Divya van Mahajan

  • Home
  • Categories
  • Series
  • Tags
  • TIL
  • About

Tag: Long-Horizon

Evaluating Long-Horizon Agent Performance: The Reality of Autonomous Business

April 12, 2026 • Divya van Mahajan
Evaluating Long-Horizon Agent Performance: The Reality of Autonomous Business

An exploration of long-horizon AI evaluations like Vending-Bench 2, demonstrating where modern LLMs thrive and break down over year-long operations.

Read more...

Search

Recent Posts

  • Accelerating Enterprise Architecture with Knowledge Graphs and LLMs Apr 12, 2026
  • Building Data Agents at Hex: From Text-to-SQL to Multi-Agent Systems Apr 12, 2026
  • Evaluating Long-Horizon Agent Performance: The Reality of Autonomous Business Apr 12, 2026
  • Pi: The Minimal Coding Agent & Master of Mischief Mar 14, 2026
  • Spec Engineering and Harness Engineering: The Future of AI-Native Development Mar 10, 2026

Tags

ai-agents observability governance trust ai learning research productivity java architecture history spring quarkus jakarta-ee software-development technical-debt coding-assistants strategy culture transformation leadership communication prd typescript github open-source mcp mainframe migration cics modernization iMac SSD-Upgrade rsync APFS Data-Recovery macOS-Monterey ai-agent agents pi automation slack-bot risk automation-debt spec-engineering harness-engineering antigravity astro genai Knowledge Graph Neo4j SAP LeanIX LLM MCP data-analytics evaluation llm hex notebooks security authorization keycloak racf enterprise Copilot Claude Integration Agents Evals Vending-Bench LLMs Long-Horizon Photos.app osxphotos Automation macOS Photography Cypher
Divya van Mahajan
© 2026 Van Mahajan Consulting. All rights reserved. Powered by Astro