15 April 2026 12:00 - 12:30
Panel | Evaluating autonomous agents: Closing the gap between tests and real-world behaviour
Evaluating autonomous agents is fundamentally harder than evaluating static models or prompt-based systems.
Behavior unfolds over sequences of actions, interacts with tools and environments, and changes under real traffic in ways that are difficult to capture with offline tests alone.
In this panel, engineers and system builders compare how they evaluate agent behavior in practice. The discussion will explore where traditional testing breaks down, how teams reason about trajectories rather than single outputs, and what signals matter most once agents are operating in dynamic, real-world environments.
Expect candid perspectives on what works, what doesn’t, and where evaluation remains an open problem for agentic systems.