04 June 2026 16:00 - 16:30
Evaluating autonomous agents: Bridging the gap between testing and real-world performance
Evaluating autonomous agents is harder than static models or prompt-based systems. Their behavior unfolds over sequences, interacts with tools and environments, and can shift in live conditions in ways offline tests miss.
In this panel, engineers share how they measure agent behavior, analyze trajectories over single outputs, and identify the signals that matter in real-world contexts, offering candid insights on what works, what doesn’t, and remaining evaluation challenges.