26 August 2026 11:00 - 11:30
Is it really the model, or its a systems problem?
A single prompt is manageable. Add tools, memory, and multi-step workflows, and the system starts behaving differently under real conditions. Small inconsistencies compound, dependencies become harder to track, and issues surface in places you weren’t looking.
This session focuses on where orchestration starts to break down in practice, how complexity builds across systems, why it’s difficult to debug, and what teams are doing to make these workflows more stable.
Key takeaways:
→ Where orchestration introduces failure points as systems become more interconnected
→ Why multi-step workflows are harder to reason about than they appear
→ What teams are doing to reduce fragility across complex systems
26 August 2026 11:00 - 11:30
Agentic AI evals challenges
Agentic AI introduces a new set of evaluation challenges that go well beyond traditional LLM benchmarking.
When a model plans, calls tools, recovers from errors, and operates over long horizons, the question shifts from "is this answer correct?" to "did this trajectory accomplish the goal - safely, efficiently, and for the right reasons?"
This talk surveys the practical challenges of evaluating agents in production: why static benchmarks saturate and mispredict real-world behavior, how path-dependence and multiple valid solutions complicate scoring, and why trajectory-level metrics (steps, tool calls, cost, latency) often matter as much as final-task success.