15 April 2026 13:50 - 14:20
Failure, recovery, and containment in autonomous systems
Autonomous systems rarely fail cleanly.
Instead, they stall mid-trajectory, loop on partial goals, or leave systems in inconsistent states after taking irreversible actions.
These failures are difficult to detect, harder to recover from, and often only surface once real users and real infrastructure are involved.
This session examines how engineering teams design for failure in autonomous systems operating in production. We’ll discuss how teams detect and contain failures as they happen, recover safely without losing intent or corrupting state, and limit blast radius when agents interact with multiple tools and services.
The focus is on recovery as a first-class systems concern, rather than an afterthought once autonomy is already in place.