Partnership opportunities

Save $200 on your pass

Call to action
Your text goes here. Insert your content, thoughts, or information in this space.
Button

Back to speakers

Shashank
Kapadia
Staff Machine Learning Engineer
Walmart Global Tech
Shashank Kapadia is a machine learning engineering leader specializing in large-scale AI solutions that drive measurable improvements in user engagement and business outcomes. With over a decade of hands-on experience at global organizations like Walmart, Randstad, and Monster Worldwide, he has pioneered cutting-edge ML solutions—optimizing revenue, boosting engagement, and streamlining decision-making. Shashank’s approach balances technical rigor with ethical responsibility. He champions fairness, transparency, and real-world relevance, ensuring solutions serve both the enterprise and the broader community. An active mentor and thought leader, he has spoken at global conferences, judged and mentored hackathons, authored widely-read articles on NLP, and co-authored published research—guiding teams to award-winning results. A valedictorian graduate in Operations Research from Northeastern University, Shashank continues to push the boundaries of ML innovation. His work exemplifies a seamless fusion of cutting-edge techniques, high-level strategy, and values-driven execution—advancing technology that’s as impactful as it is responsible.
Button
15 April 2026 13:50 - 14:20
Failure, recovery, and containment in autonomous systems
Autonomous systems rarely fail cleanly. Instead, they stall mid-trajectory, loop on partial goals, or leave systems in inconsistent states after taking irreversible actions. These failures are difficult to detect, harder to recover from, and often only surface once real users and real infrastructure are involved. This session examines how engineering teams design for failure in autonomous systems operating in production. We’ll discuss how teams detect and contain failures as they happen, recover safely without losing intent or corrupting state, and limit blast radius when agents interact with multiple tools and services. The focus is on recovery as a first-class systems concern, rather than an afterthought once autonomy is already in place.