Partnership opportunities

Register

Call to action
Your text goes here. Insert your content, thoughts, or information in this space.
Button

Back to speakers

Naman
Goyal
Machine Learning Engineer
Google Deepmind
Naman Goyal is a Machine Learning Software Engineer at Google DeepMind, where he is a foundational member of the team developing Gemini's Deep Research capabilities. His work focuses on enabling the model to handle complex, multi-step research tasks by formulating intricate plans, analyzing diverse online sources, and synthesizing comprehensive reports. He is also involved in applied research to enhance reasoning, planning, and instruction-following capabilities. Prior to his work at DeepMind, Naman's experience includes roles at NVIDIA, working on the perception stack for autonomous vehicles, and at Apple, where he focused on multimodal learning for visually rich document understanding. He was also a Computer Vision Research Fellow at Adobe, where he developed novel training strategies for deep metric learning and image retrieval. Naman holds a Master's degree in Computer Science from Columbia University, where his thesis explored Multimodal Learning and on-device Natural Language Processing for smart replies for multilingual speakers in the context of code-switching. He is a co-author on the paper, "Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities."
Button
15 April 2026 14:00 - 14:30
Taming non-determinism: A framework for evaluation and observability in autonomous agent trajectories
Deploying agentic AI in production introduces a unique engineering challenge: debugging non-deterministic execution paths. Unlike traditional software, an agent's "code" is a dynamic interplay of prompt context, model weights, and external tool outputs. This talk presents a rigorous engineering methodology for agents, focusing on quantifying agent performance and analyzing trajectories. Key takeways: →Trajectory evaluations: Analyzing the "Reasoning Trace" using a secondary judge model to detect hallucinated logic steps or tool misuse. → Cost-latency trade-offs: Optimizing token usage via dynamic context compression and speculative execution of tool calls. → Sandboxing & side-effects: Technical implementation of ephemeral execution environments to safely contain agentic code execution.