Sign In
Register

Partnership opportunities

Save $100 on your pass

Call to action
Your text goes here. Insert your content, thoughts, or information in this space.
Button

Back to speakers

Ayushi
Agarwal
Research Engineer
Google Deepmind
Ayushi is a Research Engineer at DeepMind, working on the design and optimisation of large-scale AI systems. Her work focuses on improving the performance, efficiency, and real-world deployment of large language models, bridging cutting-edge research with production-ready applications.
Button
26 August 2026 11:00 - 12:00
Workshop | Deploying and auto-scaling LLM inference on Kubernetes
Getting an LLM application to work is one thing, scaling it reliably, efficiently, and cost-effectively is where most teams hit friction. This workshop breaks down what actually matters when moving from prototype to production. Ayushi will walk through the architectural decisions behind scalable LLM systems, how to optimize performance across latency and throughput, and where costs quietly spiral if left unmanaged. Expect a practical view into how leading teams structure inference pipelines, manage model workloads, and balance trade-offs between quality, speed, and cost. Key takeaways: → How to design LLM architectures that scale without breaking under real-world demand → Techniques to optimize latency, throughput, and system performance → Where costs typically inflate and how to control them early → Trade-offs between model choice, infrastructure, and user experience