15 April 2025 16:00 - 16:30
Panel | What breaks when GenAI scales: Latency, cost, and reliability in the real world
As GenAI adoption grows, the underlying infrastructure is under constant strain. Latency spikes, unpredictable traffic, rising inference costs, and brittle retrieval layers often emerge long after a system looks stable in testing.
This session explores how engineering teams are redesigning serving layers, data pipelines, and performance workflows to keep GenAI systems fast, affordable, and reliable at scale.
Key takeaways:
→ How teams reduce latency under real-world load
→ The cost impact of routing, batching, and caching decisions
→ Where retrieval layers and vector search introduce scaling limits
→ Architectural choices that improve reliability as usage grows