15 April 2026 16:00 - 16:30
Scaling and optimizing agentic workflows
As agentic systems move from prototype to production, managing inference costs, latency, and throughput becomes a critical engineering challenge. This technical session explores three practical architectural patterns for optimizing large language model operations in autonomous agents.
First, we will dive into query-aware routing, demonstrating how to dynamically match incoming agent tasks with the most efficient model size without sacrificing capability.
Next, we will cover semantic caching strategies that bypass redundant model calls for frequently executed tasks, drastically reducing latency. Finally, we will examine advanced prompt-design techniques for token-efficient reasoning, contrasting traditional verbose reasoning methods to minimize computational overhead while preserving logical rigor.
By the end of this session, participants will have concrete methodologies to engineer highly efficient and economically viable agentic architectures.