12 September 2024 12:30 - 13:00
Architecting for speed: Reducing overall latency in LLM usecases with effective prompting strategies and architectural techniques.
In today's fast-paced digital world, minimising latency in large language model (LLM) applications is crucial for seamless user experiences.
This talk explores various techniques to reduce overall latency through effective prompting strategies and innovative architectural patterns. Attendees will gain insights into these techniques and learn about the advantages and disadvantages of each approach.