12 September 2024 10:00 - 10:30
Optimizing the giants: navigating technical challenges and debt in LLMs deployment
Large Language Models (LLMs) have become an essential tool in advancing artificial intelligence and machine learning, enabling tremendous capabilities in natural language processing and understanding. However, the efficient deployment of LLMs in production environments reveals a complex landscape of technical challenges and debts.
In this session, Ahmed will talk about the unique forms of technical challenges and debt associated with LLM deployment, including those related to memory management and parallelism, model compression, and attention optimization. These challenges emphasize the necessity of a custom approach to deploying LLMs, demanding customization and sophisticated engineering solutions not readily available in broad-use libraries.