07 November 2024 12:00 - 12:30
"Can't I just ask ChatGPT?" Production pipelines shaping the visual intelligence renaissance
We are entering a new era of Visual Intelligence, where the need for training custom models from scratch is rapidly diminishing.
Today, many vision tasks can be accomplished simply by leveraging pre-existing models (such as Foundation Models, MLLMs, or Open Vocabulary models), fine-tuned or taken directly off-the-shelf. Crucially, such models are becoming increasingly more powerful and entirely sufficient for a wide range of applications, including Retail Analytics, Surveillance, Media & Entertainment, Logistics, and many more.
While such models have achieved remarkable progress, several challenges remain in their integration into production pipelines, including: high operational costs at scale, performance variability across tasks, security concerns, and integration challenges. As a result, there is currently a big gap between the capabilities of these models and fully developed, production-ready vision systems.
In this talk, we will discuss how to overcome these challenges and implement robust, scalable computer vision pipelines powered by the latest AI model advancements.