Sign In
Register

Partnership opportunities

Secure $100 off your pass

Call to action
Your text goes here. Insert your content, thoughts, or information in this space.
Button

Back to speakers

Sagar
Manglani
Perception Lead, Autonomy
Teleo
Sagar Manglani is the Perception Lead at Teleo, a startup where they lead the development of the company's perception stack to automate heavy machinery in off-road environments. Over the years, Sagar has driven machine learning projects to production in both startup and large-scale corporate settings, earning promotions at every stage, which reflects their ability to deliver results across diverse teams and cultures. With more than a decade of experience, including six years at Ford Motor Company as a Senior Research Engineer, Sagar played a pivotal role in building Ford's first automated vehicle system for manufacturing plants and the first L2+ monocular vision-based autonomy perception stack for Ford BlueCruise. Sagar's goal is to develop autonomous solutions that are both safe and reliable. Sagar's expertise spans machine learning, computer vision, and robotics. They have worked on 2D and 3D object detection, scene segmentation, and depth estimation, handling everything from developing models to deploying them on embedded devices used in production. Sagar's work is also reflected in multiple publications and patents.
Button
27 August 2025 15:00 - 15:20
Leveraging vision language models (VLMs) for automated image & video data annotation
Recent developments in Vision Language Models and foundation models for segmentation allow for a coherent framework for automated annotation of image and video datasets. Vision-based large language models are capable of open-vocabulary detection and can identify a broad range of objects with a single prompt, producing bounding boxes that can be directly utilized to generate grounded segmentation. Multi-modal large language models can further refine this process by validating the accuracy of segmentation results against predefined thresholds, ensuring high quality in the resulting annotations. By combining vision-based LLMs for open-vocabulary detection, automated segmentation, and LLM-based verification, this methodology provides a systematic and scalable solution for preparing annotated datasets. Sequences of annotation of an uncommon object will be presented to demonstrate the strength of this method.