15 September 2026 11:00 - 13:00
A hands-on Introduction to multimodality
Multimodal AI is rapidly changing how intelligent systems interact with the world, moving beyond text-only models into systems that can understand and reason across images, audio, video, and language simultaneously.
But building reliable multimodal applications in production introduces new challenges around model orchestration, data pipelines, inference performance, and real-time interaction design.
In this hands-on workshop, hosted by NVIDIA's Antonio Rueda -Toicen will provide a practical introduction to multimodal AI development, showing developers and technical teams how to build and experiment with applications that combine multiple data types using modern AI tooling and accelerated infrastructure.
Attendees will explore:
ā The fundamentals of multimodal AI and how multimodal models process different inputs
ā Real-world use cases across vision, speech, video, and generative AI applications
ā How to build multimodal workflows using NVIDIAās AI ecosystem and developer tools
ā Key considerations around latency, inference optimisation, and GPU acceleration
ā Practical demonstrations and interactive exercises for building multimodal applications
This workshop is designed for AI engineers, developers, ML practitioners, and technical leaders looking to better understand how multimodal systems are built, deployed, and scaled in real-world environments.