20 November 2025 16:30 - 17:00
Coding with agents: Deploy ready workflows using AI at the CLI
This talk presents GeoCoder, a novel approach to solving geometry problems in Visual Question Answering (VQA) using vision-language models (VLMs) that generate and execute modular code.
Traditional VLMs often struggle with precise calculations and correct formula use. GeoCoder addresses this by fine-tuning a VLM (LLaVA 1.5 7B) to produce executable code, improving accuracy and interpretability.
We demonstrate how this semi-parametric method outperforms chain-of-thought reasoning across various problem complexities, leveraging retrieval-augmented generation and code execution for more reliable solutions.