02 October 2025 15:20 - 15:40
Panel discussion | LLM Evaluation: frameworks, metrics, and best practices
Dissect LLM evaluation, covering the frameworks, metrics, and tooling used to assess real-world performance.
Panelists will explore trade-offs between automatic and human-in-the-loop methods, evaluate alignment with task-specific goals, and share lessons from production-grade testing pipelines.
Expect a grounded, hands-on discussion on what actually works when benchmarking LLMs for reliability, safety, and business impact.