中央研究院資訊科技創新研究中心

Rethinking Benchmarks in ML Applications

講者 : 吳宗翰先生
日期 : 2025/01/08 (Wed.) 13:30~15:30
地點 : 資創中心122 演講廳、視訊
邀請人 : 陳駿丞

Abstract

線上會議連結：
Webex 會議連結
會議號： 2512 951 9728
密碼： yAxWpaPm683

Benchmarks are vital for driving progress across AI applications, serving as a foundation for defining success and inspiring innovation. In the post-ChatGPT era, their design faces new challenges due to the growing capabilities of large models and increasingly complex tasks. This talk highlights two key principles for creating effective benchmarks: evaluation metrics and dataset design. On the evaluation front, we explore the shift from traditional, objective metrics to human-aligned metrics, exemplified by the "CLAIR-A" case study on LLMs as evaluators. For dataset design, we emphasize diverse, representative, and controlled datasets, illustrated by the "Visual Haystacks" case study for long-context visual understanding. Together, these approaches enable benchmarks to better reflect real-world challenges and drive meaningful AI progress.

Bio

Tsung-Han (Patrick) Wu is a second-year CS PhD student at UC Berkeley, advised by Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. His recent work focuses on exploring the zero-shot applications and addressing the limitations of Large (Vision) Language Models. Before becoming a PhD student, he earned an MS and BS in Computer Science and Information Engineering from National Taiwan University (NTU). For more information, please visit his personal website: https://tsunghan-wu.github.io/.

學術演講

Rethinking Benchmarks in ML Applications