:::
AICC

Rethinking Benchmarks in ML Applications


  • 講者 : 吳宗翰 先生
  • 日期 : 2025/01/08 (Wed.) 13:30~15:30
  • 地點 : 資創中心122 演講廳、視訊
  • 邀請人 : 陳駿丞
Abstract
線上會議連結:
Webex 會議連結
會議號: 2512 951 9728
密碼: yAxWpaPm683


Benchmarks are vital for driving progress across AI applications, serving as a foundation for defining success and inspiring innovation. In the post-ChatGPT era, their design faces new challenges due to the growing capabilities of large models and increasingly complex tasks. This talk highlights two key principles for creating effective benchmarks: evaluation metrics and dataset design. On the evaluation front, we explore the shift from traditional, objective metrics to human-aligned metrics, exemplified by the "CLAIR-A" case study on LLMs as evaluators. For dataset design, we emphasize diverse, representative, and controlled datasets, illustrated by the "Visual Haystacks" case study for long-context visual understanding. Together, these approaches enable benchmarks to better reflect real-world challenges and drive meaningful AI progress.
Bio
Tsung-Han (Patrick) Wu is a second-year CS PhD student at UC Berkeley, advised by Prof. Trevor Darrell and Prof. Joseph E. Gonzalez. His recent work focuses on exploring the zero-shot applications and addressing the limitations of Large (Vision) Language Models. Before becoming a PhD student, he earned an MS and BS in Computer Science and Information Engineering from National Taiwan University (NTU). For more information, please visit his personal website: https://tsunghan-wu.github.io/.