:::

[TIGP-AIoT 2026 Spring Seminar] Beyond Accuracy: Diagnosing Shortcut Reasoning in Language Models through Controlled Perturbations


  • 講者 : 巴撒伊 先生
  • 日期 : 2026/03/27 (Fri.) 14:00~16:00
  • 地點 : 資創中心122演講廳
  • 邀請人 : TIGP-AIoT Program
Abstract
Benchmark accuracy has become a standard way to measure progress in language models, yet it often fails to capture whether a model is reasoning robustly or relying on superficial shortcuts. This talk presents the core idea behind the SCARE project, a perturbation-based evaluation framework designed to probe reasoning stability in arithmetic word problems. The central approach is simple: apply controlled modifications that preserve the underlying answer, then examine whether the model’s prediction remains consistent. Through perturbation families such as context padding, math-safe lexical substitution, symbolic numeric re-encoding, and premise reordering, SCARE reveals forms of brittleness that are invisible to ordinary accuracy-based evaluation. I will discuss the design principles of the framework, its role as a diagnostic tool for reasoning robustness, and broader implications for evaluating and improving trustworthy AI systems.