中央研究院資訊科技創新研究中心

[TIGP-AIoT 2026 Spring Seminar] Beyond Accuracy: Diagnosing Shortcut Reasoning in Language Models through Controlled Perturbations

講者 : 巴撒伊先生
日期 : 2026/03/27 (Fri.) 14:00~16:00
地點 : 資創中心122演講廳
邀請人 : TIGP-AIoT Program

Abstract

Benchmark accuracy has become a standard way to measure progress in language models, yet it often fails to capture whether a model is reasoning robustly or relying on superficial shortcuts. This talk presents the core idea behind the SCARE project, a perturbation-based evaluation framework designed to probe reasoning stability in arithmetic word problems. The central approach is simple: apply controlled modifications that preserve the underlying answer, then examine whether the model’s prediction remains consistent. Through perturbation families such as context padding, math-safe lexical substitution, symbolic numeric re-encoding, and premise reordering, SCARE reveals forms of brittleness that are invisible to ordinary accuracy-based evaluation. I will discuss the design principles of the framework, its role as a diagnostic tool for reasoning robustness, and broader implications for evaluating and improving trustworthy AI systems.

學術演講

[TIGP-AIoT 2026 Spring Seminar] Beyond Accuracy: Diagnosing Shortcut Reasoning in Language Models through Controlled Perturbations