中央研究院資訊科技創新研究中心

[TIGP-AIoT 2026 Spring Seminar] Model Merging for Safe, Helpful, and Adaptable LLMs

講者 : 陳尚澤教授
日期 : 2026/05/22 (Fri.) 14:00~16:00
地點 : 資創中心122演講廳
邀請人 : TIGP-AIoT Program

Abstract

As LLMs are customized, maintaining the balance between helpfulness and safety is vital. This talk introduces a modular paradigm for alignment using weight arithmetic. I will first present the Preference Vector framework, which enables test-time control over multi-preference alignment by merging behavior-specific vectors without retraining. I will then demonstrate how merging pre- and post-fine-tuned weights effectively restores safety guardrails lost during downstream adaptation, reducing attack success rates while improving performance. Together, these methods offer a scalable, data-efficient approach to safeguarding customized AI systems through efficient parameter-space operations.

Bio

Shang-Tse Chen is an Associate Professor in the Department of Computer Science and Information Engineering at National Taiwan University. He works at the intersection of applied and theoretical machine learning, with a strong application focus on cybersecurity. His research has led to patented cyber threat detection technology with Symantec, open-sourced adversarial attack and defense tools with Intel, and a deployed fire risk prediction system with the Atlanta Fire Rescue Department. He is a recipient of the K. T. Li Young Researcher Award in 2025. His recent research interests include various aspects of ML models' security, privacy, and fairness.

學術演講

[TIGP-AIoT 2026 Spring Seminar] Model Merging for Safe, Helpful, and Adaptable LLMs