中央研究院資訊科技創新研究中心

Fundamentals, Prospectives and Challenges in Deep-learning based Voice Conversion

講者 : 黃文勁教授
日期 : 2024/08/14 (Wed.) 10:30~12:30
地點 : 資創中心122 演講廳、視訊
邀請人 : 曹昱

Abstract

線上會議連結如下：
Webex 會議連結
會議號： 2519 558 2881
密碼： 4TfmipmKK66

Voice conversion (VC) refers to the task of converting certain desired attributes between two speech utterances, without changing the linguistic contents. VC has a wide range of applications, from entertainment, education to medical solutions. While the earliest VC research started forty years ago, the vast progress of deep learning in the past decade has greatly benefit the development of recent VC technologies. However, as many consider VC a solved task, there are still a plenty of remaining challenges. In this talk, I will first introduce the fundamentals of VC and how deep learning has changed the game. Then, I will talk about prospectives to consider when designing a VC system towards a certain application. Finally, I will share my thoughts on remaining challenges and future directions in VC.

Bio

Wen-Chin Huang received the B.S. degree from National Taiwan University, Taiwan in 2018 and the M.S. and Ph.D. degree from Nagoya University, Japan in 2021 and 2024, respectively. He is currently an assistant professor at the Graduate School of Informatics, Nagoya University, Japan. He was a co-organizer of the Voice Conversion Challenge 2020, Singing Voice Conversion 2023, and VoiceMOS Challenge 2022, 2023, 2024. His main research interest is speech processing, with a main focus on speech generation related fields including voice conversion and speech quality assessment. He was the recipient of the Best Student Paper Award in ISCSLP2018, the Best Paper Award in APSIPA ASC 2021, and the 16th IEEE Signal Processing Society Japan Best Student Journal Paper Award.

學術演講

Fundamentals, Prospectives and Challenges in Deep-learning based Voice Conversion