:::

ICASSP Tutorial on Deep Spectrum Regression


  • 講者 : 李錦輝 教授
  • 日期 : 2025/11/07 (Fri.) 14:00~16:00
  • 地點 : 資創中心122 演講廳、視訊
  • 邀請人 : 曹昱
Abstract
[Google Meet]
連結點我


In this tutorial, we intend to lay out theoretical foundations for deep regression, a new approach to solving signal processing problems, leveraging upon deep learning and big data paradigms. Based on Komogorov’s Representation Theorem (1957), a multivariate scalar function can be expressed exactly as a superposition of a finite number of outer functions with another linear combination of inner functions embedded within. Cybenko (1989) developed a universal approximation theorem showing such a scalar function can be approximated by a superposition of sigmoid functions, inspiring a new wave of neural network algorithms. Barron (1993) later proved that the error in approximation can be tightly bounded and related to the representation power in learning theory. To make the mapping learnable and computable for some practical applications, we cast the classical spectrum mapping problems from noisy to clean spectrograms into a nonlinear regression setting using deep neural networks (DNNs), such that the DNN parameters can be estimated with deep learning algorithms. We develop new theorems to generalize the universal approximation theorems from multilevel perceptions to DNNs and from vector-to-scalar to vector-to-vector regression. We also show that the generalization loss or regression error in machine learning can be decomposed into three terms, approximation, estimation and optimization errors, such that each of them can be tightly bounded, respectively. Many classical audio processing problems, such as speech enhancement, source separation and speech dereverberation, can be formulated as finding mapping functions to transform input to output spectra. Our developed theorems also provide some guidelines for parameter and architecture selections in DNN designs. In a series of experiments for high-dimensional nonlinear regression, we validate our theory in terms of representation and generalization powers in machine learning for spectrum mapping. As a result, DNN-transformed speech usually exhibits a superior quality and intelligibility under adverse acoustic conditions. Finally, our proposed deep regression framework was also tested on recent challenging tasks in CHiME, REVERB and DIHARD Challenges, leading to lowest error rates in almost all open evaluations. The framework can be also extended to microphone-array based speech processing and speech recognition.
Bio
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as the Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 550 papers and 30 patents, with more than 33,000 citations and an h-index of 89 on Google Scholar. He received numerous honors, including five Signal Processing Society (SPS) Best Paper Awards, three IEEE Proceedings papers, the Bell Labs President's Gold Award in 1998, the SPS Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition'' in 2016, and the ISCA Medal in Scientific Achievement for “Pioneering and
Seminal Contributions to the Principles and Practice of Automatic Speech and Speaker
Recognition'' in 2012. His two pioneering papers, published in 2014 and 2015, on deep
regression for speech enhancement accumulated over 2000 citations, recognized as top
downloaded papers in SPS publications, and won SPS Best Paper Award in 2019.