2021年06月18日 星期五 上午10:30 | 2 小時| (UTC+08:00)台北
會議號:184 943 4771
密碼:sJbYYm2EM57
Based on Komogorov’s Representation Theorem (1957), any multivariate scalar function can be expressed as a linear combination of a finite number of inner functions embedded inside another superposition of outer functions. Cybenko (1989) turned the function representation into a function approximation problem and developed a Universal Approximation Theorem such that a multivariate scalar function can be approximated by a superposition of sigmoid functions. Barron (1993) later proved that this approximation error can be tightly bounded by the number of sigmoid. This is later known to be related to the representation power of neural networks. In order to make the function learnable for some practical applications, we cast the classical function approximation problems into a regression setting using deep neural networks (DNNs) such that DNN parameters can be estimated with machine learning algorithms. We first prove some theorems to generalize the Universal Approximation Theorem from sigmoid to DNNs and from vector-to-scalar to vector-to-vector regression. We will call this process deep regression. We also show that the generalization loss or regression error in machine learning can be decomposed into three terms, an approximation error, an estimation error and an optimization error, such that each of them can be individually bounded tightly. Next, we formulate some classical spectral mapping problems, such as speech enhancement and source separation, as deep regression and incorporate the emerging big data paradigm to simulate a large collection of input/output vector pairs for high-dimension nonlinear regression. In a series of experiments, we validate our theory in terms of representation and generalization powers in learning. Our developed theorems also provide some guidelines for parameter and architecture selections in DNN design. As a result, DNN-enhanced speech usually demonstrates good quality and clear intelligibility under adverse acoustic conditions. Finally, the proposed deep regression framework was also tested on recent challenging tasks in CHiME-2, CHiME-4, CHiME-5, CHiME-6, REVERB and DIHARD III. Leveraging upon the top speech quality achieved in microphone array speech enhancement, separation and dereverberation, our teams scored the lowest error rates in all the six evaluated scenarios.
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 50,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition''. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition''. His papers on deep regression were highly cited and won a recent Best Paper Award from IEEE SPS.