中央研究院資訊科技創新研究中心

Abstract

[Google Meet]
https://meet.google.com/jcz-iyce-kch

Automatic speech attribute transcription (ASAT) is a new approach to speech processing by detecting speech anchors first, extracting speech information next, and building up words, phrases, and sentences in a bottom-up manner, so that speech knowledge sources can be integrated into making speech recognition decisions. A recent study on high-density cortical surface recordings in humans while they listen to continuous speech also reveals that distinctive phonetic features, or speech attributes, are directly related to the superior temporal gyrus (STG) representations of the entire phonetic inventory, in contrast to some conventional notion that phonemes are encoded in brain speech perception. This finding agrees with our proposed ASAT approach to speech information extraction by integrating multiple speech cues. In the second part of the talk, we address fundamental issues in language-universal multilingual speech recognition. We adopt speech attributes, as opposed to phonemes or characters, as fundamental modeling units and compare systems established with abundant training in resource-rich languages and systems built with limited training data in resource-poor languages. We explore issues related to pure data-driven frameworks and ASAT-based techniques for
phoneme and spoken-keyword recognition, as well as bottom-up continuous speech recognition, especially for resource-limited languages. We found that small ASAT models produce results similar to or better than those achieved with large models, such as Whisper.

Bio

Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as the Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 550 papers and 30 patents, with more than 33,000 citations and an h-index of 89 on Google Scholar. He received numerous honors, including five Signal Processing Society (SPS) Best Paper Awards, three IEEE Proceedings papers, the Bell Labs President's Gold Award in 1998, the SPS Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition'' in 2016, and the ISCA Medal in Scientific Achievement for “Pioneering and
Seminal Contributions to the Principles and Practice of Automatic Speech and Speaker
Recognition'' in 2012. His two pioneering papers, published in 2014 and 2015, on deep
regression for speech enhancement accumulated over 2000 citations, recognized as top
downloaded papers in SPS publications, and won SPS Best Paper Award in 2019.

人工智慧創新應用專題中心

人工智慧創新應用專題中心

學術演講

Automatic Speech Attribute Transcription (ASAT) with Implications on Brain Speech Perception, and Language-Universal Multilingual Speech Recognition