Hearing-impaired patients have limited hearing dynamic range for speech perception, which partially accounts for their poor speech understanding abilities, particularly in noise. Wide dynamic range compression aims to compress speech signal into the usable hearing dynamic range of hearing-impaired listeners; however, it normally uses a static compression based strategy. This work proposed a strategy to continuously adjust the envelope compression ratio for speech processing in cochlear implants. This adaptive envelope compression (AEC) strategy aims to keep the compression processing as close to linear as possible, while still confine the compressed amplitude envelope within the pre-set dynamic range. Vocoder simulation experiments showed that, when narrowed down to a small dynamic range, the intelligibility of AEC-processed sentences was significantly better than those processed by static envelope compression. This makes the proposed AEC strategy a promising way to improve speech recognition performance for implanted patients in the future.
The Gaussian mixture model (GMM)-based method has dominated the field of voice conversion (VC) for last decade. However, the converted spectra are excessively smoothed and thus produce muffled converted sound. In this study, we improve the speech quality by enhancing the dependency between the source (natural sound) and converted feature vectors (converted sound). It is believed that enhancing this dependency can make the converted sound closer to the natural sound. To this end, we propose an integrated maximum a posteriori and mutual information (MAPMI) criterion for parameter generation on spectral conversion. Experimental results demonstrate that the quality of converted speech by the proposed MAPMI method outperforms that by the conventional method in terms of formal listening test.
Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.