ADAPTIVE NOISE CANCELLATION FOR ROBUST SPEECH RECOGNITION IN NOISY ENVIRONMENTS
DOI:
https://doi.org/10.46991/PYSU:A.2024.58.1.022Keywords:
automatic speech recognition, noise cancellation, noise robustness, domain adaptationAbstract
In this paper, we address the challenges faced when combining noise cancellation and automatic speech recognition models. When these models are combined directly, the performance of word recognition often suffers because the distribution of input data changes. To overcome this limitation, we propose a novel method for combining these models, which enhances the ability of the speech recognition model to perform well in noisy environments. The key feature of the proposed method is the introduction of a mechanism to control the aggressiveness of noise reduction. This mechanism enables us to customize the noise reduction process according to the specific requirements of the ASR model, without necessitating any retraining. This advantage makes our method applicable to any ASR model, facilitating its implementation in practical scenarios.
References
Radford A., Kim J., et al. Robust Speech Recognition Via Large-scale Weak Supervision. International Conference on Machine Learning(2023), 28492-28518. https://doi.org/10.48550/arXiv.2212.04356
Gulati A., Qin J., et al. Conformer: Convolution-augmented Transformer for Speech Recognition. INTERSPEECH (2020), 5036-5040. https://doi.org/10.48550/arXiv.2005.08100
Li J., Lavrukhin V., et al. Jasper: An End-to-End Convolutional Neural Acoustic Model (2019). https://doi.org/10.48550/arXiv.1904.03288
Boll S. Suppression of Acoustic Noise in Speech Using Spectral Subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing 27 (1979), 113-120. https://doi.org/10.1109/TASSP.1979.1163209
Acero A. Acoustical and Environmental Robustness in Automatic Speech Recognition. Springer Science and Business Media (1992). https://doi.org/10.1007/978-1-4615-3122-7
Cui X., Iseli M., et al. Evaluation of Noise Robust Features on the Aurora Databases. Proc. 7th International Conference on Spoken Language Processing (ICSLP 2002). INTERSPEECH (2002), 481-484. https://doi.org/10.21437/ICSLP.2002-24
Hermansky H., Morgan N. RASTA Processing of Speech. IEEE Transactions on Speech and Audio Processing 2 (1994), 578-589. https://doi.org/10.1109/89.326616
Mošner L., Wu M., et al. Improving Noise Robustness of Automatic Speech Recognition Via Parallel Data and Teacher-Student Learning. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2019), 6475-6479. https://doi.org/10.48550/arXiv.1901.02348
Gales M., Young S. Robust Continuous Speech Recognition Using Parallel Model Combination. IEEE Transactions on Speech and Audio Processing 4 (1996), 352-359. https://doi.org/10.1109/89.536929
Gong Y. Speech Recognition in Noisy Environments: A Survey. Speech Communication 19 (1995), 261-291. https://doi.org/10.1016/0167-6393(94)00059-J
Lippmann R., Martin E., Paul D. Multi-style Training for Robust Isolated-word Speech Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 12 (1987), 705-708. https://doi.org/10.1109/ICASSP.1987.1169544
Wang Z., Wang X., et al. Oracle Performance Investigation of the Ideal Masks. IEEE International Workshop on Acoustic Signal Enhancement (IWAENC) (2016), 1-5. https://doi.org/10.1109/IWAENC.2016.7602888
Xia S., Li H., Zhang X. Using Optimal Ratio Mask as Training Target for Supervised Speech Separation. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Malaysia, Kuala Lumpur, IEEE (2017). https://doi.org/10.48550/arXiv.1709.00917
Cho K., Merriënboer B., et al. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation 10 (2014), 103-111. https://doi.org/10.3115/v1/W14-4012
Panayotov V., Chen G., et al. Librispeech: An ASR Corpus Based on Public Domain Audio Books. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2015), 5206-5210. https://doi.org/10.1109/ICASSP.2015.7178964
Snyder D., Chen G., Povey D. Musan: A Music, Speech, and Noise Corpus (2015). https://doi.org/10.48550/arXiv.1510.08484
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Proceedings of the YSU
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.