RI: Medium: Deep Neural Networks for Robust Speech Recognition through Integrated Acoustic Modeling and Separation

RI：中：通过集成声学建模和分离实现鲁棒语音识别的深度神经网络

基本信息

批准号：
1409431
负责人：
Eric Fosler-Lussier
金额：
$ 79.81万
依托单位：
Ohio State University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2014
资助国家：
美国
起止时间：
2014-06-01 至 2019-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1409431&HistoricalAwards=false
关键词：
RI Medium Deep Neural Networks

项目摘要

Over the last decade, speech recognition technology has become steadily more present in everyday life, as seen by the proliferation of applications including mobile personal agents and transcription of voicemail messages. Performance of these systems, however, degrades significantly in the presence of background noise; for example, using speech recognition technology in a noisy restaurant or on a windy street can be difficult because speech recognizers confuse the background noise with linguistic content. Compensation for noise typically involves preprocessing the acoustic signal to emphasize the speech signal (i.e. speech separation), and then feeding this processed input into the recognizer. The innovative approach in this project is to train the recognition and separation systems in an integrated manner so that the linguistic content of the signal can inform the separation, and vice versa. Given the impact of the recent resurgence of Deep Neural Networks (DNNs) in speech processing, this project seeks to make DNNs more resistant to noise by integrating speech separation and speech recognition, exploring three related areas. The first research area seeks to stabilize input to DNNs by combining DNN-based suppression and acoustic modeling, integrating masking estimates across time and frequency, and using this information to improve reconstruction of speech from noisy input. The second area seeks to examine a richer DNN structure, using multi-task learning techniques to guide the construction of DNNs better at performing all tasks and where layers have meaningful structure. The final research area examines ways to adapt the spurious output of DNN acoustic models given acoustic noise. With the focus of integrating speech separation and recognition, the project will be evaluated both by measuring speech recognition performance, as well as metrics that are more closely related to human speech perception. This will ensure a broader impact of this research by providing insights not only to speech technology but also facilitating the design of next-generation hearing technology in the long run.

在过去的十年中，语音识别技术在日常生活中变得越来越稳定地存在，这可以从包括移动个人代理和语音邮件信息的转录（包括移动个人代理和转录）的应用中展现出来。但是，在存在背景噪声的情况下，这些系统的性能大大降低；例如，在嘈杂的餐厅或大风街上使用语音识别技术可能很困难，因为语音识别器将背景噪音与语言内容混淆。噪声的补偿通常涉及预处理声学信号以强调语音信号（即语音分离），然后将此处理后的输入输入识别器。该项目的创新方法是以集成的方式训练识别和分离系统，以便信号的语言内容可以告知分离，反之亦然。鉴于深层神经网络（DNN）在语音处理中的最新复兴的影响，该项目试图通过整合语音分离和语音识别，探索三个相关领域，从而使DNN对噪声更具抵抗力。第一个研究领域试图通过结合基于DNN的抑制和声学建模，跨时间和频率整合掩盖估计值，并使用此信息来改善噪音输入中语音的重建，从而稳定对DNN的输入。第二个领域试图使用多任务学习技术来检查更丰富的DNN结构，以指导DNN的构建更好地执行所有任务以及层具有有意义的结构。最终研究区域研究了通过声音噪声调整DNN声学模型的虚假输出的方法。以整合语音分离和认可的重点，将通过衡量语音识别绩效以及与人类言语感知密切相关的指标来评估该项目。从长远来看，这将通过为言语技术提供见解，还可以促进下一代听力技术的设计，从而确保这项研究的广泛影响。