Deep Learning Based Complex Spectral Mapping for Multi-Channel Speaker Separation and Speech Enhancement
基于深度学习的复杂频谱映射,用于多通道说话人分离和语音增强
基本信息
- 批准号:2125074
- 负责人:
- 金额:$ 39.06万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-08-01 至 2024-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Despite tremendous advances in deep learning based speech separation and automatic speech recognition, a major challenge remains how to separate concurrent speakers and recognize their speech in the presence of room reverberation and background noise. This project will develop a multi-channel complex spectral mapping approach to multi-talker speaker separation and speech enhancement in order to improve speech recognition performance in such conditions. The proposed approach trains deep neural networks to predict the real and imaginary parts of individual talkers from the multi-channel input in the complex domain. After overlapped speakers are separated into simultaneous streams, sequential grouping will be performed for speaker diarization, which is the task of grouping the speech utterances of the same talker over intervals with the utterances of other speakers and pauses. Proposed speaker diarization will integrate spatial and spectral speaker features, which will be extracted through multi-channel speaker localization and single-channel speaker embedding. Recurrent neural networks will be trained to perform classification for the purpose of speaker diarization, which can handle an arbitrary number of speakers in a meeting. The proposed separation system will be evaluated using open, multi-channel speaker separation datasets that contain both room reverberation and background noise. The results from this project are expected to substantially elevate the performance of continuous speaker separation, as well as speaker diarization, in adverse acoustic environments, helping to close the performance gap between recognizing single-talker speech and recognizing multi-talker speech.The overall goal of this project is to develop a deep learning system that can continuously separate individual speakers in a conversational or meeting setting and accurately recognize the utterances of these speakers. Building on recent advances on simultaneous grouping to separate and enhance overlapped speakers in a talker-independent fashion, the project is mainly focused on speaker diarization, which aims to group the speech utterances of the same speaker across time. To achieve speaker diarization, deep learning based sequential grouping will be performed and it will integrate spatial and spectral speaker characteristics. Through sequential organization, simultaneous streams will be grouped with earlier-separated speaker streams to form sequential streams, each of which corresponds to all the utterances of the same speaker up to the current time. Speaker localization and classification will be investigated to make sequential grouping capable of creating new sequential streams and handling an arbitrary number of speakers in a meeting scenario. With the added spatial dimension, the proposed diarization approach provides a solution to the question of who spoke when and where, significantly expanding the traditional scope of who spoke when. The proposed separation system will be evaluated using multi-channel speaker separation datasets that contain highly overlapped speech in recorded conversations, as well as room reverberation and background noise present in real environments. The main evaluation metric will be word error rate in automatic speech recognition. The performance of speaker diarization will be measured using diarization error rate.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
尽管基于深度学习的语音分离和自动语音识别取得了巨大进步,但主要挑战仍然是如何在存在房间混响和背景噪声的情况下分离并发发言者并识别他们的语音。该项目将开发一种多通道复杂频谱映射方法,用于多说话者说话者分离和语音增强,以提高此类条件下的语音识别性能。所提出的方法训练深度神经网络,以根据复杂域中的多通道输入来预测各个说话者的实部和虚部。将重叠的说话者分成同时流后,将执行顺序分组以进行说话者二值化,即将同一说话者的语音话语与其他说话者的话语和停顿按一定间隔进行分组。所提出的说话人二值化将集成空间和频谱说话人特征,这些特征将通过多通道说话人定位和单通道说话人嵌入来提取。循环神经网络将被训练来执行分类,以实现演讲者分类,从而可以处理会议中任意数量的演讲者。所提出的分离系统将使用包含房间混响和背景噪声的开放式多通道扬声器分离数据集进行评估。该项目的结果预计将大大提高不利声学环境中连续说话人分离以及说话人二值化的性能,有助于缩小识别单说话者语音和识别多说话者语音之间的性能差距。 总体目标该项目的目的是开发一个深度学习系统,可以在对话或会议环境中持续区分各个发言者,并准确识别这些发言者的话语。该项目以同步分组的最新进展为基础,以独立于说话者的方式分离和增强重叠的说话者,主要关注说话者二值化,旨在对同一说话者在不同时间段的语音进行分组。为了实现说话人二值化,将执行基于深度学习的顺序分组,并将整合空间和频谱说话人特征。通过顺序组织,同时流将与较早分离的说话者流组合在一起形成顺序流,每个流对应于同一说话者截至当前时间的所有话语。将研究发言者定位和分类,以使顺序分组能够创建新的顺序流并处理会议场景中任意数量的发言者。通过增加空间维度,所提出的二值化方法为谁在何时何地发言的问题提供了解决方案,显着扩展了谁在何时发言的传统范围。所提出的分离系统将使用多通道说话者分离数据集进行评估,该数据集包含录制的对话中高度重叠的语音,以及真实环境中存在的房间混响和背景噪声。主要评估指标是自动语音识别中的单词错误率。说话者二值化的表现将使用二值化错误率来衡量。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation
用于多通道连续语音分离的多分辨率基于位置的训练
- DOI:
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:Hassan Taherian;DeLiang Wang
- 通讯作者:DeLiang Wang
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
通过基于位置的训练实现多通道独立于说话者的说话者分离
- DOI:10.1109/taslp.2022.3202129
- 发表时间:2024-09-13
- 期刊:
- 影响因子:0
- 作者:H. Taherian;Ke Tan;Deliang Wang
- 通讯作者:Deliang Wang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eric Fosler-Lussier其他文献
Eric Fosler-Lussier的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eric Fosler-Lussier', 18)}}的其他基金
RI: Small: Early Elementary Reading Verification in Challenging Acoustic Environments
RI:小:具有挑战性的声学环境中的早期小学阅读验证
- 批准号:
2008043 - 财政年份:2020
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
RI: Medium: Deep Neural Networks for Robust Speech Recognition through Integrated Acoustic Modeling and Separation
RI:中:通过集成声学建模和分离实现鲁棒语音识别的深度神经网络
- 批准号:
1409431 - 财政年份:2014
- 资助金额:
$ 39.06万 - 项目类别:
Continuing Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
- 批准号:
1305319 - 财政年份:2013
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
CI-P:Collaborative Research:The Speech Recognition Virtual Kitchen
CI-P:协作研究:语音识别虚拟厨房
- 批准号:
1205424 - 财政年份:2012
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
RI: Medium: Collaborative Research: Explicit Articulatory Models of Spoken Language, with Application to Automatic Speech Recognition
RI:媒介:协作研究:口语显式发音模型及其在自动语音识别中的应用
- 批准号:
0905420 - 财政年份:2009
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
CAREER: Breaking the phonetic code: novel acoustic-lexical modeling techniques for robust automatic speech recognition
职业:打破语音密码:用于鲁棒自动语音识别的新颖声学词汇建模技术
- 批准号:
0643901 - 财政年份:2006
- 资助金额:
$ 39.06万 - 项目类别:
Continuing Grant
Workshop: Student Research in Computational Linguistics, at the HLT/NAACL 2004 Conference
研讨会:计算语言学学生研究,HLT/NAACL 2004 会议
- 批准号:
0422841 - 财政年份:2004
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
相似国自然基金
基于深度学习的深地叠前时空域地震子波提取方法研究
- 批准号:
- 批准年份:2022
- 资助金额:55 万元
- 项目类别:面上项目
基于数物融合深度学习的深大基坑施工灾变风险在线预测与防控研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于三支决策和强化学习的深空探测器非预期故障自主诊断与系统重构研究
- 批准号:61903015
- 批准年份:2019
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
- 批准号:51769027
- 批准年份:2017
- 资助金额:38.0 万元
- 项目类别:地区科学基金项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
- 批准号:61672236
- 批准年份:2016
- 资助金额:64.0 万元
- 项目类别:面上项目
相似海外基金
SHF: Small: Hardware-Software Co-design for Privacy Protection on Deep Learning-based Recommendation Systems
SHF:小型:基于深度学习的推荐系统的隐私保护软硬件协同设计
- 批准号:
2334628 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
DeepMARA - Deep Reinforcement Learning based Massive Random Access Toward Massive Machine-to-Machine Communications
DeepMARA - 基于深度强化学习的大规模随机访问实现大规模机器对机器通信
- 批准号:
EP/Y028252/1 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Fellowship
CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning
CRII:OAC:基于 GPU 的大规模深度学习的压缩器辅助集体通信框架
- 批准号:
2348465 - 财政年份:2024
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
Collaborative Research: A Physics-Informed Flood Early Warning System for Agricultural Watersheds with Explainable Deep Learning and Process-Based Modeling
合作研究:基于物理的农业流域洪水预警系统,具有可解释的深度学习和基于过程的建模
- 批准号:
2243776 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Deep Learning-based Multimodal Analysis of Sleep
合作研究:EAGER:基于深度学习的睡眠多模态分析
- 批准号:
2334665 - 财政年份:2023
- 资助金额:
$ 39.06万 - 项目类别:
Standard Grant