Deep Learning Based Complex Spectral Mapping for Multi-Channel Speaker Separation and Speech Enhancement

基于深度学习的复杂频谱映射,用于多通道说话人分离和语音增强

基本信息

  • 批准号:
    2125074
  • 负责人:
  • 金额:
    $ 39.06万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-08-01 至 2024-07-31
  • 项目状态:
    已结题

项目摘要

Despite tremendous advances in deep learning based speech separation and automatic speech recognition, a major challenge remains how to separate concurrent speakers and recognize their speech in the presence of room reverberation and background noise. This project will develop a multi-channel complex spectral mapping approach to multi-talker speaker separation and speech enhancement in order to improve speech recognition performance in such conditions. The proposed approach trains deep neural networks to predict the real and imaginary parts of individual talkers from the multi-channel input in the complex domain. After overlapped speakers are separated into simultaneous streams, sequential grouping will be performed for speaker diarization, which is the task of grouping the speech utterances of the same talker over intervals with the utterances of other speakers and pauses. Proposed speaker diarization will integrate spatial and spectral speaker features, which will be extracted through multi-channel speaker localization and single-channel speaker embedding. Recurrent neural networks will be trained to perform classification for the purpose of speaker diarization, which can handle an arbitrary number of speakers in a meeting. The proposed separation system will be evaluated using open, multi-channel speaker separation datasets that contain both room reverberation and background noise. The results from this project are expected to substantially elevate the performance of continuous speaker separation, as well as speaker diarization, in adverse acoustic environments, helping to close the performance gap between recognizing single-talker speech and recognizing multi-talker speech.The overall goal of this project is to develop a deep learning system that can continuously separate individual speakers in a conversational or meeting setting and accurately recognize the utterances of these speakers. Building on recent advances on simultaneous grouping to separate and enhance overlapped speakers in a talker-independent fashion, the project is mainly focused on speaker diarization, which aims to group the speech utterances of the same speaker across time. To achieve speaker diarization, deep learning based sequential grouping will be performed and it will integrate spatial and spectral speaker characteristics. Through sequential organization, simultaneous streams will be grouped with earlier-separated speaker streams to form sequential streams, each of which corresponds to all the utterances of the same speaker up to the current time. Speaker localization and classification will be investigated to make sequential grouping capable of creating new sequential streams and handling an arbitrary number of speakers in a meeting scenario. With the added spatial dimension, the proposed diarization approach provides a solution to the question of who spoke when and where, significantly expanding the traditional scope of who spoke when. The proposed separation system will be evaluated using multi-channel speaker separation datasets that contain highly overlapped speech in recorded conversations, as well as room reverberation and background noise present in real environments. The main evaluation metric will be word error rate in automatic speech recognition. The performance of speaker diarization will be measured using diarization error rate.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
尽管基于深度学习的语音分离和自动语音识别方面取得了巨大进步,但主要的挑战仍然是如何将并发的演讲者分开并在房间混响和背景噪音的存在下认识他们的言语。该项目将开发一种多通道复杂的光谱映射方法,以对多对话者说话者的分离和语音增强,以提高这种情况下的语音识别性能。所提出的方法训练深层神经网络,以从复杂域中的多通道输入中预测单个说话者的真实和虚构部分。在将重叠的说话者分成同时流中后,将进行顺序分组以进行诊断,这是将同一说话者的语音发言的任务与其他说话者的话语和其他说话者和暂停的话语一起分组。拟议的扬声器诊断将整合空间和光谱扬声器的特征,这些功能将通过多通道扬声器定位和单渠道扬声器嵌入来提取。重复的神经网络将接受培训以进行分类,以进行诊断,该目的可以在会议上处理任意数量的说话者。提出的分离系统将使用开放的多通道扬声器分离数据集评估,这些数据集既包含房间混响和背景噪声。在不利的声学环境中,该项目的结果有望大大提高连续扬声器分离的性能以及扬声器诊断的表现,从而有助于弥补识别单聊天者语音和识别多对待者的演讲之间的性能差距。这个项目的是开发一个深度学习系统,该系统可以在会话或会议设置中不断地分开单个说话者,并准确地认识到这些说话者的话语。在最新进步的基础上,该项目以与说话者无关的方式分离和增强了重叠的演讲者,该项目主要集中于演讲者诊断,该项目的目的是将同一扬声器的语音发言分组。为了实现说话者诊断,将进行基于深度学习的顺序分组,并将其整合到空间和频谱扬声器的特征。通过顺序组织,同时流将与较早的分离扬声器流进行分组,以形成顺序流,每个流都与当前时间的同一扬声器的所有话语相对应。将研究说话者的本地化和分类,以使连续分组能够在会议场景中创建新的顺序流并处理任意数量的说话者。随着空间维度的增加,提议的诊断方法为谁讲话的问题提供了解决方案,即在何时何地讲话,显着扩大了传统的人何时讲话的范围。提出的分离系统将使用多通道扬声器分离数据集进行评估,这些数据集包含在记录的对话中包含高度重叠的语音,以及在真实环境中存在的房间混响和背景噪声。主要评估指标将是自动语音识别中的单词错误率。说话者诊断的性能将使用诊断错误率来衡量。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子和更广泛影响的审查标准通过评估来获得支持的。

项目成果

期刊论文数量(2)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-Channel Talker-Independent Speaker Separation Through Location-Based Training
Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation
用于多通道连续语音分离的多分辨率基于位置的训练
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eric Fosler-Lussier其他文献

Eric Fosler-Lussier的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eric Fosler-Lussier', 18)}}的其他基金

RI: Small: Early Elementary Reading Verification in Challenging Acoustic Environments
RI:小:具有挑战性的声学环境中的早期小学阅读验证
  • 批准号:
    2008043
  • 财政年份:
    2020
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
RI: Medium: Deep Neural Networks for Robust Speech Recognition through Integrated Acoustic Modeling and Separation
RI:中:通过集成声学建模和分离实现鲁棒语音识别的深度神经网络
  • 批准号:
    1409431
  • 财政年份:
    2014
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Continuing Grant
CI-ADDO-NEW: Collaborative Research: The Speech Recognition Virtual Kitchen
CI-ADDO-NEW:协作研究:语音识别虚拟厨房
  • 批准号:
    1305319
  • 财政年份:
    2013
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
CI-P:Collaborative Research:The Speech Recognition Virtual Kitchen
CI-P:协作研究:语音识别虚拟厨房
  • 批准号:
    1205424
  • 财政年份:
    2012
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
RI: Medium: Collaborative Research: Explicit Articulatory Models of Spoken Language, with Application to Automatic Speech Recognition
RI:媒介:协作研究:口语显式发音模型及其在自动语音识别中的应用
  • 批准号:
    0905420
  • 财政年份:
    2009
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
CAREER: Breaking the phonetic code: novel acoustic-lexical modeling techniques for robust automatic speech recognition
职业:打破语音密码:用于鲁棒自动语音识别的新颖声学词汇建模技术
  • 批准号:
    0643901
  • 财政年份:
    2006
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Continuing Grant
Workshop: Student Research in Computational Linguistics, at the HLT/NAACL 2004 Conference
研讨会:计算语言学学生研究,HLT/NAACL 2004 会议
  • 批准号:
    0422841
  • 财政年份:
    2004
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant

相似国自然基金

基于数物融合深度学习的深大基坑施工灾变风险在线预测与防控研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于深度学习的深地叠前时空域地震子波提取方法研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    55 万元
  • 项目类别:
    面上项目
基于三支决策和强化学习的深空探测器非预期故障自主诊断与系统重构研究
  • 批准号:
    61903015
  • 批准年份:
    2019
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
基于Deep-learning的三江源区冰川监测动态识别技术研究
  • 批准号:
    51769027
  • 批准年份:
    2017
  • 资助金额:
    38.0 万元
  • 项目类别:
    地区科学基金项目
基于异构医学影像数据的深度挖掘技术及中枢神经系统重大疾病的精准预测
  • 批准号:
    61672236
  • 批准年份:
    2016
  • 资助金额:
    64.0 万元
  • 项目类别:
    面上项目

相似海外基金

CRII: OAC: A Compressor-Assisted Collective Communication Framework for GPU-Based Large-Scale Deep Learning
CRII:OAC:基于 GPU 的大规模深度学习的压缩器辅助集体通信框架
  • 批准号:
    2348465
  • 财政年份:
    2024
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
SHF: Small: Hardware-Software Co-design for Privacy Protection on Deep Learning-based Recommendation Systems
SHF:小型:基于深度学习的推荐系统的隐私保护软硬件协同设计
  • 批准号:
    2334628
  • 财政年份:
    2024
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Standard Grant
DeepMARA - Deep Reinforcement Learning based Massive Random Access Toward Massive Machine-to-Machine Communications
DeepMARA - 基于深度强化学习的大规模随机访问实现大规模机器对机器通信
  • 批准号:
    EP/Y028252/1
  • 财政年份:
    2024
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Fellowship
Co-creation between content-generating AI and humans based on deep learning
基于深度学习的内容生成人工智能与人类的共同创造
  • 批准号:
    23K04201
  • 财政年份:
    2023
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Security Evaluation Method Against Deep-Learning-Based Side-Channel Attacks Exploiting Physical Behavior of Cryptographic Hardware
针对利用密码硬件物理行为的基于深度学习的侧信道攻击的安全评估方法
  • 批准号:
    23K11102
  • 财政年份:
    2023
  • 资助金额:
    $ 39.06万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了