SpeechWave

语音波

基本信息

  • 批准号:
    EP/R012067/1
  • 负责人:
  • 金额:
    $ 93.54万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2018
  • 资助国家:
    英国
  • 起止时间:
    2018 至 无数据
  • 项目状态:
    已结题

项目摘要

Speech recognition has made major advances in the past few years. Error rates have been reduced by more than half on standard large-scale tasks such as Switchboard (conversational telephone speech), MGB (multi-genre broadcast recordings), and AMI (multiparty meetings). These research advances have quickly translated into commercial products and services: speech-based applications and assistants such as such as Apple's Siri, Amazon's Alexa, and Google voice search have become part of daily life for many people. Underpinning the improved accuracy of these systems are advances in acoustic modelling, with deep learning having had an outstanding influence on the field.However, speech recognition is still very fragile: it has been successfully deployed in specific acoustic conditions and task domains - for instance, voice search on a smart phone - and degrades severely when the conditions change. This is because speech recognition is highly vulnerable to additive noise caused by multiple acoustic sources, and to reverberation. In both cases, acoustic conditions which have essentially no effect on the accuracy of human speech recognition can have a catastrophic impact on the accuracy of a state-of-the-art automatic system. A reason for such brittleness is the lack of a strong model for acoustic robustness. Robustness is usually addressed through multi-condition training, in which the training set comprises speech examples across the many required acoustic conditions, often constructed by mixing speech with noise at different signal-to-noise ratios. For a limited set of acoustic conditions these techniques can work well, but they are inefficient and do not offer a model of multiple acoustic sources, nor do they factorise the causes of variability. For instance, the best reported speech recognition results for transcription of the AMI corpus test set using single distant microphone recordings is about 38% word error rate (for non-overlapped speech), compared to about 5% error rate for human listeners. In the past few years there have been several approaches that have tried to address these problems: explicitly learning to separate multiple sources; factorised acoustic models using auxiliary features; and learned spectral masks for multi-channel beam-forming. SpeechWave will pursue an alternative approach to robust speech recognition: The development of acoustic models which learn directly from the speech waveform. The motivation to operate directly in the waveform domain arises from the insight that redundancy in speech signals is highly likely to be a key factor in the robustness of human speech recognition. Current approaches to speech recognition separate non-adaptive signal processing components from the adaptive acoustic model, and in so doing lose the redundancy - and, typically, information such as the phase - present in the speech waveform. Waveform models are particularly exciting as they combine the previously distinct signal processing and acoustic modelling components.In SpeechWave, we shall explore novel waveform-based convolutional and recurrent networks which combine speech enhancement and recognition in a factorised way, and approaches based on kernel methods and on recent research advances in sparse signal processing and speech perception. Our research will be evaluated on standard large-scale speech corpora. In addition we shall participate in, and organise, international challenges to assess the performance of speech recognition technologies. We shall also validate our technologies in practice, in the context of the speech recognition challenges faced by our project partners BBC, Emotech, Quorate, and SRI.
在过去的几年中,语音认可取得了重大进步。在标准的大规模任务(例如,交谈板(对话电话),MGB(多流派广播录音)和AMI(多方会议)等标准大规模任务上,错误率已降低了一半以上。这些研究进展已迅速转化为商业产品和服务:基于语音的应用程序和诸如Apple Siri,Amazon的Alexa和Google语音搜索之类的助手已成为许多人日常生活的一部分。在声学建模方面,基于这些系统的提高精确度的基础是对该领域的出色影响。无论如何,语音识别仍然非常脆弱:它已经成功地部署在特定的声学条件和任务域中,例如,在智能手机上进行语音搜索 - 并在条件变化时严重降低。这是因为语音识别非常容易受到由多种声学源引起的加性噪声​​和回响。在这两种情况下,对人类语音识别的准确性没有影响的声学条件可能会对最先进的自动系统的准确性产生灾难性影响。这种脆弱性的原因是缺乏强大的声学鲁棒性模型。鲁棒性通常是通过多条件训练来解决的,其中训练集包括在许多必需的声学条件下的语音示例,通常通过以不同的信噪比将语音与噪声混合构建。对于有限的声学条件,这些技术可以很好地工作,但是它们效率低下,没有提供多种声学源的模型,也不会将可变性的原因分解。例如,使用单个远程麦克风记录的AMI语料库测试集转录的最佳报告的语音识别结果约为38%的单词错误率(对于非封闭式语音),而人类听众的错误率约为5%。在过去的几年中,有几种方法试图解决这些问题:明确学习分开多个来源;使用辅助特征分解的声学模型;并学到了用于多通道梁形成的光谱掩模。 SpeechWave将采用一种替代性语音识别的替代方法:声学模型的发展,这些模型直接从语音波形中学习。直接在波形领域中运行的动机是由洞察力引起的,即语音信号中的冗余很可能是人类言语识别的鲁棒性的关键因素。语音识别的当前方法将非自适应信号处理组件与自适应声学模型分开,因此在这样做会丢失冗余 - 通常,诸如语音波形中存在的阶段 - 诸如阶段 - 。波形模型特别令人兴奋,因为它们结合了先前不同的信号处理和声学建模组件。在语音波中,我们将探索新型的基于波形的卷积和复发网络,这些网络将语音增强和识别结合在一起,并基于内核方法以及最新研究的方法在稀疏信号处理和语音感知方面取得了进步。我们的研究将对标准的大规模语音语料库进行评估。此外,我们将参与并组织国际挑战,以评估语音识别技术的性能。在实践中,我们还将在我们的项目合作伙伴英国广播公司(BBC),情感,Quotate和Sri面临的语音识别挑战的背景下验证我们的技术。

项目成果

期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Multi-Stream Acoustic Modelling Using Raw Real and Imaginary Parts of the Fourier Transform
Deep Scattering Power Spectrum Features for Robust Speech Recognition
  • DOI:
    10.21437/interspeech.2020-2656
  • 发表时间:
    2020-01-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Joy, Neethu M.;Oglic, Dino;Renals, Steve
  • 通讯作者:
    Renals, Steve
Towards a Unified Analysis of Random Fourier Features
  • DOI:
  • 发表时间:
    2018-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zhu Li;Jean-Francois Ton;Dino Oglic;D. Sejdinovic
  • 通讯作者:
    Zhu Li;Jean-Francois Ton;Dino Oglic;D. Sejdinovic
Speech Acoustic Modelling Using Raw Source and Filter Components
  • DOI:
    10.21437/interspeech.2021-53
  • 发表时间:
    2021-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
  • 通讯作者:
    Erfan Loweimi;Z. Cvetković;P. Bell;S. Renals
A Deep 2D Convolutional Network for Waveform-Based Speech Recognition
  • DOI:
    10.21437/interspeech.2020-1870
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dino Oglic;Z. Cvetković;P. Bell;S. Renals
  • 通讯作者:
    Dino Oglic;Z. Cvetković;P. Bell;S. Renals
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zoran Cvetkovic其他文献

Overcomplete expansions and robustness
过度完备的扩展和鲁棒性

Zoran Cvetkovic的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zoran Cvetkovic', 18)}}的其他基金

Challenges in Immersive Audio Technology
沉浸式音频技术的挑战
  • 批准号:
    EP/X032981/1
  • 财政年份:
    2024
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Research Grant
Visits to University of California, Berkeley, Stanford University, and SRI International
访问加州大学伯克利分校、斯坦福大学、SRI International
  • 批准号:
    EP/K034626/1
  • 财政年份:
    2013
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Research Grant
Perceptual Sound Field Reconstruction and Coherent Emulation
感知声场重建和相干仿真
  • 批准号:
    EP/F001142/1
  • 财政年份:
    2008
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Research Grant
Robust Syllable Recognition in the Acousic-Waveform Domain
声音波形域中的鲁棒音节识别
  • 批准号:
    EP/D053005/1
  • 财政年份:
    2006
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Research Grant

相似国自然基金

海洋缺氧对持久性有机污染物入海后降解行为的影响
  • 批准号:
    42377396
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
  • 批准号:
    62371263
  • 批准年份:
    2023
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
腙的Heck/脱氮气重排串联反应研究
  • 批准号:
    22301211
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
  • 批准号:
    52364038
  • 批准年份:
    2023
  • 资助金额:
    33 万元
  • 项目类别:
    地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
  • 批准号:
    82371176
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目

相似海外基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    $ 93.54万
  • 项目类别:
    Studentship
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了