Deep structured speech models
深层结构化语音模型
基本信息
- 批准号:RGPIN-2021-02652
- 负责人:
- 金额:$ 2.77万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2022
- 资助国家:加拿大
- 起止时间:2022-01-01 至 2023-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
State-of-the-art speech recognition models are either end-to-end models or hybrid models. End-to-end models are entirely based on deep neural networks (DNN). Hybrid models combine an underlying structure of weighted finite-state automata (WFSA) with a surface layer of deep neural networks. When trained on large enough amounts of annotated recordings similar to the test data, end-to-end models can outperform hybrid models. For mismatched or smaller training data, hybrid models are often a better choice since they can use prior linguistic knowledge encoded in their structure to better generalize and avoid overfitting. However, in hybrid models, finite-state automata and deep neural components are not integrated; they are trained independently with different objectives and algorithms. As a result, a performance gap remains even when enough training data is available. In practice, hybrid models are complicated to train, require specialized coding, and cannot be easily integrated with common deep learning frameworks such as Pytorch or Tensorflow. Thus their implementations lag behind the latest developments in general deep learning. Recent proposals for differentiable automata suggest how they could be integrated with deep neural networks into a single model with an end-to-end differentiable loss, in a way that is efficient in time and space to be scalable enough for speech problems. This opens up several areas of research which are yet almost unexplored for speech modelling. I propose to work on three promising lines of investigation. 1- New architectures with joint training of WFSA and DNN parameters may bridge the performance gap when enough training data is available. 2 - New loss functions closer to actual sequence-based objective functions such as word or phoneme error rate should yield better performance than approximate losses used in deep neural only models. 3- Partial supervision afforded by structured generative models can significantly reduce the need for transcribed data. Although applicable to a wide range of problems, integrated models will have their largest impact where underlying structure is complex and annotated data is scarce. Thus I intend to apply them first to problems I encountered in my recent work on Indigenous languages spoken in Canada, ranging from subword analysis to speech recognition. Making speech technology accessible to these languages will benefit their transcription, preservation, and revitalization. This research addresses key limitations of current deep learning models in speech recognition, but potentially has broader applications in natural language processing, machine translation, or genomics, where sequence-to-sequence and segmentation problems are common. Because it combines the solid mathematical framework of probabilistic models with the practical, scalable methods of deep learning, this approach is well positioned to generate advances in knowledge while providing a rich learning environment.
最新的语音识别模型是端到端模型或混合模型。端到端模型完全基于深神网络(DNN)。混合模型将加权有限状态自动机(WFSA)的基础结构与深神经网络的表面层结合在一起。当接受与测试数据相似的足够大量带注释的记录进行培训时,端到端模型可以胜过混合模型。对于不匹配或较小的培训数据,混合模型通常是一个更好的选择,因为它们可以在结构中使用先前的语言知识来更好地概括并避免过度拟合。但是,在混合模型中,有限状态的自动机和深神经成分没有集成。他们通过不同的目标和算法对他们进行了独立培训。结果,即使有足够的培训数据,性能差距也会存在。在实践中,混合模型很复杂,需要训练,需要专门的编码,并且不能轻易地与常见的深度学习框架(例如Pytorch或Tensorflow)集成。因此,他们的实现落后于一般深度学习的最新发展。最新的可区分自动机的建议表明,如何将它们与深层神经网络集成为具有端到端可区分损失的单个模型,这种方式在时间和空间上有效地足以扩展到语音问题。这打开了几个研究领域,这些领域几乎没有用于语音建模。我建议研究三个有前途的调查线。 1-通过WFSA和DNN参数联合培训的新体系结构在有足够的培训数据时可能会弥合性能差距。 2-更接近实际基于序列的目标函数(例如单词或音素错误率)的新损失函数应比仅深神经模型中使用的近似损失产生更好的性能。 3-结构化生成模型提供的部分监督可以大大减少对数据的需求。尽管适用于广泛的问题,但集成模型将在基础结构复杂而注释数据的情况下具有最大的影响。因此,我打算首先将它们应用于我最近在加拿大所说的土著语言的工作中遇到的问题,从子词分析到语音识别。使这些语言可以访问的语音技术将受益于其转录,保存和振兴。这项研究解决了语音识别中当前深度学习模型的关键局限性,但在自然语言处理,机器翻译或基因组学中可能具有更广泛的应用,在这种情况下,序列到序列和细分问题很常见。由于它将概率模型的坚实数学框架与实用,可扩展的深度学习方法相结合,因此这种方法是在提供丰富的学习环境的同时,可以在知识方面产生进步。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Boulianne, Gilles其他文献
Speaker and session variability in GMM-based speaker verification
- DOI:
10.1109/tasl.2007.894527 - 发表时间:
2007-05-01 - 期刊:
- 影响因子:0
- 作者:
Kenny, Patrick;Boulianne, Gilles;Dumouchel, Pierre - 通讯作者:
Dumouchel, Pierre
Joint factor analysis versus eigenchannels in speaker recognition
- DOI:
10.1109/tasl.2006.881693 - 发表时间:
2007-05-01 - 期刊:
- 影响因子:0
- 作者:
Kenny, Patrick;Boulianne, Gilles;Dumouchel, Pierre - 通讯作者:
Dumouchel, Pierre
Boulianne, Gilles的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Boulianne, Gilles', 18)}}的其他基金
Deep structured speech models
深层结构化语音模型
- 批准号:
DGECR-2021-00092 - 财政年份:2021
- 资助金额:
$ 2.77万 - 项目类别:
Discovery Launch Supplement
Deep structured speech models
深层结构化语音模型
- 批准号:
RGPIN-2021-02652 - 财政年份:2021
- 资助金额:
$ 2.77万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
SERT-nNOS蛋白相互作用的结构基础及其小分子互作抑制剂的设计、合成及快速抗抑郁活性研究
- 批准号:82373728
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
冻融循环下注浆结构面压剪破坏宏微观机理与强度模型
- 批准号:42307259
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
高强韧淀粉纳米晶-多糖仿生包装膜构建及其成膜过程中“链簇结构-三维网络”形成机制的研究
- 批准号:32372278
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
面向热-质-电均匀分布的空冷型燃料电池结构-控制协同优化设计
- 批准号:52375263
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
可控气泡诱导自组装石墨烯基复合薄膜多尺度结构与传热性能的协同控制
- 批准号:52302104
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
HEAR-HEARTFELT (Identifying the risk of Hospitalizations or Emergency depARtment visits for patients with HEART Failure in managed long-term care through vErbaL communicaTion)
倾听心声(通过口头交流确定长期管理护理中的心力衰竭患者住院或急诊就诊的风险)
- 批准号:
10723292 - 财政年份:2023
- 资助金额:
$ 2.77万 - 项目类别:
Crowd-Powered Machine Learning to Diagnose ASD and ADHD in Adolescents from Digital Social Interactions
众包机器学习通过数字社交互动诊断青少年 ASD 和 ADHD
- 批准号:
10682965 - 财政年份:2023
- 资助金额:
$ 2.77万 - 项目类别:
Deep structured speech models
深层结构化语音模型
- 批准号:
DGECR-2021-00092 - 财政年份:2021
- 资助金额:
$ 2.77万 - 项目类别:
Discovery Launch Supplement
Deep structured speech models
深层结构化语音模型
- 批准号:
RGPIN-2021-02652 - 财政年份:2021
- 资助金额:
$ 2.77万 - 项目类别:
Discovery Grants Program - Individual
Leveraging deep learning and clinical notes for surveillance and prediction of intentional self-harm and suicide
利用深度学习和临床记录来监测和预测故意自残和自杀
- 批准号:
10330113 - 财政年份:2021
- 资助金额:
$ 2.77万 - 项目类别: