Articulatory Speech Synthesis for Natural User Interfaces
自然用户界面的发音合成
基本信息
- 批准号:463376-2014
- 负责人:
- 金额:$ 14.64万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Strategic Projects - Group
- 财政年份:2015
- 资助国家:加拿大
- 起止时间:2015-01-01 至 2016-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
For years, articulatory synthesis research has been largely overshadowed by formant-based and acoustic-based speech synthesis techniques. While successful in some domains (e.g., voice-based databases), these techniques still cannot produce natural looking and sounding speech from text from an arbitrary speaker. Natural looking and sounding speech technology is one of the next major milestones in voice-based interaction for natural user interfaces. Articulatory speech synthesis has progressed steadily at the fringes of both industrial and academic interest and is now poised to provide the necessary platform to overcome basic problems in speech production and, we believe, represents the next major advance in speech synthesis technology.
Because of the structural complexity of the human vocal tract and of speech production behaviour, prior research in 3-dimensional articulatory synthesis has been focused on analyzing and modeling narrowly defined aspects of speech production and vocal tract structure. Rather than modeling a few sub-components of the overall vocal tract for production of a limited set of unnatural utterances, a more complete platform is needed that will allow vocal tract sub-components to be integrated and tested within the context of a working articulatory speech synthesizer that utilizes the best available technologies for the entire vocal tract.
For decades, the Haskins 2D Articulatory Speech Synthesizer has been commonly used, even with the well-known limitations of shapes and sounds it can produce, and the lack of accurate representations of either generic or speaker-specific production parameters. Advances, such as VTL by Birkholtz, have made progress in 3D articulatory speech synthesis, but remain visually undeveloped as well as lacking biomechanical foundations. To overcome these limitations and provide a platform for new research in articulatory speech synthesis, we propose to construct and evaluate an aerodynamically driven articulatory speech synthesizer based on a comprehensive, parameterized 3D biomechanical model of the vocal and facial articulators, that is capable of producing both visible and acoustic speech and non-speech.
多年来,发音合成研究在很大程度上被基于共振峰和基于声学的语音合成技术所掩盖。虽然在某些领域(例如基于语音的数据库)取得了成功,但这些技术仍然无法从任意说话者的文本中生成外观和听起来自然的语音。自然外观和听起来自然的语音技术是自然用户界面基于语音的交互的下一个主要里程碑之一。发音语音合成在工业和学术兴趣的边缘稳步发展,现在准备提供必要的平台来克服语音生成中的基本问题,我们相信,它代表了语音合成技术的下一个重大进步。
由于人类声道和语音产生行为的结构复杂性,先前的 3 维发音合成研究一直集中于对语音产生和声道结构的狭义定义方面进行分析和建模。我们需要一个更完整的平台,允许在工作发音语音的上下文中集成和测试声道子组件,而不是对整个声道的几个子组件进行建模以产生一组有限的非自然话语。合成器利用整个声道的最佳可用技术。
几十年来,Haskins 2D 发音语音合成器一直被广泛使用,尽管它可以产生的形状和声音存在众所周知的限制,并且缺乏通用或特定于扬声器的生产参数的准确表示。 Birkholtz 的 VTL 等先进技术在 3D 发音语音合成方面取得了进展,但在视觉上仍然不发达,并且缺乏生物力学基础。为了克服这些限制并为发音合成的新研究提供一个平台,我们建议构建和评估一个基于全面的、参数化的发声和面部发音器的 3D 生物力学模型的空气动力学驱动的发音合成器,该模型能够产生可见的和听觉的语音和非语音。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Penn, Gerald其他文献
Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations
- DOI:
10.1109/tvcg.2009.122 - 发表时间:
2009-11-01 - 期刊:
- 影响因子:5.2
- 作者:
Collins, Christopher;Penn, Gerald;Carpendale, Sheelagh - 通讯作者:
Carpendale, Sheelagh
APPLYING CONVOLUTIONAL NEURAL NETWORKS CONCEPTS TO HYBRID NN-HMM MODEL FOR SPEECH RECOGNITION
- DOI:
10.1109/icassp.2012.6288864 - 发表时间:
2012-01-01 - 期刊:
- 影响因子:0
- 作者:
Abdel-Hamid, Ossama;Mohamed, Abdel-rahman;Penn, Gerald - 通讯作者:
Penn, Gerald
Penn, Gerald的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Penn, Gerald', 18)}}的其他基金
Privacy-Preserving Natural Language Processing
保护隐私的自然语言处理
- 批准号:
RGPIN-2022-05197 - 财政年份:2022
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Spreading the Word: The Theory of Distributed Representations in Speech and Natural Language Processing
传播信息:语音和自然语言处理中的分布式表示理论
- 批准号:
RGPIN-2015-04069 - 财政年份:2019
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Spreading the Word: The Theory of Distributed Representations in Speech and Natural Language Processing
传播信息:语音和自然语言处理中的分布式表示理论
- 批准号:
RGPIN-2015-04069 - 财政年份:2018
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Spreading the Word: The Theory of Distributed Representations in Speech and Natural Language Processing
传播信息:语音和自然语言处理中的分布式表示理论
- 批准号:
RGPIN-2015-04069 - 财政年份:2017
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Refactoring Feature-Structure-based Dialogue Systems: Software Engineering Meets Spoken Language Processing
重构基于特征结构的对话系统:软件工程与口语处理的结合
- 批准号:
518202-2017 - 财政年份:2017
- 资助金额:
$ 14.64万 - 项目类别:
Engage Grants Program
Spreading the Word: The Theory of Distributed Representations in Speech and Natural Language Processing
传播信息:语音和自然语言处理中的分布式表示理论
- 批准号:
RGPIN-2015-04069 - 财政年份:2016
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Spreading the Word: The Theory of Distributed Representations in Speech and Natural Language Processing
传播信息:语音和自然语言处理中的分布式表示理论
- 批准号:
RGPIN-2015-04069 - 财政年份:2015
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Spoken language processing in ecologically valid contexts
生态有效环境中的口语处理
- 批准号:
239533-2010 - 财政年份:2014
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
Articulatory Speech Synthesis for Natural User Interfaces
自然用户界面的发音合成
- 批准号:
463376-2014 - 财政年份:2014
- 资助金额:
$ 14.64万 - 项目类别:
Strategic Projects - Group
Spoken language processing in ecologically valid contexts
生态有效环境中的口语处理
- 批准号:
239533-2010 - 财政年份:2013
- 资助金额:
$ 14.64万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
融合多模态学习分析的英语演讲能力评估模型与应用研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
儿童植入耳蜗后听觉行为与言语发展进程的关联性研究
- 批准号:81170916
- 批准年份:2011
- 资助金额:65.0 万元
- 项目类别:面上项目
儿童植入人工耳蜗后开放式听觉言语发育特性研究
- 批准号:30872859
- 批准年份:2008
- 资助金额:30.0 万元
- 项目类别:面上项目
相似海外基金
Construction of articulatory movement database, normalization of databases, and speech synthesis based on the database
构音数据库的构建、数据库规范化以及基于数据库的语音合成
- 批准号:
19K12024 - 财政年份:2019
- 资助金额:
$ 14.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Control strategies for articulatory speech synthesis for natural user interfaces
自然用户界面的发音语音合成控制策略
- 批准号:
506576-2017 - 财政年份:2018
- 资助金额:
$ 14.64万 - 项目类别:
Strategic Projects - Group
Control strategies for articulatory speech synthesis for natural user interfaces
自然用户界面的发音语音合成控制策略
- 批准号:
506576-2017 - 财政年份:2017
- 资助金额:
$ 14.64万 - 项目类别:
Strategic Projects - Group
Articulatory text-to-speech synthesis based on digital waveguide mesh driven by deep neural network
基于深度神经网络驱动的数字波导网格的发音文本转语音合成
- 批准号:
17K20004 - 财政年份:2017
- 资助金额:
$ 14.64万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Speech synthesis based on articulatory movement HMM and LSP digital filter
基于发音运动HMM和LSP数字滤波器的语音合成
- 批准号:
16K00234 - 财政年份:2016
- 资助金额:
$ 14.64万 - 项目类别:
Grant-in-Aid for Scientific Research (C)