Articulatory Speech Synthesis for Natural User Interfaces

自然用户界面的发音合成

基本信息

批准号：
463376-2014
负责人：
Penn, Gerald
金额：
$ 14.64万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Strategic Projects - Group
财政年份：
2015
资助国家：
加拿大
起止时间：
2015-01-01 至 2016-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=577054
关键词：
Articulatory Speech Synthesis Natural User

项目摘要

For years, articulatory synthesis research has been largely overshadowed by formant-based and acoustic-based speech synthesis techniques. While successful in some domains (e.g., voice-based databases), these techniques still cannot produce natural looking and sounding speech from text from an arbitrary speaker. Natural looking and sounding speech technology is one of the next major milestones in voice-based interaction for natural user interfaces. Articulatory speech synthesis has progressed steadily at the fringes of both industrial and academic interest and is now poised to provide the necessary platform to overcome basic problems in speech production and, we believe, represents the next major advance in speech synthesis technology. Because of the structural complexity of the human vocal tract and of speech production behaviour, prior research in 3-dimensional articulatory synthesis has been focused on analyzing and modeling narrowly defined aspects of speech production and vocal tract structure. Rather than modeling a few sub-components of the overall vocal tract for production of a limited set of unnatural utterances, a more complete platform is needed that will allow vocal tract sub-components to be integrated and tested within the context of a working articulatory speech synthesizer that utilizes the best available technologies for the entire vocal tract. For decades, the Haskins 2D Articulatory Speech Synthesizer has been commonly used, even with the well-known limitations of shapes and sounds it can produce, and the lack of accurate representations of either generic or speaker-specific production parameters. Advances, such as VTL by Birkholtz, have made progress in 3D articulatory speech synthesis, but remain visually undeveloped as well as lacking biomechanical foundations. To overcome these limitations and provide a platform for new research in articulatory speech synthesis, we propose to construct and evaluate an aerodynamically driven articulatory speech synthesizer based on a comprehensive, parameterized 3D biomechanical model of the vocal and facial articulators, that is capable of producing both visible and acoustic speech and non-speech.

多年来，基于实剂和基于声学的语音合成技术在很大程度上掩盖了关节综合研究。尽管在某些领域（例如，基于语音的数据库）成功，但这些技术仍然无法从任意扬声器中产生自然的外观和声音。天然外观和声音的语音技术是自然用户界面基于语音的交互的下一个主要里程碑之一。发音性语音综合在工业和学术兴趣的边缘稳步发展，现在有望提供必要的平台来克服言语生产的基本问题，我们认为，这代表了语音合成技术的下一个重大进步。由于人声道的结构复杂性和言语生产行为的结构复杂性，三维关节综合的先前研究集中在分析和建模语音生产和声带结构的狭义方面。与其建模整体人声道的一些子组件以生产有限的不自然话语，还需要一个更完整的平台，这将使声带子组件能够在工作的发音语音合成器中进行集成和测试，以利用整个人声道的最佳最佳技术。几十年来，即使存在着众所周知的形状和声音局限性，Haskins 2D发音语音合成器也被通常使用，并且缺乏通用或特定于扬声器的生产参数的准确表示。诸如Birkholtz的VTL之类的进步已在3D关节语音合成中取得了进步，但在视觉上仍未开发以及缺乏生物力学基础。为了克服这些局限性并为发音语音综合的新研究提供了一个平台，我们建议基于一个全面的，参数化的3D生物力学模型来构建和评估空气动力学驱动的关节式语音合成器，该模型能够产生可见的和声音语音和声音和声音和声音和非传语和非镜头。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Penn, Gerald其他文献

Bubble Sets: Revealing Set Relations with Isocontours over Existing Visualizations

DOI：
10.1109/tvcg.2009.122
发表时间：
2009-11-01
期刊：
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS
影响因子：
5.2
作者：
Collins, Christopher;Penn, Gerald;Carpendale, Sheelagh
通讯作者：
Carpendale, Sheelagh

APPLYING CONVOLUTIONAL NEURAL NETWORKS CONCEPTS TO HYBRID NN-HMM MODEL FOR SPEECH RECOGNITION

DOI：
10.1109/icassp.2012.6288864
发表时间：
2012-01-01
期刊：
2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)
影响因子：
0
作者：
Abdel-Hamid, Ossama;Mohamed, Abdel-rahman;Penn, Gerald
通讯作者：
Penn, Gerald