Synthesis of speech in any speaking styles based on corpus-based generation of prosodic features using the generation process model

使用生成过程模型基于语料库生成韵律特征来合成任何说话风格的语音

基本信息

批准号：
17300055
负责人：
HIROSE Keikichi
金额：
$ 10.79万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2005
资助国家：
日本
起止时间：
2005 至 2007
项目状态：
已结题

项目摘要

Research works were conducted to establish a corpus-based speech synthesis method, which is based on generation process model of fundamental frequency contours and can generate high-quality speech in any speaking styles. The original research plan was fulfilled with the following results :1. A method was developed to predict the command parameters of the generation process model using binary decision trees with inputs such as linguistic information available by parsing texts, and thus to synthesize fundamental frequency contours. An integrated method of prosodic control was realized by integrating the above method with other methods using binary decision trees to predict pause positions and lengths and phoneme durations. The validity of the method was shown through experiments on speech synthesis of various styles including emotional speech. A method was also developed to automatically extract the command parameters from observed fundamental frequency contours using binary decision tre … More es. It was shown that the accuracy of extraction increased by including linguistic information of the text into inputs of the trees.2. Binary decision trees were constructed to predict deviations in phrase and accent commands of the utterances with specific focuses from those without. Their inputs are accent types and positions in sentences of the focused words, and command values of the corresponding parts of the utterances without specific focus. An appropriate focus control was realized by modifying the phrase and accent commands predicted by the method in section 1 based on the predicted deviations.3. A two-step method was developed for generating fundamental frequency contours of Standard Chinese. It first generates phrase components in a corpus-based way, and then generates tone components in a corpus-based way. The method has a high flexibility in synthesizing fundamental frequency contours. As an example of flexible control, it was shown that proper focus control could be realized in a simple set of rules.4. Speech synthesis systems were constructed for Japanese and Chinese by integrating methods developed in sections 1 and 2 above with HMM speech synthesis. It was shown that synthetic speech with higher natural ness could be realized by our system than using "full" HMM synthesizer, where prosodic control was done in HMM framework. It was also shown that various styles of synthetic speech could be realized by our system.5. Spoken dialogue systems for road guidance and TV program guidance were constructed using the above speech synthesis systems. The validity of the developed speech synthesis method was proved through experiments on the control of speaking styles of reply speech depending on the user's characters and situations. Less

进行了研究工作以建立基于语料库的语音合成方法，该方法基于基本频率轮廓的生成过程模型，并且可以在任何语言样式中产生高质量的语音。最初的研究计划得出了以下结果：1。开发了一种方法来使用二进制决策树预测生成过程模型的命令参数，并具有输入，例如通过解析文本获得的语言信息，从而综合了基本频率轮廓。通过使用二进制决策树将上述方法与其他方法集成以预测暂停位置和长度和音素持续时间，可以实现韵律控制的综合方法。该方法的有效性是通过有关各种风格的语音综合（包括情感语音）的实验来显示的。还开发了一种方法来自动从观察到的基本频率轮廓中使用二进制决策tre提取命令参数。结果表明，通过将文本的语言信息包括在树木的输入中，提取的准确性提高了。2。构建了二进制决策树，以预测词组中的出发和口音命令，并从没有特定的焦点的命令中进行了言论。他们的输入是重点单词的句子中的重音类型和位置，而不是特定焦点的话语中相应部分的命令值。通过根据预测的出境修改该方法预测的短语和口音命令来实现适当的焦点控制。3。开发了一种两步方法来生成标准中文的基本频率轮廓。它首先以基于语料库的方式生成短语组件，然后以基于语料库的方式生成音调组件。该方法在合成基本频率轮廓方面具有很高的灵活性。作为灵活控制的一个示例，证明可以在简单的规则集中实现适当的焦点控制。4。语音合成系统是通过在上面的第1和2节中与HMM语音合成的整合方法构建的。结果表明，与使用“ Full” HMM合成器相比，我们系统可以实现具有更高自然性的合成语音，在HMM框架中进行了韵律控制。还表明，我们的系统可以实现各种综合语音样式5。使用上述语音合成系统构建了道路指导和电视节目指导的对话系统。通过实验，根据用户的角色和情况，通过实验对开发的语音综合方法的有效性证明了回复语音风格的有效性。较少的

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Constrained tone transformation technique for separation and combination of Mandarin tone and intonation

普通话声调与语调分离与组合的约束声调变换技术

DOI：
发表时间：
2006
期刊：
Journal of Acoustical Society of America 119・3
影响因子：
0
作者：
Corinne Touati;Atsushi Inoie;Hisao Kameda;H.Kameda;Jinfu Ni
通讯作者：
Jinfu Ni

Estimation of intonation variation with constrained tone transformations

通过受约束的声调变换来估计声调变化

DOI：
发表时间：
2005
期刊：
Proc. 9^<th> European Conference on Speech Communication and Technology (INTERSPEECH) CD-ROM
影响因子：
0
作者：
Keikichi Hirose;Yusuke Furuyama;Nobuaki Minematsu;Keikichi Hirose;Keikichi Hirose;広瀬啓吉;Keikichi Hirose;Keikichi Hirose;Quinghua Sun;Jinfu Ni
通讯作者：
Jinfu Ni

日本語テキスト音声合成用アクセント結合規則の改良

改进日语文本语音合成的重音组合规则

DOI：
发表时间：
2005
期刊：
日本音響学会講演論文集 CD-ROM
影响因子：
0
作者：
Keikichi Hirose;Yusuke Furuyama;Nobuaki Minematsu;Keikichi Hirose;Keikichi Hirose;広瀬啓吉;Keikichi Hirose;Keikichi Hirose;Quinghua Sun;Jinfu Ni;Keikichi Hirose;黒岩龍
通讯作者：
黒岩龍

Corpus-based extraction of F_0 contour generation process model parameters

基于语料库提取F_0轮廓生成过程模型参数

DOI：
发表时间：
2005
期刊：
Proceedings Inerspeech 2005 1
影响因子：
0
作者：
Keikichi Hirose;Yasufumi Asano;Nobuaki Minematsu;Jinfu Ni;Wentao Gu;Keikichi Hirose;Qinghua Sun;Keikichi Hirose;越智景子;Keikichi Hirose;Jinfu Ni;Quinghua Sun;広瀬啓吉;浅野泰史;河村美由紀;孫慶華;Keikichi Hirose;Keikichi Hirose
通讯作者：
Keikichi Hirose

Prosody in spoken language technologies(Special Lecture)

口语技术中的韵律（专题讲座）

DOI：
发表时间：
2007
期刊：
Proceedings of International Workshop on Nonlinear Circuits and Signal Processing(NCSP2007) CD-ROM
影响因子：
0
作者：
Keikichi Hirose;Qinghua Sun;Nobuaki Minematsu;八木裕司;Keikichi Hirose;Keikichi Hirose;Keikichi Hirose
通讯作者：
Keikichi Hirose