Social Perceptions of Synthetic Speakers

合成扬声器的社会认知

基本信息

批准号：
423651352
负责人：
Professor Dr.-Ing. Sebastian Möller
金额：
--
依托单位：
Quality and Usability Lab
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2019
资助国家：
德国
起止时间：
2018-12-31 至 2021-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/423651352?language=en
关键词：
Social Perceptions Synthetic Speakers

项目摘要

Speech signals automatically induce social perceptions in listeners regarding the speakers. With acoustic analysis and signal manipulation, a great body of knowledge has been accumulated regarding relevant acoustic correlates of social perceptions, such as spectral and prosodic parameters, as well as perceptual dimensions for natural speech. However, despite the advent of modern speech synthesis paradigms providing very high quality, it is yet to be understood, if results from natural speech also hold for synthesized speech. Hence, the major research question is: “Which acoustic features of synthesized speech affect subjective perceptions of social speaker characteristics?”In order to answer this question, this project studies social perception of the two basic social attributions, competence and benevolence, for text-to-speech (TTS) synthesizers in two potential application domains: Stimuli from the topics of healthcare and of customer service. Results are compared to those obtained from natural speech in earlier projects. It is tested whether competence and benevolence also emerge as basic social attributions, or if other dimensions are more relevant. Regarding the speech signal, similarities and differences in acoustic parameters and their systematics are identified. A mid-term result is an acoustic prediction model of the identified social dimensions for synthesized speech.On a methodological level, utterances are created with state-of-the-art TTS systems and systematically modified on the signal level, in order to produce stimuli for empirical testing with human listeners. Crowd-sourcing techniques are applied for the required listening and rating tests. The final goal is to examine, how acoustic features and patterns can be directly incorporated in modern TTS methodologies (Hidden-Markov-Models, Deep Neural Networks) instead of post-processing signal manipulation. This leads to the secondary research question: “Which alterations of the synthesis procedure lead to positive perceptions of speakers?” For this aim, current approaches from speaker conversion are applied.Apart from the fundamental knowledge gained from this research, results will be relevant for TTS system developers, in order to efficiently improve voices for particular service domains.

语音信号自动在听众中就演讲者引起社会看法。通过声学分析和信号操纵，关于社会感知的相关声学相关性，例如光谱和韵律参数，以及自然语音的知觉维度，广泛的知识已经准确。但是，尽管现代语音综合范式的发展提供了非常高质量的范围，但如果自然语音的结果也适用于合成的语音，但尚待理解。 Hence, the major research question is: “Which acoustic features of synthesized speech affect subjective perceptions of social speaker characteristics?”In order to answer this question, this project studies social perception of the two basic social attributes, competence and benevolence, for text-to-speech (TTS) synthesizers in two potential application domains: Stimuli from the topics of healthcare and of customer service.将结果与早期项目中自然语音获得的结果进行比较。它测试了能力和仁慈是否也作为基本社会属性出现，还是其他维度更相关。关于语音信号，声学参数及其系统的相似性和差异。中期结果是一个综合语音的确定社会维度的声学预测模型。在方法论水平上，用最新的TTS系统创建了话语，并在信号级别进行了系统修改，以便与人类听众产生经验测试的刺激。群源技术适用于所需的聆听和评级测试。最终的目标是检查如何将声学特征和模式直接纳入现代TTS方法（隐藏式Markov模型，深神经网络）而不是后处理信号操作中。这导致了二级研究问题：“综合程序的哪些变化导致对说话者的积极看法？”为了实现这一目标，采用了说话者转换的当前方法。从这项研究中获得的基本知识中，结果将与TTS系统开发人员相关，以便有效地改善特定服务领域的声音。