CRI: CI-P: Creating the Largest Speech Emotional Database by Leveraging Existing Naturalistic Recordings

CRI:CI-P:利用现有的自然录音创建最大的语音情感数据库

基本信息

  • 批准号:
    1823166
  • 负责人:
  • 金额:
    $ 9.94万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-09-01 至 2021-02-28
  • 项目状态:
    已结题

项目摘要

This community infrastructure planning project aims to consider the needs from other researchers in the design of the largest publicly available naturalistic speech emotional database, broadening the impact of the corpus across speech processing areas. The project includes a workshop with researchers with relevant but diverse expertise to introduce the current protocol for data collection, and requests their recommendations for improvements. The proposed activity will improve the protocol to address the needs from the community. Affective computing is an important research area aiming to understand, analyze, recognize, and synthesize human emotions. Providing emotion capabilities to current speech-based interfaces can facilitate transformative applications in areas related to Human Computer Interaction (HCI), healthcare, security and defense, education and entertainment. The research infrastructure envisioned in this project will open new opportunities that we cannot address with current speech emotional databases. In the area of affective computing, the proposed corpus will provide suitable training sets to explore learning algorithms that are powerful, but require large amount of labeled data. It is expected that the size, naturalness, and speaker and recording variety in the proposed corpus will allow the community to create robust models that generalize across applications. Improvements on speech emotion recognition systems will facilitate the transition of these algorithms into practical applications, providing unique societal benefits. The proposed infrastructure will also play a key role on other speech processing tasks. For the first time, the community will have the infrastructure to address speaker verification and automatic speech recognition solutions against variations due to emotion.The proposed infrastructure relies on a novel approach based on emotion retrieval along with crowdsource-based annotations to effectively build a large, naturalistic emotional database with balanced emotional content, reduced cost and reduced manual labor. The database considers podcast recordings that are available in audio-sharing websites. Although the approach of building affective databases using media content has been previously explored, the contribution of this study is the use of machine learning algorithms to retrieve audio clips with balanced emotional content, providing natural stimuli with wider spectrum of emotions. The proposed approach relies on automatic algorithms to post-process podcasts and a cost effective annotation process, which make it possible to build large scale speech emotional databases. This approach provides natural emotional renditions that are difficult to obtain with alternative data collection protocols. This project involves the research community from the design of the corpus, which is the key goal in this community infrastructure planning project. The community also play a key role in the selection of target sentences to be emotionally annotated, with novel grand challenges where the goal is to recognize and retrieve target emotional behaviors in unconstrained, unlabeled recordings.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该社区基础设施规划项目旨在考虑其他研究人员在设计最大的公开自然语音情感数据库时的需求,扩大语料库在语音处理领域的影响。该项目包括一个由具有相关但不同专业知识的研究人员参加的研讨会,以介绍当前的数据收集协议,并征求他们的改进建议。拟议的活动将改进协议以满足社区的需求。情感计算是旨在理解、分析、识别和综合人类情感的重要研究领域。为当前基于语音的界面提供情感功能可以促进人机交互(HCI)、医疗保健、安全和国防、教育和娱乐等领域的变革性应用。该项目设想的研究基础设施将带来新的机遇,这是我们当前的语音情感数据库无法解决的。在情感计算领域,所提出的语料库将提供合适的训练集来探索功能强大但需要大量标记数据的学习算法。预计所提议的语料库中的大小、自然度以及说话者和录音的多样性将使社区能够创建跨应用程序通用的强大模型。语音情感识别系统的改进将有助于这些算法转化为实际应用,从而提供独特的社会效益。拟议的基础设施还将在其他语音处理任务中发挥关键作用。社区将首次拥有解决说话者验证和自动语音识别解决方案的基础设施,以应对因情绪而变化的问题。所提出的基础设施依赖于一种基于情感检索和基于众包的注释的新颖方法,以有效地构建一个大型的、自然的情感数据库,情感内容均衡,降低成本,减少体力劳动。该数据库考虑音频共享网站中提供的播客录音。尽管之前已经探索过使用媒体内容构建情感数据库的方法,但本研究的贡献在于使用机器学习算法来检索具有平衡情感内容的音频片段,从而提供具有更广泛情感的自然刺激。所提出的方法依赖于自动算法来后处理播客和具有成本效益的注释过程,这使得建立大规模语音情感数据库成为可能。这种方法提供了自然的情感再现,这是使用其他数据收集协议难以获得的。该项目从语料库的设计就涉及到研究社区,这是该社区基础设施规划项目的关键目标。该社区还在选择要进行情感注释的目标句子方面发挥着关键作用,提出了新颖的重大挑战,其目标是在不受约束、未标记的录音中识别和检索目标情感行为。该奖项反映了 NSF 的法定使命,并被认为是值得的通过使用基金会的智力优势和更广泛的影响审查标准进行评估来获得支持。

项目成果

期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
An Efficient Temporal Modeling Approach for Speech Emotion Recognition by Mapping Varied Duration Sentences into Fixed Number of Chunks
通过将不同持续时间的句子映射到固定数量的块来进行语音情感识别的有效时间建模方法
  • DOI:
    10.21437/interspeech.2020-2636
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Lin, Wei;Busso, Carlos
  • 通讯作者:
    Busso, Carlos
The MSP-Conversation Corpus
MSP-对话语料库
  • DOI:
    10.21437/interspeech.2020-2444
  • 发表时间:
    2020-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Martinez;Abdelwahab, Mohammed;Busso, Carlos
  • 通讯作者:
    Busso, Carlos
Modeling Uncertainty in Predicting Emotional Attributes from Spontaneous Speech
通过自发言语预测情感属性的不确定性建模
Semi-Supervised Speech Emotion Recognition With Ladder Networks
使用阶梯网络的半监督语音情感识别
  • DOI:
    10.1109/taslp.2020.3023632
  • 发表时间:
    2020-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Parthasarathy, Srinivas;Busso, Carlos
  • 通讯作者:
    Busso, Carlos
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Carlos Busso其他文献

Understanding Bias in Multispectral Autofluorescence Lifetime Imaging: Are Models Sensitive to Oral Location?
了解多光谱自发荧光寿命成像中的偏差:模型对口腔位置敏感吗?
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kayla Caughlin;Rodrigo Cuenca Martinez;Gabriel P. Tortorelli;Dds Kathleen E. Higgins;Dds Ronald Faram;Javier A. Jo;Carlos Busso
  • 通讯作者:
    Carlos Busso
Revealing Emotional Clusters in Speaker Embeddings: A Contrastive Learning Strategy for Speech Emotion Recognition
揭示说话人嵌入中的情感簇:语音情感识别的对比学习策略
  • DOI:
    10.1109/icassp48485.2024.10447060
  • 发表时间:
    2024-01-19
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ismail Rasim Ulgen;Zongyang Du;Carlos Busso;Berrak Sisman
  • 通讯作者:
    Berrak Sisman
Driver Head Pose Estimation with Multimodal Temporal Fusion of Color and Depth Modeling Networks
使用颜色和深度建模网络的多模态时间融合进行驾驶员头部姿势估计
  • DOI:
  • 发表时间:
    1970-01-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Susmitha Gogineni;Carlos Busso
  • 通讯作者:
    Carlos Busso
SPEECH EMOTION RECOGNITION IN REAL STATIC AND DYNAMIC HUMAN-ROBOT INTERACTION SCENARIOS
真实静态和动态人机交互场景中的语音情感识别
  • DOI:
    10.1016/j.csl.2024.101666
  • 发表时间:
    2024-05-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Nicolás Grágeda;Carlos Busso;Eduardo Alvarado;Ricardo García;R. Mahú;F. Huenupán;N. B. Yoma
  • 通讯作者:
    N. B. Yoma
MSP-DISK: Naturalistic and Diverse In-Vehicle Database for Joint Pose and Seat Belt Detection
MSP-DISK:用于关节姿势和安全带检测的自然且多样化的车载数据库

Carlos Busso的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Carlos Busso', 18)}}的其他基金

CCRI: Medium: MSP-Podcast: Creating The Largest Speech Emotional Database By Leveraging Existing Naturalistic Recordings
CCRI:媒介:MSP-Podcast:利用现有的自然主义录音创建最大的语音情感数据库
  • 批准号:
    2016719
  • 财政年份:
    2020
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
CCRI: Medium: MSP-Podcast: Creating The Largest Speech Emotional Database By Leveraging Existing Naturalistic Recordings
CCRI:媒介:MSP-Podcast:利用现有的自然主义录音创建最大的语音情感数据库
  • 批准号:
    2016719
  • 财政年份:
    2020
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
RI: Small: Integrative, Semantic-Aware, Speech-Driven Models for Believable Conversational Agents with Meaningful Behaviors
RI:小型:集成的、语义感知的、语音驱动的模型,用于具有有意义行为的可信会话代理
  • 批准号:
    1718944
  • 财政年份:
    2017
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
FG 2015 Doctoral Consortium: Travel Support for Graduate Students
FG 2015 博士联盟:研究生旅行支持
  • 批准号:
    1540944
  • 财政年份:
    2015
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
CAREER: Advanced Knowledge Extraction of Affective Behaviors During Natural Human Interaction
职业:人类自然互动过程中情感行为的高级知识提取
  • 批准号:
    1453781
  • 财政年份:
    2015
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
WORKSHOP: Doctoral Consortium for the International Conference on Multimodal Interaction (ICMI 2013)
研讨会:多模式交互国际会议博士联盟 (ICMI 2013)
  • 批准号:
    1346655
  • 财政年份:
    2013
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
EAGER: Exploring the Use of Synthetic Speech as Reference Model to Detect Salient Emotional Segments in Speech
EAGER:探索使用合成语音作为参考模型来检测语音中的显着情感片段
  • 批准号:
    1329659
  • 财政年份:
    2013
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
RI: Small: Collaborative Research: Exploring Audiovisual Emotion Perception using Data-Driven Computational Modeling
RI:小型:协作研究:使用数据驱动的计算模型探索视听情感感知
  • 批准号:
    1217104
  • 财政年份:
    2012
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Workshop: Doctoral Consortium at the 14th International Conference on Multimodal Interaction
研讨会:第14届多模态交互国际会议博士联盟
  • 批准号:
    1249319
  • 财政年份:
    2012
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant

相似国自然基金

基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
  • 批准号:
    82374375
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    59 万元
  • 项目类别:
    面上项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
  • 批准号:
    32170497
  • 批准年份:
    2021
  • 资助金额:
    58 万元
  • 项目类别:
    面上项目
近邻星系中[CI]线作为新分子气体质量探针的观测研究
  • 批准号:
  • 批准年份:
    2020
  • 资助金额:
    24 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324711
  • 财政年份:
    2024
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
Collaborative Research: Maritime to Inland Transitions Towards ENvironments for Convection Initiation (MITTEN CI)
合作研究:海洋到内陆向对流引发环境的转变(MITTEN CI)
  • 批准号:
    2349934
  • 财政年份:
    2024
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Collaborative Research: Maritime to Inland Transitions Towards ENvironments for Convection Initiation (MITTEN CI)
合作研究:海洋到内陆向对流引发环境的转变(MITTEN CI)
  • 批准号:
    2349936
  • 财政年份:
    2024
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Continuing Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324713
  • 财政年份:
    2024
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
Collaborative Research: GEO OSE Track 2: Developing CI-enabled collaborative workflows to integrate data for the SZ4D (Subduction Zones in Four Dimensions) community
协作研究:GEO OSE 轨道 2:开发支持 CI 的协作工作流程以集成 SZ4D(四维俯冲带)社区的数据
  • 批准号:
    2324709
  • 财政年份:
    2024
  • 资助金额:
    $ 9.94万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了