Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

使用具有实时、低延迟语音转换的多模信号的增强语音通信

基本信息

批准号：
22KJ1519
负责人：
HUANG WENCHIN
金额：
$ 1.41万
依托单位：
Nagoya University
依托单位国家：
日本
项目类别：
Grant-in-Aid for JSPS Fellows
财政年份：
2023
资助国家：
日本
起止时间：
2023-03-08 至 2024-03-31
项目状态：
已结题

来源：
https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-22KJ1519/
关键词：
voice conversion self-supervised learning dysarthria electrolaryngeal speech

项目摘要

The purpose of this research is to apply voice conversion (VC) to realize an interactive speech production paradigm for real-world applications, with the help of multimodal signals and real-time processing techniques. In the second year, the applicant focused on three aspects.(1) Continued improvement on fundamental VC techniques, specifically self-supervised speech representation (S3R)-based VC, an emerging trend which reduces training data requirements. The applicant kept on updating S3PRL-VC, an open-source toolkit for researchers to evaluate S3R models for VC, and published the latest experimental results in the IEEE Journal of Selected Topics in Signal Processing.(2) Foreign accent conversion, a task that helps reduce foreign accents for efficient communication. A paper that provides an unified evaluation of current approaches and identifies unsolved problems is submitted to an international conference and currently under review.(3) Singing voice conversion, a fundamental technique that has the potential to augment the communication ability of human. The applicant is running a scientific event named the Singing Voice Conversion Challenge 2023, which aims to provide an unified experimental setting including task and dataset, in order to attract researchers world-wide to look into this problem and explore the limitation of the state-of-the-art techniques.

这项研究的目的是在多模式信号和实时处理技术的帮助下，应用语音转换（VC）来实现现实世界应用的交互式语音生产范式。在第二年，申请人着重于三个方面。（1）基本风险投资技术的持续改进，特别是基于自我监督的语音表示（S3R）的VC，这是一种降低培训数据要求的新兴趋势。申请人继续更新S3PRL-VC，这是一种开源工具包，供研究人员评估VC的S3R模型，并在信号处理中的IEEE选定主题杂志上发布了最新的实验结果。（2）外国重音转换，一项任务，有助于减少外国口音以高效的沟通。一篇论文提供了对当前方法并确定未解决问题的统一评估的论文，已提交国际会议并目前正在审查中。（3）唱歌语音转换，这是一种基本技术，有可能增强人类的沟通能力。申请人正在举办一场名为“歌声转换挑战2023”的科学活动，该活动旨在提供统一的实验设置，包括任务和数据集，以吸引全球研究人员，以调查这个问题并探索最先进的技术的限制。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

DOI：
10.21437/interspeech.2021-208
发表时间：
2021-06
期刊：
ArXiv
影响因子：
0
作者：
Wen-Chin Huang;Kazuhiro Kobayashi;Yu-Huai Peng;Ching-Feng Liu;Yu Tsao;Hsin-Min Wang;T. Toda
通讯作者：
Wen-Chin Huang;Kazuhiro Kobayashi;Yu-Huai Peng;Ching-Feng Liu;Yu Tsao;Hsin-Min Wang;T. Toda

CRANK: an Open-Source Software for Nonparallel Voice Conversion based on Vetor-Quantized Variational Autoencoder

CRANK：基于矢量量化变分自动编码器的非并行语音转换开源软件

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Kazuhiro Kobayashi;Wen-Chin Huang;Yi-Chiao Wu;Patrick Tobing;Tomoki Hayashi;and Tomoki Toda
通讯作者：
and Tomoki Toda

S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations

DOI：
10.1109/icassp43922.2022.9746430
发表时间：
2021-10
期刊：
ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
影响因子：
0
作者：
Wen-Chin Huang;Shu-Wen Yang;Tomoki Hayashi;Hung-yi Lee;Shinji Watanabe;T. Toda
通讯作者：
Wen-Chin Huang;Shu-Wen Yang;Tomoki Hayashi;Hung-yi Lee;Shinji Watanabe;T. Toda

On Prosody Modeling for ASR+TTS Based Voice Conversion