Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)
面向认知启发的多模式助听设备的视觉驱动语音增强 (AV-COGHEAR)
基本信息
- 批准号:EP/M026981/1
- 负责人:
- 金额:$ 53.29万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2015
- 资助国家:英国
- 起止时间:2015 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Current commercial hearing aids use a number of sophisticated enhancement techniques to try and improve the quality of speech signals. However, today's best aids fail to work well in many everyday situations. In particular, they fail in busy social situations where there are many competing speech sources; they fail if the speaker is too far from the listener and swamped by noise. We have identified an opportunity to solve this problem by building hearing aids that can 'see'. This ambitious project aims to develop a new generation of hearing aid technology that extracts speech from noise by using a camera to see what the talker is saying. The wearer of the device will be able to focus their hearing on a target talker and the device will filter out competing sound. This ability, which is beyond that of current technology, has the potential to improve the quality of life of the millions suffering from hearing loss (over 10m in the UK alone).Our approach is consistent with normal hearing. Listeners naturally combine information from both their ears and eyes: we use our eyes to help us hear. When listening to speech, eyes follow the movements of the face and mouth and a sophisticated, multi-stage process uses this information to separate speech from the noise and fill in any gaps. Our hearing aid will act in much the same way. It will exploit visual information from a camera (e.g.using a Google Glass like system), and novel algorithms for intelligently combining audio and visual information, in order to improve speech quality and intelligibility in real-world noisy environments. The project is bringing together a critical mass of researchers with the complementary expertise necessary to make the audio-visual hearing-aid possible. The project will combine new contrasting approaches to audio-visual speech enhancement that have been developed by the Cognitive Computing group at Stirling and the Speech and Hearing Group at Sheffield. The Stirling approach uses the visual signal to filter out noise; whereas the Sheffield approach uses the visual signal to fill in 'gaps' in the speech. The vision processing needed to track a speaker's lip and face movement will use a revolutionary 'bar code' representation developed by the Psychology Division at Stirling. The MRC Institute of Hearing Research (IHR) will provide the expertise needed to evaluate the approach on real hearing loss sufferers. Phonak AG, a leading international hearing aid manufacturer, will provide the advice and guidance necessary to maximise potential for industrial impact.The project has been designed as a series of four workpackages that consider the key research challenges related to each component of the device's design. These questions have been identified by preliminary work at Sheffield and Stirling. Among the challenges are developing improved techniques for visually-driven audio-analysis; designing better metrics for weighting audio and visual evidence; developing techniques for optimally combining the noise-filtering and gap-filling approaches. A further key challenge is that, for a hearing aid to be effective, the processing cannot delay the signal by more than 10ms. In the final year of the project a full integrated, software prototype will be clinically evaluated using listening tests with hearing-impaired volunteers in a range of modern noisy reverberant environments. Evaluation will use a new purpose-built speech corpus that will be designed specifically for testing this new class of multimodal device. The project's clinical research partner, the Scottish Section of MRC IHR, will provide advice on the experimental design and analysis aspects throughout the trials. Industry leader Phonak AG will provide advice and technical support for benchmarking real-time hearing devices. The final clinically-tested prototype will be made available to the whole hearing community as a testbed for further research, development, evaluation and benchmarking.
当前的商业助听器使用多种复杂的增强技术来尝试提高语音信号的质量。但是,今天的最佳援助在许多日常情况下都无法正常工作。特别是,他们在繁忙的社交场合中失败了,那里有许多相互竞争的演讲来源。如果说话者离听众太远并且被噪音淹没了,他们会失败。我们已经通过建立可以“看到”的助听器来确定了解决此问题的机会。这个雄心勃勃的项目旨在开发新一代的助听器技术,该技术通过使用摄像头来看看说话者在说什么,从而从噪音中提取语音。该设备的佩戴者将能够将听力集中在目标谈话者上,该设备将过滤效果。这种能力超出了当前技术的能力,有可能改善遭受听力损失的数百万人的生活质量(仅在英国就超过1000万)。我们的方法与正常的听力是一致的。听众自然地结合了耳朵和眼睛的信息:我们用眼睛帮助我们听到。当听语音时,眼睛遵循面部和嘴巴的运动,以及一个复杂的多阶段过程,使用此信息将语音与噪音分开并填补任何空白。我们的助听器将以相同的方式采取行动。它将利用相机(例如,使用Google Glass(例如系统)和新颖的算法来利用视觉信息,以智能地组合音频和视觉信息,以提高现实世界噪声环境中的语音质量和清晰度。该项目正在汇集大量的研究人员,并拥有使视听助听器成为可能的必要互补专业知识。该项目将结合由斯特林认知计算小组和谢菲尔德的语音和听力组开发的新的对比方法来增强音频语音的增强方法。 Stirling方法使用视觉信号来过滤噪声。而谢菲尔德的方法则使用视觉信号来填补演讲中的“空白”。跟踪说话者的嘴唇和面部运动所需的视觉处理将使用斯特林心理学部门开发的革命性的“条形码”代表。 MRC听力研究所(IHR)将提供评估实际听力损失患者的方法所需的专业知识。国际领先的助听器制造商Phonak AG将提供最大程度地发挥工业影响潜力的建议和指导。该项目被设计为四个工作包,这些工作包考虑与设备设计的每个组件相关的关键研究挑战。这些问题已通过谢菲尔德和斯特林的初步工作来确定。挑战之一是为视觉驱动的音频分析开发改进的技术;设计更好的指标来加权音频和视觉证据;开发技术,可最佳地结合噪声过滤和缝隙填充方法。另一个关键的挑战是,为了使助听器有效,该处理不能将信号延迟超过10ms。在该项目的最后一年中,完整集成的软件原型将使用在一系列现代嘈杂的混响环境中的听力受损志愿者进行临床评估。评估将使用新的专门构建的语音语料库,该语料库专门用于测试这一新类别的多模式设备。该项目的临床研究合作伙伴MRC IHR的苏格兰部分将提供有关整个试验中实验设计和分析方面的建议。行业负责人Phonak AG将为实时听力设备进行基准测试建议和技术支持。最终临床测试的原型将用于整个听证社区,作为进一步研究,开发,评估和基准测试的测试床。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fast Lip Feature Extraction Using Psychologically Motivated Gabor Features
- DOI:10.1109/ssci.2018.8628931
- 发表时间:2018-01-01
- 期刊:
- 影响因子:0
- 作者:Abel, Andrew;Gao, Chengxiang;Hussain, Amir
- 通讯作者:Hussain, Amir
A Novel Spatiotemporal Longitudinal Methodology for Predicting Obesity Using Near Infrared Spectroscopy (NIRS) Cerebral Functional Activity Data
- DOI:10.1007/s12559-017-9541-x
- 发表时间:2018-01
- 期刊:
- 影响因子:5.4
- 作者:A. Abdullah;A. Hussain;Imtiaz Hussain Khan
- 通讯作者:A. Abdullah;A. Hussain;Imtiaz Hussain Khan
Lip-Reading Driven Deep Learning Approach for Speech Enhancement
- DOI:10.1109/tetci.2019.2917039
- 发表时间:2021-06-01
- 期刊:
- 影响因子:5.3
- 作者:Adeel, Ahsan;Gogate, Mandar;Whitmer, William M.
- 通讯作者:Whitmer, William M.
An Enhanced Binary Particle Swarm Optimization (E-BPSO) algorithm for service placement in hybrid cloud platforms
- DOI:10.1007/s00521-022-07839-5
- 发表时间:2018-06
- 期刊:
- 影响因子:6
- 作者:Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi
- 通讯作者:Wissem Abbes;Zied Kechaou;Amir Hussain;A. Qahtani;Omar Almutiry;Habib Dhahri;A. Alimi
Cognitively Inspired Audiovisual Speech Filtering: Towards an Intelligent, Fuzzy Based, Multimodal, Two-Stage Speech Enhancement System
认知启发的视听语音过滤:走向智能、模糊、多模态、两阶段语音增强系统
- DOI:
- 发表时间:2015
- 期刊:
- 影响因子:0
- 作者:Abel Andrew
- 通讯作者:Abel Andrew
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Amir Hussain其他文献
Novel deep neural network based pattern field classification architectures
基于新型深度神经网络的模式场分类架构
- DOI:
10.1016/j.neunet.2020.03.011 - 发表时间:
2020-03 - 期刊:
- 影响因子:7.8
- 作者:
Kaizhu Huang;Shufei Zhang;Rui Zhang;Amir Hussain - 通讯作者:
Amir Hussain
Can Generative AI Models Extract Deeper Sentiments as Compared to Traditional Deep Learning Algorithms?
与传统深度学习算法相比,生成式人工智能模型能否提取更深层的情感?
- DOI:
10.1109/mis.2024.3374582 - 发表时间:
2024 - 期刊:
- 影响因子:6.4
- 作者:
Mohammad Anas;Anam Saiyeda;S. Sohail;Erik Cambria;Amir Hussain;Erik Cambria - 通讯作者:
Erik Cambria
AVSE Challenge: Audio-Visual Speech Enhancement Challenge
AVSE 挑战赛:视听语音增强挑战赛
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Andrea Lorena Aldana Blanco;Cassia Valentini;Ondrej Klejch;M. Gogate;K. Dashtipour;Amir Hussain;P. Bell - 通讯作者:
P. Bell
Automatic object-oriented coding facility for product life cycle management of discrete products
用于离散产品的产品生命周期管理的自动面向对象编码工具
- DOI:
- 发表时间:
2014 - 期刊:
- 影响因子:0
- 作者:
W. Khan;Amir Hussain - 通讯作者:
Amir Hussain
Deep Complex U-Net with Conformer for Audio-Visual Speech Enhancement
具有 Conformer 的深度复杂 U-Net,用于增强视听语音
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Shafique Ahmed;Chia;Wenze Ren;Chin;Ernie Chu;Jun;Amir Hussain;H. Wang;Yu Tsao;Jen - 通讯作者:
Jen
Amir Hussain的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Amir Hussain', 18)}}的其他基金
COG-MHEAR: Towards cognitively-inspired 5G-IoT enabled, multi-modal Hearing Aids
COG-MHEAR:迈向受认知启发的 5G-IoT 支持的多模式助听器
- 批准号:
EP/T021063/1 - 财政年份:2021
- 资助金额:
$ 53.29万 - 项目类别:
Research Grant
Dual Process Control Models in the Brain and Machines with Application to Autonomous Vehicle Control
大脑和机器中的双过程控制模型在自主车辆控制中的应用
- 批准号:
EP/I009310/1 - 财政年份:2011
- 资助金额:
$ 53.29万 - 项目类别:
Research Grant
Industrial CASE Account - Stirling 2009
工业案例账户 - 斯特灵 2009
- 批准号:
EP/H501584/1 - 财政年份:2009
- 资助金额:
$ 53.29万 - 项目类别:
Training Grant
Industrial CASE Account - Stirling 2008
工业案例账户 - 斯特灵 2008
- 批准号:
EP/G501750/1 - 财政年份:2009
- 资助金额:
$ 53.29万 - 项目类别:
Training Grant
相似国自然基金
智能锤上模锻刮板位姿视觉识别与自动夹持机械手控制方法研究
- 批准号:52374167
- 批准年份:2023
- 资助金额:50.00 万元
- 项目类别:面上项目
上丘-蓝斑核-背侧海马神经环路在视觉线索诱发的可卡因觅药行为中的作用及机制
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:
小鼠上丘表征视觉特征的神经机制
- 批准号:32271060
- 批准年份:2022
- 资助金额:54.00 万元
- 项目类别:面上项目
黎曼流形上的分布式优化及其在视觉传感器网络中的应用
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
小鼠上丘表征视觉特征的神经机制
- 批准号:
- 批准年份:2022
- 资助金额:54 万元
- 项目类别:面上项目
相似海外基金
Implementing and Iterating WeWALK’s Agent-Based Guidance System (WeASSIST) in Rail Transport to Improve Visually Impaired Customer Experience
在铁路运输中实施和迭代 WeWALK 基于代理的引导系统 (WeASSIST),以改善视障客户体验
- 批准号:
10098144 - 财政年份:2024
- 资助金额:
$ 53.29万 - 项目类别:
Collaborative R&D
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
- 批准号:
2418188 - 财政年份:2023
- 资助金额:
$ 53.29万 - 项目类别:
Standard Grant
Variability of Brain Reorganization in Blindness
失明时大脑重组的变异性
- 批准号:
10562129 - 财政年份:2023
- 资助金额:
$ 53.29万 - 项目类别:
Multisensory Augmented Reality as a bridge to audio-only accommodations for inclusive STEM interactive digital media
多感官增强现实作为包容性 STEM 交互式数字媒体的纯音频住宿的桥梁
- 批准号:
10693600 - 财政年份:2023
- 资助金额:
$ 53.29万 - 项目类别:
Cross-modal plasticity after the loss of vision at two early developmental ages in the posterior parietal cortex: Adult connections, cortical function and behavior.
后顶叶皮质两个早期发育年龄视力丧失后的跨模式可塑性:成人连接、皮质功能和行为。
- 批准号:
10751658 - 财政年份:2023
- 资助金额:
$ 53.29万 - 项目类别: