Characterizing the recovery of spectral, temporal, and phonemic speech information from visual cues

表征从视觉线索中恢复频谱、时间和音位语音信息

基本信息

批准号：
10563860
负责人：
David Brang
金额：
$ 55.04万
依托单位：
UNIVERSITY OF MICHIGAN AT ANN ARBOR
依托单位国家：
美国
项目类别：
财政年份：
2023
资助国家：
美国
起止时间：
2023-02-14 至 2028-01-31
项目状态：
未结题

来源：
https://reporter.nih.gov/project-details/10563860
关键词：
Acoustics Attention Auditory Auditory area Auditory system Biological Brain Brain Injuries Brain Neoplasms Classification Cochlear Implants Code Compensation Crowding Cues Data Development Devices Dimensions Distributional Activity Electrodes Electroencephalography Emotional Frequencies Functional Magnetic Resonance Imaging Health Hearing Human Illusions Impairment Individual Lipreading Maps Measures Modality Modeling Movement Neurons Noise Oral Oral cavity Participant Patients Pattern Perception Periodicity Physiological Processes Population Presbycusis Process Reaction Time Recovery Rehabilitation therapy Research Resolution Resources Route Shapes Signal Transduction Social Interaction Speech Speech Perception Speech Sound Stimulus Stroke Superior temporal gyrus System Testing Titrations Training Programs Trauma Vision Visual Vocation audiovisual speech auditory stimulus density healthy aging improved neural prosthesis neuromechanism programs response restoration sensory substitution social speech accuracy visual information visual speech

项目摘要

Project Summary Auditory speech perception is essential for social, vocational, and emotional health in hearing individuals. However, the reliability of auditory signals varies widely in everyday settings (e.g., at a crowded party), requiring supplemental processes to enable accurate speech perception. The principle mechanisms that support the perception of degraded auditory speech signals are auditory-visual (crossmodal) interactions, which can perceptually restore speech content using visual cues provided by lipreading, rhythmic articulatory movements, and the natural correlations present between oral resonance and mouth shape. Moreover, receptive speech processes can be limited through a variety of causes, including intrinsic brain tumor, stroke, cochlear implant usage, and age-related hearing loss, making compensatory crossmodal mechanisms necessary for one to continue working and maintaining healthy social interactions. However, the physiological processes that enable vision to facilitate speech perception remain poorly understood and no integrative model exists for how these multiple visual dimensions combine to enhance auditory speech perception. In the auditory domain, distributed populations of neurons encode spectro-temporal information about acoustic cues that are then transcoded into phonemes. We propose a dual-route perceptual model through which visual signals integrate with phoneme- coded neurons. First, a direct path through which viseme-to-phoneme conversions generate semi-overlapping distributions of activity in the superior temporal gyrus, leading to improved hearing through improved auditory phoneme tuning functions. Second, an indirect path through which visual features restore spectral information about speech frequencies and alter phoneme-response timing, resulting in improved auditory spectro-temporal profiles (which in turn are transcoded into phonemes with greater precision). Finally, we will examine the hypothesis that our perceptual system optimizes which of these visual dimensions is prioritized for recovery based on what is missing from the auditory signal. These studies will provide a unified framework for how speech perception benefits from different visual signals. By understanding biological approaches to crossmodally restoring degraded auditory speech information, we can develop better targeted rehabilitation programs and neural prostheses to maximize speech perception recovery after trauma or during healthy aging.

项目摘要听觉言语感知对于听力个人的社会，职业和情感健康至关重要。但是，听觉信号的可靠性在日常设置（例如，在拥挤的聚会上）有很大差异，需要补充过程以实现准确的语音感知。支持的主要机制降解的听觉语音信号的感知是听觉 - 视觉（Crossmodal）的交互，可以感知恢复语音内容，使用唇读，有节奏的发音运动提供的视觉提示，以及口腔共振和口形状之间存在的自然相关性。而且，接受演讲过程可以通过多种原因限制，包括固有的脑肿瘤，中风，人工耳蜗用法和与年龄相关的听力损失，使人需要的补偿性跨模式机制继续工作并保持健康的社会互动。但是，可以实现的生理过程促进言语感知的愿景仍然很众所周知，并且没有综合模型多个视觉维度结合在一起，以增强听觉语音感知。在听觉域中，分布式神经元的群体编码有关声明的光谱信息信息，然后将其转码音素。我们提出了一个双路由感知模型，通过该模型，视觉信号与音素集成在一起编码神经元。首先，通过该直接路径，通过该路径，Viseme-phoneme转换会生成半重叠的路径高级颞回中活动的分布，从而通过改善听觉改善了听力音素调整功能。其次，一个间接路径通过该路径恢复光谱信息关于语音频率和更改音素响应时间，从而改善了听觉光谱时间剖面（依次将其精确地转码为音素）。最后，我们将研究假设我们的感知系统优化了这些视觉维度中的哪一个用于恢复基于听觉信号所缺少的内容。这些研究将为语音如何提供一个统一的框架感知受益于不同的视觉信号。通过了解跨模型的生物学方法恢复降级的听觉语音信息，我们可以制定针对性更好的康复计划和神经假体以最大程度地提高创伤后或健康衰老期间的语音感知恢复。