Speech Perception under Cognitive Load

认知负荷下的言语感知

基本信息

批准号：
ES/R004722/1
负责人：
Sven Mattys
金额：
$ 36.25万
依托单位：
University of York
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2018
资助国家：
英国
起止时间：
2018 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=ES%2FR004722%2F1
关键词：
Speech Perception under Cognitive Load

Speech Perception under Cognitive Load

项目摘要

Most theories of human speech perception are derived from tasks performed in a quiet environment and under conditions of undivided attention. However, in the past few years, there has been a surge of interest in modelling speech recognition in more realistic conditions (e.g., noisy background, accented speech). However, among these realistic conditions, those resulting from a cognitive load have received little attention. Here, we define cognitive load (CL) as any listening challenges arising not from a distortion of the speech signal but from the recruitment of processing resources due to concurrent attentional or mnemonic demands. For example, what are the consequences of monitoring cockpit instruments on a pilot's ability to follow spoken instructions from ground control? The disruptive effect of CL on speech perception is noticed as early as in the initial stages of acoustic encoding. Under some circumstances, CL can even lead to a form of transient hearing impairment called inattentional deafness. Despite the obvious implications that these results have for theory and clinical practice, little is known about the low-level mechanisms by which CL interferes with speech perception. The aim of this proposal is to address this issue in three interconnected research streams drawing upon psychometric and identification paradigms.The first stream asks whether CL affects all acoustic dimensions of speech equally. This question is important because not all acoustic dimensions are equally crucial for communication. For example, successful word recognition is more resilient to pitch distortions than duration distortions. The idea that CL affects some dimensions more than others is motivated by the claim that CL (e.g., a concurrent visual task) causes listeners to rapidly shift attention back and forth between the speech signal and the CL task, leading to an underestimation of the duration of the speech signal. If this hypothesis is correct, CL should lead primarily to a distortion of auditory temporal judgements and leave other core dimensions (loudness, pitch, and spectral structure) unaffected. This will be contrasted with the claim that CL leads to a general reduction in auditory precision across all acoustic dimensions.The second stream investigates whether the format of the CL stimuli affects the severity of the CL interference. For example, is speech perception more affected by a concurrent task that requires rehearsing words silently (phonological format) or by a task that requires processing visual stimuli (visual format)? These experiments will address the debate between modal and amodal views of the processing resources used during speech perception. The third stream aims to distinguish two potential mechanisms behind CL interference: Encoding and maintenance. Encoding is the process of converting a sensory input into mental representations. Maintenance is the process of preserving these representations in memory. Encoding of the CL stimuli will be manipulated such that it takes place either during or before the speech stimuli, hence pitting encoding against maintenance as the mechanism underlying interference. An encoding hypothesis predicts that only simultaneous encoding of speech and CL stimuli should lead to CL effects. In order to explore the generalisability of the above phenomena beyond the speech domain, the effect of CL will be tested on both speech and non-speech sounds. This comparison will situate our findings within the long-standing debate on the existence of a specialised speech mode for sound perception.Finally, because the notion of "cognitive listening" is becoming central not only in speech research but also in hearing practice, we will engage with clinical audiologists and discuss ways of including a cognitive component into standard pure-tone audiometric (PTA) and advise on potential phase-II clinical trials.

人类语音感知的大多数理论均来自在安静的环境中执行的任务，并在关注的条件下执行。但是，在过去的几年中，人们对在更现实的条件下的语音识别进行建模（例如嘈杂的背景，强调语音）引起了人们的兴趣。但是，在这些现实的条件中，认知负荷产生的条件很少受到关注。在这里，我们将认知负载（CL）定义为任何聆听挑战，而不是由于语音信号的扭曲而导致的，而是由于同时注意的注意力或助记符要求引起的处理资源。例如，监视飞行员遵循地面控制的口语说明能力的驾驶舱仪器的后果是什么？ CL对语音感知的破坏性作用早在声学编码的初始阶段就注意到了。在某些情况下，CL甚至可以导致一种瞬时听力障碍，称为不发耳朵。尽管这些结果对理论和临床实践具有明显的影响，但对于CL会干扰语音感知的低级机制知之甚少。该提案的目的是在三个相互联系的研究流中解决此问题，这些研究流取决于心理测量和识别范式。第一个流询问CL是否会平均影响语音的所有声学维度。这个问题很重要，因为并非所有声学维度对于交流都至关重要。例如，成功的单词识别比持续时间扭曲更能抵御音高扭曲。 CL对某些维度的影响比其他方面更大的想法是由于CL（例如，同时视觉任务）的说法使听众在语音信号和CL任务之间来回移动注意力，从而低估了语音信号的持续时间。如果该假设是正确的，则CL应该主要导致听觉时间判断的扭曲，并留下其他核心维度（响度，音调和频谱结构）不受影响。这将与声称CL导致所有声学尺寸的听觉精度的总体降低形成对比。第二个流研究CL刺激的格式是否影响CL干扰的严重程度。例如，语音感知是否更受并发任务的影响，该任务需要默默地排练单词（语音格式）或需要处理视觉刺激（视觉格式）的任务？这些实验将解决语音感知过程中使用的处理资源的模态和阿莫达尔观点之间的辩论。第三个流旨在区分CL干扰之后的两个潜在机制：编码和维护。编码是将感觉输入转换为心理表示的过程。维护是将这些表示形式保存在内存中的过程。 CL刺激的编码将被操纵，以使其在语音刺激期间或之前进行，因此将编码与维持为基础干扰的机制进行替换。编码假设预测，只有同时编码语音和CL刺激才能导致CL效应。为了探索以外的语音域之外上述现象的普遍性，CL的效果将在语音和非语音声音上进行测试。这种比较将使我们的发现在有关声音感知的专业语音模式的长期辩论中，因为“从最终的角度来看，“认知聆听”的概念不仅在语音研究中而且在听力研究中也变得核心，我们将与临床听力学家互动，并与临床听力学家互动，并讨论将认知成分纳入标准的Pure-Tone Audiometric（pta）阶段（PTA）的阶段（pta）和临床阶段。