Dynamics of Vocal Tract Shaping

声带塑造的动力学

基本信息

批准号：
8384871
负责人：
SHRIKANTH NARAYANAN
金额：
$ 41.15万
依托单位：
UNIVERSITY OF SOUTHERN CALIFORNIA
依托单位国家：
美国
项目类别：
财政年份：
2005
资助国家：
美国
起止时间：
2005-05-01 至 2015-01-31
项目状态：
已结题

来源：
https://reporter.nih.gov/project-details/8384871
关键词：
Acoustics Affect Area Articulation Disorders Articulators Behavior Breathing Cardiac Clinical Cognition Communication Communities Complex Computer Simulation Custom Data Data Analyses Deglutition Deglutition Disorders Dental Dimensions Disease Event Funding Gestures Glossectomy Goals Head and neck structure Health Human Image Imaging technology In Situ Indium Individual Instruction Joints Knowledge Language Language Disorders Larynx Length Linguistics Lip structure Liquid substance Location Magnetic Resonance Imaging Measures Methods Modeling Motor output Movement Noise Nose Operative Surgical Procedures Oral Oral cavity Oropharyngeal Output Participant Personal Satisfaction Phonation Physiologic pulse Plant Roots Play Procedures Process Production Property Quality of life Recovery Research Resolution Roentgen Rays Series Shapes Signal Transduction Sleep Apnea Syndromes Speech Speech Acoustics Speech Disorders Stimulus Stress Stroke Structure System Technology Three-Dimensional Imaging Time Tongue Traction Vision Work bioimaging clinical application cognitive control computerized data processing constriction data modeling design image processing imaging Segmentation improved instrument internal control kinematics member movie novel phonology phrases prevent process optimization programs reconstruction respiratory scaffold skills spatiotemporal speech processing tool

项目摘要

DESCRIPTION (provided by applicant): The long-term goal of this project is to wed state-of-the-art technology for imaging the vocal tract with a linguistically informed analysis of vocal tract constriction actions in order to understand the cognitive control and production of the compositional units of spoken language. We have developed the use of real time MRI to illuminate the inherently dynamic speech production process. Our approach is to observe the time-varying changes in vocal tract shaping and to understand how these emerge lawfully from the combined effects of multiple constriction events distributed over space (subparts of the tract) and over time. An understanding of dynamic vocal tract actions as fundamental to linguistic organization will do much to add to the field's current-basically static-approach to describing speech. In the previous (and first) funding period of the proposal, our team developed and refined our novel real time MRI acquisition ability that has made veridical real-time movies of speech production possible for the first time without X-rays. Data show clear real-time movements of the lips, tongue, and velum, providing exquisite information about the spatiotemporal properties of speech gestures in both the oral and pharyngeal portions of the vocal tract. We also developed novel noise-mitigated image-synchronized strategies to record speech in-situ during imaging as well as signal processing strategies for deriving linguistically-meaningful measures from the data (Bresh and Narayanan, 2009). We have demonstrated the utility of this approach for linguistic studies of speech communication that were hitherto not possible (e.g., Byrd, Tobin, Bresch, Narayanan, 2009; Bresch et al, 2008). Building on these foundational efforts, we situate the specific research aims of our competing renewal proposal as follows. The specific aims of this proposal are to further develop the technology and analysis platform of real-time MRI, which provides the scaffolding for the project, while pursuing speech production studies with an overarching theme of examining the decomposition of speech into cognitively-controlled action units, or gestures. Specifically, we aim to investigate the compositionality of speech in three domains-each being areas of study that are not approachable using exclusively acoustic speech data without direct access to the dynamic information from the entire vocal tract, which can only be supplied with real-time MRI. Our specific aims examine (i) compositionality in space: deployment of concurrent constriction events distributed spatially, that is, over distinct constriction effectors within the vocal tract, (ii) compositionality in time: deployment of constriction events distributed temporally, (iii) compositionality in cognition: deployment of constriction events during speech planning that mirror those observed during speech production. We propose to use the real-time MRI approach we've developed to advance our understanding in all these three aspects of linguistic structuring. Our approach to decomposing speech shaping into multiple discrete events, in space and over time, can be further validated by demonstrating that we can capture the observed data time-functions using a computational model having only discrete gestural input. To do this, we will employ a computational implementation of Articulatory Phonology and Task Dynamics (called TaDA). The model is particularly appropriate because it provides a hypothesized ensemble of gestures arrayed over time for any input utterance. The model is biologically plausible and produces as its output explicit time-functions of constriction events in the vocal tract, which is precisely what we measure directly with real-time MRI. We anticipate a highly synergistic relation between model and data that can bootstrap our understanding of the structure of speech. The model has not, to this point, been optimized using real data, as the appropriate data did not exist before real-time MRI. And, in turn, the use of real-time MRI as a tool in understanding speech depends on having an analytical procedure for relating the observed shaping changes to underlying (multiple) controls, which is what the model provides. The project's final specific aim is to continue to advance our technical real-time MRI approach for investigating the physical realization of phonological structure by: (i) improved image signal to noise ratio through the use of a novel custom 16 receiver head neck coil, (ii) doubling the 2D acquisition frame rate through the use of novel pulse sequences in conjunction with new joint acquisition-processing optimization, and (iii) fast 3D imaging using more sophisticated pulse-sequences to supplement the single plane fast imaging work. These challenges will be pursued in tandem with the design of data-driven analyses suitable for distilling the high-dimensional information provided by real-time MRI and with the synchronized acoustic speech signal, critical for deriving linguistically-meaningful measures. Specifically, we pursue robust and faster image segmentation and articulatory tracking, and methods for dynamical modeling using the derived time series constriction data.

描述（由申请人提供）：该项目的长期目标是通过对声带收缩动作进行语言知情的分析来介绍最先进的技术，以了解语音界限动作，以了解口头语言组成单元的认知控制和生产。我们已经开发了实时MRI来阐明固有的动态语音生产过程。我们的方法是观察声道塑造的随时间变化的变化，并了解它们是如何从分布在空间（小节的子部分）和随着时间时间的多个收缩事件的综合效果中合法出现的。对语言组织基础的动态声道动作的理解将为该领域的当前基本静态态度所做的事情做出很大的影响。在该提案的上一个（也是第一）资金期间，我们的团队开发并完善了我们的小说实时MRI获取能力，这使得在没有X射线的情况下首次成为了Veridical Exime Severige Movies的演讲制作。数据显示了嘴唇，舌头和丝绒的明显实时运动，提供了有关声带的口腔和咽部部分中语音手势的时空特性的精致信息。我们还开发了新颖的噪声减少图像合成的策略，以在成像过程中记录语音，以及从数据中得出语言上无意识的措施的信号处理策略（Bresh and Narayanan，2009年）。我们已经证明了这种方法对语音交流的语言研究的实用性（例如，伯德，托宾，布雷施，纳拉亚南，2009年；布雷希等人，2008年）。在这些基本努力的基础上，我们将竞争性更新提案的特定研究目标如下。该提案的具体目的是进一步开发实时MRI的技术和分析平台，该平台为项目提供了脚手架，同时以总体主题进行言语制作研究，该研究的主题是检查语音分解为认知控制的动作单位或手势。具体而言，我们旨在调查三个域中的语音的组成性是研究领域，这些领域是不使用声音语音数据而无法直接访问整个声带中动态信息的研究领域，只能通过实时MRI提供。我们的具体目的检查（i）空间中的组成性：同时收缩事件的部署在空间上分布分布，也就是说，声音中的狭窄效应子超过了，（ii）时间的组成性：时间上分布在时间上分布的收缩事件，（iii）认知中的组合性：在语音计划中的分布：在语音规划中的分布，以反映这些在语音生产过程中观察到的内容。我们建议使用我们开发的实时MRI方法来促进我们在语言结构的所有这三个方面中的理解。我们可以通过证明我们可以使用只有离散手势输入的计算模型来捕获观察到的数据时函数来进一步验证我们将语音塑造分解为多个离散事件的方法的方法。为此，我们将采用信号语音和任务动态的计算实施（称为TADA）。该模型特别适合，因为它提供了随着时间的流逝，为任何输入话语提供了一个假设的手势集合。该模型在生物学上是合理的，并作为声音中收缩事件的输出显式时间函数产生，这正是我们通过实时MRI直接测量的。我们预计模型和数据之间存在高度协同的关系，可以引起我们对语音结构的理解。到目前为止，该模型尚未使用实际数据进行优化，因为在实时MRI之前不存在适当的数据。反过来，将实时MRI用作理解语音的工具取决于具有分析程序，以将观察到的塑形变化与基础（多个）控件相关联，这就是模型所提供的。 The project's final specific aim is to continue to advance our technical real-time MRI approach for investigating the physical realization of phonological structure by: (i) improved image signal to noise ratio through the use of a novel custom 16 receiver head neck coil, (ii) doubling the 2D acquisition frame rate through the use of novel pulse sequences in conjunction with new joint acquisition-processing optimization, and (iii) fast 3D imaging using more sophisticated pulse-sequences to补充单个平面快速成像工作。这些挑战将与数据驱动的分析一起设计，适合于实时MRI提供的高维信息以及同步的声音语音信号，这对于得出语言上的措施至关重要。具体而言，我们采用强大，更快的图像分割和发音跟踪，以及使用派生的时间序列收缩数据进行动态建模的方法。