Challenges in Immersive Audio Technology

沉浸式音频技术的挑战

基本信息

批准号：
EP/X032981/1
负责人：
Zoran Cvetkovic
金额：
$ 121.51万
依托单位：
King's College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2024
资助国家：
英国
起止时间：
2024 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX032981%2F1
关键词：
Challenges Immersive Audio Technology

项目摘要

Immersive technologies will transform not only how we communicate and experience entertainment, but also our experience of the physical world, from shops to museums, cars to classrooms. This transformation has been driven primarily by an unprecedented progress in visual technologies, which enable transporting users to an alternate visual reality. In the domain of audio, there are however long-standing fundamental challenges that need to be overcome to enable striking immersive experiences in which a group of listeners can just walk into a scene and feel transported to an alternate reality to enjoy a seamless shared experience without the need for headphones, head-tracking, personalisation or calibration.The first key challenge is the delivery of immersive audio experiences to multiple listeners. Recent advances in audio technology are beginning to succeed in generating high quality immersive audio experiences. However, these are restricted in practice to individual listeners, with appropriate signals presented either via headphones, or via systems based on a modest number of loudspeakers using either cross-talk cancellation or beamforming. There remains a fundamental challenge in the technologically efficient delivery of "3D sound" to multiple listeners, either in small numbers (2-5) in a home environment, in museums, galleries and other public spaces (5-20) or in cinema and theatre auditoria (20-100). In principle, shared auditory experiences can be generated using physics-based methods such as wavefield synthesis or higher order ambisonics, but a sweet spot of even a modest size requires a prohibitive number of channels. CIAT aims to transform state of the art by developing a principled scalable and reconfigurable framework for capturing and reproducing only perceptually relevant information, thus leading to a step advance in the quality of immersive audio experiences achievable by practically viable systems.The second key challenge is the real-time computation of environment acoustics needed to transport listeners to alternate reality, allowing them to interact with the environment and sound sources in it. This is pertinent to applications where immersive audio content is synthesised rather than recorded and to object-based audio in general. The sound field of an acoustic event consists of direct wavefront, followed by early and higher-order reflections. A convincing experience of being transported to the environment where the event takes place requires the rendering of these reflections, which cannot all be computed in real time. In applications where the sense of realism is critical, e.g. extended reality (XR) and to some extent gaming, impulse responses of the environment are typically computed only at several locations, with preset limits on the number reflections and directions of arrival, and then convolved with source sounds to achieve what is referred to as high-quality reverberation. Still, the computation of impulse responses and convolution may require GPU implementation and careful hands-on balancing between quality and complexity, and between CPU and GPU computation. CIAT aims to deliver a paradigm shift in environment modelling that will enable numerically efficient seamless high quality environment simulation in real time.By addressing these challenges, CIAT will enable creation and delivery of shared interactive immersive audio experiences for emerging XR applications, whilst making a step advance in the quality of immersive audio in traditional media. In particular, efficient real-time synthesis of high quality environment acoustics is essential for both XR and object-based audio in general, including streaming and broadcasting. Delivery of 3D soundscapes to multiple listeners is a major unresolved problem in traditional applications too, including broadcasting, cinema, music events, and audio-visual installations.

沉浸式技术不仅会改变我们如何交流和体验娱乐的方式，而且会改变我们对物理世界的经验，从商店到博物馆，汽车再到教室。这种转变主要是由视觉技术前所未有的进步驱动的，这使用户能够将用户运送到另一种视觉现实。在音频领域中，需要克服的长期基本挑战，以实现令人震惊的沉浸式体验，在这些体验中，一群听众可以走进一个场景，并感觉到替代现实，以享受无缝的共享体验，而无需耳机，头部跟踪，个性化，个性化，个性化或校准挑战。音频技术的最新进展已开始成功地产生高质量的沉浸音频体验。但是，这些在实践中受到单个听众的限制，并通过使用交叉对话取消或波束成形的扬声器数量适中的系统，或通过系统呈现适当的信号。在家庭环境中，在博物馆，画廊和其他公共场所（5-20）或电影院和剧院的监视器（20-100）中，在家庭环境中以少量（2-5）向多个听众（2-5）的“ 3D声音”传递，或者在家庭环境中，或者（20-100）中的“ 3D声音”（2-5），仍然存在着根本的挑战（20-100）。原则上，可以使用基于物理的方法（例如Wave Field合成或高阶Ambisonics）来生成共享的听觉体验，但是即使是适度尺寸的甜蜜点也需要大量的通道。 CIAT旨在通过开发一个仅捕获和重现的可伸缩和重新配置的框架来改变艺术状态，从而导致在实际上可行的系统中可以实现的沉浸音频体验质量的迈出一步，第二个关键的挑战是第二个关键挑战。这与沉浸式音频内容合成而不是记录并与基于对象的音频合成的应用程序有关。声学事件的声场由直接波前组成，然后是早期和高阶反射。令人信服的经历将被运送到活动发生的环境需要渲染这些反射，这不能全部实时计算。在现实感至关重要的应用中，例如扩展现实（XR）和在某种程度上游戏，环境的脉冲响应通常仅在几个位置计算，对数量反射和到达方向的预设限制，然后用源声音卷积以实现所谓的高Quality Reverberation。尽管如此，脉冲响应和卷积的计算仍可能需要实施GPU，并在质量和复杂性之间以及CPU和GPU计算之间进行仔细的动手平衡。 CIAT旨在实现环境建模的范式转变，这将使数值有效的无缝高质量环境模拟实时仿真。通过解决这些挑战，CIAT将使CIAT能够创建和交付共享的交互式互动式互动式的音频体验，以促进新兴的XR应用，从而在传统的媒体中提高了一步，从而提高了传统媒体的质量。特别是，高质量环境声学的有效实时综合对于XR和基于对象的音频（包括流和广播）都是必不可少的。在传统应用程序中，向多个听众交付3D音景也是一个主要的未解决问题，包括广播，电影院，音乐活动和视听安装。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Zoran Cvetkovic其他文献

Overcomplete expansions and robustness

过度完备的扩展和鲁棒性

DOI：
10.1109/tfsa.1996.547479
发表时间：
1996
期刊：
Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96)
影响因子：
0
作者：
Zoran Cvetkovic;Martin Vetterli
通讯作者：
Martin Vetterli