RI: Small: Toward Efficient and Robust Dynamic Scene Understanding Based on Visual Correspondences

RI：小：基于视觉对应的高效、鲁棒的动态场景理解

基本信息

批准号：
2310254
负责人：
Huaizu Jiang
金额：
$ 59.39万
依托单位：
Northeastern University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-01 至 2026-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2310254&HistoricalAwards=false
关键词：
RI Small Toward Efficient Robust

项目摘要

Finding correspondences is a fundamental problem in computer vision; visual correspondences provide useful cues for a machine to understand its dynamic surroundings in a manner similar to what humans do. For instance, as an agent moves around, it may learn that objects that are far away like mountains typically do not move much, whereas nearby buildings and bushes appear to move rapidly in the environments as the agent changes position relative to them. Although significant advances have been made in solving various forms of visual correspondence problems, different correspondence models maintain different designs despite their inherent similarity, making the effective design principles and the learned representations difficult to transfer from one problem to another. In response to this challenge, this project aims to solve disparate visual correspondence problems with a unified model. In doing so, the project will also address two practical aspects of implementation of the developed models in scenarios with diverse visual appearance and significant resource constraints. These advances are expected to unlock novel applications and improve dynamic scene understanding in the areas of Augmented Reality, sports broadcasting, sports analytics, robotics, etc. The project outcomes may also unveil new markets and economic opportunities through solutions that augment cognitive and physical abilities of users in their daily lives. The team of researchers will actively integrate proposed research into the curriculum development and attract undergraduate researchers to the project. This project is particularly well-suited for outreach activities to broaden participation of underrepresented and K-12 students, by connecting abstract technical concepts with tangible research demonstrations.The project has three tightly connected thrusts, presenting fundamental advances in correspondence determination, in applications of these correspondences, and in making these algorithms efficient and robust in deployment. Concretely, first, a unified model to solve all the visual correspondence problems, ranging from 2D to 3D, will be developed, taking advantage of recent progress of the Transformer model and self-supervised learning from large-scale unlabeled data. The Transformer model naturally captures the correspondences of candidates with less inductive bias, making it a better choice to learn from the large-scale data and improve accuracy of data-poor domains when transferred from data-rich ones. Second, with the correspondences, novel applications will be unlocked to advance dynamic scene understanding, particularly for slow-motion video synthesis and robotic obstacle avoidance. Finally, the investigators will study mechanisms to improve efficiency and robustness when deploying the models on edge computing devices. The developed algorithms will be rigorously evaluated on standard benchmarks and in real-world deployment on edge devices.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

寻找对应关系是计算机视觉中的一个基本问题；视觉对应为机器以类似于人类的方式理解其动态环境提供了有用的线索。例如，当智能体四处移动时，它可能会了解到像山脉这样遥远的物体通常不会移动太多，而当智能体相对于它们改变位置时，附近的建筑物和灌木丛似乎在环境中快速移动。尽管在解决各种形式的视觉对应问题方面已经取得了重大进展，但不同的对应模型尽管具有固有的相似性，但仍保持着不同的设计，使得有效的设计原则和学习到的表示难以从一个问题转移到另一个问题。为了应对这一挑战，该项目旨在通过统一的模型解决不同的视觉对应问题。在此过程中，该项目还将解决在具有不同视觉外观和严重资源限制的场景中实施所开发模型的两个实际问题。这些进步预计将在增强现实、体育广播、体育分析、机器人等领域解锁新的应用并提高动态场景理解。项目成果还可能通过增强认知和身体能力的解决方案揭示新的市场和经济机会。用户在日常生活中。研究人员团队将积极将拟议的研究融入课程开发中，并吸引本科生研究人员参与该项目。该项目特别适合开展外展活动，通过将抽象的技术概念与具体的研究演示联系起来，扩大代表性不足的学生和 K-12 学生的参与。该项目具有三个紧密相连的主旨，展示了对应确定方面的根本性进展，以及这些方面的应用。通信，并使这些算法在部署中高效且稳健。具体而言，首先，将利用 Transformer 模型的最新进展和大规模无标签数据的自监督学习，开发一个解决从 2D 到 3D 的所有视觉对应问题的统一模型。 Transformer 模型自然地捕获了具有较少归纳偏差的候选者的对应关系，使其成为从大规模数据中学习并在从数据丰富的域转移时提高数据贫乏域的准确性的更好选择。其次，通过这些对应关系，将解锁新的应用程序以促进动态场景理解，特别是慢动作视频合成和机器人避障。最后，研究人员将研究在边缘计算设备上部署模型时提高效率和鲁棒性的机制。开发的算法将根据标准基准和边缘设备的实际部署进行严格评估。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力优点和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Huaizu Jiang其他文献

Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense

一半

DOI：
发表时间：
2024-09-14
期刊：
Journal of the Korea Institute of Military Science and Technology
影响因子：
0
作者：
Ashish Singh;Hang Su;SouYoung Jin;Huaizu Jiang;Chetan Manjesh;Geng Luo;Ziwei He;Li Hong;E. Learned;Rosie Cowell
通讯作者：
Rosie Cowell