Collaborative Research: RI: Medium: Lie group representation learning for vision

协作研究：RI：中：视觉的李群表示学习

基本信息

批准号：
2313149
负责人：
Bruno Olshausen
金额：
$ 50万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-10-01 至 2026-09-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2313149&HistoricalAwards=false
关键词：
Collaborative Research RI Medium Lie

项目摘要

The quest to build intelligent machines capable of sensing, understanding and acting in their environment presents one of the great scientific challenges of our time. Despite recent advances in artificial intelligence (AI), the realization of robust, autonomous vision systems that understand and interact with the physical world remains elusive. Mathematically, vision requires understanding the relationships among an immense variety of object shapes, each subject to an immense variety of geometric and lighting transformations, leading to an explosion of possible visual scenes. This project aims to break through this barrier by developing a mathematically grounded computational theory of vision that will enable a new class of neural network learning algorithms to parse visual scenes into their constituent objects and transformations, thereby enabling computers to better represent the world around them. The results and computational tools arising from this research will be disseminated to the scientific community and general public through courses, seminars, hackathons, and open-source software contributed to the Geomstats library.The premise of this project is that the current limitations of AI and computer vision can be addressed with an appropriate mathematical framework, Lie theory, that models the hierarchical structure of natural transformations in the visual world. The investigators will develop generalizations of foundational signal processing transforms through explicit Lie group operations encoded in learnable G-Modules (Group-Modules). These modules directly tackle the combinatoric explosion in vision by factorizing images into shapes and their underlying transformations. Specifically, the team will develop G-modules that learn group-equivariant representations of the transformations contained in natural images (Aim 1), robust representations of shape by collapsing group orbits only with respect to specific transformations (Aim 2), and disentangling of transformation and shape via factorization (Aim 3). The modules are assembled into hierarchical architectures that can learn complex representations of transformations and shapes (Aim 4). Together, these aims provide a new paradigm that grounds existing models of vision and gives a set of guiding principles for the design of future deep learning architectures with enhanced abilities to sense and understand the world.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

寻求构建能够在其环境中感知、理解和行动的智能机器是我们这个时代面临的巨大科学挑战之一。尽管人工智能 (AI) 最近取得了进展，但实现理解物理世界并与物理世界交互的强大自主视觉系统仍然难以实现。从数学上讲，视觉需要理解各种各样的物体形状之间的关系，每个物体形状都会经历各种各样的几何和照明变换，从而导致可能的视觉场景的爆炸。该项目旨在通过开发一种以数学为基础的视觉计算理论来突破这一障碍，该理论将使新型神经网络学习算法能够将视觉场景解析为其组成对象和变换，从而使计算机能够更好地表示周围的世界。这项研究的结果和计算工具将通过课程、研讨会、黑客马拉松和为 Geomstats 库贡献的开源软件向科学界和公众传播。该项目的前提是，当前人工智能和人工智能的局限性计算机视觉可以通过适当的数学框架（李理论）来解决，该框架模拟视觉世界中自然变换的层次结构。研究人员将通过在可学习的 G 模块（组模块）中编码的显式李群运算来开发基础信号处理变换的概括。这些模块通过将图像分解为形状及其底层转换来直接解决视觉中的组合爆炸问题。具体来说，该团队将开发 G 模块，学习自然图像中包含的变换的群等变表示（目标 1）、通过仅针对特定变换折叠群轨道来实现稳健的形状表示（目标 2）以及变换的解开并通过因式分解形成形状（目标 3）。这些模块被组装成分层架构，可以学习变换和形状的复杂表示（目标 4）。这些目标共同提供了一个新的范式，为现有的视觉模型奠定了基础，并为未来深度学习架构的设计提供了一套指导原则，增强了感知和理解世界的能力。该奖项反映了 NSF 的法定使命，并被认为是值得的通过使用基金会的智力优势和更广泛的影响审查标准进行评估来获得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Bruno Olshausen其他文献

Edinburgh Research Explorer Expecting the World: Perception, Prediction, and the Origins of Human Knowledge

爱丁堡研究探索者展望世界：感知、预测和人类知识的起源

DOI：
10.1111/tran.12213
发表时间：
2024-09-13
期刊：
Transactions of the Institute of British Geographers
影响因子：
3.3
作者：
Nihat Ay;Ray Guillery;Bruno Olshausen;Murray Sherman;Fritz Sommer;Karl J. Friston;Daniel Dennett;Peter König;Suzanna Siegel;Mark D. Sprevak;Matteo Colombo;Matthew Nudds;Bill Phillips
通讯作者：
Bill Phillips