RI: Small: Learning Dynamics and Evolution towards Cognitive Understanding of Videos

RI：小：视频认知理解的学习动态和演化

基本信息

批准号：
1813709
负责人：
Chenliang Xu
金额：
$ 45万
依托单位：
University of Rochester
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-09-01 至 2021-08-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1813709&HistoricalAwards=false
关键词：
RI Small Learning Dynamics Evolution

项目摘要

A fundamental capability of human intelligence is being able to learn to act by watching instructional videos. Such capability is reflected in abstraction and summarization of the instructional procedures as well as in answering questions such as "why" and "how" something happened in the video. This project aims to build computational models that are able to perform well in above tasks, which require, beyond the conventional recognition of objects, actions and attributes in the scene, the higher-order inference of any relations therein. Here, the higher-order inference refers to inference that cannot be answered immediately by direct observations and thus requires stronger semantics. The developed technology will enable many applications in other fields, e.g., multimedia (video indexing and retrieval), robotics (reasoning capability of why and how questions), and healthcare (assistive devices for visually impaired people). In addition, the project will contribute to education and diversity by involving underrepresented groups in research activities, integrating research results into teaching curriculum, and conducting outreach activities to local K-12 communities. The research will develop a framework to perform higher-order inference in understanding web instructional videos, such that models devised in this framework are capable of not only discovering and captioning procedures that constitute the instructional event but also answering questions such as why and how something happened. The framework is built on a video story graph that models the dynamics (the composition of actions at different scales) and evolution (the change in object states and attributes), and it supports higher-order inference upon deep learning units and incorporation of external knowledge graph in a unified framework. Methodologies to extract such video story graphs and use them to discover, caption procedures and perform question-answering will be explored. Expected outcomes of this project include: a software package for constructing and performing inference on video story graphs and incorporating external knowledge; a web-deployed system to process user-uploaded instructional videos; and a large video dataset with procedure and question-answering annotations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

人类智能的基本能力是能够通过观看教学视频来学习行动。这种能力反映在教学过程的抽象和汇总以及回答诸如视频中发生的“为什么”和“如何”之类的问题中。该项目旨在构建能够在上述任务中表现良好的计算模型，除了对场景中对象，动作和属性的常规识别之外，还需要其在其中任何关系中的高阶推断。在这里，高阶推断是指无法通过直接观察立即回答的推论，因此需要更强的语义。开发的技术将在其他领域中启用许多应用程序，例如多媒体（视频索引和检索），机器人技术（原因和方式的推理能力）和医疗保健（视觉受损的人的辅助设备）。此外，该项目将通过参与研究活动的代表性不足的群体，将研究成果纳入教学课程以及向当地K-12社区进行外展活动，从而为教育和多样性做出贡献。这项研究将开发一个框架以在理解Web教学视频中执行高阶推断，以便在此框架中设计的模型不仅能够发现构成教学事件的过程和字幕程序，还可以回答诸如为什么以及如何发生的问题。该框架建立在视频故事图上，该图形图表图（不同尺度上的动作组成）和进化（对象状态和属性的变化），并支持对深度学习单元的高阶推断，并在统一的框架中纳入了外部知识图。将探索方法来提取此类视频故事图并使用它们来发现，标题程序和执行问答的方法。该项目的预期结果包括：用于在视频故事图上构建和执行推断并结合外部知识的软件包；网络部署的系统，用于处理用户删除的教学视频；以及带有程序和提问注释的大型视频数据集。该奖项反映了NSF的法定任务，并且使用基金会的知识分子优点和更广泛的审查标准，被认为值得通过评估来获得支持。

项目成果

期刊论文数量（45）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Attentive Relational Networks for Mapping Images to Scene Graphs

DOI：
10.1109/cvpr.2019.00408
发表时间：
2019-01-01
期刊：
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019)
影响因子：
0
作者：
Qi, Mengshi;Li, Weijian;Luo, Jiebo
通讯作者：
Luo, Jiebo

Learning by Planning: Language-Guided Global Image Editing

DOI：
10.1109/cvpr46437.2021.01338
发表时间：
2021-01-01
期刊：
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021
影响因子：
0
作者：
Shi, Jing;Xu, Ning;Xu, Chenliang
通讯作者：
Xu, Chenliang

Video Re-localization via Cross Gated Bilinear Matching

通过交叉门双线性匹配进行视频重新定位

DOI：
发表时间：
2018
期刊：
2018 European Conference on Computer Vision (ECCV
影响因子：
0
作者：
Yang Feng, Lin Ma
通讯作者：
Yang Feng, Lin Ma

Audio-Visual Event Localization in the Wild

野外视听事件定位

DOI：
发表时间：
2019
期刊：
IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
影响因子：
0
作者：
Tian, Yapeng;Shi, Jing;Li, Bochen;Duan, Zhiyao;Xu, Chenliang
通讯作者：
Xu, Chenliang

Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing