CAREER: Natural Narratives and Multimodal Context as Weak Supervision for Learning Object Categories

职业:自然叙事和多模态上下文作为学习对象类别的弱监督

基本信息

  • 批准号:
    2046853
  • 负责人:
  • 金额:
    $ 54.71万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-05-01 至 2026-04-30
  • 项目状态:
    未结题

项目摘要

This project develops a framework to train computer vision models for detection of objects from weak, naturally-occurring supervision of language (text or speech) and additional multimodal signals. It considers dynamic settings, where humans interact with their visual environment and refer to the encountered objects, e.g., “Carefully put the tomato plants in the ground” and “Please put the phone down and come set the table,” and captions written for a human audience to complement an image, e.g., news article captions. The challenge of using such language-based supervision for training detection systems is that along with useful signal, the speech contains many irrelevant tokens. The project will benefit society by exploring novel avenues for overcoming this challenge and reducing the need for expensive and potentially unnatural crowdsourced labels for training. It has the potential to make object detection systems more scalable and thus more usable by a broad user base in a variety of settings. The resources and tools developed would allow natural, lightweight learning in different environments, e.g., different languages or types of imagery where the well-known object categories are not useful or where there is a shift in both the pixels as well as the way in which humans refer to objects (different cultures, medicine, art). This project opens possibilities for learning in vivo rather than in vitro; while the focus here is on object categories, multimodal weak supervision is useful for a larger variety of tasks. Research and education are integrated through local community outreach and research mentoring for students from lesser-known universities, new programs for student training including honing graduate students' writing skills, and development of interactive educational modules and demos based on research findings. This project creatively connects two domains, vision-and-language, and object detection, and pioneers training of object detection models with weak language supervision and a large vocabulary of potential classes. The impact of noise in the language channel will be mitigated through three complementary techniques that model visual concreteness of words, to what extent the text refers to the visual environment it appears with, and whether the weakly-supervised models that are learned are logically consistent. Two complementary word-region association mechanisms will be used (metric learning and cross-modal transformers), whose application is novel for weakly-supervised detection. Importantly, to make detection feasible, not only the semantics of image-text pairs, but their discourse relationship, will be captured. To facilitate and disambiguate the association of words to a physical environment, the latter will be represented through additional modalities, namely sound, motion, depth and touch, which are either present in the data or estimated. This project advances knowledge of how multimodal cues contextualize the relation between image and text; no prior work has modeled image-text relationships along multiple channels (sound, depth, touch, motion). Finally, to connect the appearance of objects to the purpose and use of these objects, relationships between objects, properties and actions will be semantically organized in a graph, and grammars to represent activities involving objects will be extracted, still maintaining the weakly-supervised setting.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目开发了一个框架,用于检测来自弱的,自然的监督或语音的对象)和其他多模式信号。 ,“并写有书面范围来补充图像,例如,新闻文章捕获。使用此类基于语言的监督进行培训检测系统的挑战是,与有用的信号一起,演讲包含许多IRELELELELELELELELVANT令牌。LL通过探索探索社会来利益。克服和减少对训练的昂贵和不自然的需求。最终的医学,艺术)。较鲜为人知的大学基于研究发现,磨练了研究生的写作技巧,并开发互动模块和演示。监督和大量的潜在类别的噪声。弱监督的检测是可行的,不仅是图像 - textt对的语义,而且是通过估计的ADITITS来重复的关于沿多个通道的多模式将图像和文本之间的关系(声音,深度,触摸,运动)将在图形上进行语义组织以提取不满意的活动,弱监督的环境。该奖项反映了NSF'Sf'sf'Sf'Story Mission D被认为是值得通过Toundation的智力优点和更广泛的影响审查标准通过评估来获得支持的。

项目成果

期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Weakly-Supervised Action Detection Guided by Audio Narration
Improving language-supervised object detection with linguistic structure analysis
Boosting Weakly Supervised Object Detection using Fusion and Priors from Hallucinated Depth
Complementary Cues from Audio Help Combat Noise in Weakly-Supervised Object Detection
Learning to Overcome Noise in Weak Caption Supervision for Object Detection
  • DOI:
    10.1109/tpami.2022.3187350
  • 发表时间:
    2022-06
  • 期刊:
  • 影响因子:
    23.6
  • 作者:
    Mesut Erhan Unal;Keren Ye;Mingda Zhang;Christopher Thomas;Adriana Kovashka;Wei Li;Danfeng Qin;
  • 通讯作者:
    Mesut Erhan Unal;Keren Ye;Mingda Zhang;Christopher Thomas;Adriana Kovashka;Wei Li;Danfeng Qin;
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Adriana Kovashka其他文献

Detecting Sexually Provocative Images
检测性挑逗图像
Syntharch: Interactive Image Search with Attribute-Conditioned Synthesis
Syntharch:具有属性条件合成的交互式图像搜索
Inferring Visual Persuasion via Body Language, Setting, and Deep Features
通过肢体语言、场景和深层特征推断视觉说服力
Interactive image search with attributes
  • DOI:
  • 发表时间:
    2014-08
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adriana Kovashka
  • 通讯作者:
    Adriana Kovashka
Dorian: Music Recommendation Strategies using Social Network Mining
Dorian:使用社交网络挖掘的音乐推荐策略
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Adriana Kovashka
  • 通讯作者:
    Adriana Kovashka

Adriana Kovashka的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Adriana Kovashka', 18)}}的其他基金

RI: Small: Multilingual Supervision for Object Detection under Geographic Domain and Concept Shifts
RI:小型:地理领域和概念转变下目标检测的多语言监督
  • 批准号:
    2329992
  • 财政年份:
    2023
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Travel: Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
旅行:为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    2222346
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
RI: Small: Domain-robust object detection through shape and context
RI:小:通过形状和上下文进行领域稳健的对象检测
  • 批准号:
    2006885
  • 财政年份:
    2020
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1742714
  • 财政年份:
    2017
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
RI: Small: Modeling Vividness and Symbolism for Decoding Visual Rhetoric
RI:小:建模生动性和象征意义以解码视觉修辞
  • 批准号:
    1718262
  • 财政年份:
    2017
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
CRII: RI: Automatically Understanding the Messages and Goals of Visual Media
CRII:RI:自动理解视觉媒体的信息和目标
  • 批准号:
    1566270
  • 财政年份:
    2016
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1630019
  • 财政年份:
    2016
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Group Travel Grant for the Doctoral Consortium of the IEEE Conference on Computer Vision and Pattern Recognition
为 IEEE 计算机视觉和模式识别会议博士联盟提供团体旅行补助金
  • 批准号:
    1529929
  • 财政年份:
    2015
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant

相似国自然基金

自然接触对青少年网络问题行为的作用机制及其干预
  • 批准号:
    72374025
  • 批准年份:
    2023
  • 资助金额:
    40 万元
  • 项目类别:
    面上项目
基于自然人群队列评估口腔菌群在食管癌前病变/癌发病中的作用研究
  • 批准号:
    82304214
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
分子影像示踪自然杀伤细胞在新生儿缺氧缺血性脑病中的作用和机制
  • 批准号:
    82371929
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
自然语言驱动的建筑物室内外一体化矢量模型重建方法
  • 批准号:
    42371457
  • 批准年份:
    2023
  • 资助金额:
    48 万元
  • 项目类别:
    面上项目
耦合生态风险与关键生态系统服务功能的青藏高原自然保护地格局优化研究
  • 批准号:
    32301380
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Challenging Cultural Norms through Asset-focused Narratives: Examining Intersecting Stigmatized Identities from Graduate Student and Faculty Perspectives in the Natural Sciences
通过以资产为中心的叙事挑战文化规范:从自然科学研究生和教师的角度审视交叉的污名化身份
  • 批准号:
    2321219
  • 财政年份:
    2023
  • 资助金额:
    $ 54.71万
  • 项目类别:
    Standard Grant
Using Narratives to Identify Stigma Phenotypes - A Socio-Ecological Approach
使用叙述来识别耻辱表型 - 社会生态学方法
  • 批准号:
    10508469
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
Extraction of Symptom Burden from Clinical Narratives of Cancer Patients using Natural Language Processing
使用自然语言处理从癌症患者的临床叙述中提取症状负担
  • 批准号:
    10591957
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
Joint learning methods for event and relation extraction from clinical narratives
从临床叙述中提取事件和关系的联合学习方法
  • 批准号:
    10507223
  • 财政年份:
    2022
  • 资助金额:
    $ 54.71万
  • 项目类别:
Extraction of Symptom Burden from Clinical Narratives of Cancer Patients using Natural Language Processing
使用自然语言处理从癌症患者的临床叙述中提取症状负担
  • 批准号:
    10179677
  • 财政年份:
    2021
  • 资助金额:
    $ 54.71万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了