CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models

职业:检测、理解和修复自然语言处理模型中的漏洞

基本信息

  • 批准号:
    2046873
  • 负责人:
  • 金额:
    $ 50万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-07-01 至 2026-06-30
  • 项目状态:
    未结题

项目摘要

With recent advances in machine learning, models have achieved high accuracy on many challenging tasks in natural language processing (NLP) such as question answering, machine translation, and dialog agents, sometimes coming close to or beating human performance on these benchmarks. However, these NLP models often suffer from brittleness in many different ways: they latch onto erroneous artifacts, do not support natural variations in language, are not robust to adversarial attacks, and only work on a few domains. Existing pipelines for developing NLP models lack support for useful insights, and identifying bugs requires considerable effort from experts both in machine learning and the domain. This CAREER project develops several techniques to support this need for more robust training and evaluation pipelines for NLP, providing easy-to-use, scalable, and accurate mechanisms for identifying, understanding, and addressing NLP models' vulnerabilities. The developed methods will support diverse application areas such as conversational agents, sentiment classifiers, and abuse/hate speech detection. Further, the team engages with the developers of NLP models in academia and industry to develop a data science curriculum for K-12 education, particularly for students from underrepresented communities.Based on the notion of vulnerability as unexpected behavior on certain input transformations, the team will contribute across the following three thrusts. The first thrust identifies vulnerabilities by testing user-defined behaviors and searching over many possible vulnerabilities. In the second thrust, the investigators develop methods to understand the model's vulnerabilities by tracing the causes of errors to individual training data points and data artifacts. The last thrust will develop approaches to address vulnerabilities in models by directly injecting the vulnerability definitions into the model during training and using explanation-based annotations to supervise the models. These thrusts build upon the goals of behavioral testing, explanation-based interactions, and architecture agnosticism to support most current and future NLP models and applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
随着机器学习的最新进展,模型在自然语言处理(NLP)中的许多具有挑战性的任务(例如问答,机器翻译和对话框)中取得了很高的准确性,有时会在这些基准测试上接近或击败人类绩效。但是,这些NLP模型通常以许多不同的方式遭受脆弱性:它们锁定在错误的人工制作上,不支持语言的自然变化,对对抗性攻击并不强大,并且只能在一些领域上工作。用于开发NLP模型的现有管道缺乏对有用见解的支持,并且识别错误需要从机器学习和域中的专家大量努力。该职业项目开发了多种技术,以支持NLP更强大的培训和评估管道的需求,从而提供了易于使用,可扩展和准确的机制,以识别,理解和解决NLP模型的漏洞。开发的方法将支持各种应用领域,例如对话代理,情感分类器以及滥用/仇恨言论检测。此外,该团队与学术界和行业中NLP模型的开发商互动,以开发针对K-12教育的数据科学课程,尤其是对于来自代表性不足的社区的学生而言。基于脆弱性的概念,作为某些输入转换的意外行为,团队将在以下三个推力中做出贡献。第一个推力通过测试用户定义的行为并搜索许多可能的漏洞来确定漏洞。在第二个推力中,调查人员通过追踪到单个培训数据点和数据工件的错误原因来开发方法来了解模型的漏洞。最后的推力将通过将漏洞定义直接注入训练过程中,并使用基于解释的注释来监督模型,从而开发出解决模型中漏洞的方法。这些推力基于行为测试,基于解释的互动以及建筑不可知论的目标,以支持大多数当前和未来的NLP模型和应用。该奖项反映了NSF的法定任务,并被认为是通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来获得支持的。

项目成果

期刊论文数量(9)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Quantifying Social Biases Using Templates is Unreliable
使用模板量化社会偏见是不可靠的
Explaining machine learning models with interactive natural language conversations using TalkToModel
  • DOI:
    10.1038/s42256-023-00692-8
  • 发表时间:
    2022-07
  • 期刊:
  • 影响因子:
    23.8
  • 作者:
    Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
  • 通讯作者:
    Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
MISGENDERED: Limits of Large Language Models in Understanding Pronouns
性别错误:大型语言模型在理解代词方面的局限性
TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Singh
  • 通讯作者:
    Sameer Singh
Pylon: A PyTorch Framework for Learning with Constraints
Pylon:用于约束学习的 PyTorch 框架
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sameer Singh其他文献

Modeling Performance of Different Classification Methods : Deviation from the Power Law
不同分类方法的建模性能:偏离幂律
  • DOI:
  • 发表时间:
    2005
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Singh
  • 通讯作者:
    Sameer Singh
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
  • DOI:
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
  • 通讯作者:
    Sameer Singh
Multi-stage Classification for Audio Based Activity Recognition
基于音频的活动识别的多级分类
  • DOI:
    10.1007/11875581_100
  • 发表时间:
    2006
  • 期刊:
  • 影响因子:
    0
  • 作者:
    José Lopes;Charles Lin;Sameer Singh
  • 通讯作者:
    Sameer Singh
Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills
技能集优化:通过可转移技能强化语言模型行为
  • DOI:
    10.48550/arxiv.2402.03244
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
  • 通讯作者:
    Roy Fox
A survey of object recognition methods for automatic asset detection in high-definition video
高清视频中自动资产检测的对象识别方法综述

Sameer Singh的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sameer Singh', 18)}}的其他基金

Collaborative Research: RI: Small: Post hoc Explanations in the Wild: Exposing Vulnerabilities and Ensuring Robustness
合作研究:RI:小型:事后解释:暴露漏洞并确保稳健性
  • 批准号:
    2008956
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CCRI: ENS: Machine Learning Democratization via a Linked, Annotated Repository of Datasets
CCRI:ENS:通过链接、带注释的数据集存储库实现机器学习民主化
  • 批准号:
    1925741
  • 财政年份:
    2019
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CRII: RI: Explaining Decisions of Black-box Models via Input Perturbations
CRII:RI:通过输入扰动解释黑盒模型的决策
  • 批准号:
    1756023
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
RI: Small: Modeling Multiple Modalities for Knowledge-Base Construction
RI:小型:知识库构建的多种模式建模
  • 批准号:
    1817183
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant

相似国自然基金

基于深度理解的大规模互联网虚假新闻检测研究
  • 批准号:
    62302333
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于场景理解和视觉推理的光电集成芯片表面缺陷检测方法研究
  • 批准号:
    52375499
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
航母舰面保障作业全场景运动目标识别与跟踪技术研究
  • 批准号:
    61906173
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
复杂视频场景中多人在线行为分析研究
  • 批准号:
    61902101
  • 批准年份:
    2019
  • 资助金额:
    28.0 万元
  • 项目类别:
    青年科学基金项目
基于漏洞数据集的漏洞特征库分析与预测方法研究
  • 批准号:
    U1836211
  • 批准年份:
    2018
  • 资助金额:
    253.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

Detecting and Understanding Disparities in Pediatric Safety Events for Hospitalized Children
检测和了解住院儿童儿科安全事件的差异
  • 批准号:
    10661525
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
Detecting and Understanding Disparities in Pediatric Safety Events for Hospitalized Children
检测和了解住院儿童儿科安全事件的差异
  • 批准号:
    10450528
  • 财政年份:
    2022
  • 资助金额:
    $ 50万
  • 项目类别:
SaTC: CORE: Small: Collaborative: Understanding and Detecting Memory Bugs in Rust
SaTC:核心:小:协作:理解和检测 Rust 中的内存错误
  • 批准号:
    1955965
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: Collaborative: Understanding and Detecting Memory Bugs in Rust
SaTC:核心:小:协作:理解和检测 Rust 中的内存错误
  • 批准号:
    1956364
  • 财政年份:
    2020
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
CRII: CSR: Toward Understanding and Automatically Detecting Specious Configuration in Large Systems
CRII:CSR:理解和自动检测大型系统中的可疑配置
  • 批准号:
    1755737
  • 财政年份:
    2018
  • 资助金额:
    $ 50万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了