CAREER: Detecting, Understanding, and Fixing Vulnerabilities in Natural Language Processing Models

职业：检测、理解和修复自然语言处理模型中的漏洞

基本信息

批准号：
2046873
负责人：
Sameer Singh
金额：
$ 50万
依托单位：
University of California-Irvine
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-07-01 至 2026-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2046873&HistoricalAwards=false
关键词：
CAREER Detecting Understanding Fixing Vulnerabilities

项目摘要

With recent advances in machine learning, models have achieved high accuracy on many challenging tasks in natural language processing (NLP) such as question answering, machine translation, and dialog agents, sometimes coming close to or beating human performance on these benchmarks. However, these NLP models often suffer from brittleness in many different ways: they latch onto erroneous artifacts, do not support natural variations in language, are not robust to adversarial attacks, and only work on a few domains. Existing pipelines for developing NLP models lack support for useful insights, and identifying bugs requires considerable effort from experts both in machine learning and the domain. This CAREER project develops several techniques to support this need for more robust training and evaluation pipelines for NLP, providing easy-to-use, scalable, and accurate mechanisms for identifying, understanding, and addressing NLP models' vulnerabilities. The developed methods will support diverse application areas such as conversational agents, sentiment classifiers, and abuse/hate speech detection. Further, the team engages with the developers of NLP models in academia and industry to develop a data science curriculum for K-12 education, particularly for students from underrepresented communities.Based on the notion of vulnerability as unexpected behavior on certain input transformations, the team will contribute across the following three thrusts. The first thrust identifies vulnerabilities by testing user-defined behaviors and searching over many possible vulnerabilities. In the second thrust, the investigators develop methods to understand the model's vulnerabilities by tracing the causes of errors to individual training data points and data artifacts. The last thrust will develop approaches to address vulnerabilities in models by directly injecting the vulnerability definitions into the model during training and using explanation-based annotations to supervise the models. These thrusts build upon the goals of behavioral testing, explanation-based interactions, and architecture agnosticism to support most current and future NLP models and applications.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

随着机器学习的最新进展，模型在自然语言处理（NLP）中的许多具有挑战性的任务（例如问答，机器翻译和对话框）中取得了很高的准确性，有时会在这些基准测试上接近或击败人类绩效。但是，这些NLP模型通常以许多不同的方式遭受脆弱性：它们锁定在错误的人工制作上，不支持语言的自然变化，对对抗性攻击并不强大，并且只能在一些领域上工作。用于开发NLP模型的现有管道缺乏对有用见解的支持，并且识别错误需要从机器学习和域中的专家大量努力。该职业项目开发了多种技术，以支持NLP更强大的培训和评估管道的需求，从而提供了易于使用，可扩展和准确的机制，以识别，理解和解决NLP模型的漏洞。开发的方法将支持各种应用领域，例如对话代理，情感分类器以及滥用/仇恨言论检测。此外，该团队与学术界和行业中NLP模型的开发商互动，以开发针对K-12教育的数据科学课程，尤其是对于来自代表性不足的社区的学生而言。基于脆弱性的概念，作为某些输入转换的意外行为，团队将在以下三个推力中做出贡献。第一个推力通过测试用户定义的行为并搜索许多可能的漏洞来确定漏洞。在第二个推力中，调查人员通过追踪到单个培训数据点和数据工件的错误原因来开发方法来了解模型的漏洞。最后的推力将通过将漏洞定义直接注入训练过程中，并使用基于解释的注释来监督模型，从而开发出解决模型中漏洞的方法。这些推力基于行为测试，基于解释的互动以及建筑不可知论的目标，以支持大多数当前和未来的NLP模型和应用。该奖项反映了NSF的法定任务，并被认为是通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来获得支持的。

项目成果

期刊论文数量（9）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Quantifying Social Biases Using Templates is Unreliable

使用模板量化社会偏见是不可靠的

DOI：
发表时间：
2022
期刊：
NeurIPS Workshop on Trustworthy and Socially Responsible Machine Learning (TSRML
影响因子：
0
作者：
Seshadri, Preethi;Pezeshkpour, Pouya;Singh, Sameer
通讯作者：
Singh, Sameer

Explaining machine learning models with interactive natural language conversations using TalkToModel

DOI：
10.1038/s42256-023-00692-8
发表时间：
2022-07
期刊：
Nature Machine Intelligence
影响因子：
23.8
作者：
Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh
通讯作者：
Dylan Slack;Satyapriya Krishna;Himabindu Lakkaraju;Sameer Singh

MISGENDERED: Limits of Large Language Models in Understanding Pronouns

性别错误：大型语言模型在理解代词方面的局限性

DOI：
10.18653/v1/2023.acl-long.293
发表时间：
2023
期刊：
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers
影响因子：
0
作者：
Hossain, Tamanna;Dev, Sunipa;Singh, Sameer
通讯作者：
Singh, Sameer

TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Sameer Singh
通讯作者：
Sameer Singh

Pylon: A PyTorch Framework for Learning with Constraints

Pylon：用于约束学习的 PyTorch 框架

DOI：
10.1609/aaai.v36i11.21711
发表时间：
2022
期刊：
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track
影响因子：
0
作者：
Ahmed, Kareem;Li, Tao;Ton, Thy;Guo, Quan;Chang, Kai-Wei;Kordjamshidi, Parisa;Srikumar, Vivek;Van den Broeck, Guy;Singh, Sameer
通讯作者：
Singh, Sameer

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Sameer Singh其他文献

Modeling Performance of Different Classification Methods : Deviation from the Power Law

不同分类方法的建模性能：偏离幂律

DOI：
发表时间：
2005
期刊：
影响因子：
0
作者：
Sameer Singh
通讯作者：
Sameer Singh

ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution

ezCoref：一种收集众包注释以进行共指解析的可扩展方法

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh
通讯作者：
Sameer Singh

Multi-stage Classification for Audio Based Activity Recognition

基于音频的活动识别的多级分类

DOI：
10.1007/11875581_100
发表时间：
2006
期刊：
影响因子：
0
作者：
José Lopes;Charles Lin;Sameer Singh
通讯作者：
Sameer Singh

Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills

技能集优化：通过可转移技能强化语言模型行为

DOI：
10.48550/arxiv.2402.03244
发表时间：
2024
期刊：
ArXiv
影响因子：
0
作者：
Kolby Nottingham;Bodhisattwa Prasad Majumder;Bhavana Dalvi;Sameer Singh;Peter Clark;Roy Fox
通讯作者：
Roy Fox