CAREER: Robust, Fair, and Culturally Aware Commonsense Reasoning in Natural Language

职业:用自然语言进行稳健、公平和具有文化意识的常识推理

基本信息

  • 批准号:
    2339746
  • 负责人:
  • 金额:
    $ 59.89万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-05-01 至 2029-04-30
  • 项目状态:
    未结题

项目摘要

Recent advances in artificial intelligence have led to the proliferation of Large Language Models (LLMs). LLMs are models that cane be used for interactions with human users through written language; for example, a user inputs an instruction or question in English to the LLM-based program, and the LLM outputs a response in fluent English. With these linguistic capabilities, LLMs are being developed for use in applications that are both ubiquitous (e.g., internet search, customer support, writing tools) and high-stakes (e.g., mental health care, classroom education, assistive technology for people with disabilities). Despite their growing adoption, many fundamental properties of LLMs aren’t yet well understood, and pressing questions remain about when and whether LLMs can be entrusted with such important tasks. For example, when instructed to make simple predictions about every-day situations, like cooking a meal or riding in a vehicle, LLMs can make strange and surprising errors, exhibiting concerning lapses in basic common sense judgment and reasoning abilities. Additionally, these predictions made by LLMs can reflect social stereotypes and cultural assumptions which, at best, limit the usefulness of the technology for certain populations and, at worst, cause active harm. This project seeks to address unfairness and bias due to stereotyping and cultural context by proposing a generalized framework for defeasible commonsense inference in natural language in which a system compares two similar situations with respect to their support for a given inference. The proposed work aims at developing scientific methods to measure and improve the abilities of LLMs to (1) reason correctly about every-day situations, (2) do so in a manner that is fair and unprejudiced, and (3) adapt these reasoning abilities across specific cultural contexts. By measuring these fundamental capabilities of LLMs, we can better understand and mitigate the risks of applying this technology in high-stakes settings.The three phases of the project focus on the (1) robustness, (2) social fairness, and (3) cultural awareness dimensions of reasoning in LLMs. The project assumes a basic task formulation in which a situation description is provided to an LLM (e.g., “Someone drops a glass”), and the LLM must either evaluate a possible inference, or generate an inference from scratch (“The glass breaks”). In phase 1, methods will be developed to automatically manipulate situation descriptions in order to train and evaluate an LLM’s ability to make nuanced inferences, with the goal of learning to distinguish which factors influence a particular inference and which ones do not (e.g., when trying to predict if a dropped glass is going to break, the thickness of the glass matters but the color of the glass does not.) In phase 2, methods will be developed to automatically test whether LLMs make socially fair inferences, for example via name substitution tests, and to intervene when a proposed output is detected as unfair. In phase 3, survey participants from the U.S. and Ghana will answer multiple stages of questions about every-day situations; the collected data will be used to develop evaluation questions for a case study on the adaptability of LLMs across these two cultural settings. For each phase of the project, the resulting datasets, methods, and scientific findings will be made available to the public.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
人工智能的最新进展导致了大型语言模型(LLM)的激增,这些模型可用于通过书面语言与人类用户进行交互;例如,用户用英语向 LLM 输入指令或问题。凭借这些语言能力,法学硕士正在开发用于无处不在的应用程序(例如互联网搜索、客户支持、写作工具)和应用程序。高风险(例如,心理健康保健、课堂教育、残疾人辅助技术)尽管法学硕士越来越被采用,但法学硕士的许多基本属性尚未得到充分理解,关于何时以及是否可以委托法学硕士的紧迫问题仍然存在。例如,当被要求对日常情况做出简单的预测(例如做饭或乘车)时,法学硕士可能会犯下奇怪且令人惊讶的错误,表现出基本常识判断和推理能力的失误。此外,法学硕士做出的这些预测可以反映社会刻板印象和文化假设,这些刻板印象和文化假设在最好的情况下限制了该技术对某些人群的有用性,在最坏的情况下,造成了积极的伤害。该项目旨在解决由于刻板印象和文化造成的不公平和偏见。提出了一个自然语言中可废止的常识推理的通用框架,其中一个系统比较两种相似的情况对给定推理的支持。拟议的工作旨在开发科学方法来衡量和提高法学硕士的能力。 (1) 正确地推理日常情况,(2) 以公平和不带偏见的方式进行推理,(3) 在特定的文化背景下调整这些推理能力通过衡量法学硕士的这些基本能力,我们可以更好地理解。并减轻在高风险环境中应用该技术的风险。该项目的三个阶段重点关注法学硕士基本任务制定中的(1)稳健性、(2)社会公平性和(3)文化意识维度。哪个向法学硕士提供情况描述(例如,“有人掉落了玻璃杯”),法学硕士必须评估可能的推论,或从头开始生成推论(“玻璃破碎”)。开发用于自动操纵情况描述,以训练和评估法学硕士做出细致入微的推论的能力,其目标是学习区分哪些因素影响特定推论,哪些因素不影响(例如,当试图预测是否会下降时)玻璃会破裂,玻璃的厚度很重要,但玻璃的颜色并不重要。)在第二阶段,将开发方法来自动测试法学硕士是否做出社会公平的推论,例如通过姓名替换测试,并进行干预当提议的产出被检测为不公平时,来自美国和加纳的调查参与者将回答有关日常情况的多个阶段的问题;收集的数据将用于制定有关适应性的案例研究的评估问题。跨越这两种文化的法学硕士对于该项目的每个阶段,所产生的数据集、方法和科学发现都将向公众开放。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查进行评估,被认为值得支持。标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rachel Rudinger其他文献

Neural Models of Factuality
事实性的神经模型
  • DOI:
    10.18653/v1/n18-1067
  • 发表时间:
    2018-04-06
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rachel Rudinger;Aaron Steven White;Benjamin Van Durme
  • 通讯作者:
    Benjamin Van Durme
Analyzing Stereotypes in Generative Text Inference Tasks
分析生成文本推理任务中的刻板印象
  • DOI:
    10.18653/v1/2021.findings-acl.355
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Anna Sotnikova;Yang Trista Cao;Hal Daumé;Rachel Rudinger
  • 通讯作者:
    Rachel Rudinger
The First Workshop on Commonsense Representation and Reasoning May 27 , 2022
第一届常识表示与推理研讨会 五月 27 , 2022
  • DOI:
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Antoine Bosselut;Xiangci Li;Bill Yuchen Lin;Vered Schwartz;Bodhisattwa Prasad Majumdar;Yash Kumar Lal;Rachel Rudinger;Xiang Ren;Niket T;on;on;Yucheng Lin
  • 通讯作者:
    Yucheng Lin
Pregnant Questions: The Importance of Pragmatic Awareness in Maternal Health Question Answering
怀孕问题:务实意识在孕产妇健康问答中的重要性
  • DOI:
  • 发表时间:
    2023-11-16
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Neha Srikanth;Rupak Sarkar;Rachel Rudinger;Jordan Boyd
  • 通讯作者:
    Jordan Boyd
Cross-lingual Decompositional Semantic Parsing
跨语言分解语义解析

Rachel Rudinger的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

强壮前沟藻共生细菌降解膦酸酯产生促藻效应的分子机制
  • 批准号:
    42306167
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于复合编码脉冲串的水下主动隐蔽性探测新方法研究
  • 批准号:
    61271414
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
半定松弛与非凸二次约束二次规划研究
  • 批准号:
    11271243
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
高效率强壮消息鉴别码的分析与设计
  • 批准号:
    61202422
  • 批准年份:
    2012
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
民航客运网络收益管理若干问题的研究
  • 批准号:
    60776817
  • 批准年份:
    2007
  • 资助金额:
    20.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

Robust privacy preserving distributed analysis platform for cancer research: addressing data bias and disparities
用于癌症研究的强大隐私保护分布式分析平台:解决数据偏差和差异
  • 批准号:
    10642562
  • 财政年份:
    2023
  • 资助金额:
    $ 59.89万
  • 项目类别:
Containerizing tasks to ensure robust AI/ML data curation pipelines to estimate environmental disparities in the rural south
将任务容器化,以确保强大的 AI/ML 数据管理管道,以估计南部农村的环境差异
  • 批准号:
    10842665
  • 财政年份:
    2022
  • 资助金额:
    $ 59.89万
  • 项目类别:
CAREER: Advancing Fair Data Mining via New Robust and Explainable Algorithms and Human-Centered Approaches
职业:通过新的稳健且可解释的算法和以人为本的方法推进公平数据挖掘
  • 批准号:
    2146091
  • 财政年份:
    2022
  • 资助金额:
    $ 59.89万
  • 项目类别:
    Standard Grant
Robust, Generalizable, and Fair Machine Learning Models for Biomedicine
稳健、可推广且公平的生物医学机器学习模型
  • 批准号:
    10275864
  • 财政年份:
    2021
  • 资助金额:
    $ 59.89万
  • 项目类别:
Robust workflow software for MRI tracking of glymphatic-lymphatic coupling
用于 MRI 跟踪类淋巴耦合的强大工作流程软件
  • 批准号:
    10609195
  • 财政年份:
    2021
  • 资助金额:
    $ 59.89万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了