NSF-NSERC: SaTC: CORE: Small: Managing Risks of AI-generated Code in the Software Supply Chain

NSF-NSERC:SaTC:核心:小型:管理软件供应链中人工智能生成代码的风险

基本信息

  • 批准号:
    2341206
  • 负责人:
  • 金额:
    $ 60万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-06-01 至 2027-05-31
  • 项目状态:
    未结题

项目摘要

Modern software is created by combining pre-existing software packages into a software product. This approach is enabled by the growing popularity of the Open-Source paradigm, where the source code of software packages is made available under licenses that allow reuse. This approach speeds up software development with significant economic benefits, but also creates the risk of inadvertently importing vulnerable code into critical software tools. The risk is further compounded by the increasing use of Artificial Intelligence (AI) tools for code generation in Open-Source development. These tools must be trained on enormous amounts of data, which is not always rigorously reviewed, and thus they may learn to generate vulnerable code. To make matters worse, malicious parties may actively inject malicious code in their training set. Unfortunately, all these issues are still poorly understood. This project aims at measuring and mitigating the risks emerging from AI-generated code in the software supply chain. It will investigate how prevalent the use of AI tools is, and characterize the security risks they entail. In doing so, it will address pressing economic and societal needs: AI promises to bring significant benefits to software development, but those can only be achieved if its risks are mitigated. The research outcomes will be disseminated through workshops and hackathons, and the results will become part of curriculum and courses. The work will benefit the open-source community by producing provenance tools to improve software supply chain security. The project is a collaboration with researchers from Canada with complementary expertise that provides additional resources to the project. Technically, the AI tools being investigated consist of various Large Language Models (LLM) for code generation. The threat model of interest is one where a developer inserts vulnerable LLM-generated code into a security-critical program, be it due to low-quality code generation or using a poisoned/backdoored LLM. This project consists of three thrusts, each addressing a research question relevant to the threat model: (i) how, and to what extent, LLM code can be distinguished from code written by humans; (ii) to what extent LLM code is already present in the supply chain, and what are its security implications; and (iii) to what extent poisoning attacks against LLM code generation can succeed in realistic conditions. In thrust (i), this project extends existing code stylometry techniques, until now used to distinguish human programmers, to the novel problem of distinguishing human- and LLM-generated code. In thrust (ii), the investigators conduct measurement studies of Open-Source software, generating empirical understanding of the presence and implications of LLM-generated code in the supply chain. Finally, thrust (iii) looks at the practical feasibility of code backdoors, and the effectiveness of automated reputation-based vetting as a defense.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代软件是通过将现有的软件包组合到软件产品中来创建的。这种方法是通过开源范例的日益普及而实现的,其中软件包的源代码在允许重用的许可证下提供。这种方法加快了软件开发速度,带来了显着的经济效益,但也带来了无意中将易受攻击的代码导入关键软件工具的风险。由于开源开发中越来越多地使用人工智能 (AI) 工具来生成代码,这一风险进一步加剧。这些工具必须接受大量数据的训练,而这些数据并不总是经过严格的审查,因此它们可能会学会生成易受攻击的代码。更糟糕的是,恶意方可能会主动在其训练集中注入恶意代码。不幸的是,所有这些问题仍然知之甚少。该项目旨在衡量和减轻软件供应链中人工智能生成的代码所带来的风险。它将调查人工智能工具的使用有多普遍,并描述它们所带来的安全风险。在此过程中,它将解决紧迫的经济和社会需求:人工智能有望为软件开发带来巨大的好处,但只有在降低风险的情况下才能实现这些好处。研究成果将通过研讨会和黑客马拉松传播,成果将成为课程和课程的一部分。这项工作将通过生产出处工具来提高软件供应链的安全性,从而使开源社区受益。 该项目是与加拿大研究人员合作的,他们具有互补的专业知识,为该项目提供了额外的资源。从技术上讲,正在研究的人工智能工具由用于代码生成的各种大型语言模型(LLM)组成。感兴趣的威胁模型是开发人员将易受攻击的 LLM 生成的代码插入到安全关键程序中的模型,无论是由于低质量的代码生成还是使用中毒/后门的 LLM。该项目由三个主旨组成,每个主旨都解决与威胁模型相关的研究问题:(i)如何以及在多大程度上将LLM代码与人类编写的代码区分开来; (ii) LLM 代码在多大程度上已存在于供应链中,以及其安全影响是什么; (iii) 针对 LLM 代码生成的中毒攻击在现实条件下能够成功到什么程度。在主旨 (i) 中,该项目将迄今为止用于区分人类程序员的现有代码风格测量技术扩展到区分人类和法学硕士生成的代码的新问题。在推力(ii)中,研究人员对开源软件进行了测量研究,对供应链中法学硕士生成的代码的存在和影响产生了实证理解。最后,推力 (iii) 着眼于代码后门的实际可行性,以及基于声誉的自动审查作为防御的有效性。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优点和更广泛的评估进行评估,被认为值得支持。影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Rachel Greenstadt其他文献

Feature Vector Difference based Authorship Verification for Open-World Settings
开放世界设置中基于特征向量差异的作者身份验证
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Janith Weerasinghe;Rhia Singh;Rachel Greenstadt
  • 通讯作者:
    Rachel Greenstadt
From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions
从用户洞察到可操作的指标:以用户为中心的隐私保护浏览器扩展评估
Stoking the Flames: Understanding Escalation in an Online Harassment Community
煽风点火:了解在线骚扰社区的升级
This paper is included in the Proceedings of the 32nd USENIX Security Symposium
本文收录于第32届USENIX安全研讨会论文集
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Alan F. Luo;N. Warford;Samuel Dooley;Rachel Greenstadt;Michelle L. Mazurek;Nora McDonald
  • 通讯作者:
    Nora McDonald
Challenges in Restructuring Community-based Moderation
重组基于社区的审核面临的挑战
  • DOI:
    10.48550/arxiv.2402.17880
  • 发表时间:
    2024-02-27
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chau Tran;Kejsi Take;Kaylea Champion;Benjamin Mako Hill;Rachel Greenstadt
  • 通讯作者:
    Rachel Greenstadt

Rachel Greenstadt的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Rachel Greenstadt', 18)}}的其他基金

Collaborative Research: Conference: 2023 Workshop for Aspiring PIs in Secure and Trusted Cyberspace
协作研究:会议:2023 年安全可信网络空间中有抱负的 PI 研讨会
  • 批准号:
    2247405
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Threat Intelligence for Targets of Coordinated Harassment
协作研究:SaTC:核心:中:协调骚扰目标的威胁情报
  • 批准号:
    2016061
  • 财政年份:
    2020
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Medium: Threat Intelligence for Targets of Coordinated Harassment
协作研究:SaTC:核心:中:协调骚扰目标的威胁情报
  • 批准号:
    2016061
  • 财政年份:
    2020
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
  • 批准号:
    2031951
  • 财政年份:
    2019
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
  • 批准号:
    1931005
  • 财政年份:
    2019
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
  • 批准号:
    1813697
  • 财政年份:
    2018
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
  • 批准号:
    1703736
  • 财政年份:
    2017
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
Student Travel Support: Privacy Enhancing Technology Symposium (PETS) 2015
学生旅行支持:隐私增强技术研讨会 (PETS) 2015
  • 批准号:
    1523108
  • 财政年份:
    2015
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
EAGER: Cybercrime Science
EAGER:网络犯罪科学
  • 批准号:
    1347151
  • 财政年份:
    2013
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
CAREER: Privacy Analytics for Users in a Big Data World
职业:大数据世界中用户的隐私分析
  • 批准号:
    1253418
  • 财政年份:
    2013
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant

相似海外基金

NSF-NSERC: Fairness Fundamentals: Geometry-inspired Algorithms and Long-term Implications
NSF-NSERC:公平基础:几何启发的算法和长期影响
  • 批准号:
    2342253
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
NSF-NSERC: Building a two-qubit controlled phase gate using laterally coupled semiconductor quantum dots
NSF-NSERC:使用横向耦合半导体量子点构建两个量子位控制的相位门
  • 批准号:
    2317047
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
L2M NSERC - Integrated Microfluidic Electrochemical Assay for Cervical Cancer Detection at Point-of-Care Testing
L2M NSERC - 用于即时检测宫颈癌检测的集成微流控电化学分析
  • 批准号:
    576535-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Idea to Innovation
SuperNOVA NSERC PromoScience Supplement for Science Odyssey (Spring 2022)
《科学奥德赛》的 SuperNOVA NSERC PromoScience 增刊(2022 年春季)
  • 批准号:
    571642-2022
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    PromoScience Supplement for Science Odyssey
NSERC-IRCC in In-situ Oil Sands Steam Generation and Clean Technologies
NSERC-IRCC 现场油砂蒸汽发电和清洁技术
  • 批准号:
    488011-2020
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Industrial Research Chairs for Colleges Grants
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了