NSF-NSERC: SaTC: CORE: Small: Managing Risks of AI-generated Code in the Software Supply Chain
NSF-NSERC:SaTC:核心:小型:管理软件供应链中人工智能生成代码的风险
基本信息
- 批准号:2341206
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2024
- 资助国家:美国
- 起止时间:2024-06-01 至 2027-05-31
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Modern software is created by combining pre-existing software packages into a software product. This approach is enabled by the growing popularity of the Open-Source paradigm, where the source code of software packages is made available under licenses that allow reuse. This approach speeds up software development with significant economic benefits, but also creates the risk of inadvertently importing vulnerable code into critical software tools. The risk is further compounded by the increasing use of Artificial Intelligence (AI) tools for code generation in Open-Source development. These tools must be trained on enormous amounts of data, which is not always rigorously reviewed, and thus they may learn to generate vulnerable code. To make matters worse, malicious parties may actively inject malicious code in their training set. Unfortunately, all these issues are still poorly understood. This project aims at measuring and mitigating the risks emerging from AI-generated code in the software supply chain. It will investigate how prevalent the use of AI tools is, and characterize the security risks they entail. In doing so, it will address pressing economic and societal needs: AI promises to bring significant benefits to software development, but those can only be achieved if its risks are mitigated. The research outcomes will be disseminated through workshops and hackathons, and the results will become part of curriculum and courses. The work will benefit the open-source community by producing provenance tools to improve software supply chain security. The project is a collaboration with researchers from Canada with complementary expertise that provides additional resources to the project. Technically, the AI tools being investigated consist of various Large Language Models (LLM) for code generation. The threat model of interest is one where a developer inserts vulnerable LLM-generated code into a security-critical program, be it due to low-quality code generation or using a poisoned/backdoored LLM. This project consists of three thrusts, each addressing a research question relevant to the threat model: (i) how, and to what extent, LLM code can be distinguished from code written by humans; (ii) to what extent LLM code is already present in the supply chain, and what are its security implications; and (iii) to what extent poisoning attacks against LLM code generation can succeed in realistic conditions. In thrust (i), this project extends existing code stylometry techniques, until now used to distinguish human programmers, to the novel problem of distinguishing human- and LLM-generated code. In thrust (ii), the investigators conduct measurement studies of Open-Source software, generating empirical understanding of the presence and implications of LLM-generated code in the supply chain. Finally, thrust (iii) looks at the practical feasibility of code backdoors, and the effectiveness of automated reputation-based vetting as a defense.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代软件是通过将预先存在的软件包组合到软件产品中来创建的。开源范式的日益普及来实现这种方法,在该范式中,软件包的源代码可根据允许重复使用的许可提供。这种方法可以加快具有重大经济利益的软件开发,但也产生了将脆弱的代码无意中导入到关键软件工具中的风险。通过越来越多的人工智能(AI)工具用于开发代码生成的工具,风险进一步加剧了风险。这些工具必须经过大量数据的培训,这些数据并不总是严格审查,因此它们可以学会生成脆弱的代码。更糟糕的是,恶意派对可能会在其培训设置中积极注入恶意代码。不幸的是,所有这些问题仍然很熟悉。该项目旨在衡量和减轻软件供应链中AI生成的代码所产生的风险。它将调查AI工具的使用多么普遍,并表征其所带来的安全风险。通过这样做,它将满足紧迫的经济和社会需求:AI有望为软件开发带来重大利益,但是只有在减轻风险的情况下才能实现这些收益。研究成果将通过研讨会和黑客马拉松传播,结果将成为课程和课程的一部分。这项工作将通过生产出来源工具来提高软件供应链安全性,从而使开源社区受益。 该项目是与加拿大研究人员的合作,具有互补的专业知识,为项目提供了更多资源。从技术上讲,正在研究的AI工具由各种大型语言模型(LLM)组成,以生成代码。感兴趣的威胁模型是开发人员将脆弱的LLM生成的代码插入至关重要的安全程序中的一种,无论是由于低质量代码生成还是使用中毒/后门的LLM。该项目由三个推力组成,每个推力都解决了与威胁模型相关的研究问题:(i)如何以及在何种程度上可以将LLM代码与人类编写的代码区分开来; (ii)供应链中已经存在LLM代码在多大程度上,其安全性含义是什么; (iii)在何种程度上,对LLM代码生成的中毒攻击在现实条件下可以成功。在推力(i)中,该项目扩展了现有的代码样式技术,直到现在为了区分人类程序员,以区分人类和LLM生成的代码的新问题。在推力(II)中,研究人员对开源软件进行了测量研究,从而对LLM生成的代码在供应链中的存在和含义产生了经验理解。最后,推力(iii)着眼于代码后门的实际可行性,以及基于自动声誉的审查作为辩护的有效性。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛影响的审查标准来评估的支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Rachel Greenstadt其他文献
Challenges in Restructuring Community-based Moderation
重组基于社区的审核面临的挑战
- DOI:
10.48550/arxiv.2402.17880 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Chau Tran;Kejsi Take;Kaylea Champion;Benjamin Mako Hill;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions
从用户洞察到可操作的指标:以用户为中心的隐私保护浏览器扩展评估
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Ritik Roongta;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Stoking the Flames: Understanding Escalation in an Online Harassment Community
煽风点火:了解在线骚扰社区的升级
- DOI:
10.1145/3641015 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Kejsi Take;Victoria Zhong;Chris Geeng;Emmi Bevensee;Damon McCoy;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Feature Vector Difference based Authorship Verification for Open-World Settings
开放世界设置中基于特征向量差异的作者身份验证
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Janith Weerasinghe;Rhia Singh;Rachel Greenstadt - 通讯作者:
Rachel Greenstadt
Rachel Greenstadt的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Rachel Greenstadt', 18)}}的其他基金
Collaborative Research: Conference: 2023 Workshop for Aspiring PIs in Secure and Trusted Cyberspace
协作研究:会议:2023 年安全可信网络空间中有抱负的 PI 研讨会
- 批准号:
2247405 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Medium: Threat Intelligence for Targets of Coordinated Harassment
协作研究:SaTC:核心:中:协调骚扰目标的威胁情报
- 批准号:
2016061 - 财政年份:2020
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
- 批准号:
2031951 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
- 批准号:
1931005 - 财政年份:2019
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Small: Collaborative: Understanding and Mitigating Adversarial Manipulation of Content Curation Algorithms
SaTC:核心:小型:协作:理解和减轻内容管理算法的对抗性操纵
- 批准号:
1813697 - 财政年份:2018
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
SaTC: CORE: Medium: Collaborative: Measuring the Value of Anonymous Online Participation
SaTC:核心:媒介:协作:衡量匿名在线参与的价值
- 批准号:
1703736 - 财政年份:2017
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Student Travel Support: Privacy Enhancing Technology Symposium (PETS) 2015
学生旅行支持:隐私增强技术研讨会 (PETS) 2015
- 批准号:
1523108 - 财政年份:2015
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
CAREER: Privacy Analytics for Users in a Big Data World
职业:大数据世界中用户的隐私分析
- 批准号:
1253418 - 财政年份:2013
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
EAGER: Investigating Diversity in Online Community Filtering
EAGER:调查在线社区过滤的多样性
- 批准号:
1048515 - 财政年份:2010
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似海外基金
NSF-NSERC: Fairness Fundamentals: Geometry-inspired Algorithms and Long-term Implications
NSF-NSERC:公平基础:几何启发的算法和长期影响
- 批准号:
2342253 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
NSF-NSERC: Building a two-qubit controlled phase gate using laterally coupled semiconductor quantum dots
NSF-NSERC:使用横向耦合半导体量子点构建两个量子位控制的相位门
- 批准号:
2317047 - 财政年份:2023
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
NSERC/BC SPCA Industrial Research Chair in Animal Welfare
NSERC/BC SPCA 动物福利工业研究主席
- 批准号:
554745-2019 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Industrial Research Chairs
L2M NSERC - A Novel Mechanical Sensor for Online Flow Rate Monitoring in Subsea Pipeline Networks
L2M NSERC - 用于海底管网在线流量监测的新型机械传感器
- 批准号:
580749-2023 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Idea to Innovation
L2M NSERC - Terahertz wired technology for future networks
L2M NSERC - 面向未来网络的太赫兹有线技术
- 批准号:
580661-2023 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Idea to Innovation