EAGER: Proof-Carrying Code Completions

EAGER：携带证明的代码完成

基本信息

批准号：
2403762
负责人：
Caleb Stanford
金额：
$ 30万
依托单位：
University of California-Davis
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-02-15 至 2025-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2403762&HistoricalAwards=false
关键词：
EAGER Proof Carrying Code Completions

项目摘要

Today's programmers are using large language models (LLMs) to accelerate software development by automatically generating code suggestions and code completions. Widely used examples include GitHub Copilot and OpenAI ChatGPT. However, code generated by these tools can have bugs that are not caught by users, and this presents a serious safety risk. This project will leverage an idea called "proof-carrying code" where code suggestions are packaged together with a mathematical proof of their safety, allowing programmers to be confident that the program is safe to deploy. This project will develop tools, techniques, and empirical results for using LLMs to generate trustworthy code together with mathematical proofs. Project outcomes, including code, data sets and course materials, will be developed in the open and made available online to researchers working on LLMs, end users of LLM-based code generation, and early industry and open source adopters.In the 1990s, researchers in the programming languages community recognized a powerful idea known as proof-carrying code (PCC): they showed how code can be shipped together with a proof of its safety that could be vetted – efficiently – by an end user. LLMs can be viewed as high-resource computations, and LLM users as low-resource entities. Seen through this lens, PCC maps naturally to the safety problem for LLM-generated code. The technical aims of this project are divided into four thrusts: (1) Gather empirical data on code that is currently generated by LLMs, and to determine core safety risks, to enable building of a dataset that will be useful to other researchers; (2) Develop a framework for PCC, including enumeration of safety properties of interest and showing how to instantiate the framework with existing program verification, proof languages, and proof frameworks; (3) Implement new tools for verification condition generation from source code for popular programming languages and for specific safety properties; and (4) Evaluate the use of LLMs for generating proofs in this context, including developing new algorithms and proof sampling techniques to improve model effectiveness. The research will lead to new insights into the current capabilities of LLMs, to new relevant safety properties for code generation in a black-box setting, and to new techniques to generate verification conditions -- to bridge the gap in formal verification technology from special-purpose languages like Coq and Dafny to general-purpose programming languages in popular use.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

当今的程序员正在使用大型语言模型（LLMS）来自动生成代码建议和代码完成来加速软件开发。广泛使用的示例包括Github Copilot和Openai Chatgpt。但是，这些工具生成的代码可能具有用户未捕获的错误，这具有严重的安全风险。该项目将利用一个称为“证明携带代码”的想法，其中代码建议与其安全性的数学证明一起包装，使程序员可以确信该程序可以安全地部署。该项目将开发工具，技术和经验结果，用于使用LLMS生成可信赖的代码以及数学证明。项目成果，包括代码，数据集和课程材料，将在公开场合开发，并在线提供了从事LLMS的研究人员，基于LLM的代码生成的最终用户，以及早期行业和开放源代码的采用者。在1990年代，编程语言的研究人员社区中的研究人员认识到了一个有力的想法，即通过效率来证明了一个效率的证明，可以证明其效率 - 可以通过效率进行证明，从而可以通过证明自己的安全。 LLM可以被视为高资源计算，而LLM用户则被视为低资源实体。通过此镜头可以自然地映射到LLM生成的代码的安全问题。该项目的技术目的分为四个推力：（1）收集有关LLMS当前生成的代码的经验数据，并确定核心安全风险，以构建对其他研究人员有用的数据集；（2）为PCC开发一个框架，包括列举感兴趣的安全性能，并展示如何使用现有的程序验证，证明语言和证明框架实例化框架；（3）实施新工具以验证条件从流行编程语言和特定安全性属性的源代码生成的新工具；（4）评估在这种情况下使用LLM来生成证明的使用，包括开发新算法和证明采样技术以提高模型效率。这项研究将导致对LLM当前功能的新见解，在黑色盒子环境中为代码生成的新相关安全性，以及新的技术以生成验证条件 - 以弥合正式验证技术的差距，从特殊用途的语言中，来自COQ和DAFNY等普通使用语言（通过普通的编程语言）的特殊性语言进行了评估。基金会的智力优点和更广泛的影响评论标准。