CAREER: Controllable generation for instruction-following language models

职业：指令跟随语言模型的可控生成

基本信息

批准号：
2338866
负责人：
Tatsunori Hashimoto
金额：
$ 53.13万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-04-15 至 2029-03-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2338866&HistoricalAwards=false
关键词：
CAREER Controllable generation instruction following

项目摘要

Instruction-following language models like ChatGPT are beginning to see widespread development, and the ability to understand these systems and control them is critically important to make sure that they benefit society. Despite the success of these language models in generating fluent and convincing-looking outputs, there has been a growing body of work indicating that these systems can generate outputs that are undesirable to users, model creators, and even society at large. This gap between the ability to create models that imitate humans and the inability to have them fulfill specific desiderata (e.g. refuse to generate incorrect information) shows a major deficiency in the ability to precisely control these systems. This project aims to build principled, transparent, and precise methods for controlling language models.To achieve these goals, this project views controllable generation as a viable long-term path to creating instruction-following language models that precisely follow our design goals. Controllable generation offers several benefits. First, it defines a precise statistical modeling problem on which it is possible to build principled methods and rigorous evaluations. Second, it separates the control target from the task, improving transparency by allowing users to see exactly what is being optimized by the model designers. Third, it enables much more precise controls via inference-time methods such as rejection sampling, which strictly enforces the control as a constraint. While controllable generation has major long-term benefits for language models, there also remain significant open problems that must be resolved first, including the difficulty of performing discrete search, the need for specialized training, and the lack of realistic benchmarks of control tasks in the wild. We will address these challenges through a combination of new models (such as diffusion-based models), zero-shot and decoder-based control methods, and a broad benchmark of in-the-wild control behaviors.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

像 ChatGPT 这样的指令跟踪语言模型开始得到广泛的发展，理解这些系统并控制它们的能力对于确保它们造福社会至关重要。尽管这些语言模型在生成流畅且令人信服的输出方面取得了成功，但越来越多的工作表明这些系统可能会生成用户、模型创建者甚至整个社会不希望的输出。创建模仿人类的模型的能力与无法让它们满足特定需求（例如拒绝生成不正确的信息）之间的差距表明精确控制这些系统的能力存在重大缺陷。该项目旨在建立有原则的、透明的和精确的控制语言模型的方法。为了实现这些目标，该项目将可控生成视为创建精确遵循我们设计目标的指令跟踪语言模型的可行的长期路径。可控发电具有多种优势。首先，它定义了一个精确的统计建模问题，可以在此基础上构建原则性方法和严格的评估。其次，它将控制目标与任务分开，通过允许用户准确地看到模型设计者正在优化的内容来提高透明度。第三，它通过推理时间方法（例如拒绝采样）实现更精确的控制，这严格执行控制作为约束。虽然可控生成对语言模型具有重大的长期好处，但仍然存在必须首先解决的重大开放问题，包括执行离散搜索的困难、需要专门的培训以及缺乏现实的控制任务基准。荒野。我们将通过结合新模型（例如基于扩散的模型）、零样本和基于解码器的控制方法以及野外控制行为的广泛基准来应对这些挑战。该奖项反映了 NSF 的法定使命和通过使用基金会的智力价值和更广泛的影响审查标准进行评估，该项目被认为值得支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Tatsunori Hashimoto其他文献

Benchmarking and Improving Generator-Validator Consistency of Language Models

语言模型的基准测试和改进生成器-验证器一致性

DOI：
10.48550/arxiv.2310.01846
发表时间：
2023-10-03
期刊：
ArXiv
影响因子：
0
作者：
Xiang Lisa Li;Vaishnavi Shrivastava;Siyan Li;Tatsunori Hashimoto;Percy Liang
通讯作者：
Percy Liang

The Troubling Emergence of Hallucination in Large Language Models - An Extensive Definition, Quantification, and Prescriptive Remediations

大型语言模型中出现的令人不安的幻觉——广泛的定义、量化和规范性补救措施

DOI：
10.48550/arxiv.2310.04988
发表时间：
2023-10-08
期刊：
影响因子：
0
作者：
Vipula Rawte;Swagata Chakraborty;Agnibh Pathak;Anubhav Sarkar;S.M. Towhidul Islam Tonmoy;Islam Tonmoy;Aman Chadha;Amit P. Sheth;Amitava Das;Paris;A. Sridhar;Erik Visser;Improved;Jianlin Su;Yu Lu;Shengfeng Pan;Ahmed Murtadha;Bo Wen;Yunfeng Liu;Roformer;Rohan Taori;Ishaan Gulrajani;Tianyi Zhang;Yann Dubois;Xuechen Li;Carlos Guestrin;Percy Liang;Tatsunori Hashimoto;Stanford;Hugo Touvron;Thibaut Lavril;Gautier Izacard;Xavier Martinet;Marie;Timothée Lacroix;Baptiste Rozière;Naman Goyal;Eric Hambro;Faisal Azhar;Aurelien Rodriguez;Arm;Joulin;Thomas Wolf;Lys;re Debut;re;Victor Sanh;Julien Chaumond;Clement Delangue;Anthony Moi;Pierric Cistac;Tim Rault;Rémi Louf;Morgan Funtow;Joe Davison;Sam Shleifer;Patrick von Platen;Clara Ma;Yacine Jernite;J. Plu;Canwen Xu;Teven Le Scao;Sylvain Gugger;Mariama Drame;Quentin Lhoest;Susan Zhang;Stephen Roller;Mikel Artetxe;Moya Chen;Shuohui Chen;Christopher De;Mona T. Diab;Xi Xian Li;Todor Victoria Lin;Myle Ott;Kurt Shuster;Punit Daniel Simig;Singh Koura;Anjali Sridhar;Tianlu Wang;Luke Zettlemoyer. 2022;Daniel M. Ziegler;Nisan Stiennon;Jeffrey Wu;Tom B. Brown;Alec Radford;Dario Amodei;Paul F. Chris
通讯作者：
Paul F. Chris

On the Opportunities and Risks of Foundation Models

论基金会模型的机遇与风险

DOI：
10.3390/ijms20246214
发表时间：
2021-08-16
期刊：
ArXiv
影响因子：
0
作者：
Rishi Bommasani;Drew A. Hudson;E. Adeli;R. Altman;Simran Arora;Sydney von Arx;Michael S. Bernstein;J. Bohg;Antoine Bosselut;E. Brunskill;Erik Brynjolfsson;S. Buch;Dallas Card;Rodrigo Castellon;Niladri S. Chatterji;Annie S. Chen;Kathleen A. Creel;Jared Davis;Dora Demszky;Chris Donahue;Moussa Doumbouya;Esin Durmus;Stefano Ermon;J. Etchemendy;Kawin Ethayarajh;L. Fei;Chelsea Finn;Trevor Gale;Lauren Gillespie;Karan Goel;Noah D. Goodman;S. Grossman;Neel Guha;Tatsunori Hashimoto;Peter Henderson;John Hewitt;Daniel E. Ho;Jenny Hong;Kyle Hsu;Jing Huang;Thomas F. Icard;Saahil Jain;Dan Jurafsky;Pratyusha Kalluri;Siddharth Karamcheti;G. Keeling;Fereshte Khani;O. Khattab;Pang Wei Koh;M. Krass;Ranjay Krishna;Rohith Kuditipudi;Ananya Kumar;Faisal Ladhak;Mina Lee;Tony Lee;J. Leskovec;Isabelle Levent;Xiang Lisa Li;Xuechen Li;Tengyu Ma;Ali Malik;Christopher D. Manning;Suvir Mirch;ani;ani;E. Mitchell;Zanele Munyikwa;Suraj Nair;A. Narayan;D. Narayanan;Benjamin Newman;Allen Nie;Juan Carlos Niebles;H. Nilforoshan;J. Nyarko;Giray Ogut;Laurel J. Orr;Isabel Papadimitriou;J. Park;C. Piech;Eva Portelance;Christopher Potts;Aditi Raghunathan;Robert Reich;Hongyu Ren;Frieda Rong;Yusuf H. Roohani;Camilo Ruiz;Jack Ryan;Christopher R'e;Dorsa Sadigh;Shiori Sagawa;Keshav Santhanam;Andy Shih;K. Srinivasan;Alex Tamkin;Rohan Taori;A. Thomas;Florian Tramèr;Rose E. Wang;William Wang;Bohan Wu;Jiajun Wu;Yuhuai Wu;Sang Michael Xie;Michihiro Yasunaga;Jiaxuan You;M. Zaharia;Michael Zhang;Tianyi Zhang;Xikun Zhang;Yuhui Zhang;Lucia Zheng;Kaitlyn Zhou;Percy Liang
通讯作者：
Percy Liang

Identifying the Risks of LM Agents with an LM-Emulated Sandbox

使用 LM 模拟沙箱识别 LM 代理的风险

DOI：
10.48550/arxiv.2309.15817
发表时间：
2023-09-25
期刊：
ArXiv
影响因子：
0
作者：
Yangjun Ruan;Honghua Dong;Andrew Wang;Silviu Pitis;Yongchao Zhou;Jimmy Ba;Yann Dubois;Chris J. Maddison;Tatsunori Hashimoto
通讯作者：
Tatsunori Hashimoto

Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions

安全调整的 LLaMA：提高遵循指令的大型语言模型安全性的经验教训

DOI：
10.48550/arxiv.2309.07875
发表时间：
2023-09-14
期刊：
ArXiv
影响因子：
0
作者：
Federico Bianchi;Mirac Suzgun;Giuseppe Attanasio;Paul Röttger;Dan Jurafsky;Tatsunori Hashimoto;James Zou
通讯作者：
James Zou