CAREER: Dual Reinforcement Learning: A Unifying Framework with Guarantees

职业:双重强化学习:有保证的统一框架

基本信息

  • 批准号:
    2340651
  • 负责人:
  • 金额:
    $ 59.98万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-09-01 至 2029-08-31
  • 项目状态:
    未结题

项目摘要

Reinforcement learning (RL) holds the promise to automate and improve many real-world processes that require sequential decision-making to optimize some long-term objective, such as self-driving cars, industry automation, recommendation systems, and more recently in natural language processing. There has been much exciting progress in the field of deep reinforcement learning in the past few years, with RL agents demonstrating remarkable performance across a wide range of problem domains. However, to achieve this progress, it is necessary to have access to a fast simulator and tens or hundreds of millions of data points that are collected, trained on, then thrown away. Off-policy methods are an alternative approach, which provide much more data efficiency because they are not restricted to only training on on-policy data and can even be used to train on existing offline data. This suggests that to truly unlock the potential of reinforcement learning, we must develop principled off-policy algorithms. This project is focused on advancing RL by looking at a framework that aims to provide a unified, principled objective that applies to both standard and off-line RL settings and will allow us to efficiently solve large-scale, real-world, sequential decision-making problems.In this project, the PI will examine the dual formulation of this objective, which gives rise to a principled off-policy objective that sidesteps issues present in the more commonly used primal formulation. This objective will lead to algorithms particularly suitable for large state-action spaces, long horizons, and sparse rewards encountered in real-world problems. The PI will explore connections between existing and new imitation learning and reinforcement-learning methods and the proposed framework. The PI will show that both imitation learning and reinforcement learning methods are unified under this objective and present theoretical guarantees for this class of methods. Finally, the PI will extend the dual framework to leverage pre-training and fine tuning for improved sample efficiency. This includes exploring methods for incorporating out-of-domain datasets and multiple modalities in self-supervised pre-training, especially relevant for applications in household robotics.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习 (RL) 有望实现自动化和改进许多现实世界的流程,这些流程需要顺序决策来优化某些长期目标,例如自动驾驶汽车、行业自动化、推荐系统以及最近的自然语言加工。过去几年,深度强化学习领域取得了许多令人兴奋的进展,强化学习代理在广泛的问题领域中表现出了卓越的性能。然而,要实现这一进步,需要使用快速模拟器以及收集、训练然后丢弃的数千万或数亿个数据点。离策略方法是一种替代方法,它提供了更高的数据效率,因为它们不仅限于对策略数据进行训练,甚至可以用于对现有离线数据进行训练。这表明,要真正释放强化学习的潜力,我们必须开发有原则的离策略算法。该项目的重点是通过研究一个框架来推进强化学习,该框架旨在提供适用于标准和离线强化学习设置的统一、有原则的目标,并使我们能够有效地解决大规模、现实世界的顺序决策问题。在这个项目中,PI 将检查该目标的双重表述,这会产生一个有原则的非政策目标,回避更常用的原始表述中存在的问题。这一目标将导致算法特别适合于现实世界问题中遇到的大状态动作空间、长视野和稀疏奖励。 PI 将探索现有和新的模仿学习和强化学习方法与拟议框架之间的联系。 PI将表明模仿学习和强化学习方法在这一目标下是统一的,并为此类方法提供理论保证。最后,PI 将扩展双重框架,以利用预训练和微调来提高样本效率。这包括探索在自监督预训练中纳入域外数据集和多种模式的方法,特别是与家用机器人应用相关的方法。该奖项反映了 NSF 的法定使命,并通过使用基金会的知识进行评估,被认为值得支持。优点和更广泛的影响审查标准。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amy Zhang其他文献

Circularity and Enclosures: Metabolizing Waste with the Black Soldier Fly
循环性和外壳:黑水虻代谢废物
  • DOI:
    10.14506/ca35.1.08
  • 发表时间:
    2020-02-01
  • 期刊:
  • 影响因子:
    1.9
  • 作者:
    Amy Zhang
  • 通讯作者:
    Amy Zhang
Mechanical Trap Surface-Enhanced Raman Spectroscopy for Three-Dimensional Surface Molecular Imaging of Single Live Cells.
用于单个活细胞三维表面分子成像的机械陷阱表面增强拉曼光谱。
  • DOI:
    10.1002/anie.201700695
  • 发表时间:
    2017-03-27
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Q. Jin;Ming Li;Beril Polat;S. Paidi;A. Dai;Amy Zhang;Jayson V. Pagaduan;I. Barman;D. Gracias
  • 通讯作者:
    D. Gracias
Causal Transformers: Improving the Robustness on Spurious Correlations
因果变换器:提高虚假相关性的鲁棒性
  • DOI:
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    0
  • 作者:
    David Krueger;Ethan Caballero;Joern;Amy Zhang;Jonathan Binas;Dinghuai Zhang;Yinhan Liu;Myle Ott;Naman Goyal;Jingfei Du;M;ar Joshi;ar;Danqi Chen;Omer Levy;Mike Lewis;Ze Liu;Yutong Lin;Yue Cao;Han Hu;Yixuan Wei;Jiasen Lu;Dhruv Batra;Devi Parikh;Tom McCoy;Ellie Pavlick;Tal Linzen;Nikita Nangia;Adina Williams;A. Lazaridou;Ankur P. Parikh;Oscar Täckström;Dipanjan Das;Jeffrey Pennington;R. Socher;Lihua Qian;Hao Zhou;Yu Bao;Mingxuan Wang;Lin;Alec Radford;Jeffrey Wu;R. Child;D. Luan;Chitwan Saharia;William Chan;Saurabh Saxena;Rico Sennrich;B. Haddow;Ale;ra Birch;ra
  • 通讯作者:
    ra
Confidence-aware 3D Gaze Estimation and Evaluation Metric
置信感知 3D 凝视估计和评估指标
  • DOI:
    10.48550/arxiv.2303.10062
  • 发表时间:
    2023-03-17
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Qiaojie Zheng;Jiucai Zhang;Amy Zhang;Xiaoli Zhang
  • 通讯作者:
    Xiaoli Zhang
Comparing binary & ordinal definitions of urinary & stool continence outcomes: Data from the National Spina Bifida Patient Registry.
比较二进制
  • DOI:
    10.1016/j.jpurol.2024.01.029
  • 发表时间:
    2024-02-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    M. Kelly;Tiebin Liu;J. Routh;H. Castillo;Stacy T. Tanaka;Kathryn A. Smith;L. Krach;Amy Zhang;Eileen Sherburne;Jonathan Castillo;Joseph David;John S. Wiener
  • 通讯作者:
    John S. Wiener

Amy Zhang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Amy Zhang', 18)}}的其他基金

CAREER: Tools for User and Community-Led Social Media Curation
职业:用户和社区主导的社交媒体管理工具
  • 批准号:
    2236618
  • 财政年份:
    2023
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Continuing Grant
Collaborative Research: DASS: Transitioning open-source software projects to accountable community governance
合作研究:DASS:将开源软件项目转变为负责任的社区治理
  • 批准号:
    2217653
  • 财政年份:
    2022
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Large: Privacy-Preserving Abuse Prevention for Encrypted Communications Platforms
协作研究:SaTC:核心:大型:加密通信平台的隐私保护滥用预防
  • 批准号:
    2120497
  • 财政年份:
    2021
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Continuing Grant

相似国自然基金

基于铜基载氧体双重调控的化学链氧解耦燃烧协同脱氯机理研究
  • 批准号:
    52306132
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
可视化微球示踪动态评价VEGF-C/PGF2α双重调控APP/PS1鼠颈部淋巴管功能的作用及机制研究
  • 批准号:
    82370506
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
缓波型多相输流柔性立管的双重流固耦合响应机理研究
  • 批准号:
    52301338
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
人机协同服务场景中的消费者感知与行为研究:基于物化和拟人化的双重视角
  • 批准号:
    72372166
  • 批准年份:
    2023
  • 资助金额:
    42 万元
  • 项目类别:
    面上项目
SLAMF8抑制STAT1信号双重编辑TAMs和CD8+T细胞功能促进结直肠癌免疫逃逸的机制研究
  • 批准号:
    82303970
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Improving Treatment Engagement in Individuals with Co-occurring Substance Use and Psychosis: A Telemedicine Family-Based Approach
提高同时存在药物滥用和精神病患者的治疗参与度:基于家庭的远程医疗方法
  • 批准号:
    10668969
  • 财政年份:
    2020
  • 资助金额:
    $ 59.98万
  • 项目类别:
Improving Treatment Engagement in Individuals with Co-occurring Substance Use and Psychosis: A Telemedicine Family-Based Approach
提高同时存在药物滥用和精神病患者的治疗参与度:基于家庭的远程医疗方法
  • 批准号:
    10421286
  • 财政年份:
    2020
  • 资助金额:
    $ 59.98万
  • 项目类别:
Dual-Process Models of Alcohol Use in Late Adolescence
青春期后期饮酒的双过程模型
  • 批准号:
    10227450
  • 财政年份:
    2017
  • 资助金额:
    $ 59.98万
  • 项目类别:
Dual-Process Models of Alcohol Use in Late Adolescence
青春期后期饮酒的双过程模型
  • 批准号:
    10158376
  • 财政年份:
    2017
  • 资助金额:
    $ 59.98万
  • 项目类别:
Behavioral economic mechanisms of prescription opioid addiction in chronic pain
慢性疼痛处方阿片类药物成瘾的行为经济机制
  • 批准号:
    9902387
  • 财政年份:
    2016
  • 资助金额:
    $ 59.98万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了