CAREER: Dual Reinforcement Learning: A Unifying Framework with Guarantees

职业:双重强化学习:有保证的统一框架

基本信息

  • 批准号:
    2340651
  • 负责人:
  • 金额:
    $ 59.98万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-09-01 至 2029-08-31
  • 项目状态:
    未结题

项目摘要

Reinforcement learning (RL) holds the promise to automate and improve many real-world processes that require sequential decision-making to optimize some long-term objective, such as self-driving cars, industry automation, recommendation systems, and more recently in natural language processing. There has been much exciting progress in the field of deep reinforcement learning in the past few years, with RL agents demonstrating remarkable performance across a wide range of problem domains. However, to achieve this progress, it is necessary to have access to a fast simulator and tens or hundreds of millions of data points that are collected, trained on, then thrown away. Off-policy methods are an alternative approach, which provide much more data efficiency because they are not restricted to only training on on-policy data and can even be used to train on existing offline data. This suggests that to truly unlock the potential of reinforcement learning, we must develop principled off-policy algorithms. This project is focused on advancing RL by looking at a framework that aims to provide a unified, principled objective that applies to both standard and off-line RL settings and will allow us to efficiently solve large-scale, real-world, sequential decision-making problems.In this project, the PI will examine the dual formulation of this objective, which gives rise to a principled off-policy objective that sidesteps issues present in the more commonly used primal formulation. This objective will lead to algorithms particularly suitable for large state-action spaces, long horizons, and sparse rewards encountered in real-world problems. The PI will explore connections between existing and new imitation learning and reinforcement-learning methods and the proposed framework. The PI will show that both imitation learning and reinforcement learning methods are unified under this objective and present theoretical guarantees for this class of methods. Finally, the PI will extend the dual framework to leverage pre-training and fine tuning for improved sample efficiency. This includes exploring methods for incorporating out-of-domain datasets and multiple modalities in self-supervised pre-training, especially relevant for applications in household robotics.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习(RL)有望自动化和改善许多现实世界的过程,这些过程需要顺序决策以优化一些长期目标,例如自动驾驶汽车,行业自动化,推荐系统以及最近的自然语言处理。在过去的几年中,深入的增强学习领域取得了令人兴奋的进步,RL代理在广泛的问题领域表现出了出色的表现。但是,为了实现这一进展,有必要访问快速模拟器,以及收集,训练,然后扔掉的数千万或数亿个数据点。政体方法是一种替代方法,它提供了更高的数据效率,因为它们不仅限于对政策数据的培训,甚至可以用于培训现有的离线数据。这表明,要真正释放强化学习的潜力,我们必须开发有原则的非政策算法。该项目的重点是通过查看一个旨在提供统一的,原则性目标的框架来推进RL,该框架适用于标准和离线RL设置,并将使我们能够有效地解决大型,真实的,顺序的决策问题。在此项目中,PI将对这个目标的双重形式提出了一个普通的构造,以实现较大的目标,以确定较大的目标。该目标将导致算法特别适用于在现实世界中遇到的大型州行动空间,较长的视野和稀疏的奖励。 PI将探索现有和新的模仿学习与加强学习方法与拟议框架之间的联系。 PI将表明,模仿学习和强化学习方法都是在此目标下统一的,并为这类方法提供了理论保证。最后,PI将扩展双重框架以利用预训练和微调以提高样品效率。这包括探索用于合并域外数据集的方法以及在自我监管的预培训中的多种模式,尤其与家庭机器人技术中的应用有关。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛的审查标准通过评估来通过评估来支持的。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Amy Zhang其他文献

Preliminary mapping of HopZ1b resistance-associated loci in Arabidopsis thaliana via EMS and ecotype screens
  • DOI:
  • 发表时间:
    2016-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Amy Zhang
  • 通讯作者:
    Amy Zhang
A review of principles in design and usability testing of tactile technology for individuals with visual impairments
视觉障碍人士触觉技术的设计和可用性测试原则回顾
  • DOI:
    10.1080/10400435.2016.1176083
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Emily L. Horton;R. Renganathan;Bryan N. Toth;Alexa J. Cohen;Andrea V. Bajcsy;A. Bateman;Mathew C. Jennings;Anish Khattar;Ryan S. Kuo;Felix A. Lee;Meilin K. Lim;Laura W. Migasiuk;Amy Zhang;Oliver K. Zhao;Márcio A. Oliveira
  • 通讯作者:
    Márcio A. Oliveira
Visual outcomes of combined cataract and minimally invasive glaucoma surgery.
白内障和微创青光眼联合手术的视力结果。
  • DOI:
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    2.8
  • 作者:
    S. Sarkisian;N. Radcliffe;P. Harasymowycz;S. Vold;Thomas D. Patrianakos;Amy Zhang;L. Herndon;J. Brubaker;M. Moster;Brian A. Francis
  • 通讯作者:
    Brian A. Francis
A Deep Learning Approach to Population Based COVID-19 Case Prediction in the US
美国基于人群的 COVID-19 病例预测的深度学习方法
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sameer Sundrani;Amy Zhang
  • 通讯作者:
    Amy Zhang
Learning Action-based Representations Using Invariance
使用不变性学习基于动作的表示
  • DOI:
    10.48550/arxiv.2403.16369
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Max Rudolph;Caleb Chuck;Kevin Black;Misha Lvovsky;S. Niekum;Amy Zhang
  • 通讯作者:
    Amy Zhang

Amy Zhang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Amy Zhang', 18)}}的其他基金

CAREER: Tools for User and Community-Led Social Media Curation
职业:用户和社区主导的社交媒体管理工具
  • 批准号:
    2236618
  • 财政年份:
    2023
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Continuing Grant
Collaborative Research: DASS: Transitioning open-source software projects to accountable community governance
合作研究:DASS:将开源软件项目转变为负责任的社区治理
  • 批准号:
    2217653
  • 财政年份:
    2022
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Standard Grant
Collaborative Research: SaTC: CORE: Large: Privacy-Preserving Abuse Prevention for Encrypted Communications Platforms
协作研究:SaTC:核心:大型:加密通信平台的隐私保护滥用预防
  • 批准号:
    2120497
  • 财政年份:
    2021
  • 资助金额:
    $ 59.98万
  • 项目类别:
    Continuing Grant

相似国自然基金

乙酰缬草三酯通过靶向PCBP1/2和GPX4双重作用诱导结直肠癌细胞铁死亡的分子机制研究
  • 批准号:
    82374086
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
双重CRISPR dCas9干扰/激活技术靶向再激活X连锁抑癌基因在肝细胞癌性别差异及性别特异性靶向治疗中的作用
  • 批准号:
    82303116
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
超声纳米马达靶向降低肝窦双重屏障治疗NASH纤维化的实验研究
  • 批准号:
    82302220
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于GSK-3β/AChE双重抑制的抗AD杂交分子的设计、合成及作用机制研究
  • 批准号:
    22367005
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
CTLA4/VISTA双重靶向免疫耐受修饰的新生猪胰岛类器官异种移植研究
  • 批准号:
    32370985
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Improving Treatment Engagement in Individuals with Co-occurring Substance Use and Psychosis: A Telemedicine Family-Based Approach
提高同时存在药物滥用和精神病患者的治疗参与度:基于家庭的远程医疗方法
  • 批准号:
    10668969
  • 财政年份:
    2020
  • 资助金额:
    $ 59.98万
  • 项目类别:
Improving Treatment Engagement in Individuals with Co-occurring Substance Use and Psychosis: A Telemedicine Family-Based Approach
提高同时存在药物滥用和精神病患者的治疗参与度:基于家庭的远程医疗方法
  • 批准号:
    10421286
  • 财政年份:
    2020
  • 资助金额:
    $ 59.98万
  • 项目类别:
Dual-Process Models of Alcohol Use in Late Adolescence
青春期后期饮酒的双过程模型
  • 批准号:
    10158376
  • 财政年份:
    2017
  • 资助金额:
    $ 59.98万
  • 项目类别:
Dual-Process Models of Alcohol Use in Late Adolescence
青春期后期饮酒的双过程模型
  • 批准号:
    10227450
  • 财政年份:
    2017
  • 资助金额:
    $ 59.98万
  • 项目类别:
Behavioral economic mechanisms of prescription opioid addiction in chronic pain
慢性疼痛处方阿片类药物成瘾的行为经济机制
  • 批准号:
    9902387
  • 财政年份:
    2016
  • 资助金额:
    $ 59.98万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了