CAREER: Towards Real-world Reinforcement Learning

职业:走向现实世界的强化学习

基本信息

  • 批准号:
    2339395
  • 负责人:
  • 金额:
    $ 60万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-03-01 至 2029-02-28
  • 项目状态:
    未结题

项目摘要

Reinforcement learning (RL) is one of the most important paradigms for modeling data-driven decision-making. Recent years have witnessed several empirical successes of RL, such as RL agents that outperform humans in video and board games. However, many empirical RL algorithms today often require many training examples to learn and can produce unreliable solutions (solutions that exhibit catastrophic failures, for example). While these issues are typically not problematic when training RL agents in simulators, they pose significant difficulties when deploying RL to real-world problems where data (including human feedback) is expensive, and reliability is essential. The main novelty of this project will be the development of new RL algorithms that can learn efficiently (from as few training data points as possible) and reliably (avoid catastrophic failures with high probability). The development of such RL algorithms can expand the applications of RL systems from simulation to real-world applications where data is expensive to collect and safety is critical. In autonomous driving, the developed technologies can make self-driving cars adapt to new road conditions safely by making fewer mistakes. In generative Artificial Intelligence (AI), efficient and reliable RL algorithms that can learn from rich human feedback will enable better human-AI alignment, making AI systems improve reliably and safely under human guidance.The main research goal of this project is to enable real-world RL by advancing RL techniques, theoretically and empirically. The critical innovation in the project is to develop safe and efficient RL algorithms by leveraging specific problem structures and rich human feedback. The project has three main thrusts. First, the project will establish risk-averse RL algorithms that are provably correct and scalable to high dimensional data. Second, the project will develop RL algorithms that can leverage common problem-specific structures for improved sample efficiency. Third, the project will create new algorithms for RL with rich feedback beyond scalar rewards (including preference-based feedback and positive demonstrations). In addition to the proposed work on algorithmic advancements, the project will focus on their deployment to real-world problems, including database query optimization and optimizing generative models such as Large Language Models and Diffusion Models.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
强化学习(RL)是建模数据驱动决策的最重要范例之一。近年来,RL取得了几项经验成功,例如在视频和棋盘游戏中超过人类的RL代理。但是,如今许多经验RL算法通常需要许多培训示例来学习并可以产生不可靠的解决方案(例如,显示出灾难性故障的解决方案)。尽管在模拟器中训练RL代理时,这些问题通常不会有问题,但在将RL部署到现实世界中的问题时,它们在数据(包括人反馈)很昂贵且可靠性至关重要的情况下遇到了很大的困难。该项目的主要新颖性将是开发新的RL算法,这些算法可以有效地学习(从尽可能少的训练数据点)学习并可靠地学习(避免使用高概率的灾难性失败)。此类RL算法的开发可以扩展RL系统的应用程序从模拟到现实世界应用程序,在这些应用程序中,数据昂贵,安全至关重要。在自动驾驶中,开发的技术可以使自动驾驶汽车通过犯下更少的错误来安全地适应新的道路条件。在生成人工智能(AI)中,可以从丰富的人类反馈中学习的有效且可靠的RL算法将使人类AI系统的一致性更好,从而使AI系统在人类的指导下可靠,安全地改善。该项目的主要研究目标是通过在理论上和经验上促进RL技术来启用现实世界中的RL。该项目的关键创新是通过利用特定的问题结构和丰富的人类反馈来开发安全有效的RL算法。该项目有三个主要推力。首先,该项目将建立规避风险的RL算法,这些算法可证明是正确且可扩展到高维数据的。其次,该项目将开发RL算法,这些算法可以利用特定于问题的结构来提高样品效率。第三,该项目将为RL创建新的算法,并具有超越标量奖励的丰富反馈(包括基于偏好的反馈和积极的演示)。除了提出有关算法进步的拟议工作外,该项目还将集中在其部署到现实世界中的问题上,包括数据库查询优化和优化生成模型,例如大语言模型和扩散模型。该奖项反映了NSF的法定任务,并被认为是通过基金会的知识优点和广泛的cribitia进行评估,以评估值得通过评估来进行评估。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Wen Sun其他文献

Composite of nonexpansion reduced graphite oxide and carbon derived from pitch as anodes of Na-ion batteries with high coulombic efficiency
非膨胀还原石墨氧化物与沥青碳复合材料作为高库伦效率钠离子电池负极
  • DOI:
    10.1016/j.cej.2016.10.074
  • 发表时间:
    2017-02
  • 期刊:
  • 影响因子:
    15.1
  • 作者:
    Wen Sun;Xiaodong Hong;Ming Wang;Yongqiang Mao
  • 通讯作者:
    Yongqiang Mao
Research on TVD Control of Cornering Energy Consumption for Distributed Drive Electric Vehicles Based on PMP
基于PMP的分布式驱动电动汽车转弯能耗TVD控制研究
  • DOI:
    10.3390/en15072641
  • 发表时间:
    2022-04
  • 期刊:
  • 影响因子:
    3.2
  • 作者:
    Wen Sun;Yang Chen;Junnian Wang;Xiangyu Wang;Lili Liu
  • 通讯作者:
    Lili Liu
In Utero Exposure to Fine Particles Decreases Early Birth Weight of Rat Offspring and TLR4/NF-κB Expression in Lungs
子宫内暴露于细颗粒会降低大鼠后代的早期出生体重和肺部 TLR4/NF-κB 表达
  • DOI:
    10.1021/acs.chemrestox.0c00056
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wenting Tang;Zhongjun Li;Yaoguang Huang;Lili Du;Chuangyu Wen;Wen Sun;Zhiqiang Yu;Suran Huang;Dunjin Chen
  • 通讯作者:
    Dunjin Chen
Online No-regret Model-Based Meta RL for Personalized Navigation
基于在线无悔模型的元强化学习,用于个性化导航
  • DOI:
    10.48550/arxiv.2204.01925
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuda Song;Ye Yuan;Wen Sun;Kris M. Kitani
  • 通讯作者:
    Kris M. Kitani
Dipole-induced modulation of effective work function of metal gate in junctionless FETs
无结 FET 中金属栅极有效功函数的偶极子感应调制
  • DOI:
    10.1063/1.5143771
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Xinhe Wang;Zhigang Zhang;Jianshi Tang;B. Gao;Wen Sun;Feng Xu;Huaqiang Wu;He Qian
  • 通讯作者:
    He Qian

Wen Sun的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Wen Sun', 18)}}的其他基金

RI: Small: Towards Provably Efficient Representation Learning in Reinforcement Learning via Rich Function Approximation
RI:小:通过丰富函数逼近实现强化学习中可证明有效的表示学习
  • 批准号:
    2154711
  • 财政年份:
    2022
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant

相似国自然基金

SHP2调控Treg向Th2-like Treg的可塑性转化在变应性鼻炎中的作用与机制研究
  • 批准号:
    82301281
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
EAST高极向比压运行模式下芯部与边界兼容机制的数值模拟研究
  • 批准号:
    12375228
  • 批准年份:
    2023
  • 资助金额:
    53 万元
  • 项目类别:
    面上项目
CXCR5依赖的边缘区B细胞向滤泡树突状细胞呈递外泌体引发心脏移植排斥的研究
  • 批准号:
    82300460
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
Dlx2通过调控Tspan13影响上颌突间充质干细胞骨向分化的机制研究
  • 批准号:
    82301008
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CAREER: Towards Safety-Critical Real-Time Systems with Learning Components
职业:迈向具有学习组件的安全关键实时系统
  • 批准号:
    2340171
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
Towards a real implementation of quantum network systems
迈向量子网络系统的真正实现
  • 批准号:
    24K07485
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
CAREER: Towards Fairness in the Real World under Generalization, Privacy and Robustness Challenges
职业:在泛化、隐私和稳健性挑战下实现现实世界的公平
  • 批准号:
    2339198
  • 财政年份:
    2024
  • 资助金额:
    $ 60万
  • 项目类别:
    Continuing Grant
CRII: SCH: Towards Smart Patient Flow Management: Real-time Inpatient Length of Stay Modeling and Prediction
CRII:SCH:迈向智能患者流程管理:实时住院患者住院时间建模和预测
  • 批准号:
    2246158
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
Towards Real-Time Fine-Grained Tracking in Distributed Large-Scale RF Tag Systems
实现分布式大规模射频标签系统中的实时细粒度跟踪
  • 批准号:
    2225337
  • 财政年份:
    2023
  • 资助金额:
    $ 60万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了