RI: Small: Towards Optimal and Adaptive Reinforcement Learning with Offline Data and Limited Adaptivity

RI:小型:利用离线数据和有限的适应性实现最优和自适应强化学习

基本信息

  • 批准号:
    2007117
  • 负责人:
  • 金额:
    $ 45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2024-09-30
  • 项目状态:
    已结题

项目摘要

Reinforcement learning (RL) is one of the fastest-growing research areas in machine learning. RL-based techniques have led to several recent breakthroughs in artificial intelligence, such as beating human champions in the game of Go. The application of RL to real life problems, however, remains limited, even in areas where a large amount of data has already been collected. The crux of the problem is that most existing RL methods require an environment for the agent to interact with, but in real-life applications, it is rarely possible to have access to such an environment — deploying an algorithm that learns by trial-and-errors may have serious legal, ethical and safety issues. This project aims to address this conundrum by developing algorithms that learn from offline data. The outcome of the research could significantly reduce the overhead of using RL techniques in real-life sequential decision-making problems such as those in power transmission, personalized medicine, scientific discoveries, computer networking and public policy.The project focuses on two settings that aim at addressing the aforementioned challenge of limited access to an environment. In the first setting, the agent is given only the historical data from logged interactions with the environment. In the second setting, the agent is able to change how it interacts with the environment only a few times. The investigators will develop mathematical theory that describes the difficulty of the problem and ensures that the developed algorithms are robust and optimal in the sense that they use the least possible resources (data, energy, computation). Using techniques such as marginalized importance sampling, uniform convergence and batched exploration, the project will generalize the recent line of work in ``breaking the curse of horizon'' to allow function approximations and establish the much-needed statistical learning theory for offline and low-adaptive reinforcement learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
增强学习(RL)是机器学习中增长最快的研究领域之一。基于RL的技术导致了最近在人工智能方面取得的一些突破,例如在GO游戏中击败人类冠军。但是,即使在已经收集了大量数据的领域,RL在现实生活中的应用也仍然有限。问题的症结在于,大多数现有的RL方法都需要一个环境供代理商进行互动,但是在现实生活中,很少有可能访问这种环境 - 部署通过反复试验来学习的算法可能会有严重的法律,道德和安全问题。该项目旨在通过开发从离线数据学习的算法来解决这个难题。该研究的结果可能会大大减少在现实生活中使用RL技术的开销,例如在权力传播,个性化医学,科学发现,计算机网络和公共政策中使用RL技术。该项目侧重于解决有限访问环境的PRIORE的两个环境。在第一个设置中,仅赋予代理与与环境相互作用的历史数据。在第二个设置中,代理只能更改其与环境相互作用的方式几次。研究人员将使用诸如边缘化重要性取样,统一的融合和批处理探索等技术来发展数学理论,该项目将概括“打破地平线的曲线”中的最新工作,以允许函数近似值,以允许运行近似值,并建立了非常需要的统计学学习理论,以表现出良好的统计学和稳定的启发。使用基金会的智力优点和更广泛的影响评估标准进行评估。

项目成果

期刊论文数量(16)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
No-Regret Linear Bandits beyond Realizability
  • DOI:
    10.48550/arxiv.2302.13252
  • 发表时间:
    2023-02
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chong Liu;Ming Yin;Yu-Xiang Wang
  • 通讯作者:
    Chong Liu;Ming Yin;Yu-Xiang Wang
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators
  • DOI:
    10.48550/arxiv.2211.15956
  • 发表时间:
    2022-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jiachen Li;Edwin Zhang;Ming Yin;Qinxun Bai;Yu-Xiang Wang;William Yang Wang
  • 通讯作者:
    Jiachen Li;Edwin Zhang;Ming Yin;Qinxun Bai;Yu-Xiang Wang;William Yang Wang
Near-Optimal Deployment Efficiency in Reward-Free Reinforcement Learning with Linear Function Approximation
  • DOI:
    10.48550/arxiv.2210.00701
  • 发表时间:
    2022-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dan Qiao;Yu-Xiang Wang
  • 通讯作者:
    Dan Qiao;Yu-Xiang Wang
Optimal Dynamic Regret in Proper Online Learning with Strongly Convex Losses and Beyond
  • DOI:
  • 发表时间:
    2022-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dheeraj Baby;Yu-Xiang Wang
  • 通讯作者:
    Dheeraj Baby;Yu-Xiang Wang
Near-Optimal Differentially Private Reinforcement Learning
  • DOI:
    10.48550/arxiv.2212.04680
  • 发表时间:
    2022-12
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Dan Qiao;Yu-Xiang Wang
  • 通讯作者:
    Dan Qiao;Yu-Xiang Wang
共 14 条
  • 1
  • 2
  • 3
前往

Yu-Xiang Wang其他文献

Celastrol induces lipophagy via LXRα/ABCA1 pathway in clear cell renal cell carcinoma
雷公藤红素通过 LXRα/ABCA1 通路在透明细胞肾细胞癌中诱导脂肪自噬
  • DOI:
  • 发表时间:
    2020
    2020
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Chanjuan Zhang;Neng Zhu;Jia Long;Hong-Tao Wu;Yu-Xiang Wang;Bi-Yuan Liu;Duan-Fang Liao;Li Qin
    Chanjuan Zhang;Neng Zhu;Jia Long;Hong-Tao Wu;Yu-Xiang Wang;Bi-Yuan Liu;Duan-Fang Liao;Li Qin
  • 通讯作者:
    Li Qin
    Li Qin
New Paradigms and Optimality Guarantees in Statistical Learning and Estimation
  • DOI:
    10.1184/r1/6720836.v1
    10.1184/r1/6720836.v1
  • 发表时间:
    2017
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yu-Xiang Wang
    Yu-Xiang Wang
  • 通讯作者:
    Yu-Xiang Wang
    Yu-Xiang Wang
Genetic improvement of tilapias in China: Genetic parameters and selection responses in fillet traits of Nile tilapia (<em>Oreochromis niloticus</em>) after six generations of multi-trait selection for growth and fillet yield
  • DOI:
    10.1016/j.aquaculture.2012.08.028
    10.1016/j.aquaculture.2012.08.028
  • 发表时间:
    2012-11-05
    2012-11-05
  • 期刊:
  • 影响因子:
  • 作者:
    Jørn Thodesen;Morten Rye;Yu-Xiang Wang;Hans B. Bentsen;Trygve Gjedrem
    Jørn Thodesen;Morten Rye;Yu-Xiang Wang;Hans B. Bentsen;Trygve Gjedrem
  • 通讯作者:
    Trygve Gjedrem
    Trygve Gjedrem
Enhancing the Fe3+ Sensing Sensitivity by Energy Transfer and Phase Transformation in a Bimetallic Lanthanide Metal-Organic Framework
通过双金属镧系金属有机框架中的能量转移和相变提高 Fe3 传感灵敏度
  • DOI:
    10.1002/slct.201802147
    10.1002/slct.201802147
  • 发表时间:
    2018
    2018
  • 期刊:
  • 影响因子:
    2.1
  • 作者:
    Xue-Zhi Song;Yu-Xiang Wang;Jia-Wei Yan;Xi Chen;Yu-Lan Meng;Zhenquan Tan
    Xue-Zhi Song;Yu-Xiang Wang;Jia-Wei Yan;Xi Chen;Yu-Lan Meng;Zhenquan Tan
  • 通讯作者:
    Zhenquan Tan
    Zhenquan Tan
The Expression Characterization of Chicken Uncoulping Protein Gene
鸡解偶联蛋白基因的表达特征
共 9 条
  • 1
  • 2
前往

Yu-Xiang Wang的其他基金

CAREER: Exact Optimal and Data-Adaptive Algorithms and Tools for Differential Privacy
职业:用于差异隐私的精确最优和数据自适应算法和工具
  • 批准号:
    2048091
    2048091
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Continuing Grant
    Continuing Grant
Collaborative Research: SCALE MoDL: Adaptivity of Deep Neural Networks
合作研究:SCALE MoDL:深度神经网络的适应性
  • 批准号:
    2134214
    2134214
  • 财政年份:
    2021
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant

相似国自然基金

TIM-4调控小胶质细胞向吞噬型转化促进蛛网膜下腔出血后血液清除的作用及机制
  • 批准号:
    82301485
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
EGFR突变的肺腺癌向小细胞肺癌转变的分子机制及干预策略
  • 批准号:
    82341002
  • 批准年份:
    2023
  • 资助金额:
    200 万元
  • 项目类别:
    专项基金项目
巨噬细胞A20调控小管上皮细胞胞葬在AKI向CKD转变中的作用机制探讨
  • 批准号:
    82270728
  • 批准年份:
    2022
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
小胶质细胞外泌体调控卒中后星形胶质细胞亚型向神经干细胞转化的机制研究
  • 批准号:
    82271320
  • 批准年份:
    2022
  • 资助金额:
    52 万元
  • 项目类别:
    面上项目
小尺度场向电流时空分布特征及与沉降粒子关系的研究
  • 批准号:
    42174191
  • 批准年份:
    2021
  • 资助金额:
    59.00 万元
  • 项目类别:
    面上项目

相似海外基金

RI: Small: Towards Abstractive Summarization That Preserves the Original Meaning
RI:小:走向保留原意的抽象概括
  • 批准号:
    2303678
    2303678
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant
RI: Small: Towards Provably Efficient Representation Learning in Reinforcement Learning via Rich Function Approximation
RI:小:通过丰富函数逼近实现强化学习中可证明有效的表示学习
  • 批准号:
    2154711
    2154711
  • 财政年份:
    2022
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant
RI: Small: Towards Abstractive Summarization That Preserves the Original Meaning
RI:小:走向保留原意的抽象概括
  • 批准号:
    1909603
    1909603
  • 财政年份:
    2019
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant
RI: Small: Learning Dynamics and Evolution towards Cognitive Understanding of Videos
RI:小:视频认知理解的学习动态和演化
  • 批准号:
    1813709
    1813709
  • 财政年份:
    2018
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant
RI: Small: Towards a Formal Theory of Blameworthiness, Intention, and Moral Responsibility
RI:小:走向应受谴责、意图和道德责任的正式理论
  • 批准号:
    1718108
    1718108
  • 财政年份:
    2017
  • 资助金额:
    $ 45万
    $ 45万
  • 项目类别:
    Standard Grant
    Standard Grant