CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits

职业:模型不确定性下的鲁棒强化学习:算法和基本限制

基本信息

  • 批准号:
    2337375
  • 负责人:
  • 金额:
    $ 52万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2024
  • 资助国家:
    美国
  • 起止时间:
    2024-09-01 至 2029-08-31
  • 项目状态:
    未结题

项目摘要

Existing reinforcement learning (RL) approaches usually assume that a learned policy will be deployed in the same environment as the one it was trained in. Such an assumption is often violated in practice, due to e.g., adversarial perturbations, modeling error between simulator and real-world applications, non-stationary environment, and limited amount of training data. The discrepancy between the training and test environments gives rise to a model mismatch, which lead to a notable decline in performance and restrict the suitability of RL in crucial domains, e.g., healthcare, critical infrastructure, transportation systems, and smart cities. To address the above challenge, there have been noteworthy efforts to develop distributionally robust RL approaches. This CAREER project aims to advance the fundamental algorithmic and theoretic limits of distributionally robust RL. The research outcome of this project holds the promise to push the algorithmic and theoretical boundaries of robust RL, and will deliver provably convergent, efficient and minimax optimal robust RL algorithms. The project will have a significant impact on theory and practice of sequential decision making in various domains, e.g., special education, intelligent transportation system, wireless communication networks, power systems and drone networks. The activities in this project will provide concrete principles and design guidelines to achieve robustness in face of model uncertainty. The integration of research work into education and outreach will target K-12 educators, graduate, undergraduate and underrepresented students with efforts on (i) Artificial Intelligence (AI) summer camp for K-12 educators; (ii) Buffalo Day workshop; (iii) curriculum development; (iv) student supervision.The research efforts are organized around three complimentary thrusts: (i) Thrust A focuses on developing theoretical and algorithmic foundations for distributionally robust RL under the long-term average-reward criterion. (ii) Thrust B focuses on developing a unified framework of distributional robustness for learning (robust) policies from offline dataset without active data acquisition and exploration, and further uncovering their fundamental limits; (iii) Thrust C focuses on constructive approaches and fundamental limits of robust RL under constraints, i.e., optimizing reward while simultaneously guaranteeing constraints under model uncertainty. This project will develop fundamental understandings of robust RL, minimax optimal robust RL algorithms and novel technical convergence and complexity analyses. The research outcome will significantly improve the robustness of RL algorithms and will be of interest to a broad range of communities, e.g., machine learning, statistics, information theory, networking, communication, power, and education. The proposed work will also foster new interdisciplinary research directions across these research communities.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现有的强化学习(RL)方法通常假设学习到的策略将部署在与其训练的环境相同的环境中。由于对抗性扰动、模拟器和真实环境之间的建模误差等,这种假设在实践中经常被违反。 -世界应用、非平稳环境和有限的训练数据。训练和测试环境之间的差异会导致模型不匹配,从而导致性能显着下降,并限制强化学习在关键领域(例如医疗保健、关键基础设施、交通系统和智能城市)的适用性。为了应对上述挑战,人们在开发分布式鲁棒强化学习方法方面做出了值得注意的努力。这个职业项目旨在推进分布式鲁棒强化学习的基本算法和理论极限。该项目的研究成果有望突破鲁棒强化学习的算法和理论界限,并将提供可证明收敛、高效且极小极大最优的鲁棒强化学习算法。该项目将对特殊教育、智能交通系统、无线通信网络、电力系统和无人机网络等各个领域的顺序决策的理论和实践产生重大影响。该项目中的活动将提供具体的原则和设计指南,以实现模型不确定性的鲁棒性。将研究工作纳入教育和推广活动将针对 K-12 教育工作者、研究生、本科生和代表性不足的学生,努力开展 (i) K-12 教育工作者人工智能 (AI) 夏令营; ㈡ 水牛日讲习班; (iii) 课程开发; (iv) 学生监督。研究工作围绕三个互补的主旨进行组织:(i) 主旨 A 侧重于为长期平均奖励标准下的分布式鲁棒强化学习开发理论和算法基础。 (ii) Thrust B 侧重于开发一个统一的分布式鲁棒性框架,用于在没有主动数据采集和探索的情况下从离线数据集中学习(鲁棒)策略,并进一步揭示其基本局限性; (iii) 推力 C 侧重于约束下稳健强化学习的建设性方法和基本限制,即优化奖励,同时保证模型不确定性下的约束。该项目将加深对鲁棒强化学习、极小极大最优鲁棒强化学习算法以及新颖技术收敛和复杂性分析的基本理解。研究成果将显着提高强化学习算法的鲁棒性,并将引起广泛的社区的兴趣,例如机器学习、统计学、信息论、网络、通信、电力和教育。拟议的工作还将在这些研究团体中培育新的跨学科研究方向。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Shaofeng Zou其他文献

Asymptotic optimality of D-CuSum for quickest change detection under transient dynamics
D-CuSum 的渐近最优性用于瞬态动态下最快的变化检测
Linear Complexity Exponentially Consistent Tests for Outlying Sequence Detection
离群序列检测的线性复杂度指数一致测试
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Yuheng Bu;Shaofeng Zou;V. Veeravalli
  • 通讯作者:
    V. Veeravalli
Layered decoding and secrecy over degraded broadcast channels
降级广播信道的分层解码和保密
K-user degraded broadcast channel with secrecy outside a bounded range
K 用户降级广播信道,其保密性超出有限范围
  • DOI:
  • 发表时间:
    2016
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Shaofeng Zou;Yingbin Liang;L. Lai;H. Poor;S. Shamai
  • 通讯作者:
    S. Shamai
Sequential (Quickest) Change Detection: Classical Results and New Directions
顺序(最快)变化检测:经典结果和新方向

Shaofeng Zou的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Shaofeng Zou', 18)}}的其他基金

CCSS: Collaborative Research: Quickest Threat Detection in Adversarial Sensor Networks
CCSS:协作研究:对抗性传感器网络中最快的威胁检测
  • 批准号:
    2112693
  • 财政年份:
    2021
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant
Collaborative Research: CIF: Medium: Emerging Directions in Robust Learning and Inference
协作研究:CIF:媒介:稳健学习和推理的新兴方向
  • 批准号:
    2106560
  • 财政年份:
    2021
  • 资助金额:
    $ 52万
  • 项目类别:
    Continuing Grant
CIF: Small: Reinforcement Learning with Function Approximation: Convergent Algorithms and Finite-sample Analysis
CIF:小型:带有函数逼近的强化学习:收敛算法和有限样本分析
  • 批准号:
    2007783
  • 财政年份:
    2020
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant
CRII: CIF: Dynamic Network Event Detection with Time-Series Data
CRII:CIF:使用时间序列数据进行动态网络事件检测
  • 批准号:
    1948165
  • 财政年份:
    2020
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant

相似国自然基金

强壮前沟藻共生细菌降解膦酸酯产生促藻效应的分子机制
  • 批准号:
    42306167
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于复合编码脉冲串的水下主动隐蔽性探测新方法研究
  • 批准号:
    61271414
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
半定松弛与非凸二次约束二次规划研究
  • 批准号:
    11271243
  • 批准年份:
    2012
  • 资助金额:
    60.0 万元
  • 项目类别:
    面上项目
高效率强壮消息鉴别码的分析与设计
  • 批准号:
    61202422
  • 批准年份:
    2012
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
民航客运网络收益管理若干问题的研究
  • 批准号:
    60776817
  • 批准年份:
    2007
  • 资助金额:
    20.0 万元
  • 项目类别:
    联合基金项目

相似海外基金

Exploring Causality in Reinforcement Learning for Robust Decision Making
探索强化学习中的因果关系以实现稳健决策
  • 批准号:
    EP/Y003187/1
  • 财政年份:
    2023
  • 资助金额:
    $ 52万
  • 项目类别:
    Research Grant
CPS: Medium: Collaborative Research: Provably Safe and Robust Multi-Agent Reinforcement Learning with Applications in Urban Air Mobility
CPS:中:协作研究:可证明安全且鲁棒的多智能体强化学习及其在城市空中交通中的应用
  • 批准号:
    2312094
  • 财政年份:
    2023
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant
Robust and Efficient Model-based Reinforcement Learning
稳健高效的基于模型的强化学习
  • 批准号:
    EP/X03917X/1
  • 财政年份:
    2023
  • 资助金额:
    $ 52万
  • 项目类别:
    Research Grant
CPS: Medium: Collaborative Research: Provably Safe and Robust Multi-Agent Reinforcement Learning with Applications in Urban Air Mobility
CPS:中:协作研究:可证明安全且鲁棒的多智能体强化学习及其在城市空中交通中的应用
  • 批准号:
    2312092
  • 财政年份:
    2023
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant
CIF: Small: Adversarially Robust Reinforcement Learning: Attack, Defense, and Analysis
CIF:小型:对抗性鲁棒强化学习:攻击、防御和分析
  • 批准号:
    2232907
  • 财政年份:
    2023
  • 资助金额:
    $ 52万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了