CIF: SMALL: Theoretical Foundations of Partially Observable Reinforcement Learning: Minimax Sample Complexity and Provably Efficient Algorithms

CIF：SMALL：部分可观察强化学习的理论基础：最小最大样本复杂性和可证明有效的算法

基本信息

批准号：
2315725
负责人：
Song Mei
金额：
$ 48.37万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-06-15 至 2026-05-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2315725&HistoricalAwards=false
关键词：
CIF SMALL Theoretical Foundations Partially

项目摘要

Many reinforcement learning (RL) systems operate within environments that provide only partial observations and imperfect information to the agents. Despite notable empirical success, partially observable RL models still present considerable theoretical challenges, potentially posing significant risks to sensitive tasks. This project will design efficient learning algorithms and provide sharp sample complexity analyses for partially observable RL systems. The theoretical tools will build on a broad range of subjects, including machine learning, information theory, control theory, and high-dimensional statistics. The developed results will have impact on a variety of applications such as robotic control, autonomous driving, and strategic games. The investigator is committed to fostering diversity by actively recruiting and training students, particularly those from underrepresented minorities and women in Science, Technology, Engineering, and Math (STEM).This project will tackle the theoretical challenges in learning two partially observable RL models: partially observable Markov decision processes (POMDPs) and extensive-form games (EFGs). The main goal is to provide theoretical tools and new insights to developing algorithms and proving sharp statistical complexity bounds. The first component will focus on POMDPs, with the goal of closing the sample complexity gap of learning in the basic tabular setting and addressing the computational challenges by identifying structural conditions that admit planning efficiency. The second component will focus on EFGs, with the goal of designing near-optimal algorithms for three types of regret: external regret, Phi-regret, and dynamic regret. The proposed algorithms and sharp statistical complexity bounds will provide a solid theoretical foundation for future research of RL theorists and practitioners. These algorithms will be coded and tested within the OpenSpiel environment to evaluate their empirical performance.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

许多强化学习（RL）系统在环境中运行，仅向代理提供部分观察和不完美的信息。尽管取得了显着的经验成功，但部分可观察到的RL模型仍然带来了相当大的理论挑战，可能对敏感任务构成了重大风险。该项目将设计有效的学习算法，并为部分可观察到的RL系统提供尖锐的样本复杂性分析。理论工具将基于广泛的主题，包括机器学习，信息理论，控制理论和高维统计。开发的结果将对各种应用程序产生影响，例如机器人控制，自动驾驶和战略游戏。研究者致力于通过积极招募和培训学生，尤其是来自代表性不足的少数民族和妇女的科学，技术，工程和数学（STEM）（STEM）的学生来促进多样性。该项目将应对学习两个部分可观察到的RL的理论挑战：部分可观察到的Markov决策过程（POMDPS）和广泛的Fermentive-Forment-Ferment-Fermorges（e efgs）。主要目标是提供理论工具和新见解，以开发算法并证明尖锐的统计复杂性界限。第一个组件将集中在POMDP上，目的是缩小基本表格设置中学习的样本复杂性差距，并通过识别承认计划效率的结构条件来解决计算挑战。第二个组件将重点放在EFG上，目的是为三种遗憾设计近乎最佳的算法：外部遗憾，phi-regret和动态遗憾。提出的算法和尖锐的统计复杂性界限将为RL理论家和从业者的未来研究提供稳固的理论基础。这些算法将在OpenSpiel环境中进行编码和测试，以评估其经验性绩效。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子优点和更广泛的影响评估标准通过评估来获得支持的。