IIS:RI Theoretical Foundations of Reinforcement Learning: From Tabula Rasa to Function Approximation

IIS:RI 强化学习的理论基础：从白板到函数逼近

基本信息

批准号：
2110170
负责人：
Simon Du
金额：
$ 50万
依托单位：
University of Washington
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2110170&HistoricalAwards=false
关键词：
IIS RI Theoretical Foundations Reinforcement

项目摘要

Reinforcement learning, a technique that trains intelligent agents to make decisions, has become the central algorithmic paradigm for various applications, such as robotics, healthcare, manufacturing production, game playing, and transportation. However, reinforcement learning is equally infamous for demanding significant amounts of data and computing resources. This project aims to contribute to the fundamental understanding of reinforcement learning to reveal its inherent difficulties and develop efficient algorithms with strong theoretical guarantees. The results of the project are readily applicable to solving practical resource-hungry problems. The success of this project also requires new algorithmic techniques and mathematical tools in a variety of disciplines. An education plan is integrated into this project; the investigator will develop new courses, mentor students, organize workshops, and deliver lessons to high school students through the University of Washington’s Partner School program.This project has two major components. The first thrust studies the most canonical setting, tabula rasa reinforcement learning. The investigator will identify fundamental limits and develop optimal algorithms for several problems of both theoretical and practical interests: worst-case complexity, adaptation to problem structure, and data collection for batch RL. The second thrust is motivated by the modern usage of RL, where function approximation is employed for generalization over a large state space. The investigator will systematically examine the necessary and sufficient conditions that permit efficient learning algorithms for three of the most popular function approximation schemes: value-based, policy-based, and model-based. For both thrusts, the investigator will utilize the inherent combinatorial structures of reinforcement learning to characterize its fundamental hardness and design efficient algorithms. In addition to theoretical developments, the project also aims to implement all algorithms developed as open-source software and evaluate them on benchmark simulation environments.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

强化学习是一种训练智能代理做出决策的技术，已成为用于各种应用程序的中心算法范式，例如机器人技术，医疗保健，制造业生产，游戏和运输。但是，强化学习对于要求大量的数据和计算资源而同样臭名昭著。该项目旨在为强化学习的基本理解做出贡献，以揭示其固有的困难，并以强大的理论保证来开发有效的算法。该项目的结果很容易适用于解决实用的渴望资源问题。该项目的成功还需要各种学科的新算法技术和数学工具。该项目融入了一个教育计划；调查员将通过华盛顿大学合作伙伴学校计划开发新课程，精神学生，组织研讨会，并为高中生提供课程。该项目有两个主要组成部分。第一个推力研究最规范的环境，Tabula Rasa增强学习。研究人员将确定基本限制，并为理论和实际兴趣的几个问题开发最佳算法：最差的复杂性，对问题结构的适应性以及批处理RL的数据收集。第二个推力是由RL的现代用法激发的，在大型状态空间上，使用函数近似来概括。研究者将系统地检查必要和充分的条件，这些条件允许三种最流行的功能近似方案有效学习算法：基于价值，基于策略和基于模型。对于这两个推力，研究者将利用强化学习的继承组合结构来表征其基本硬度和设计有效算法。除了理论发展外，该项目还旨在实施作为开源软件开发的所有算法并在基准模拟环境中评估它们。该奖项反映了NSF的法定任务，并被认为是通过基金会的智力优点和更广泛的影响标准通过评估来获得支持的。