Collaborative Research: CIF: Medium: MoDL:Toward a Mathematical Foundation of Deep Reinforcement Learning
合作研究:CIF:媒介:MoDL:迈向深度强化学习的数学基础
基本信息
- 批准号:2212261
- 负责人:
- 金额:$ 60万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-10-01 至 2026-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Deep Reinforcement Learning (DRL), which uses neural networks to solve sequential decision-making problems, has made breakthroughs in real-world applications, such as robotics, gaming, healthcare, and transportation systems. However, current theoretical work on reinforcement learning is restricted to problems with a small number of states; as these results do not cover neural networks, they cannot be used to satisfactorily explain the empirical successes of DRL. This project seeks to bridge this gap by building a mathematical foundation for DRL that leverages ideas from approximation theory, control theory, and optimization theory. This will allow the computational and statistical complexity of DRL to be systematically characterized, and will help with designing more efficient and reliable empirical methods. Education and outreach plans are integrated into this project. Specifically, the investigators will mentor graduate and undergraduate students (some through the STARS program for underrepresented groups at the University of washington), develop new courses and monographs, organize research workshops, and develop course materials for a high school data science and artificial intelligence curriculum. This project has three major components. The first thrust identifies which types of guarantees are achievable by policies for different reinforcement learning problem instances. Concretely, this requires investigating how increasingly structured problem instances enable stronger guarantees for policies; this will be done by using, and further developing, tools from non-convex optimization to describe policies that achieve stationary points, local maxima, and global maxima of the reward function. The second thrust takes the perspective of approximation theory and capacity control to investigate how the neural network complexity can be gradually increased to eventually find the most complex sub-family of neural networks that permit sample-efficient algorithms. The third thrust builds upon the knowledge gained in the first two thrusts, and is devoted to the design of computationally efficient algorithms; this will be done by leveraging tools from optimization theory and by making connections with control theory.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
使用神经网络解决顺序决策问题的深度强化学习(DRL)在现实世界中的应用中取得了突破,例如机器人技术,游戏,医疗保健和运输系统。但是,当前关于强化学习的理论工作仅限于少数州的问题。由于这些结果不涵盖神经网络,因此不能用来令人满意地解释DRL的经验成功。该项目试图通过为DRL建立数学基础来弥合这一差距,该基础利用近似理论,控制理论和优化理论来利用思想。这将使DRL的计算和统计复杂性系统地表征,并有助于设计更有效和可靠的经验方法。教育和外展计划已纳入该项目。具体来说,研究人员将指导毕业生和本科生(有些通过华盛顿大学代表性不足的团体的明星计划),开发新课程和专着,组织研究研讨会,并为高中数据科学和人工智能课程开发课程材料。该项目具有三个主要组成部分。第一个推力可以通过针对不同的强化学习问题实例来确定哪些类型的保证可以实现。具体而言,这需要调查越来越多的结构化问题实例为政策提供更强大的保证;这将通过使用非凸优化的工具以及进一步开发的工具来描述奖励功能的固定点,本地最大值和全球最大值的政策。第二个推力是近似理论和容量控制的观点,以研究如何逐渐增加神经网络的复杂性,以最终找到允许样品效率算法的神经网络中最复杂的亚家族。第三个推力建立在前两个推力中获得的知识,并致力于设计计算有效算法的设计。这将通过利用优化理论的工具和与控制理论建立联系来实现。该奖项反映了NSF的法定任务,并使用基金会的知识分子优点和更广泛的影响标准,被视为值得通过评估来支持。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Near-Optimal Randomized Exploration for Tabular Markov Decision Processes
- DOI:
- 发表时间:2021-02
- 期刊:
- 影响因子:0
- 作者:Zhihan Xiong;Ruoqi Shen;Qiwen Cui;Maryam Fazel;S. Du
- 通讯作者:Zhihan Xiong;Ruoqi Shen;Qiwen Cui;Maryam Fazel;S. Du
Learning in Congestion Games with Bandit Feedback
- DOI:10.48550/arxiv.2206.01880
- 发表时间:2022-06
- 期刊:
- 影响因子:0
- 作者:Qiwen Cui;Zhihan Xiong;Maryam Fazel;S. Du
- 通讯作者:Qiwen Cui;Zhihan Xiong;Maryam Fazel;S. Du
On Controller Reduction in Linear Quadratic Gaussian Control with Performance Bounds
关于具有性能界限的线性二次高斯控制中的控制器简化
- DOI:
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Ren, Zhaolin;Zheng, Yang;Fazel, Maryam;Li, Na
- 通讯作者:Li, Na
Toward a Theoretical Foundation of Policy Optimization for Learning Control Policies
为学习控制策略奠定策略优化的理论基础
- DOI:10.1146/annurev-control-042920-020021
- 发表时间:2023
- 期刊:
- 影响因子:0
- 作者:Hu, Bin;Zhang, Kaiqing;Li, Na;Mesbahi, Mehran;Fazel, Maryam;Başar, Tamer
- 通讯作者:Başar, Tamer
Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
- DOI:10.48550/arxiv.2210.01400
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Rui Yuan;S. Du;Robert Mansel Gower;A. Lazaric;Lin Xiao
- 通讯作者:Rui Yuan;S. Du;Robert Mansel Gower;A. Lazaric;Lin Xiao
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Simon Du其他文献
Decoding-Time Language Model Alignment with Multiple Objectives
具有多个目标的解码时语言模型对齐
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Ruizhe Shi;Yifang Chen;Yushi Hu;Alisa Liu;Hanna Hajishirzi;Noah A. Smith;Simon Du - 通讯作者:
Simon Du
sample-complexity of Estimating Convolutional and Recurrent Neural Networks How Many Samples are Needed to Estimate a Convolutional or Recurrent Neural Network ? ∗
估计卷积和循环神经网络的样本复杂度 估计卷积或循环神经网络需要多少样本?
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Simon Du - 通讯作者:
Simon Du
Simon Du的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Simon Du', 18)}}的其他基金
CAREER: Toward a Foundation of Over-Parameterization
职业生涯:迈向超参数化的基础
- 批准号:
2143493 - 财政年份:2022
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
Collaborative Research: SCALE MoDL: Adaptivity of Deep Neural Networks
合作研究:SCALE MoDL:深度神经网络的适应性
- 批准号:
2134106 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Continuing Grant
IIS:RI Theoretical Foundations of Reinforcement Learning: From Tabula Rasa to Function Approximation
IIS:RI 强化学习的理论基础:从白板到函数逼近
- 批准号:
2110170 - 财政年份:2021
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
相似国自然基金
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
- 批准号:62371263
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
腙的Heck/脱氮气重排串联反应研究
- 批准号:22301211
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
- 批准号:52364038
- 批准年份:2023
- 资助金额:33 万元
- 项目类别:地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
- 批准号:82371176
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FOXO3 m6A甲基化修饰诱导滋养细胞衰老效应在补肾法治疗自然流产中的机制研究
- 批准号:82305286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: CIF: Medium: Snapshot Computational Imaging with Metaoptics
合作研究:CIF:Medium:Metaoptics 快照计算成像
- 批准号:
2403122 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: CIF-Medium: Privacy-preserving Machine Learning on Graphs
合作研究:CIF-Medium:图上的隐私保护机器学习
- 批准号:
2402815 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Mathematical and Algorithmic Foundations of Multi-Task Learning
协作研究:CIF:小型:多任务学习的数学和算法基础
- 批准号:
2343599 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research: CIF: Small: Mathematical and Algorithmic Foundations of Multi-Task Learning
协作研究:CIF:小型:多任务学习的数学和算法基础
- 批准号:
2343600 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant
Collaborative Research:CIF:Small:Acoustic-Optic Vision - Combining Ultrasonic Sonars with Visible Sensors for Robust Machine Perception
合作研究:CIF:Small:声光视觉 - 将超声波声纳与可见传感器相结合,实现强大的机器感知
- 批准号:
2326905 - 财政年份:2024
- 资助金额:
$ 60万 - 项目类别:
Standard Grant