CAREER: Reinforcement Learning for Recursive Markov Decision Processes and Beyond
职业:递归马尔可夫决策过程及其他的强化学习
基本信息
- 批准号:2146563
- 负责人:
- 金额:$ 59.66万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-05-01 至 2027-04-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This award is funded in whole or in part under the American Rescue Plan Act of 2021 (Public Law 117-2).Reinforcement Learning (RL) is a sampling-based approach to optimization of Markov decision processes (MDPs), where agents rely on rewards to discover optimal solutions. When combined with powerful approximation schemes (e.g., deep neural networks), RL has been effective in highly complex tasks traditionally considered beyond reach of Artificial Intelligence. However, its sensitivity to the approximation parameters makes RL difficult to use (significant Machine Learning expertise is demanded of the programmer) and difficult to trust (manual approximations can invalidate guarantees). The vision of this project is to democratize RL by developing principled methodologies and powerful tools to improve the usability and trustworthiness of RL-based programming at scale. These research objectives are complemented by efforts to integrate the foundations of RL-based computability in CS education and to explore the role of RL-based programming in CS education.Approximation in RL is needed because RL algorithms with guaranteed convergence work on finite MDPs, and yet scale poorly. Approximation affects usability and trustworthiness. This proposal identifies two goals addressing both concerns: 1) to discover convergent RL beyond finite MDPs and 2) to develop abstraction-based approaches for RL with rigorous optimization guarantees. The success of the proposed approaches will be evaluated by their ability to handle systems at scale. The algorithms and datasets will be disseminated as open-source software. The proposed research makes fundamental contributions to three disciplines: formal methods, machine learning, and control theory; at the same time, it takes fundamental, concrete steps towards broadening participation in computing by making RL-based programming easier and more inclusive.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该奖项的全部或部分资金来源于《2021 年美国救援计划法案》(公法 117-2)。强化学习 (RL) 是一种基于采样的马尔可夫决策过程 (MDP) 优化方法,其中智能体依赖于奖励发现最佳解决方案。当与强大的近似方案(例如深度神经网络)相结合时,强化学习在传统上被认为超出人工智能能力范围的高度复杂任务中非常有效。然而,它对逼近参数的敏感性使得强化学习难以使用(需要程序员具备丰富的机器学习专业知识)并且难以信任(手动逼近可能会使保证失效)。该项目的愿景是通过开发有原则的方法和强大的工具来大规模提高基于强化学习的编程的可用性和可信度,从而使强化学习民主化。这些研究目标还通过努力将基于 RL 的可计算性基础整合到 CS 教育中,并探索基于 RL 的编程在 CS 教育中的作用来补充。需要 RL 逼近,因为保证收敛的 RL 算法适用于有限 MDP,并且但规模很差。近似值会影响可用性和可信度。该提案确定了解决这两个问题的两个目标:1)发现超越有限 MDP 的收敛强化学习;2)开发具有严格优化保证的基于抽象的强化学习方法。所提出的方法是否成功将通过其大规模处理系统的能力来评估。算法和数据集将作为开源软件传播。 拟议的研究对三个学科做出了基础性贡献:形式方法、机器学习和控制理论;与此同时,它采取了根本性的、具体的步骤,通过使基于强化学习的编程变得更容易、更具包容性,来扩大对计算的参与。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力优势和更广泛的影响进行评估,被认为值得支持。审查标准。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Reinforcement Learning with Guarantees that Hold for Ever
强化学习,保证永远有效
- DOI:
- 发表时间:2022-09
- 期刊:
- 影响因子:0
- 作者:Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
- 通讯作者:Wojtczak, Dominik
The Octatope Abstract Domain for Verification of Neural Networks.
用于验证神经网络的八位位抽象域。
- DOI:
- 发表时间:2023-03
- 期刊:
- 影响因子:0
- 作者:Bak, S.;Dohmen, T.;Subramani, K.;Trivedi, A.;Velasquez, A.;Wojciechowski, P.
- 通讯作者:Wojciechowski, P.
Recursive Reinforcement Learning
递归强化学习
- DOI:
- 发表时间:2022-11
- 期刊:
- 影响因子:0
- 作者:Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
- 通讯作者:Wojtczak, Dominik
Optimal Repair for Omega-Regular Properties
Omega-Regular 特性的最佳修复
- DOI:
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Dave, V.;Krishna, S.;Murali, V.;Trivedi, A.
- 通讯作者:Trivedi, A.
An Impossibility Result in Automata-Theoretic Reinforcement Learning.
自动机理论强化学习的不可能结果。
- DOI:
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Hahn, Ernst Moritz;Perez, Mateo;Schewe, Sven;Somenzi, Fabio;Trivedi, Ashutosh;Wojtczak, Dominik
- 通讯作者:Wojtczak, Dominik
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ashutosh Trivedi其他文献
Quantitative estimation of side-channel leaks with neural networks
用神经网络定量估计侧信道泄漏
- DOI:
10.1007/s10009-021-00622-2 - 发表时间:
2021-05-26 - 期刊:
- 影响因子:1.5
- 作者:
Saeid Tizpaz;Pavol Cerný;S. Sankaranarayanan;Ashutosh Trivedi - 通讯作者:
Ashutosh Trivedi
Metamorphic Testing and Debugging of Tax Preparation Software
报税软件的变质测试与调试
- DOI:
10.1109/icse-seis58686.2023.00019 - 发表时间:
2022-05-10 - 期刊:
- 影响因子:0
- 作者:
Saeid Tizpaz;Morgan Wagner;Shiva Darian;Krystia Reed;Ashutosh Trivedi - 通讯作者:
Ashutosh Trivedi
Co-Buchi Barrier Certificates for Discrete-time Dynamical Systems
离散时间动力系统的 Co-Buchi 屏障证书
- DOI:
10.48550/arxiv.2311.07695 - 发表时间:
2023-11-13 - 期刊:
- 影响因子:0
- 作者:
Vishnu Murali;Ashutosh Trivedi;Majid Zamani - 通讯作者:
Majid Zamani
Stochastic Timed Games Revisited
重新审视随机计时游戏
- DOI:
- 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
S. Akshay;P. Bouyer;S. Krishna;L. Manasa;Ashutosh Trivedi - 通讯作者:
Ashutosh Trivedi
Alternating Good-for-MDP Automata
交替 Good-for-MDP 自动机
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
E. M. Hahn;Mateo Perez;S. Schewe;F. Somenzi;Ashutosh Trivedi;D. Wojtczak - 通讯作者:
D. Wojtczak
Ashutosh Trivedi的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ashutosh Trivedi', 18)}}的其他基金
Collaborative Research: DASS: Assessing Accountability of Tax Preparation Software Systems
合作研究:DASS:评估报税软件系统的责任
- 批准号:
2317207 - 财政年份:2023
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
SHF: Small: Omega-Regular Objectives for Model-Free Reinforcement Learning
SHF:小型:无模型强化学习的 Omega-Regular 目标
- 批准号:
2009022 - 财政年份:2020
- 资助金额:
$ 59.66万 - 项目类别:
Standard Grant
相似国自然基金
水系硫基液流电池中电荷加强型膜质-荷传输的结构协同调控机制
- 批准号:22378319
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
耗散加强理论在非线性系统与随机抽样中的应用
- 批准号:12301283
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
内共振加强参激模式能量采集非线性动力学理论与实验研究
- 批准号:12302010
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
新型冠状病毒肺炎疫苗异源加强接种的免疫增强效应机制学研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
开放经济背景下加强知识产权保护对跨国技术引进的影响
- 批准号:72273028
- 批准年份:2022
- 资助金额:45 万元
- 项目类别:面上项目
相似海外基金
CAREER: Intelligent Battery Management with Safe, Efficient, Fast-Adaption Reinforcement Learning and Physics-Inspired Machine Learning: From Cells to Packs
职业:具有安全、高效、快速适应的强化学习和物理启发机器学习的智能电池管理:从电池到电池组
- 批准号:
2340194 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Dual Reinforcement Learning: A Unifying Framework with Guarantees
职业:双重强化学习:有保证的统一框架
- 批准号:
2340651 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Robust Reinforcement Learning Under Model Uncertainty: Algorithms and Fundamental Limits
职业:模型不确定性下的鲁棒强化学习:算法和基本限制
- 批准号:
2337375 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Structure Exploiting Multi-Agent Reinforcement Learning for Large Scale Networked Systems: Locality and Beyond
职业:为大规模网络系统利用多智能体强化学习的结构:局部性及其他
- 批准号:
2339112 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant
CAREER: Towards Real-world Reinforcement Learning
职业:走向现实世界的强化学习
- 批准号:
2339395 - 财政年份:2024
- 资助金额:
$ 59.66万 - 项目类别:
Continuing Grant