CAREER: Stochasticity and Resilience in Reinforcement Learning: From Single to Multiple Agents

职业：强化学习中的随机性和弹性：从单个智能体到多个智能体

基本信息

批准号：
2339794
负责人：
Qiaomin Xie
金额：
$ 53.29万
依托单位：
University of Wisconsin-Madison
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2024
资助国家：
美国
起止时间：
2024-03-01 至 2029-02-28
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2339794&HistoricalAwards=false
关键词：
CAREER Stochasticity Resilience Reinforcement Learning

项目摘要

Reinforcement Learning (RL) has emerged as a promising data-driven paradigm for learning to control unknown and complex systems. It has achieved impressive success in simulated environments such as games. However, for applications in real-world engineering systems, existing RL algorithms and theory fall short of addressing three fundamental challenges: high stochasticity, long-horizon regimes and vulnerability to model uncertainty. These challenges are exacerbated in systems with multiple strategic agents. The goal of this CAREER project is to advance the algorithmic and theoretical foundations of RL by addressing these challenges, and enable efficient and resilient RL-based control in engineering systems. This project will particularly focus on applications in computer and communication networks, which will guide the problem formulation, methodology development and evaluation. The project is enhanced by an education plan that aims to offer students from K–12 to college a pathway to obtain experience and training in RL and broadly machine learning, as well as in their applications in engineering systems. This project will also support a mentoring program for students fromunderrepresented groups in STEM.The research work in this project will address the aforementioned challenges via three technical thrusts. Thrust 1 studies finite-time convergence of various iterative algorithms that arise in RL through the unified variational inequality framework, by leveraging tools from modern Markov chain theory. In Thrust 2, we will develop techniques to tame the high stochasticity in long-horizon problems, and further develop RL algorithms that provably learn a stable and near-optimal policy. Thrust 3 studies scalable multi-agent RL through the framework of mean-field game and graphon game, as well as the game theoretical foundation of robust Markov games under model uncertainty. The developed RL algorithms will be implemented and evaluated in a broad profile of decision-making problems in computer and communication networks.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

增强学习（RL）已成为一个有希望的数据驱动的范式，用于学习控制未知和复杂系统。它在模拟环境（例如游戏）中取得了令人印象深刻的成功。但是，对于现有工程系统中的应用，现有的RL算法和理论缺乏应对三个基本挑战：高随机性，长期胜利制度和模拟不确定性的脆弱性。这些挑战在具有多种战略代理的系统中加剧了。该职业项目的目的是通过解决这些挑战来推动RL的算法和理论基础，并在工程系统中实现高效且基于RL的控制。该项目将特别关注计算机和通信网络中的应用，这些应用程序将指导问题公式，方法论的开发和评估。一项教育计划增强了该项目，该计划旨在为从K -12到大学的学生提供一条途径，以获得RL和广泛的机器学习以及在工程系统中的应用中获得经验和培训。该项目还将为STEM中代表性不足的群体中代表性不足的学生提供心理计划。该项目的研究工作将通过三个技术推力来应对优先的挑战。1Thrust1研究通过从现代Markov Chabov Chable理论中利用统一的变性框架，通过统一的变性框架通过统一的变性框架来通过RL产生的各种迭代算法的有限时间收敛。在“推力2”中，我们将开发技术来驯服长途问题中高的随机性，并进一步开发RL算法，这些算法适当地学习了稳定且近乎最佳的政策。推力3研究通过均值游戏和Graphon游戏的框架进行可扩展的多代理RL，以及在模型不确定性下的强大马尔可夫游戏的游戏理论基础。开发的RL算法将在计算机和通信网络中的决策问题方面进行实施和评估。该奖项反映了NSF的法定任务，并使用基金会的知识分子优点和更广泛的影响审查标准，认为通过评估而被认为是宝贵的支持。