Robust and Efficient Model-based Reinforcement Learning

稳健高效的基于模型的强化学习

基本信息

批准号：
EP/X03917X/1
负责人：
Ilija Bogunovic
金额：
$ 50.73万
依托单位：
University College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX03917X%2F1
关键词：
Robust Efficient Model based Reinforcement

项目摘要

Reinforcement learning (RL) is concerned with training data-driven agents to make decisions. In particular, an RL agent interacting with an environment needs to learn an optimal policy, i.e., which actions to take in different states to maximize its rewards. Recently, RL has become one of the most prominent areas of machine learning since RL methods can have tremendous potential in solving complex tasks across various fields (e.g., in autonomous driving, nuclear fusion, healthcare, hardware design, etc.). However, a number of challenges still stand in the way of its widespread adoption. Contemporary RL algorithms are often data-intensive and lack robustness guarantees. Established (deep) RL approaches require a vast amount of data that is readily available in some environments (e.g., in video games). This is often not the case with real-world tasks where data acquisition is costly. Another major challenge is to use the learned control policies in the real world while ensuring reliable, robust, and safe performance. This research aims to provide practical model-based RL algorithms with rigorous statistical and robustness guarantees. This is significant in safety-critical applications where obtaining data is expensive, e.g., in nuclear fusion, learning policies to control plasmas is performed via expensive simulators. The key novelty will be to incorporate the versatile robustness aspects into model-based RL allowing for its broad application across different applications and domains.This project focuses on designing algorithms that make use of powerful non-linear statistical models to learn about the world and can tackle large state spaces present in modern RL tasks. The focus is on obtaining near-optimal policies that are robust against distributional shifts in the environmental dynamics, (adversarial) data corruptions/outliers, and satisfy application-dependent safety constraints during exploration. A major contribution will be novel rigorous statistical sample complexity guarantees for designed algorithms that characterize convergence to optimal robust and safe policies. The obtained guarantees will be efficient in the sense of being independent of the number of states, and hence applicable to complex applications. This will require designing new robust estimators and confidence intervals for popular statistical models. Moreover, the project will result in an entire testbed with distributional shifts and attacking strategies that will be provided to benchmark the robustness of standard and novel robust RL algorithms. This project will be among the first contribution to achieving both robustness and efficiency in MBRL by providing practical algorithms that can be readily applied to emerging impactful real-world tasks such as robust control of nuclear plasmas (an exciting and promising path toward sustainable energy) and efficient discovery of system-on-chip designs.

强化学习（RL）涉及训练数据驱动的智能体做出决策。特别是，与环境交互的强化学习代理需要学习最优策略，即在不同状态下采取哪些操作来最大化其奖励。最近，强化学习已成为机器学习最突出的领域之一，因为强化学习方法在解决各个领域的复杂任务（例如自动驾驶、核聚变、医疗保健、硬件设计等）方面具有巨大的潜力。然而，许多挑战仍然阻碍其广泛采用。当代强化学习算法通常是数据密集型的并且缺乏鲁棒性保证。成熟的（深度）强化学习方法需要大量数据，这些数据在某些环境（例如视频游戏）中很容易获得。对于数据获取成本高昂的现实任务来说，情况通常并非如此。另一个主要挑战是在现实世界中使用学习到的控制策略，同时确保可靠、稳健和安全的性能。本研究旨在提供实用的基于模型的强化学习算法，并具有严格的统计和鲁棒性保证。这在获取数据成本昂贵的安全关键型应用中非常重要，例如在核聚变中，控制等离子体的学习策略是通过昂贵的模拟器来执行的。关键的新颖之处是将多功能的鲁棒性方面纳入基于模型的强化学习中，从而使其能够在不同的应用程序和领域中广泛应用。该项目的重点是设计算法，利用强大的非线性统计模型来了解世界，并能够解决现代强化学习任务中存在的大型状态空间。重点是获得近乎最优的策略，这些策略对于环境动态的分布变化、（对抗性）数据损坏/异常值具有鲁棒性，并在探索过程中满足依赖于应用程序的安全约束。一个主要贡献将是为设计的算法提供新颖严格的统计样本复杂性保证，这些算法表征了收敛到最佳鲁棒和安全策略的特征。所获得的保证在独立于状态数量的意义上将是有效的，因此适用于复杂的应用。这将需要为流行的统计模型设计新的稳健估计器和置信区间。此外，该项目将产生一个具有分布变化和攻击策略的完整测试平台，用于对标准和新型鲁棒强化学习算法的鲁棒性进行基准测试。该项目将通过提供实用算法来实现 MBRL 的鲁棒性和效率，从而成为实现 MBRL 鲁棒性和效率的第一个贡献，这些算法可以轻松应用于新兴的有影响力的现实世界任务，例如核等离子体的鲁棒控制（一条令人兴奋且有希望的可持续能源之路）和有效发现片上系统设计。