Distributionally Robust Adaptive Control: Enabling Safe and Robust Reinforcement Learning

分布式鲁棒自适应控制：实现安全鲁棒的强化学习

基本信息

批准号：
2135925
负责人：
Naira Hovakimyan
金额：
$ 37.5万
依托单位：
University of Illinois at Urbana-Champaign
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-07-01 至 2025-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2135925&HistoricalAwards=false
关键词：
Distributionally Robust Adaptive Control Enabling

项目摘要

Data-driven algorithms can autonomously control complex systems like autonomous cars and drones. However, the use of such powerful algorithms remains relegated primarily to controlled laboratory environments. The main reason for the minimal adoption of data-driven methods for safety-critical systems is the difficulty one encounters when attempting to establish safety and predictability guarantees as one would do with well-established control theoretical methods. This award supports fundamental research to identify the best methodologies to consolidate data-driven and control-theoretic tools so that the overall methodology is safe, robust, and high-performing. The new approach lifts control tools to speak the same language as the data-driven methods. In doing so, the performance of the data-driven methods is not compromised, and yet, the safety guarantees of control-theoretic tools can be constructed. Safe and predictable autonomous operation of complex systems can bring immense socio-economic benefits through its application in medical robotics, autonomous logistics, transportation, and extra-terrestrial exploration, to name a few. This research involves multiple disciplines, including robotics, control theory, statistical learning, and mathematics. The cross-disciplinary nature will assist underrepresented groups' broader participation in STEM and impact engineering education. To adopt data-driven methods that rely on reinforcement learning (RL) algorithms in safety-critical systems, we need guarantees on safety and robustness. Robust and adaptive control methodologies developed for classical systems with parametric uncertainties cannot be used directly in conjunction with RL because the latter operates on data-driven models for which identifying parametric and deterministic uncertainties is difficult, if not impossible. This research will construct a new class of robust adaptive controllers that are robust to errors in the learned distributions, thus allowing RL algorithms to directly interact with these controllers without further restrictions. Due to robustness at the level of distributions, notions of risk-aware safety can be included in a straightforward manner. This research will first aim to construct controllers that track temporally evolving state distributions with uniform bounds. Then, the epistemic uncertainties will be introduced with a novel adaptive control scheme to quantifiably control the effect of the uncertainties in the space of distributions. The results produced through this effort will bring the two distinct worlds of data-driven control and classical control together at a natural intersection point where trajectories of distributions, not of sample paths, are considered.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

数据驱动的算法可以自主控制复杂的系统，例如自动驾驶汽车和无人机。但是，这种强大的算法的使用仍主要降级为受控的实验室环境。最少采用数据驱动方法来安全至关重要系统的主要原因是，在试图建立安全性和可预测性保证时，人们会遇到困难，就像一个人使用良好的控制理论方法一样。该奖项支持基本研究，以确定合并数据驱动和控制理论工具的最佳方法，以便整体方法是安全，强大且高性能的。新方法提升控制工具，以说与数据驱动的方法相同的语言。这样一来，数据驱动方法的性能就不会受到损害，但是，可以构建控制理论工具的安全保证。复杂系统的安全和可预测的自主操作可以通过其在医疗机器人技术，自主物流，运输和地外探索中的应用来带来巨大的社会经济利益。这项研究涉及多个学科，包括机器人技术，控制理论，统计学习和数学。跨学科的性质将有助于代表性不足的团体对STEM和影响工程教育的广泛参与。要采用依靠加强学习（RL）算法的数据驱动方法，我们需要确保安全性和鲁棒性。为具有参数不确定性的经典系统开发的稳健和自适应控制方法不能直接与RL结合使用，因为后者在数据驱动的模型上运行，而这些模型很难识别参数和确定性不确定性，即使不是不可能。这项研究将构建一类新的健壮自适应控制器，这些控制器对学习分布中的错误是可靠的，从而允许RL算法与这些控制器直接相互作用而无需进一步限制。由于分布级别的稳健性，可以简单地包括风险感知安全性的概念。这项研究将首先旨在构建控制器，以跟踪均匀界限的时间不断发展的状态分布。然后，将使用一种新型的自适应控制方案引入认知不确定性，以量化分布空间中不确定性的影响。通过这项工作产生的结果将使数据驱动的控制和经典控制的两个不同的世界在自然的交叉点融合在一起，在这种自然交叉点中，考虑到分布轨迹而不是样本路径的轨迹。这项奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子优点和更广泛影响的审查审查的审查标准来通过评估来通过评估来提供支持的。