An Abstraction-based Technique for Safe Reinforcement Learning

一种基于抽象的安全强化学习技术

基本信息

批准号：
EP/X015823/1
负责人：
Francesco Belardinelli
金额：
$ 38.49万
依托单位：
Imperial College London
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2023
资助国家：
英国
起止时间：
2023 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FX015823%2F1
关键词：
Abstraction based Technique Safe Reinforcement

项目摘要

Autonomous agents learning to act in unknown environments have been attracting research interest due to their wider implications for AI, as well as for their applications in complex domains, including robotics, network optimisation, and resource allocation. Currently, one of the most successful approaches is reinforcement learning (RL). However, to learn how to act, agents are required to explore the environment, which in safety-critical scenarios means that they might take dangerous actions, possibly harming themselves or even putting human lives at risk. Consequently, reinforcement learning is still rarely used in real-world applications, where multiple safety-critical constraints need to be satisfied simultaneously.To alleviate this problem, RL algorithms are being combined with formal verification techniques to ensure safety in learning. Indeed, formal methods are nowadays routinely applied to the specification, design, and verification of complex systems, as they allow to obtain proof-like certification of their correct and safe behaviour, which is meant to be intelligible to system engineers and human users alike. These desirable features have motivated the adoption of formal methods for the verification of general AI systems, which has variously been called safe, verifiable, trustworthy AI 1. Still, the application of formal methods to AI systems raises significant new challenges, including the "black-box" nature of most machine learning algorithms used nowadays. Specific to the application of formal methods to RL, we identify two main shortcomings with current approaches, which will be tackled in this project:- Most of current verification methodologies do not scale well as the complexity of the application increases. This state explosion problem is particularly acute for RL scenarios, where agents might have to chose among a huge number of action/state transitions (e.g., autonomous cars).- Systems with multiple learning agents are comparatively less explored, and therefore less understood, than single-agent settings, partly because of the high-dimensionality of their state-space and their non-stationarity. Yet, multi-agent settings are key for applications, such as platooning for autonomous vehicles and robot swarms.To tackle both problems, we put forward an abstraction-based approach to verification, which is meant to reduce the state space, also by leveraging on symmetries of the system, while preserving all its safety-related features, thus leading to guaranteed and scalable safe behaviours. The research envisaged in this project is timely and it fits with the current portfolio of EPSRC-funded research, as it aligns with the theme of AI and robotics, in particular the key strategic investment in trust-worthy autonomous systems. The present proposal is aimed at developing a verifiably safe RL methodology, which is meant to have a positive societal impact on the trust of the general public towards deployed AI solutions, and to facilitate their adoption within society at large.

由于对AI的更广泛影响，以及它们在复杂领域的应用，包括机器人技术，网络优化和资源分配。目前，最成功的方法之一是增强学习（RL）。但是，要学习采取行动，需要代理人探索环境，这在安全至关重要的情况下意味着他们可能采取危险的行动，可能伤害自己，甚至使人类生命处于危险之中。因此，增强学习仍然很少用于现实世界应用，在现实世界中，需要同时满足多个安全至关重要的约束。为了减轻此问题，RL算法与正式验证技术相结合，以确保学习安全。实际上，如今，正式方法通常适用于复杂系统的规范，设计和验证，因为它们允许获得其正确且安全的行为的类似证明的认证，这对于系统工程师和人类用户都可以理解。这些理想的特征促使采用正式方法来验证一般AI系统，该系统被称为安全，可验证，可信赖的AI 1。尽管如此，在AI系统中，将形式方法应用于AI系统会引起重大的新挑战，包括当今使用的大多数机器学习算法的“黑箱”本质。特定于将正式方法应用于RL的应用，我们确定了两个主要方法，这些方法将在该项目中解决： - 当前大多数当前验证方法的扩展不是很好地扩展，因为应用程序的复杂性增加。对于RL场景而言，这种状态爆炸问题尤其严重，在这种情况下，代理商可能必须在大量的动作/州过渡中选择（例如，自动驾驶汽车）。-具有多个学习剂的系统相对较少，因此比单一代理的探索较少，部分是由于其状态空间和非统一空间和非统一的设置。然而，多机构设置是应用的关键，例如为自动驾驶汽车和机器人群的排成线。为了解决这两个问题，我们提出了一种基于抽象的验证方法，该方法旨在减少状态空间，还可以通过利用系统的对称性，同时保留其所有与安全相关的功能，从而为保证和确定的安全性和kable safe and scable Safe Cable Cable Cable Cable Cable Cable Cable Cable Cable Cable Cable Cable Cable Cable Capeiors。该项目中设想的研究是及时的，它与EPSRC资助的研究的当前投资组合相吻合，因为它与AI和机器人技术的主题相符，尤其是在值得信任的自主系统中的关键战略投资。本提案旨在开发一种可靠的安全RL方法，该方法旨在对公众对部署AI解决方案的信任产生积极的社会影响，并促进其在整个社会中的收养。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Francesco Belardinelli其他文献

On the Stability of Learning in Network Games with Many Players

论多人网络游戏中学习的稳定性

DOI：
10.48550/arxiv.2403.15848
发表时间：
2024
期刊：
Adaptive Agents and Multi-Agent Systems
影响因子：
0
作者：
A. Hussain;D.G. Leonte;Francesco Belardinelli;G. Piliouras
通讯作者：
G. Piliouras

The Reasons that Agents Act: Intention and Instrumental Goals

代理人行动的原因：意图和工具性目标

DOI：
10.48550/arxiv.2402.07221
发表时间：
2024
期刊：
影响因子：
0
作者：
Francis Rhys Ward;Matt MacDermott;Francesco Belardinelli;Francesca Toni;Tom Everitt
通讯作者：
Tom Everitt