Exploring Causality in Reinforcement Learning for Robust Decision Making

探索强化学习中的因果关系以实现稳健决策

基本信息

  • 批准号:
    EP/Y003187/1
  • 负责人:
  • 金额:
    $ 20.97万
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Research Grant
  • 财政年份:
    2023
  • 资助国家:
    英国
  • 起止时间:
    2023 至 无数据
  • 项目状态:
    未结题

项目摘要

Reinforcement learning (RL) has seen significant development in recent years and has demonstrated impressive capabilities in decision-making tasks, such as games (AlphaStar, OpenAI Five), chatbots (ChatGPT, and recommendation systems (Microsoft). The techniques of RL can also be applied to many fields, such as transportation, network communications, autonomous driving, sequential treatment in healthcare, robotics, and control. Unlike traditional supervised learning, RL focuses on making a sequence of decisions to achieve a long-term goal. This makes it particularly well-suited for solving complex problems. However, while RL has the potential to be highly effective, there are challenges that need to be addressed in order to make it more practical for real-world applications, where changing factors cannot be fully considered in training the agent, such as traffic regulations, weather, and clouds. To empower RL algorithms to be deployed in a range of real applications, we need to evaluate and improve the robustness of RL when facing complex changes in the real world and task shifts.In this project, we aim to develop robust and generalisable reinforcement learning techniques from a causal modelling perspective. The first thrust focuses on utilising causal model learning to create compact and robust representations of tasks. This compact and robust task representation can greatly benefit the overall performance of the RL agent by reducing the complexity of the problem and making the agent's decision-making process more efficient. As a result, the agent can learn faster and generalise better to unseen tasks, which is especially important in real-world scenarios where data is scarce and the complexity of tasks can vary greatly.The second research thrust focuses on the development of efficient and generalisable algorithms for task assignment transfer. This can enable the RL agent to adapt to new tasks more quickly and effectively and to generalise the learned knowledge to different but related tasks. This is crucial for real-world scenarios where the agent needs to operate in different environments or the task requirements change over time.One example of an application that would benefit from these contributions is autonomous driving in an industrial setting. While RL agents are usually trained in simulators, they may not perform well in real-world road scenarios and can be easily distracted by task-irrelevant information. For example, visual images that autonomous cars observe contain predominantly task-irrelevant information, like cloud shapes and architectural details, which should not influence the decision on driving.In this project, we aim to enable the agent to learn a compact and robust representation of the task, enabling it to only retain state information that is relevant to the task, adapt to changing driving scenarios safely, and generalise its knowledge to related tasks such as adapting to the different driving rules in the United States (right-hand drive).A causal understanding can help identify the minimal sufficient representations that are essential for policy learning and transferring and achieve safe and controllable explorations by leveraging causal structures and counterfactual reasoning.It can mitigate the issues that are suffered by most existing RL approaches, such as being data-hungry and lacking interpretability and generalisability.The outcome of this project can greatly improve the scalability and adaptability of RL agents, making them more suitable for real-world applications.
强化学习(RL)近年来取得了长足的发展,并在决策任务中表现出了令人印象深刻的能力,例如游戏(AlphaStar、OpenAI Five)、聊天机器人(ChatGPT 和推荐系统(Microsoft)。RL 技术还可以与传统的监督学习不同,强化学习专注于做出一系列决策以实现长期目标,例如交通、网络通信、自动驾驶、医疗保健、机器人和控制等。特别然而,虽然强化学习具有高效的潜力,但为了使其更实用于现实世界的应用,在训练中无法充分考虑不断变化的因素,还需要解决一些挑战。为了使 RL 算法能够部署在一系列实际应用中,我们需要评估和提高 RL 在面对现实世界的复杂变化和任务转移时的鲁棒性。这个项目,我们的目标是开发强大且通用的从因果建模的角度来看强化学习技术。第一个重点是利用因果模型学习来创建紧凑且稳健的任务表示。这种紧凑而鲁棒的任务表示可以降低问题的复杂性并使代理的决策过程更加高效,从而极大地提高 RL 代理的整体性能。因此,智能体可以更快地学习并更好地泛化到未见过的任务,这在数据稀缺且任务复杂性可能差异很大的现实场景中尤其重要。第二个研究重点是开发高效且可泛化的智能体任务分配转移算法。这可以使强化学习代理更快、更有效地适应新任务,并将学到的知识推广到不同但相关的任务。这对于代理需要在不同环境中运行或任务要求随时间变化的现实场景至关重要。从这些贡献中受益的应用程序的一个例子是工业环境中的自动驾驶。虽然强化学习智能体通常在模拟器中接受训练,但它们在现实世界的道路场景中可能表现不佳,并且很容易被与任务无关的信息分散注意力。例如,自动驾驶汽车观察到的视觉图像主要包含与任务无关的信息,例如云的形状和建筑细节,这些信息不应影响驾驶决策。在这个项目中,我们的目标是使代理能够学习紧凑而鲁棒的表示使其能够仅保留与任务相关的状态信息,安全地适应不断变化的驾驶场景,并将其知识推广到相关任务,例如适应美国不同的驾驶规则(右侧驾驶)。因果理解可以帮助确定最小的充分条件表示对于政策学习和迁移至关重要,并通过利用因果结构和反事实推理实现安全可控的探索。它可以缓解大多数现有强化学习方法所面临的问题,例如数据匮乏、缺乏可解释性和普遍性。该项目的成果可以极大地提高强化学习代理的可扩展性和适应性,使它们更适合实际应用。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yali Du其他文献

Differential Expression of Endocrine Regulatory Genes in Apis cerana and Apis mellifera (Hymenoptera: Apidae) at High Temperature
中华蜜蜂和意大利蜜蜂(膜翅目:蜜蜂科)高温下内分泌调控基因的差异表达
  • DOI:
    10.18474/jes21-78
  • 发表时间:
    2022-06-22
  • 期刊:
  • 影响因子:
    0.9
  • 作者:
    Xinyu Li;Weihua Ma;Yali Du;Kai Xu;Yusuo Jiang
  • 通讯作者:
    Yusuo Jiang
Low‐Temperature NH 3 Selective Catalytic Reduction Performance Enhancement of Fe‐Based Oxides by Employing Carbon Nanotubes to Decorate the MgFe‐LDH
  • DOI:
    10.1002/slct.202203767
  • 发表时间:
    2023-02-20
  • 期刊:
  • 影响因子:
    2.1
  • 作者:
    Yali Du;Xianfeng Wu;L. Liu;Xiaodong Li;Lifei Liu;Xu Wu
  • 通讯作者:
    Xu Wu
Contextual Transformer for Offline Meta Reinforcement Learning
用于离线元强化学习的上下文转换器
  • DOI:
    10.48550/arxiv.2211.08016
  • 发表时间:
    2022-11-15
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Runji Lin;Ye Li;Xidong Feng;Zhaowei Zhang;Xian Hong Wu Fung;Haifeng Zhang;Jun Wang;Yali Du;Yaodong Yang
  • 通讯作者:
    Yaodong Yang
Safe Multi-agent Reinforcement Learning with Natural Language Constraints
具有自然语言约束的安全多智能体强化学习
  • DOI:
    10.48550/arxiv.2405.20018
  • 发表时间:
    2024-05-30
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ziyan Wang;Meng Fang;Tristan Tomilin;Fei Fang;Yali Du
  • 通讯作者:
    Yali Du
Neurological manifestations and risk factors associated with poor prognosis in hospitalized children with Omicron variant infection.
Omicron 变异感染住院儿童预后不良相关的神经系统表现和危险因素。
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    3.6
  • 作者:
    Li Tang;Yuxin Guo;C. Shu;Xiaokang Peng;Sikai Qiu;Ruina Li;Pan Liu;Huijing Wei;Shan Liao;Yali Du;Dandan Guo;Ning Gao;Qing;Xiaoguai Liu;Fanpu Ji
  • 通讯作者:
    Fanpu Ji

Yali Du的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

面向隐私保护数据的联邦因果关系推断算法研究
  • 批准号:
    62376087
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
大花花锚和卵萼花锚交配系统转变与物种分化的因果关系及卵萼花锚混合交配系统的维持机制研究
  • 批准号:
    32371586
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
基于遗传大数据探究外周血白细胞计数与帕金森病的因果关系:孟德尔随机化研究和遗传风险评分分析
  • 批准号:
    82301434
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于因果关系计算的可解释认知演化路径发现方法及应用研究
  • 批准号:
    62377024
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
面向高维复杂系统的因果关系发现研究
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

CHAI - EPSRC AI Hub for Causality in Healthcare AI with Real Data
CHAI - EPSRC AI 中心,利用真实数据研究医疗保健 AI 中的因果关系
  • 批准号:
    EP/Y028856/1
  • 财政年份:
    2024
  • 资助金额:
    $ 20.97万
  • 项目类别:
    Research Grant
Rapid, Scalable, and Joint Assessment of Seismic Multi-Hazards and Impacts: From Satellite Images to Causality-Informed Deep Bayesian Networks
地震多重灾害和影响的快速、可扩展和联合评估:从卫星图像到因果关系深度贝叶斯网络
  • 批准号:
    2242590
  • 财政年份:
    2024
  • 资助金额:
    $ 20.97万
  • 项目类别:
    Standard Grant
Collaborative Research: Learning and forecasting high-dimensional extremes: sparsity, causality, privacy
协作研究:学习和预测高维极端情况:稀疏性、因果关系、隐私
  • 批准号:
    2310974
  • 财政年份:
    2023
  • 资助金额:
    $ 20.97万
  • 项目类别:
    Standard Grant
Development of a Causality Analysis Method for Point Processes Based on Nonlinear Dynamical Systems Theory and Elucidation of the Representation of Information Processing in the Brain
基于非线性动力系统理论的点过程因果分析方法的发展及大脑信息处理表征的阐明
  • 批准号:
    22KJ2815
  • 财政年份:
    2023
  • 资助金额:
    $ 20.97万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
EAGER: North American Monsoon Prediction Using Causality Informed Machine Learning
EAGER:使用因果关系信息机器学习来预测北美季风
  • 批准号:
    2313689
  • 财政年份:
    2023
  • 资助金额:
    $ 20.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了