Robust and Sample Efficient Reinforcement Learning
鲁棒且样本高效的强化学习
基本信息
- 批准号:RGPIN-2019-05014
- 负责人:
- 金额:$ 4.01万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2020
- 资助国家:加拿大
- 起止时间:2020-01-01 至 2021-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Reinforcement Learning (RL) is arguably one of the most comprehensive forms of machine learning. It facilitates active learning and it allows a system to learn over an extended period of time about its environment as it makes a sequence of decisions. The system can also learn from weak signals that might be delayed. This is particularly useful in robotics, autonomous vehicles, conversational agents, game playing, operations research, automated trading, non-myopic recommender systems and self-managing networks. The generality of reinforcement learning also makes it complex and therefore challenging algorithmically.
Objectives: The goal of this work is to develop algorithms to improve the robustness and sample efficiency of reinforcement learning. Tremendous progress has been achieved in recent years by deep reinforcement learning techniques that scale to high dimensional inputs (e.g., images and natural language) and complex tasks. However, most of the successes are limited to applications with simulated environments (e.g., games, simulated robotic environments) since current algorithms may execute costly/catastrophic actions and may require an amount of data that is prohibitively large for interaction with real environments.
Methods: I will develop novel Bayesian reinforcement learning techniques that can quantify the uncertainty of the environment. This will be helpful both for robustness and sample efficiency. In Bayesian learning, a distribution over the unknowns is estimated and refined at each time step. This distribution also allows a system to explore more efficiently by focusing its actions on the parts of the environment that are still unknown. To that effect, I will develop scalable Bayesian techniques for deep reinforcement learning that explore safely and efficiently. I will also develop novel constrained reinforcement learning techniques that take into account secondary objectives such as variance and cost functions that should not be exceeded. This will further improve robustness by ensuring that key performance indicators (KPIs) are met in industrial applications. I will also develop generative reinforcement learning techniques that are robust to missing inputs. In some applications (e.g., non-myopic recommender systems and self-managing networks), some observations/sensors might not be available at each time step. Generative reinforcement learning techniques that can marginalize inputs in a principled way will be designed.
强化学习(RL)可以说是机器学习的最全面形式之一。 它有助于积极学习,并允许系统在很长一段时间内学习其环境,因为它做出了一系列决策。 该系统还可以从可能会延迟的弱信号中学习。 这在机器人技术,自动驾驶汽车,对话代理,游戏玩法,运营研究,自动交易,非洋流推荐系统和自我管理网络中特别有用。 加强学习的一般性也使其变得复杂,因此在算法上具有挑战性。
目标:这项工作的目的是开发算法以提高增强学习的鲁棒性和样本效率。 近年来,通过深度强化学习技术的扩展为高维输入(例如,图像和自然语言)和复杂的任务,取得了巨大的进步。但是,由于当前算法可能执行昂贵/灾难性的动作,并且可能需要大量的数据,这些数据可能需要大量的数据,而这些数据对于与真实环境的交互互动。
方法:我将开发新颖的贝叶斯强化学习技术,以量化环境的不确定性。 这将有助于鲁棒性和样品效率。 在贝叶斯学习中,在每个时间步骤中估计和完善未知数的分布。 该分布还允许系统通过将其动作集中在仍然未知的环境部分来更有效地探索。 为此,我将开发可扩展的贝叶斯技术,以安全有效地探索深度加强学习。 我还将开发新颖的约束强化学习技术,这些学习技术考虑了次要目标,例如差异和不应超过的成本函数。 通过确保在工业应用中满足关键绩效指标(KPI),这将进一步提高鲁棒性。 我还将开发生成的增强学习技术,这些学习技术可用于缺少输入。在某些应用程序(例如,非侧型推荐系统和自我管理网络)中,每个时间步骤可能无法使用某些观察/传感器。 将设计可以以原则性方式边缘化输入的生成强化学习技术。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Poupart, Pascal其他文献
Online Structure Learning for Feed-Forward and Recurrent Sum-Product Networks
前馈和循环和积网络的在线结构学习
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Kalra, Agastya;Rashwan, Abdullah;Hsu, Wei-Shou;Poupart, Pascal;Doshi, Prashant;Trimponias, Georgios - 通讯作者:
Trimponias, Georgios
Measuring Life Space in Older Adults with Mild-to-Moderate Alzheimer's Disease Using Mobile Phone GPS
- DOI:
10.1159/000355669 - 发表时间:
2014-01-01 - 期刊:
- 影响因子:3.5
- 作者:
Tung, James Yungjen;Rose, Rhiannon Victoria;Poupart, Pascal - 通讯作者:
Poupart, Pascal
Affective Neural Response Generation
- DOI:
10.1007/978-3-319-76941-7_12 - 发表时间:
2018-01-01 - 期刊:
- 影响因子:0
- 作者:
Asghar, Nabiha;Poupart, Pascal;Mou, Lili - 通讯作者:
Mou, Lili
Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process
- DOI:
10.1016/j.cviu.2009.06.008 - 发表时间:
2010-05-01 - 期刊:
- 影响因子:4.5
- 作者:
Hoey, Jesse;Poupart, Pascal;Mihailidis, Alex - 通讯作者:
Mihailidis, Alex
Poupart, Pascal的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Poupart, Pascal', 18)}}的其他基金
Robust and Sample Efficient Reinforcement Learning
鲁棒且样本高效的强化学习
- 批准号:
RGPIN-2019-05014 - 财政年份:2022
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Robust and Sample Efficient Reinforcement Learning
鲁棒且样本高效的强化学习
- 批准号:
RGPIN-2019-05014 - 财政年份:2021
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Reinforcement Learning for Sports Analytics
体育分析的强化学习
- 批准号:
521357-2018 - 财政年份:2020
- 资助金额:
$ 4.01万 - 项目类别:
Strategic Projects - Group
Reinforcement Learning for Sports Analytics
体育分析的强化学习
- 批准号:
521357-2018 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Strategic Projects - Group
Robust and Sample Efficient Reinforcement Learning
鲁棒且样本高效的强化学习
- 批准号:
RGPIN-2019-05014 - 财政年份:2019
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Lifelong Machine Learning and Sequential Decision Making for Natural Language Interfaces
自然语言界面的终身机器学习和顺序决策
- 批准号:
312388-2013 - 财政年份:2018
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Lifelong Machine Learning and Sequential Decision Making for Natural Language Interfaces
自然语言界面的终身机器学习和顺序决策
- 批准号:
312388-2013 - 财政年份:2017
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Lifelong Machine Learning and Sequential Decision Making for Natural Language Interfaces
自然语言界面的终身机器学习和顺序决策
- 批准号:
312388-2013 - 财政年份:2016
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Lifelong Machine Learning and Sequential Decision Making for Natural Language Interfaces
自然语言界面的终身机器学习和顺序决策
- 批准号:
312388-2013 - 财政年份:2015
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
Lifelong Machine Learning and Sequential Decision Making for Natural Language Interfaces
自然语言界面的终身机器学习和顺序决策
- 批准号:
312388-2013 - 财政年份:2014
- 资助金额:
$ 4.01万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
知识驱动下样本高效的跨模态强化学习方法研究
- 批准号:62376041
- 批准年份:2023
- 资助金额:49.00 万元
- 项目类别:面上项目
数据高效的小样本学习方法研究
- 批准号:62106123
- 批准年份:2021
- 资助金额:24.00 万元
- 项目类别:青年科学基金项目
多模态及小样本条件下的高效视觉显著性检测研究
- 批准号:62176169
- 批准年份:2021
- 资助金额:57.00 万元
- 项目类别:面上项目
数据高效的小样本学习方法研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于高质量图像生成和高效特征提取的零样本学习研究
- 批准号:62106260
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Arthroscopic-assisted tibial plateau fixation (AATPF) vs. Open reduction internal fixation (ORIF): A multicenter randomized controlled trial
关节镜辅助胫骨平台固定术 (AATPF) 与切开复位内固定术 (ORIF):一项多中心随机对照试验
- 批准号:
10723527 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Non-invasive detection of tumor NTRK gene fusions via rapid, efficient and low-cost extracellular vesicle isolation method
快速、高效、低成本的细胞外囊泡分离方法无创检测肿瘤NTRK基因融合体
- 批准号:
10707684 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
CIF: SMALL: Theoretical Foundations of Partially Observable Reinforcement Learning: Minimax Sample Complexity and Provably Efficient Algorithms
CIF:SMALL:部分可观察强化学习的理论基础:最小最大样本复杂性和可证明有效的算法
- 批准号:
2315725 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Standard Grant
Acceleration of risk gene discovery for Tic Disorders through large-scale collaboration
通过大规模合作加速抽动症风险基因的发现
- 批准号:
10726443 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别:
Rapid Acute Leukemia Genomic Profiling with CRISPR enrichment and Real-time long-read sequencing
利用 CRISPR 富集和实时长读长测序进行快速急性白血病基因组分析
- 批准号:
10651543 - 财政年份:2023
- 资助金额:
$ 4.01万 - 项目类别: