Developing inquisitive, model-based agents for reinforcement learning

开发好奇的、基于模型的强化学习代理

基本信息

  • 批准号:
    RGPIN-2019-06079
  • 负责人:
  • 金额:
    $ 2.04万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2022
  • 资助国家:
    加拿大
  • 起止时间:
    2022-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

Natural agents, like animals, learn from a life-time of experience. Most artificial learning systems do not. Newborns begin life with a frenzy of learning: attempting to master their muscle twitches and make sense of their visual inputs. This knowledge is continuously reused and refined throughout life. Our current Artificial Intelligence (AI) systems are well-suited to problems with a clear cause and effect relationship between the system's decisions and the utility of those decisions. Swimming into a shark will cause a loss of life. Shooting an alien ship will increase the score. However, in problems where the consequences of a decision are significantly delayed, it is more difficult to learn this mapping. The most challenging and largely unsolved AI benchmark problems feature such delayed consequences. It is common practice for state-of-the-art systems to train for the equivalent of 30 days on each Atari game, and still achieve well-below human performance in games that feature delayed consequences. One way to deal with the problem of delayed consequences is for the AI to construct its own understanding of how the world works, usually called a model of the world. A model encodes the regularities of the world. For example a model might encode: (1) when I am lined up with a shark and I decide to fire a torpedo, the shark will disappear, and (2) if I am standing on a platform and I decide to jump down, I will end up on the ground. Given access to a model of this form, an AI can mentally simulate future situations that would result from behaving in particular ways without actually interacting with the world. Just as a human can decide where they might end-up if the took a new path down to the river. We can imagine the outcome of taking this alternative path without physically doing it, and avoid unnecessary exploration unless we decide it is valuable to do so. Model-based mental simulation can dramatically improve the efficiency of learning. The remaining question is how does the system decide how to best make use of mental simulation. People often decide to try out things they have never done before. We choose to engage in activities that are mentally and physically challenging, but not beyond our abilities. Humans are motivated by novelty, curiosity and knowledge seeking, and bored by things we already know about. Combining this idea with a model could allow an AI to simulate different ways of behaving, preferring those ways of behaving that result in reduction of uncertainty and the acquisition of new knowledge. With a model, the AI can generate its own internal feedback to focus its mental simulations. The objective of this research program is two fold: (1) to design new approaches for representing and learning models of the world, and (2) to integrate mechanisms that can guide mental simulations (planning) and exploration toward uncertainty and knowledge acquisition.
天然代理人,就像动物一样,从一生的经验中学习。大多数人工学习系统都没有。新生儿以狂热的学习开始生活:试图掌握其肌肉抽搐并理解其视觉投入。这些知识在一生中都不断地重复使用和完善。我们目前的人工智能(AI)系统非常适合在系统的决策和这些决定的实用性之间存在明确的因果关系的问题。游泳进入鲨鱼会导致生命损失。射击外星船将增加得分。但是,在明显延迟决策后果的问题时,学习此映射更加困难。最具挑战性和最大尚未解决的AI基准问题具有延迟后果。最先进的系统在每个Atari游戏中进行30天的训练是普遍的做法,并且在具有延迟后果的游戏中仍然可以实现良好的人类表现。 解决延迟后果问题的一种方法是,AI构建自己对世界运作方式的理解,通常称为世界模型。一个模型编码世界的规律。例如,模型可能会编码:(1)当我与鲨鱼排队时,我决定发射鱼雷,鲨鱼将消失,(2)如果我站在平台上,我决定跳下来,我最终会在地面上。鉴于访问该形式的模型,AI可以在心理上模拟未来的情况,而这些情况会因在没有与世界互动而在没有与世界互动的情况下表现出来。就像人类可以决定如果走到河流的新路上,他们可能会在哪里结束。我们可以想象采取这种替代途径而不进行物理的结果,除非我们认为这样做是有价值的,否则避免了不必要的探索。基于模型的心理模拟可以大大提高学习效率。 剩下的问题是系统如何决定如何最好地利用心理模拟。人们经常决定尝试以前从未做过的事情。我们选择从事精神和身体上具有挑战性但不超出我们的能力的活动。人类是出于新颖,好奇和知识寻求的动机,并且对我们已经知道的事情感到无聊。将这个想法与模型相结合可以使AI模拟不同的行为方式,更喜欢那些行为方式,从而减少不确定性和新知识的获取。通过模型,AI可以产生自己的内部反馈,以集中精神模拟。该研究计划的目的是两个方面:(1)设计用于代表世界上和学习模型的新方法,以及(2)整合可以指导心理模拟(计划)和探索不确定性和知识获取的机制。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

White, Adam其他文献

Questioning Anglocentrism in plural policing studies: Private security regulation in Belgium and the United Kingdom
  • DOI:
    10.1177/14773708211014853
  • 发表时间:
    2021-05-12
  • 期刊:
  • 影响因子:
    1.9
  • 作者:
    Leloup, Pieter;White, Adam
  • 通讯作者:
    White, Adam
Multi-timescale nexting in a reinforcement learning robot
  • DOI:
    10.1177/1059712313511648
  • 发表时间:
    2014-04-01
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Modayil, Joseph;White, Adam;Sutton, Richard S.
  • 通讯作者:
    Sutton, Richard S.
Teachers' stories: physical education teachers' constructions and experiences of masculinity within secondary school physical education
  • DOI:
    10.1080/13573322.2015.1112779
  • 发表时间:
    2017-01-01
  • 期刊:
  • 影响因子:
    2.9
  • 作者:
    White, Adam;Hobson, Michael
  • 通讯作者:
    Hobson, Michael
A Qualitative Exploration of Parents' Perceptions of Risk in Youth Contact Rugby.
  • DOI:
    10.3390/bs12120510
  • 发表时间:
    2022-12-14
  • 期刊:
  • 影响因子:
    2.6
  • 作者:
    Anderson, Eric;White, Adam;Hardwicke, Jack
  • 通讯作者:
    Hardwicke, Jack
From eye-blinks to state construction: Diagnostic benchmarks for online representation learning.
  • DOI:
    10.1177/10597123221085039
  • 发表时间:
    2023-03
  • 期刊:
  • 影响因子:
    1.6
  • 作者:
    Rafiee, Banafsheh;Abbas, Zaheer;Ghiassian, Sina;Kumaraswamy, Raksha;Sutton, Richard S.;Ludvig, Elliot A.;White, Adam
  • 通讯作者:
    White, Adam

White, Adam的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('White, Adam', 18)}}的其他基金

Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    DGECR-2019-00479
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Launch Supplement
Leveraging spectrally encoded beads for multiplexed nucleic acid detection
利用光谱编码珠进行多重核酸检测
  • 批准号:
    503082-2017
  • 财政年份:
    2018
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Postdoctoral Fellowships
Leveraging spectrally encoded beads for multiplexed nucleic acid detection
利用光谱编码珠进行多重核酸检测
  • 批准号:
    503082-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Postdoctoral Fellowships
Particle Size Analysis in Marine Sediments
海洋沉积物中的粒径分析
  • 批准号:
    516368-2017
  • 财政年份:
    2017
  • 资助金额:
    $ 2.04万
  • 项目类别:
    University Undergraduate Student Research Awards
Particle Size Analysis in Marine Sediments
海洋沉积物中的粒径分析
  • 批准号:
    505971-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 2.04万
  • 项目类别:
    University Undergraduate Student Research Awards
Single cell gene expression analysis by microfluidic digital PCR
通过微流控数字PCR进行单细胞基因表达分析
  • 批准号:
    427647-2012
  • 财政年份:
    2013
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Single cell gene expression analysis by microfluidic digital PCR
通过微流控数字PCR进行单细胞基因表达分析
  • 批准号:
    427647-2012
  • 财政年份:
    2012
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Alexander Graham Bell Canada Graduate Scholarships - Doctoral

相似国自然基金

好奇影响灵感产生的现象与机制:认知与情绪双加工路径
  • 批准号:
    71701080
  • 批准年份:
    2017
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目
面向云存储的多源数据安全查询机制和算法研究
  • 批准号:
    61472125
  • 批准年份:
    2014
  • 资助金额:
    80.0 万元
  • 项目类别:
    面上项目
专家影评和口碑传播对电影消费的影响机制—一项跨文化研究
  • 批准号:
    70872005
  • 批准年份:
    2008
  • 资助金额:
    28.2 万元
  • 项目类别:
    面上项目

相似海外基金

Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2021
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2020
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Model Theory and proof theory of probabilistic logic in propositional and modal team semantics
命题和模态团队语义中概率逻辑的模型理论和证明理论
  • 批准号:
    19F19797
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Grant-in-Aid for JSPS Fellows
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    RGPIN-2019-06079
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Grants Program - Individual
Developing inquisitive, model-based agents for reinforcement learning
开发好奇的、基于模型的强化学习代理
  • 批准号:
    DGECR-2019-00479
  • 财政年份:
    2019
  • 资助金额:
    $ 2.04万
  • 项目类别:
    Discovery Launch Supplement
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了