RI: Small: Collaborative Research: Hidden Parameter Markov Decision Processes: Exploiting Structure in Families of Tasks

RI:小型:协作研究:隐藏参数马尔可夫决策过程:利用任务族中的结构

基本信息

  • 批准号:
    1718306
  • 负责人:
  • 金额:
    $ 24.2万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-08-01 至 2022-07-31
  • 项目状态:
    已结题

项目摘要

Part 1Machine learning has the potential to automate many complex, real-life tasks. However, learning algorithms typically require a substantial amount of data from each specific task they are asked to solve, requiring repeated interactions with the world, each of which take time and effort. Many real-life learning scenarios involve repeated interactions with tasks that are similar, but not identical. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs - each has a similar disease but a different progression, requiring individualized treatment; a robot may have to manipulate objects of different size and weight - each requiring similar but not identical grasping strategies. In such cases treating all of the tasks as the same results in poor performance, but learning to solve each as if they were completely different takes far too long. This project will develop intelligent agents that can use knowledge gained when solving prior tasks to much more rapidly learn new tasks that are similar but not quite the same.The principal technical component of this project will lie in rigorously defining what it means for tasks to be related and in producing algorithms for leveraging that definition to enable rapid learning. To do so, the project will introduce the Hidden-Parameter Markov Decision Process, which models a family of tasks through a parameter which describes variation through the family but is hidden from the learner. The project will investigate methods that exploit this structure by learning a model of task variation and then seeking to identify the parameter value for each specific task. The planned work will focus on healthcare applications, where families of related but distinct tasks are common (i.e. each patient will have unique characteristics). However, the project aims to produce foundational learning algorithms applicable to many application areas, ranging from robotics to systems design. This research will also be integrated into the courses taught by the PIs at Harvard and Brown and made available online; the PIs will include a diverse population, including REUs, both in these classes and in their research groups.Part 2Many real-life learning scenarios involve repeated interactions with tasks that have similar, but not identical, dynamics. For example, an immunologist may encounter HIV patients with different comorbid conditions and latent viral reservoirs; a robot may have to manipulate objects of different size and weight. These cases describe a family of related tasks, each of which is similar but not quite the same. An intelligent agent should be able to transfer knowledge learned during previous experiences to rapidly solve new tasks in the same family. However, while many algorithms have been developed to transfer knowledge, the lack of a model of task relatedness inhibits our ability to formally understand the benefits of such algorithms or the structure they exploit.The planned work will model such scenarios by embedding the tasks on a low dimensional manifold that captures relevant variation between instances. Each location on this manifold (unobserved by the agent) describes a task instance, forming a sufficient statistic for solving the task in the context of the task family. Preliminary work by the PIs has shown that it is possible to learn such a manifold after solving just a few individual task instances and enable the rapid optimization of policies for new task instances. Building on these promising initial results, the PIs plan to: 1) Develop methods for task family characterization, by determining whether a collection of tasks can be modeled via a single manifold or consists of several clusters; whether a new task belongs to an existing cluster or manifold; and if so, and whether or not transfer is worthwhile. 2) Scale inference by adapting recent results from machine learning to deal with large state and action spaces. 3) Generate policies using Bayesian reinforcement learning algorithms, and by exploiting formal links between state and policy representations.In addition to synthetic domains, progress on these directions will be applied to problems of treatment optimization for patients with HIV, sepsis, and depression via clinical collaborations that the PIs have with world-experts in these diseases.
第 1 部分机器学习有潜力自动执行许多复杂的现实任务。然而,学习算法通常需要来自它们被要求解决的每个特定任务的大量数据,需要与世界重复交互,每一个都需要时间和精力。许多现实生活中的学习场景涉及与相似但不相同的任务的重复交互。例如,免疫学家可能会遇到患有不同合并症和潜伏病毒库的艾滋病毒患者——每个人都有相似的疾病,但进展情况不同,需要个体化治疗;机器人可能必须操纵不同大小和重量的物体——每个物体都需要相似但不相同的抓取策略。在这种情况下,将所有任务视为相同会导致性能不佳,但学习解决每个任务就好像它们完全不同一样需要太长时间。该项目将开发智能代理,可以利用解决先前任务时获得的知识来更快地学习相似但不完全相同的新任务。该项目的主要技术组成部分将在于严格定义任务的含义。相关并生成利用该定义实现快速学习的算法。为此,该项目将引入隐参数马尔可夫决策过程,该过程通过一个参数对一系列任务进行建模,该参数描述了整个系列的变化,但对学习者来说是隐藏的。该项目将通过学习任务变化模型来研究利用这种结构的方法,然后寻求识别每个特定任务的参数值。计划的工作将重点关注医疗保健应用,其中相关但不同的任务系列很常见(即每个患者都有独特的特征)。 然而,该项目的目标是产生适用于从机器人到系统设计等许多应用领域的基础学习算法。这项研究还将被整合到哈佛大学和布朗大学的 PI 教授的课程中,并在网上提供; PI 将包括不同的群体,包括 REU,无论是在这些班级还是在他们的研究小组中。第 2 部分许多现实生活中的学习场景都涉及与具有相似但不相同动态的任务的重复交互。 例如,免疫学家可能会遇到患有不同合并症和潜伏病毒库的艾滋病毒患者;机器人可能必须操纵不同大小和重量的物体。 这些案例描述了一系列相关的任务,每个任务相似但不完全相同。智能代理应该能够转移在以前的经验中学到的知识,以快速解决同一系列中的新任务。然而,虽然已经开发了许多算法来传递知识,但缺乏任务相关性模型阻碍了我们正式理解此类算法或其利用的结构的好处的能力。计划的工作将通过将任务嵌入到捕获实例之间相关变化的低维流形。 该流形上的每个位置(代理未观察到)都描述一个任务实例,形成足够的统计数据来解决任务族上下文中的任务。 PI 的初步工作表明,在解决几个单独的任务实例后可以学习这样的流形,并能够快速优化新任务实例的策略。 基于这些有希望的初步结果,PI 计划: 1) 通过确定任务集合是否可以通过单个流形建模或由多个集群组成来开发任务族表征方法;新任务是否属于现有集群或流形;如果是这样,以及是否值得转移。 2)通过调整机器学习的最新结果来处理大型状态和动作空间来扩展推理。 3)使用贝叶斯强化学习算法并利用状态和政策表示之间的正式联系来生成政策。除了综合领域之外,这些方向的进展还将应用于通过临床对艾滋病毒、脓毒症和抑郁症患者进行治疗优化的问题PI 与这些疾病领域的世界专家进行合作。

项目成果

期刊论文数量(3)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Combining Parametric and Nonparametric Models for Off-Policy Evaluation
结合参数和非参数模型进行离策略评估
  • DOI:
    10.1016/j.jsames.2011.12.006
  • 发表时间:
    2019-05-14
  • 期刊:
  • 影响因子:
    1.8
  • 作者:
    Omer Gottesman;Yao Liu;Scott Sussex;E. Brunskill;F. Doshi
  • 通讯作者:
    F. Doshi
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
具有隐藏参数马尔可夫决策过程的稳健且高效的迁移学习
Representation Balancing MDPsfor Off-Policy Policy Evaluation
用于离政策评估的表示平衡 MDP
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Finale Doshi-Velez其他文献

Finale Doshi-Velez的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Finale Doshi-Velez', 18)}}的其他基金

RI: Small: Human Validation in Batch Reinforcement Learning
RI:小:批量强化学习中的人工验证
  • 批准号:
    2007076
  • 财政年份:
    2020
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Continuing Grant
CAREER: Generative Models for Targeted Domain Interpretability with Applications to Healthcare
职业:目标领域可解释性的生成模型及其在医疗保健领域的应用
  • 批准号:
    1750358
  • 财政年份:
    2018
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Continuing Grant
RI: Small: Workshop for Women in Machine Learning
RI:小型:机器学习领域女性研讨会
  • 批准号:
    1649706
  • 财政年份:
    2016
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Scalable Bayesian Inference for Interpretable Time-Series Models
可解释时间序列模型的可扩展贝叶斯推理
  • 批准号:
    1544628
  • 财政年份:
    2015
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Scalable Bayesian Inference in Large Medical Databases
大型医学数据库中的可扩展贝叶斯推理
  • 批准号:
    1225204
  • 财政年份:
    2012
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Fellowship Award

相似国自然基金

小分子代谢物Catechin与TRPV1相互作用激活外周感觉神经元介导尿毒症瘙痒的机制研究
  • 批准号:
    82371229
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
DHEA抑制小胶质细胞Fis1乳酸化修饰减轻POCD的机制
  • 批准号:
    82301369
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
SETDB1调控小胶质细胞功能及参与阿尔茨海默病发病机制的研究
  • 批准号:
    82371419
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
PTBP1驱动H4K12la/BRD4/HIF1α复合物-PKM2正反馈环路促进非小细胞肺癌糖代谢重编程的机制研究及治疗方案探索
  • 批准号:
    82303616
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232298
  • 财政年份:
    2023
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2334936
  • 财政年份:
    2023
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Foundations of Few-Round Active Learning
协作研究:RI:小型:少轮主动学习的基础
  • 批准号:
    2313131
  • 财政年份:
    2023
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: End-to-end Learning of Fair and Explainable Schedules for Court Systems
合作研究:RI:小型:法院系统公平且可解释的时间表的端到端学习
  • 批准号:
    2232054
  • 财政年份:
    2023
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Small: Motion Fields Understanding for Enhanced Long-Range Imaging
合作研究:RI:小型:增强远程成像的运动场理解
  • 批准号:
    2232300
  • 财政年份:
    2023
  • 资助金额:
    $ 24.2万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了