RI: Small: Using and Gathering Data for Efficient Batch Reinforcement Learning

RI：小型：使用和收集数据以实现高效的批量强化学习

基本信息

批准号：
2112926
负责人：
Emma Brunskill
金额：
$ 50万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-10-01 至 2024-09-30
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2112926&HistoricalAwards=false
关键词：
RI Small Using Gathering Data

项目摘要

Imagine if we could provide each child with the right support, at the right time, for helping them learn best, or to ensure a diabetes patient is being given the best interventions to help them manage their chronic condition over time at home. Unfortunately such personalization is expensive. More scalable computerized approaches can lack the real-time information needed to provide effective personalization, or the ability to specialize interventions. However, the huge rise in more user-friendly software tools means that it is now possible to do such targeted personalization in a broad array of settings. This research will develop new methods for leveraging existing data, and create algorithms to acquire new data in a way that is compatible with the limitations of common systems. This work could help enable personalized interventions across a much broader array of applications than is currently benefiting from such approaches. The research will be particularly focused on the technical challenges arising from areas like education and healthcare.More specifically, this research will create data efficient algorithms and statistical estimators for leveraging past datasets about decisions made and their outcomes, and for acquiring new batch data that might lead to better results to create decision policies-- mappings from features describing the current context to a particular decision or intervention. In particular, the project will center on developing new algorithms that optimize policies with data efficient, minimal assumption lower statistical bounds on their future performance; bound the benefit of gathering a budget of additional data; and, inspired by insights from optimal experimental design, create algorithms for constructing non-adaptive policies that can be used to gather data that then can be leveraged to identify a near-optimal decision policy. The research will focus on both settings where a single decision is made for a particular context, and where a sequence of decisions are made and the decisions made impact the next context observed (common in sequential decision making under uncertainty processes).This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

想象一下，如果我们能够在正确的时间为每个孩子提供正确的支持，帮助他们最好地学习，或者确保糖尿病患者得到最好的干预措施，帮助他们在家控制自己的慢性病。不幸的是，这种个性化是昂贵的。更具可扩展性的计算机化方法可能缺乏提供有效个性化所需的实时信息或专业化干预的能力。然而，更加用户友好的软件工具的巨大增长意味着现在可以在广泛的设置中进行此类有针对性的个性化设置。这项研究将开发利用现有数据的新方法，并创建算法以与常见系统的局限性兼容的方式获取新数据。这项工作可以帮助在比目前从此类方法中受益的更广泛的应用程序中实现个性化干预。该研究将特别关注教育和医疗保健等领域出现的技术挑战。更具体地说，这项研究将创建数据高效的算法和统计估计器，以利用过去有关决策及其结果的数据集，并获取可能会产生影响的新批次数据。导致创建决策策略的更好结果——从描述当前上下文的特征到特定决策或干预的映射。特别是，该项目将集中于开发新算法，通过数据有效、最小假设下限来优化政策的未来表现；限制收集额外数据预算的好处；并且，受到最佳实验设计见解的启发，创建用于构建非自适应策略的算法，这些策略可用于收集数据，然后可利用这些数据来识别接近最佳的决策策略。该研究将重点关注针对特定环境做出单个决策的环境，以及做出一系列决策且做出的决策影响观察到的下一个环境的环境（在不确定性过程下的顺序决策中常见）。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。