RI: Small: Non-parametric Approximate Dynamic Programming for Continuous Domains

RI：小：连续域的非参数近似动态规划

基本信息

批准号：
1218931
负责人：
Ronald Parr
金额：
$ 45万
依托单位：
Duke University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2012
资助国家：
美国
起止时间：
2012-08-01 至 2018-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1218931&HistoricalAwards=false
关键词：
RI Small Non parametric Approximate

项目摘要

This project concerns a machine learning technique known as reinforcement learning, which is related to, but distinct from, the notion of reinforcement learning used in psychology. The common element is that both views study changes in behavior that result from experience. In the machine learning case, the behaviors are often decision making in dynamic environments, such as controlling a robot, a factory, inventory levels for a warehouse or even drug dosage levels. Current theoretical development in this area guarantees that optimal decisions can be made by reinforcement learning algorithms, but only under restrictive assumptions that are difficult to ensure in practice. Efforts to apply reinforcement learning to significant practical problems have enjoyed some success, but such efforts often forgo theoretical guarantees and rely upon tedious parameter adjustments by experts (human trial and error) to achieve success.This research seeks to reduce the amount of human trial and error needed to make reinforcement learning successful, thereby making it a more accessible tool to a wider range of people. Specifically, it will focus on algorithms for domains described by continuous variables, seeking to provide stronger theoretical guarantees for such domains as well as an approach that balances the anticipated benefit of trying new things with the benefit of sticking to what is already known about a problem (exploration vs. exploitation). A practical benefit of success in this area would be improved techniques that make it easier for people to deploy algorithms that learn and improve performance in a variety of practical tasks like those mentioned above: robot or factory control, inventory management, or drug delivery.This project plans to use a model helicopter as a challenge domain, but it is not about helicopter control per se. Rather, it seeks to develop general techniques that can apply to many problems, including helicopters, and will use model helicopters as an inexpensive and fun way to motivate students. The project aims to develop a model helicopter simulator (to reduce the cost and risk of trying everything on an actual helicopter) and plans to make this simulator available to the research community, providing a fun and challenging benchmark problem.

该项目涉及一种称为增强学习的机器学习技术，该技术与心理学中使用的强化学习概念有关但与众不同。共同的要素是，两种观点研究的行为变化是由经验导致的。在机器学习案例中，行为通常是在动态环境中的决策，例如控制机器人，工厂，仓库的库存水平，甚至是药物剂量水平。当前在这一领域的理论发展确保可以通过强化学习算法做出最佳决定，但只有在限制性的假设下才难以确保实践。将强化学习应用于重大实际问题的努力取得了成功，但是这种努力经常放弃理论保证，并依靠专家（人类试验和错误）进行乏味的参数调整以取得成功。这项研究旨在减少人类试验和错误的数量，以减少加强学习成功的人类试验，从而使其成为更广泛范围的人。具体而言，它将重点介绍连续变量描述的域算法，以寻求为此类领域提供更强大的理论保证，并平衡尝试新事物的预期收益，并有利于坚持已经知道的问题（Exploration vs.剥削）。在这一领域取得成功的实际好处将是改进的技术，使人们更容易部署在上面提到的各种实用任务中学习和提高性能的算法：机器人或工厂控制，库存管理或药物交付。本项目计划将模型直升机用作挑战域，但与直升机控制无关。相反，它试图开发可以应用于包括直升机在内的许多问题的通用技术，并将使用模型直升机作为一种廉价且有趣的方式来激励学生。该项目旨在开发模型直升机模拟器（以减少在实际直升机上尝试所有物品的成本和风险），并计划使研究社区可用，从而提供一个有趣而挑战性的基准问题。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ronald Parr其他文献

Amazing Things Come From Having Many Good Models

令人惊奇的事情来自于拥有许多好的模型

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Cynthia Rudin;Chudi Zhong;Lesia Semenova;Margo Seltzer;Ronald Parr;Jiachang Liu;Srikar Katta;Jon Donnelly;Harry Chen;Zachery Boner
通讯作者：
Zachery Boner