RI: Small: Non-parametric Approximate Dynamic Programming for Continuous Domains
RI:小:连续域的非参数近似动态规划
基本信息
- 批准号:1218931
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-08-01 至 2018-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project concerns a machine learning technique known as reinforcement learning, which is related to, but distinct from, the notion of reinforcement learning used in psychology. The common element is that both views study changes in behavior that result from experience. In the machine learning case, the behaviors are often decision making in dynamic environments, such as controlling a robot, a factory, inventory levels for a warehouse or even drug dosage levels. Current theoretical development in this area guarantees that optimal decisions can be made by reinforcement learning algorithms, but only under restrictive assumptions that are difficult to ensure in practice. Efforts to apply reinforcement learning to significant practical problems have enjoyed some success, but such efforts often forgo theoretical guarantees and rely upon tedious parameter adjustments by experts (human trial and error) to achieve success.This research seeks to reduce the amount of human trial and error needed to make reinforcement learning successful, thereby making it a more accessible tool to a wider range of people. Specifically, it will focus on algorithms for domains described by continuous variables, seeking to provide stronger theoretical guarantees for such domains as well as an approach that balances the anticipated benefit of trying new things with the benefit of sticking to what is already known about a problem (exploration vs. exploitation). A practical benefit of success in this area would be improved techniques that make it easier for people to deploy algorithms that learn and improve performance in a variety of practical tasks like those mentioned above: robot or factory control, inventory management, or drug delivery.This project plans to use a model helicopter as a challenge domain, but it is not about helicopter control per se. Rather, it seeks to develop general techniques that can apply to many problems, including helicopters, and will use model helicopters as an inexpensive and fun way to motivate students. The project aims to develop a model helicopter simulator (to reduce the cost and risk of trying everything on an actual helicopter) and plans to make this simulator available to the research community, providing a fun and challenging benchmark problem.
该项目涉及一种称为增强学习的机器学习技术,该技术与心理学中使用的强化学习概念有关但与众不同。 共同的要素是,两种观点研究的行为变化是由经验导致的。 在机器学习案例中,行为通常是在动态环境中的决策,例如控制机器人,工厂,仓库的库存水平,甚至是药物剂量水平。 当前在这一领域的理论发展确保可以通过强化学习算法做出最佳决定,但只有在限制性的假设下才难以确保实践。 将强化学习应用于重大实际问题的努力取得了成功,但是这种努力经常放弃理论保证,并依靠专家(人类试验和错误)进行乏味的参数调整以取得成功。这项研究旨在减少人类试验和错误的数量,以减少加强学习成功的人类试验,从而使其成为更广泛范围的人。 具体而言,它将重点介绍连续变量描述的域算法,以寻求为此类领域提供更强大的理论保证,并平衡尝试新事物的预期收益,并有利于坚持已经知道的问题(Exploration vs.剥削)。 在这一领域取得成功的实际好处将是改进的技术,使人们更容易部署在上面提到的各种实用任务中学习和提高性能的算法:机器人或工厂控制,库存管理或药物交付。本项目计划将模型直升机用作挑战域,但与直升机控制无关。 相反,它试图开发可以应用于包括直升机在内的许多问题的通用技术,并将使用模型直升机作为一种廉价且有趣的方式来激励学生。 该项目旨在开发模型直升机模拟器(以减少在实际直升机上尝试所有物品的成本和风险),并计划使研究社区可用,从而提供一个有趣而挑战性的基准问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ronald Parr其他文献
Amazing Things Come From Having Many Good Models
令人惊奇的事情来自于拥有许多好的模型
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Cynthia Rudin;Chudi Zhong;Lesia Semenova;Margo Seltzer;Ronald Parr;Jiachang Liu;Srikar Katta;Jon Donnelly;Harry Chen;Zachery Boner - 通讯作者:
Zachery Boner
An Optimal Tightness Bound for the Simulation Lemma
模拟引理的最优紧界
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sam Lobel;Ronald Parr - 通讯作者:
Ronald Parr
Ronald Parr的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ronald Parr', 18)}}的其他基金
RI: Small: Feature Encoding for Reinforcement Learning
RI:小型:强化学习的特征编码
- 批准号:
1815300 - 财政年份:2018
- 资助金额:
$ 45万 - 项目类别:
Continuing Grant
EAGER: Collaborative Research: An Unified Learnable Roadmap for Sequential Decision Making in Relational Domains
EAGER:协作研究:关系领域顺序决策的统一可学习路线图
- 批准号:
1836575 - 财政年份:2018
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: IIS: RI: Learning in Continuous and High Dimensional Action Spaces
EAGER:IIS:RI:在连续和高维行动空间中学习
- 批准号:
1147641 - 财政年份:2011
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative: RI: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
协作:RI:可导出强化学习的特征发现和基准
- 批准号:
0713435 - 财政年份:2007
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
CAREER: Observing to Plan - Planning to Observe
职业生涯:观察到计划 - 计划到观察
- 批准号:
0546709 - 财政年份:2006
- 资助金额:
$ 45万 - 项目类别:
Continuing Grant
Prediction and Planning: Bridging the Gap
预测和规划:弥合差距
- 批准号:
0209088 - 财政年份:2002
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
组蛋白4第12位赖氨酸乳酸化修饰调控非小细胞肺癌顺铂耐药的作用机制研究
- 批准号:82303085
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
CLDN6高表达肿瘤细胞亚群在非小细胞肺癌ICB治疗抗性形成中的作用及机制研究
- 批准号:82373364
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
GPC3-CD81通过SHP-2调控肿瘤相关巨噬细胞极化介导非小细胞肺癌免疫放疗抵抗的分子机制研究
- 批准号:82373217
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
癌-睾丸蛋白PIWIL4在非小细胞肺癌中参与小RNA生成的机制和生物学功能研究
- 批准号:32371347
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于NRF2调控KPNB1促进PD-L1核转位介导非小细胞肺癌免疫治疗耐药的机制研究
- 批准号:82303969
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
NSF-AoF: RI: Small: Safe Reinforcement Learning in Non-Stationary Environments With Fast Adaptation and Disturbance Prediction
NSF-AoF:RI:小型:具有快速适应和干扰预测功能的非平稳环境中的安全强化学习
- 批准号:
2133656 - 财政年份:2021
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RI: Small: Non-parametric Machine Learning in the Age of Deep and High-Dimensional Models
RI:小:深度和高维模型时代的非参数机器学习
- 批准号:
1909816 - 财政年份:2019
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RI: Small: Understanding Subtle Non-Social Facial Expressivity to Boost Learning and Computer Interaction
RI:小:理解微妙的非社交面部表情以促进学习和计算机交互
- 批准号:
1911197 - 财政年份:2019
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RI: Small: ConnotationNet: Modeling Non-Literal Meaning in Context
RI:小:ConnotationNet:在上下文中建模非字面意义
- 批准号:
1714566 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
RI: Small: Collaborative Research: On-Line Learning Algorithms for Path Experts with Non-Additive Losses
RI:小型:协作研究:具有非加性损失的路径专家的在线学习算法
- 批准号:
1618662 - 财政年份:2016
- 资助金额:
$ 45万 - 项目类别:
Standard Grant