RI: Small: Feature Encoding for Reinforcement Learning
RI:小型:强化学习的特征编码
基本信息
- 批准号:1815300
- 负责人:
- 金额:$ 50万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-08-01 至 2023-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project focuses on the subfield of machine learning referred to as Reinforcement Learning (RL), in which algorithms or robots learn by trial and error. As with many areas of machine learning, there has been a surge of interest in "deep learning" approaches to reinforcement learning, i.e, "Deep RL." Deep learning uses computational models motivated by structures found in the brains of animals. Deep RL has enjoyed some stunning successes, including a recent advance by which a program learned to play the Asian game of Go better than the best human player. Notably, this level of performance was achieved without any human guidance. Given only the rules of the game, the program learned by playing against itself. Although games are intriguing and attention-grabbing, this feat was merely a technology demonstration. Firms are seeking to deploy Deep RL methods to increase the efficiency of their operations across a range of applications such as data center management and robotics. To realize fully the potential of Deep RL, further research is required to make the training process more predictable, reliable, and efficient. Current techniques require massive amounts of training data and computation, and subtle changes in the configuration of the system can cause huge differences in the quality of the results obtained. Thus, even though RL systems can learn autonomously by trial and error, a large amount of human intuition, experience and experimentation may be required to lay the groundwork for these systems to succeed. This proposal seeks to develop new techniques and theory to make high quality deep RL results more widely and easily obtainable. In addition, this proposal will provide opportunities for undergraduates to be involved in research through Duke's Data+ initiative.The proposed research is partly inspired by past work on feature selection and discovery for reinforcement learning. Much of that work focused primarily on linear value function approximation. Its relevance to deep reinforcement learning is that methods such as Deep Q-learning have a linear final layer. The preceding, nonlinear layers can, therefore, be interpreted as performing feature discovery for what is ultimately a linear value function approximation process. Sufficient conditions on the features that were specified for successful linear value function approximation in earlier work can now be re-interpreted as an intermediate objective function for the penultimate layer of a deep network. The proposed research aims to achieve the following objectives: 1) Develop a theory of feature construction that explains and informs deep reinforcement learning methods, 2) develop improved approaches to value function approximation that are applicable to deep reinforcement learning, 3) develop improved approaches to policy search that are applicable to deep reinforcement learning, and 4) develop new algorithms for exploration in reinforcement learning that take advantage of learned feature representations, and 5) perform computational experiments demonstrating the efficacy of the new algorithms developed on benchmark problems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目着重于称为增强学习(RL)的机器学习子领域,其中算法或机器人通过反复试验学习。与机器学习的许多领域一样,人们对“深度学习”方法的增强学习方法引起了人们的兴趣,即“深度RL”。 深度学习使用由动物大脑中发现的结构动机的计算模型。 Deep RL取得了一些惊人的成功,包括最近的进步,该计划学会了与最好的人类玩家更好地玩亚洲游戏。值得注意的是,没有任何人类的指导就可以达到这种水平。只有游戏规则,该程序通过对抗自身而学习。尽管游戏引人入胜且引人注目,但这一壮举只是一个技术演示。公司正在寻求部署深层RL方法,以提高其在数据中心管理和机器人技术等应用程序中的运营效率。为了充分实现深度RL的潜力,需要进一步的研究才能使培训过程更加可预测,可靠和高效。当前的技术需要大量的培训数据和计算,并且系统配置的细微变化可能会在获得的结果质量上引起巨大差异。因此,即使RL系统可以通过反复试验自主学习,但可能需要大量的人类直觉,经验和实验才能为这些系统取得成功奠定基础。该建议旨在开发新技术和理论,以使高质量的深度RL结果更广泛,更容易获得。此外,该提案将为大学生通过Duke的数据+计划参与研究的机会。拟议的研究部分受到过去在功能选择和发现方面的研究方面的工作的启发。 大部分工作主要集中在线性值函数近似上。 它与深度强化学习的相关性是,诸如深Q学习之类的方法具有线性最终层。 因此,可以将前面的非线性层解释为最终是线性值函数近似过程的性能发现。 现在,在早期工作中为成功的线性值函数近似的特征上的足够条件可以重新解释为深网倒数第二层的中间目标函数。拟议的研究旨在实现以下目标:1)开发一种特征构建理论,解释并告知深度强化学习方法,2)开发改进的价值功能近似方法,适用于深度强化学习,3)发展改进的政策搜索方法,这些方法适用于深度强化的实验,以及在探索方面的探索方面,以探索新的障碍,以实现探索的实验,以实现实验,以实现良好的实验,以实现有效性的实现,以实现有效的效力,以实现有效的成效,以实现有效的成果,以实现有效的成果,以实现有效的努力,并将其采用。该奖项反映了NSF的法定任务,并被认为是使用基金会的知识分子优点和更广泛的影响审查标准,反映了NSF的法定任务。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Policy Caches with Successor Features
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:0
- 作者:Mark W. Nemecek;R. Parr
- 通讯作者:Mark W. Nemecek;R. Parr
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Ronald Parr其他文献
Amazing Things Come From Having Many Good Models
令人惊奇的事情来自于拥有许多好的模型
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Cynthia Rudin;Chudi Zhong;Lesia Semenova;Margo Seltzer;Ronald Parr;Jiachang Liu;Srikar Katta;Jon Donnelly;Harry Chen;Zachery Boner - 通讯作者:
Zachery Boner
An Optimal Tightness Bound for the Simulation Lemma
模拟引理的最优紧界
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sam Lobel;Ronald Parr - 通讯作者:
Ronald Parr
Ronald Parr的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Ronald Parr', 18)}}的其他基金
EAGER: Collaborative Research: An Unified Learnable Roadmap for Sequential Decision Making in Relational Domains
EAGER:协作研究:关系领域顺序决策的统一可学习路线图
- 批准号:
1836575 - 财政年份:2018
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
RI: Small: Non-parametric Approximate Dynamic Programming for Continuous Domains
RI:小:连续域的非参数近似动态规划
- 批准号:
1218931 - 财政年份:2012
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
EAGER: IIS: RI: Learning in Continuous and High Dimensional Action Spaces
EAGER:IIS:RI:在连续和高维行动空间中学习
- 批准号:
1147641 - 财政年份:2011
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative: RI: Feature Discovery and Benchmarks for Exportable Reinforcement Learning
协作:RI:可导出强化学习的特征发现和基准
- 批准号:
0713435 - 财政年份:2007
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
CAREER: Observing to Plan - Planning to Observe
职业生涯:观察到计划 - 计划到观察
- 批准号:
0546709 - 财政年份:2006
- 资助金额:
$ 50万 - 项目类别:
Continuing Grant
Prediction and Planning: Bridging the Gap
预测和规划:弥合差距
- 批准号:
0209088 - 财政年份:2002
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
相似国自然基金
非小细胞肺癌血清IgG的N-糖基化修饰特征分析及其遗传调控机制研究
- 批准号:82272418
- 批准年份:2022
- 资助金额:52.00 万元
- 项目类别:面上项目
典型喀斯特小流域土壤中铁同位素的分馏特征及过程研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于多频全极化指纹特征的机场净空区小目标雷达识别
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
黄土高原植被自然恢复和人工造林小流域土壤优先流特征的差异及其对产流过程的影响研究
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
黄土高原植被自然恢复和人工造林小流域土壤优先流特征的差异及其对产流过程的影响研究
- 批准号:42201033
- 批准年份:2022
- 资助金额:30.00 万元
- 项目类别:青年科学基金项目
相似海外基金
tRNA-derived RNA Fragments (tRF) as Prognostic and Diagnostic Biomarkers for Alzheimer’s Disease
tRNA 衍生的 RNA 片段 (tRF) 作为阿尔茨海默病的预后和诊断生物标志物
- 批准号:
10578546 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Collaborative Research: SHF: Small: Sub-millisecond Topological Feature Extractor for High-Rate Machine Learning
合作研究:SHF:小型:用于高速机器学习的亚毫秒拓扑特征提取器
- 批准号:
2234921 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Sub-millisecond Topological Feature Extractor for High-Rate Machine Learning
合作研究:SHF:小型:用于高速机器学习的亚毫秒拓扑特征提取器
- 批准号:
2234920 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Sub-millisecond Topological Feature Extractor for High-Rate Machine Learning
合作研究:SHF:小型:用于高速机器学习的亚毫秒拓扑特征提取器
- 批准号:
2234919 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别:
Standard Grant
Implementing and Scaling the STEADI Fall Prevention Algorithm Using a Conversational Relational Agent for Community-Dwelling Older Adults with and without Mild Cognitive Impairment (MCI).
使用对话关系代理为社区居住的患有或不患有轻度认知障碍 (MCI) 的老年人实施和扩展 STEADI 跌倒预防算法。
- 批准号:
10822816 - 财政年份:2023
- 资助金额:
$ 50万 - 项目类别: