RI: Small: Feature Encoding for Reinforcement Learning

RI：小型：强化学习的特征编码

基本信息

批准号：
1815300
负责人：
Ronald Parr
金额：
$ 50万
依托单位：
Duke University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-01 至 2023-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1815300&HistoricalAwards=false
关键词：
RI Small Feature Encoding Reinforcement

项目摘要

This project focuses on the subfield of machine learning referred to as Reinforcement Learning (RL), in which algorithms or robots learn by trial and error. As with many areas of machine learning, there has been a surge of interest in "deep learning" approaches to reinforcement learning, i.e, "Deep RL." Deep learning uses computational models motivated by structures found in the brains of animals. Deep RL has enjoyed some stunning successes, including a recent advance by which a program learned to play the Asian game of Go better than the best human player. Notably, this level of performance was achieved without any human guidance. Given only the rules of the game, the program learned by playing against itself. Although games are intriguing and attention-grabbing, this feat was merely a technology demonstration. Firms are seeking to deploy Deep RL methods to increase the efficiency of their operations across a range of applications such as data center management and robotics. To realize fully the potential of Deep RL, further research is required to make the training process more predictable, reliable, and efficient. Current techniques require massive amounts of training data and computation, and subtle changes in the configuration of the system can cause huge differences in the quality of the results obtained. Thus, even though RL systems can learn autonomously by trial and error, a large amount of human intuition, experience and experimentation may be required to lay the groundwork for these systems to succeed. This proposal seeks to develop new techniques and theory to make high quality deep RL results more widely and easily obtainable. In addition, this proposal will provide opportunities for undergraduates to be involved in research through Duke's Data+ initiative.The proposed research is partly inspired by past work on feature selection and discovery for reinforcement learning. Much of that work focused primarily on linear value function approximation. Its relevance to deep reinforcement learning is that methods such as Deep Q-learning have a linear final layer. The preceding, nonlinear layers can, therefore, be interpreted as performing feature discovery for what is ultimately a linear value function approximation process. Sufficient conditions on the features that were specified for successful linear value function approximation in earlier work can now be re-interpreted as an intermediate objective function for the penultimate layer of a deep network. The proposed research aims to achieve the following objectives: 1) Develop a theory of feature construction that explains and informs deep reinforcement learning methods, 2) develop improved approaches to value function approximation that are applicable to deep reinforcement learning, 3) develop improved approaches to policy search that are applicable to deep reinforcement learning, and 4) develop new algorithms for exploration in reinforcement learning that take advantage of learned feature representations, and 5) perform computational experiments demonstrating the efficacy of the new algorithms developed on benchmark problems.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目着重于称为增强学习（RL）的机器学习子领域，其中算法或机器人通过反复试验学习。与机器学习的许多领域一样，人们对“深度学习”方法的增强学习方法引起了人们的兴趣，即“深度RL”。深度学习使用由动物大脑中发现的结构动机的计算模型。 Deep RL取得了一些惊人的成功，包括最近的进步，该计划学会了与最好的人类玩家更好地玩亚洲游戏。值得注意的是，没有任何人类的指导就可以达到这种水平。只有游戏规则，该程序通过对抗自身而学习。尽管游戏引人入胜且引人注目，但这一壮举只是一个技术演示。公司正在寻求部署深层RL方法，以提高其在数据中心管理和机器人技术等应用程序中的运营效率。为了充分实现深度RL的潜力，需要进一步的研究才能使培训过程更加可预测，可靠和高效。当前的技术需要大量的培训数据和计算，并且系统配置的细微变化可能会在获得的结果质量上引起巨大差异。因此，即使RL系统可以通过反复试验自主学习，但可能需要大量的人类直觉，经验和实验才能为这些系统取得成功奠定基础。该建议旨在开发新技术和理论，以使高质量的深度RL结果更广泛，更容易获得。此外，该提案将为大学生通过Duke的数据+计划参与研究的机会。拟议的研究部分受到过去在功能选择和发现方面的研究方面的工作的启发。大部分工作主要集中在线性值函数近似上。它与深度强化学习的相关性是，诸如深Q学习之类的方法具有线性最终层。因此，可以将前面的非线性层解释为最终是线性值函数近似过程的性能发现。现在，在早期工作中为成功的线性值函数近似的特征上的足够条件可以重新解释为深网倒数第二层的中间目标函数。拟议的研究旨在实现以下目标：1）开发一种特征构建理论，解释并告知深度强化学习方法，2）开发改进的价值功能近似方法，适用于深度强化学习，3）发展改进的政策搜索方法，这些方法适用于深度强化的实验，以及在探索方面的探索方面，以探索新的障碍，以实现探索的实验，以实现实验，以实现良好的实验，以实现有效性的实现，以实现有效的效力，以实现有效的成效，以实现有效的成果，以实现有效的成果，以实现有效的努力，并将其采用。该奖项反映了NSF的法定任务，并被认为是使用基金会的知识分子优点和更广泛的影响审查标准，反映了NSF的法定任务。

项目成果

期刊论文数量（1）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Policy Caches with Successor Features

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Mark W. Nemecek;R. Parr
通讯作者：
Mark W. Nemecek;R. Parr

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Ronald Parr其他文献

Amazing Things Come From Having Many Good Models

令人惊奇的事情来自于拥有许多好的模型

DOI：
发表时间：
期刊：
影响因子：
0
作者：
Cynthia Rudin;Chudi Zhong;Lesia Semenova;Margo Seltzer;Ronald Parr;Jiachang Liu;Srikar Katta;Jon Donnelly;Harry Chen;Zachery Boner
通讯作者：
Zachery Boner