EAGER: Training A Mobile Robot from Human Feedback via Income Learning

EAGER：通过收入学习根据人类反馈训练移动机器人

基本信息

批准号：
1643413
负责人：
Michael Littman
金额：
$ 7万
依托单位：
Brown University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-08-01 至 2018-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1643413&HistoricalAwards=false
关键词：
EAGER Training Mobile Robot Human

项目摘要

As cyberphysical systems become more widespread, there is an increasing number of complex tasks that they can usefully perform to assist human users. Tasks are typically formalized in the sequential decision framework, where the learner perceives states, takes actions, and receives a reward feedback signal. In practice, there is a critical need to learn directly from human users if such machines are to accomplish tasks outside of those pre-specified by the original developers. This project will develop new algorithms that can learn more effectively from humans. We will evaluate these algorithms in both virtual agents and on robot platforms. We will investigate whether and how non-expert humans can construct sequences of tasks of increasing difficulty, similar to how expert animal trainers shape tasks. Insights from these user studies will be leveraged to further improve our algorithms' abilities to learn from human trainers. Once successful, this project will make critical progress towards allowing non-technical users to be able to teach virtual and physical agents to perform complex tasks in a natural setting, familiar to many from previous experience in training household pets.This project is a part of a larger effort between Washington State University (WSU), North Carolina State University, and Brown University. The Brown effort will focus on deriving a well-motivated learning algorithm (tentatively called "I-learning") and understanding its theoretical properties. Of particular interest is the behavior of these algorithms in settings that are well studied in the reinforcement-learning community such as Markov decisions processes, k-armed bandit, and learning with function approximation. Algorithms will be implemented and tested on virtual and physical platforms (robots) and broader impacts on education and control will be pursued.

随着网络物理系统变得越来越普遍，他们可以有效地执行的复杂任务越来越多，以帮助人类用户。任务通常是在顺序决策框架中形式化的，在该框架中，学习者认为，采取行动并收到奖励反馈信号。实际上，如果这些机器要在原始开发人员预先指定的那些机器之外完成任务，则至关重要的是直接向人类用户学习。该项目将开发新算法，这些算法可以从人类中更有效地学习。我们将在虚拟代理和机器人平台上评估这些算法。我们将调查非专家如何以及如何构建一个增加难度的任务序列，类似于专家动物培训师如何塑造任务。这些用户研究的见解将得到利用，以进一步提高我们算法向人类教练学习的能力。一旦成功，该项目将取得关键的进步，以允许非技术用户能够教导虚拟和物理代理在自然环境中执行复杂的任务，这是许多人以前在培训家庭宠物方面的经验而熟悉的。这是华盛顿州立大学（WSU），北卡罗来纳州立大学和布朗大学之间更大努力的一部分。棕色的努力将集中于得出一种良好的学习算法（暂定称为“ i-Learning”），并理解其理论特性。特别令人感兴趣的是这些算法在在强化学习社区中进行了充分研究的设置中的行为，例如马尔可夫决策过程，K臂匪徒和功能近似的学习。将在虚拟和物理平台（机器人）（机器人）上实施和测试算法，并将追求对教育和控制的更广泛影响。

项目成果

期刊论文数量（13）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Showing versus doing: Teaching by demonstration

展示与实践：示范教学

DOI：
发表时间：
2016
期刊：
NeurIPS
影响因子：
0
作者：
Ho, M. K.;Littman, M. L.;MacGlashan, J.;Cushman, F.;Austerweil, J. L.
通讯作者：
Austerweil, J. L.

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

DOI：
10.1007/s10458-015-9283-7
发表时间：
2015
期刊：
Autonomous Agents and Multi-Agent Systems
影响因子：
1.9
作者：
R. Loftin;Bei Peng;J. MacGlashan;M. Littman;Matthew E. Taylor;Jeff Huang;D. Roberts
通讯作者：
R. Loftin;Bei Peng;J. MacGlashan;M. Littman;Matthew E. Taylor;Jeff Huang;D. Roberts

Curriculum Design for Machine Learners in Sequential Decision Tasks

DOI：
10.1109/tetci.2018.2829980
发表时间：
2017-05
期刊：
IEEE Transactions on Emerging Topics in Computational Intelligence
影响因子：
5.3
作者：
Bei Peng;J. MacGlashan;R. Loftin;M. Littman;David L. Roberts;Matthew E. Taylor
通讯作者：
Bei Peng;J. MacGlashan;R. Loftin;M. Littman;David L. Roberts;Matthew E. Taylor

Teaching by Intervention: Working Backwards, Undoing Mistakes, or Correcting Mistakes?

干预教学：逆向工作、消除错误还是纠正错误？

DOI：
发表时间：
2017
期刊：
Proceedings of the Cognitive Science Conference
影响因子：
0
作者：
Ho, M. K.l;Austerweil, J. L.
通讯作者：
Austerweil, J. L.

Interactive Learning from Policy-Dependent Human Feedback

DOI：
发表时间：
2017-01
期刊：
ArXiv
影响因子：
0
作者：
J. MacGlashan;Mark K. Ho;R. Loftin;Bei Peng;Guan Wang;David L. Roberts;Matthew E. Taylor;M. Littman-M.
通讯作者：
J. MacGlashan;Mark K. Ho;R. Loftin;Bei Peng;Guan Wang;David L. Roberts;Matthew E. Taylor;M. Littman-M.

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Michael Littman其他文献

Model-based reasoning

基于模型的推理

DOI：
10.1016/j.compedu.2012.11.014
发表时间：
2013
期刊：
Comput. Educ.
影响因子：
0
作者：
Michael Jackson;Janusz Wojtusiak;Dayne Freitag;Eugene Subbotsky;Hans M. Nordahl;Jens C. Thimm;John Burgoyne;Roberto Poli;Thomas R. Guskey;Michael Davison;J. Magnotti;Adam M. Goodman;Jeffrey S. Katz;L. Verschaffel;W. Dooren;B. Smedt;Sean A. Fulop;Melva R. Grant;Leonid I. Perlovsky;B. De Smedt;P. Ghesquière;Dariusz Plewczynski;Leily Ziglari;P. Birjandi;Scott Rick;Roberto Weber;N. Seel;Maike Luhmann;Michael Eid;A. Antonietti;Barbara Colombo;Hamish Coates;Ali Radloff;P. Pirnay;Dirk Ifenthaler;Edward Swing;Craig A Anderson;David Tzuriel;Norman M. Weinberger;David C. Riccio;Patrick K. Cullen;J. Tallet;Megan L. Hoffman;David A. Washburn;Iván Izquierdo;Jorge H. Medina;M. Cammarota;A. Podolskiy;Joke Torbeyns;J. Kranzler;P. A. Kirschner;F. Kirschner;Kenn Apel;Julie A. Wolter;J. Masterson;JungMi Lee;Stefan N Groesser;Sabine Al;Philip Barker;Paul Schaik;I. Cutica;Monica Bucciarelli;K. Pata;Anna Strasser;A. Guillot;N. Hoyek;Christian Collet;Maria Opfermann;Roger Azevedo;Detlev Leutner;Thomas C. Toppino;Alice Y. Kolb;David A. Kolb;P. Brazdil;Ricardo Vilalta;Carlos Soares;C. Giraud;Jeffrey W. Bloom;Tyler Volk;Marwan A. Dwairy;Richard A. Swanson;Johanna Pöysä;K. Luwel;Theo Hug;Angélique Martin;Nicolas Guéguen;Craig Hassed;Fabio Alivernini;Michael Herczeg;M. Mastropieri;T. Scruggs;Angelika Rieder;S. Castillo;Gerardo Ayala;R. Low;R. Babuška;Barbara C. Buckley;Henry Markovits;Sungho Kim;In;Michael J. Spector;A. Towse;Charlie N. Lewis;Brian Francis;David N. Rapp;Pratim Sengupta;Sidney D’Mello;Serge Brand;J. Patry;Cees Klaassen;Sieglinde Weyringer;Alfred Weinberger;Marilla D. Svinicki;Jane S. Vogler;Andrew J. Martin;John M. Keller;ChanMin Kim;Gabriele Wulf;Lynne E. Parker;Michael Wunder;Michael Littman;Lisa J. Lehmberg;C. Victor Fung;Hannele Niemi;Steven Reiss;Piet Desmet;F. Cornillie;Helmut M. Niegemann;Steffi Heidig;Dominic W. Massaro;Charles Fadel;Cheryl Lemke;R. Grabner;Michael D. Basil;Daniel R. Little;Stephan Lewandowsky;Parmjit Singh;Zheng Liu;Marcelo H. Ang;W. Seah;Jack Heller;C. Randles;Kenneth S. Aigen
通讯作者：
Kenneth S. Aigen