CAREER: Learning from demonstrations and beyond -- consolidating imitation and reinforcement learning

职业:从演示中学习以及超越——巩固模仿和强化学习

基本信息

  • 批准号:
    2238979
  • 负责人:
  • 金额:
    $ 58.45万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Continuing Grant
  • 财政年份:
    2023
  • 资助国家:
    美国
  • 起止时间:
    2023-06-01 至 2028-05-31
  • 项目状态:
    未结题

项目摘要

Recent advancements in deep reinforcement learning (RL) hold unprecedented potential for automating and optimizing control of real-world tasks such as autonomous driving, traffic management, medical procedures, robotic manufacturing, and energy management. Unfortunately, it is common for RL algorithms to exhibit unstable and/or inefficient learning, which limits their applicability. Seeking to address this critical concern, this CAREER project leverages imitation learning (IL), or behavior copying, which is better understood and typically more stable. The project targets the unification of IL and RL into a holistic paradigm that can safely and effectively learn from, and outperform, existing solutions. This project will address outstanding knowledge gaps in both types of learning through a novel curriculum decomposition of the tasks, where simplified demonstrations are used to bootstrap the learner’s behavior. The project will also foster education and outreach activities. Specifically, it will enhance undergraduate STEM training by providing students with exposure to scientific research and knowledge discovery processes relating to safety-critical AI applications through an original multidisciplinary undergraduate engineering program. Moreover, it will facilitate a unique K12 outreach activity within a large minority (Hispanic/latino) community (Bryan, TX). The project will support and advance an existing research collaboration with an industrial partner in the context of defense technology. This collaboration, in turn, is expected to advance the US national defense.This project will form the basis for a new research thrust in ML---one that combines IL and RL toward a holistic, robust, and safe learning framework. It will define and prove a no-regret bound on the training process within the Markov-Decision Process formalization. The approach is to reduce an IL problem to an RL one that includes a domain-independent curriculum-learning trajectory. The resulting algorithms and solutions are expected to achieve state-of-the-art performance in complex control domains as well as to deepen theoretical understanding of the potential and limitations of the resulting solutions. Specifically, the research seeks to prove conditions guaranteeing policy convergence and monotonic improvement during training. Moreover, the project will develop domain-specific adaptation to and analysis of real-world applications (autonomous driving and robotics testbeds) while providing stable and efficient RL from demonstrations.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
深度强化学习 (RL) 的最新进展在自动化和优化现实世界任务(例如自动驾驶、交通管理、医疗程序、机器人制造和能源管理)方面具有前所未有的潜力,但不幸的是,RL 算法表现出的现象很常见。不稳定和/或低效的学习,限制了它们的适用性,为了解决这个关键问题,这个职业项目利用了更好理解并且通常更稳定的模仿学习(IL)。强化学习成为一种可以安全有效地学习并超越现有解决方案的整体范式,该项目将通过任务的新颖课程分解来解决这两种学习类型中的突出知识差距,其中使用简化的演示来引导学习者的行为。具体来说,该项目还将通过原创的多学科本科工程项目,让学生接触与安全关键型人工智能应用相关的科学研究和知识发现过程,从而加强本科生 STEM 培训。促进独特的K12该项目将支持和推进与工业合作伙伴在国防技术领域的现有研究合作。美国国防。该项目将构成 ML 的新研究方向的基础——将 IL 和 RL 结合起来,形成一个全面、稳健和安全的学习框架。它将定义并证明训练的无悔界限。马尔可夫决策过程中的过程该方法旨在将 IL 问题简化为 RL 问题,其中包含独立于领域的课程学习轨迹,预计所得算法和解决方案将在复杂的控制领域实现最先进的性能。具体来说,该研究旨在证明训练过程中保证政策收敛和单调改进的条件。此外,该项目还将开发针对实际应用(自主)的特定领域的适应和分析。驾驶和机器人技术测试床),同时通过演示提供稳定和高效的强化学习。该奖项反映了高效的 NSF 法定使命,并且通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Comparison between popular Genetic Algorithm (GA)-based tool and Covariance Matrix Adaptation - Evolutionary Strategy (CMA-ES) for optimizing indoor daylight
用于优化室内日光的流行的基于遗传算法 (GA) 的工具与协方差矩阵适应 - 进化策略 (CMA-ES) 的比较
The (Un)Scalability of Informed Heuristic Function Estimation in NP-Hard Search Problems
NP 难搜索问题中知情启发式函数估计的(非)可扩展性
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sumedh Pendurkar;Taoan Huang;Brendan Juba;Jiapeng Zhang;Sven Koenig;Guni Sharon
  • 通讯作者:
    Guni Sharon
Bilevel Entropy based Mechanism Design for Balancing Meta in Video Games
  • DOI:
    10.5555/3545946.3598887
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon
  • 通讯作者:
    Sumedh Pendurkar;Chris Chow;Luo Jie;Guni Sharon
Task Phasing: Automated Curriculum Learning from Demonstrations
任务阶段化:从演示中自动进行课程学习
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Guni Sharon其他文献

Curriculum Generation for Learning Guiding Functions in State-Space Search Algorithms
状态空间搜索算法中学习引导功能的课程生成
Technical Report: Hybrid Autonomous Intersection Management
技术报告:混合自主交叉口管理
  • DOI:
    10.48550/arxiv.2204.07704
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aaron Parks;Guni Sharon
  • 通讯作者:
    Guni Sharon
Traffic Optimization For a Mixture of Self-interested and Compliant Agents
自利和顺从代理混合的流量优化
An Assessment of Autonomous Vehicles: Traffic Impacts and Infrastructure Needs—Final Report
自动驾驶汽车评估:交通影响和基础设施需求——最终报告
  • DOI:
  • 发表时间:
    2017
  • 期刊:
  • 影响因子:
    0
  • 作者:
    K. Kockelman;S. Boyles;P. Stone;Daniel J. Fagnant;Rahul Patel;M. Levin;Guni Sharon;M. Simoni;Michael Albert;Hagen Fritz;Rebecca Hutchinson;P. Bansal;Gleb B. Domnenko;P. Bujanovic;Bumsik Kim;Elham Pourrahmani;Sudesh Agrawal;Tianxin Li;Josiah P. Hanna;Aqshems Nichols;Jia Li
  • 通讯作者:
    Jia Li
Socially Optimal Non-discriminatory Restrictions for Continuous-Action Games
对连续动作游戏的社会最优非歧视性限制

Guni Sharon的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

相似国自然基金

基于渐进式稀疏建模与深度学习的激光吸收光谱层析成像
  • 批准号:
    62371415
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
利用深度学习方法开发创新高精度城市风速及污染物扩散的预测模型研究
  • 批准号:
    42375193
  • 批准年份:
    2023
  • 资助金额:
    51 万元
  • 项目类别:
    面上项目
​基于自监督学习的医学图像质量迁移反问题理论
  • 批准号:
    12301546
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于共识主动性学习的城市电动汽车充电、行驶行为与交通网—配电网协同控制策略研究
  • 批准号:
    62363022
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
基于脑电信号多域特征和深度学习的驾驶行为识别研究
  • 批准号:
    62366028
  • 批准年份:
    2023
  • 资助金额:
    33 万元
  • 项目类别:
    地区科学基金项目

相似海外基金

Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
  • 批准号:
    2312955
  • 财政年份:
    2023
  • 资助金额:
    $ 58.45万
  • 项目类别:
    Standard Grant
FMitF: Track I: Program Synthesis for Robot Learning from Demonstrations
FMITF:轨道 I:机器人从演示中学习的程序综合
  • 批准号:
    2319471
  • 财政年份:
    2023
  • 资助金额:
    $ 58.45万
  • 项目类别:
    Standard Grant
Collaborative Research: RI: Medium: Superhuman Imitation Learning from Heterogeneous Demonstrations
合作研究:RI:媒介:异质演示中的超人模仿学习
  • 批准号:
    2312956
  • 财政年份:
    2023
  • 资助金额:
    $ 58.45万
  • 项目类别:
    Standard Grant
Learning robot navigation and manipulation from demonstrations
通过演示学习机器人导航和操作
  • 批准号:
    2601734
  • 财政年份:
    2021
  • 资助金额:
    $ 58.45万
  • 项目类别:
    Studentship
NRI: FND: Robust Learning of Sequential Motion from Human Demonstrations to Enable Robot-Guided Exercise Training
NRI:FND:从人体演示中稳健地学习顺序运动,以实现机器人引导的运动训练
  • 批准号:
    1830597
  • 财政年份:
    2019
  • 资助金额:
    $ 58.45万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了