CPS: Medium: Sufficient Statistics for Learning Multi-Agent Interactions

CPS：中：学习多智能体交互的足够统计数据

基本信息

批准号：
2125511
负责人：
Dorsa Sadigh
金额：
$ 111.42万
依托单位：
Stanford University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2021
资助国家：
美国
起止时间：
2021-09-15 至 2025-08-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2125511&HistoricalAwards=false
关键词：
CPS Medium Sufficient Statistics Learning

项目摘要

Multi-agent coordination and collaboration is a core challenge of future cyber-physical systems as they start having more complex interactions with each other or with humans in homes or cities. One of the key challenges is that agents must be able to reason about and learn the behavior of other agents in order to be able to make decisions. This is particularly challenging because state of the art approaches such as recursive belief modeling over partner policies often do not scale. However, humans are very effective in coordinating and collaborating with each other without the need of any expensive recursive belief modeling. One hypothesis is that humans can effectively capture the sufficient representations required for coordinating on tasks. Similar to humans, the agents in a multi-agent setting can look for the sufficient statistics needed for coordination and collaboration. This project is about learning and approximating such sufficient statistics to enable effective collaboration and coordination. In addition, the investigators will study teaching and learning in settings where the agents have partial observation over the world and need to teach and learn from each other in order to achieve a collaborative task.Important successful demonstrations of reinforcement learning for single agents have spurred the drive to determine whether such methods can extend to multiple agents. There have also been notable developments in the area of multi-agent systems, both in understanding the structure of the resulting interacting dynamics and in the development of practical reinforcement learning algorithms. The core objective of this project is: 1) the development of learning methods that approximate the well-known concept of sufficient statistics in multi-agent interactions; 2) the development of a reinforcement learning algorithm that leverages the representations of sufficient statistics for more effective planning, coordination, and collaboration in multi-agent settings; and 3) the development of algorithms that use the representations of sufficient statistics to enable teaching and learning in multi-agent settings under partial observation over the environment. The overall outcome of this project will be a new formalism along with algorithms, tools, and techniques that enhance multi-agent learning and control. The investigators will ground this in two main applications: 1) collaborative search and exploration and 2) collaborative transport of objects.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

多代理协调和协作是未来网络物理系统的核心挑战，因为它们开始彼此之间或与人类或城市中的人类之间进行更复杂的互动。主要挑战之一是，代理必须能够推理和学习其他代理的行为，以便能够做出决定。这尤其具有挑战性，因为艺术的状态方法（例如递归信念建模对合作伙伴政策）通常不会扩大。但是，人类在彼此协调和协作方面非常有效，而无需任何昂贵的递归信念建模。一种假设是，人类可以有效地捕获与任务协调所需的足够表示。与人类类似，在多代理环境中的代理商可以寻找协调和协作所需的足够统计数据。该项目是关于学习和近似足够的统计数据，以实现有效的协作和协调。此外，调查人员将在环境中研究教学和学习，在这种情况下，代理商对世界有部分观察，需要互相教学和学习以实现协作任务。最重要的成功证明了对单个代理的加强学习的成功证明，促使人们促进了驱动力，以确定此类方法是否可以扩展到多个代理商。多代理系统领域也有显着的发展，既在理解所得的相互作用动力学的结构以及实用强化学习算法的发展中。该项目的核心目标是：1）学习方法的发展，该方法近似于多代理相互作用中足够统计的概念； 2）开发增强学习算法，该算法利用足够的统计数据来实现多代理环境中的更有效的计划，协调和协作； 3）开发使用足够统计数据的表示算法，以在环境中部分观察下在多机构环境中进行教学。该项目的总体结果将是一种新的形式主义，以及算法，工具和技术，可增强多机构学习和控制。调查人员将在两个主要应用程序中进行以下基础：1）合作搜索和探索以及2）对象的协作运输。该奖项反映了NSF的法定任务，并且使用基金会的知识分子优点和更广泛的影响审查标准，认为值得通过评估来获得支持。

项目成果

期刊论文数量（6）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Conditional Imitation Learning for Multi-Agent Games

DOI：
10.1109/hri53351.2022.9889671
发表时间：
2022-01
期刊：
2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)
影响因子：
0
作者：
Andy Shih;Stefano Ermon;Dorsa Sadigh
通讯作者：
Andy Shih;Stefano Ermon;Dorsa Sadigh

Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

去中心化合作强盗团队中的合作伙伴感知算法

DOI：
发表时间：
2022
期刊：
Proceedings of the 36th AAAI Conference on Artificial Intelligence
影响因子：
0
作者：
Erdem Bıyık, Anusha Lalitha
通讯作者：
Erdem Bıyık, Anusha Lalitha

Reward Design with Language Models

DOI：
10.48550/arxiv.2303.00001
发表时间：
2023-02
期刊：
ArXiv
影响因子：
0
作者：
Minae Kwon;Sang Michael Xie;Kalesha Bullard;Dorsa Sadigh
通讯作者：
Minae Kwon;Sang Michael Xie;Kalesha Bullard;Dorsa Sadigh

Leveraging Smooth Attention Prior for Multi-Agent Trajectory Prediction

DOI：
10.48550/arxiv.2203.04421
发表时间：
2022-03
期刊：
2022 International Conference on Robotics and Automation (ICRA)
影响因子：
0
作者：
Zhangjie Cao;Erdem Biyik;G. Rosman;Dorsa Sadigh
通讯作者：
Zhangjie Cao;Erdem Biyik;G. Rosman;Dorsa Sadigh

Influencing Towards Stable Multi-Agent Interactions

DOI：
发表时间：
2021-10
期刊：
ArXiv
影响因子：
0
作者：
Woodrow Z. Wang;Andy Shih;Annie Xie;Dorsa Sadigh
通讯作者：
Woodrow Z. Wang;Andy Shih;Annie Xie;Dorsa Sadigh

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Dorsa Sadigh其他文献

Repeated Interactions Convention Dependence HighLow ρi ρ 2 ρ 3 Rule representation Convention representation 4 player chess Friendly Rock Paper Scissors time gt gp

重复交互约定依赖 HighLow ρi ρ 2 ρ 3 规则表示约定表示 4 人棋友好石头剪刀布时间 gt gp

DOI：
发表时间：
2021
期刊：
影响因子：
0
作者：
Andy Shih;Arjun Sawhney;J. Kondic;Stefano Ermon;Dorsa Sadigh
通讯作者：
Dorsa Sadigh

Altruistic Autonomy: Beating Congestion on Shared Roads

无私的自治：克服共享道路上的拥堵

DOI：
发表时间：
2018
期刊：
Workshop on the Algorithmic Foundations of Robotics
影响因子：
0
作者：
Erdem Biyik;Daniel A. Lazar;Ramtin Pedarsani;Dorsa Sadigh
通讯作者：
Dorsa Sadigh

Shared Autonomy for Robotic Manipulation with Language Corrections

具有语言修正功能的机器人操作的共享自主权

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Siddharth Karamcheti;Raj Palleti;Yuchen Cui;Percy Liang;Dorsa Sadigh
通讯作者：
Dorsa Sadigh

Deep Local Trajectory Replanning and Control for Robot Navigation

机器人导航的深度局部轨迹重新规划和控制

DOI：
发表时间：
2019
期刊：
IEEE International Conference on Robotics and Automation
影响因子：
0
作者：
Ashwini Pokle;Roberto Martín;P. Goebel;Vincent Chow;H. Ewald;Junwei Yang;Zhenkai Wang;Amir Sadeghian;Dorsa Sadigh;S. Savarese;Marynel Vázquez
通讯作者：
Marynel Vázquez