Collaborative Research: HCC: Medium: Aligning Robot Representations with Humans

合作研究：HCC：媒介：使机器人表示与人类保持一致

基本信息

批准号：
2310757
负责人：
Anca Dragan
金额：
$ 42.05万
依托单位：
University of California-Berkeley
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-08-15 至 2026-07-31
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2310757&HistoricalAwards=false
关键词：
Collaborative Research HCC Medium Aligning

Collaborative Research HCC Medium Aligning

项目摘要

This project seeks to make robots more robust and aligned with human preferences and values. Traditionally, robot behaviors and objectives were trained to include a set of hand-crafted features (i.e., variables represented in the data) that reflect task-relevant aspects of the environment. Using well-chosen features is very data-efficient, but it is unrealistic for human engineers to identify and write code ahead of time for all the features that could matter. Training modern high-capacity models from a lot of data is a great alternative, as long as we do not probe the learned models on novel (out-of-distribution) inputs. The reason these models fail to generalize to out-of-distribution inputs is that they will generally fail to learn the correct representation, comprising the features that matter, and instead pick up on spurious patterns in the data. The central goal of this project is to enable robots to arrive at the underlying correct representation for objectives (and, hence, behaviors). And since learning the objective function---what the human user wants---is fundamentally about humans, this work proposes that only the human can determine what actually matters vs. what is spurious. The research will introduce the problem of aligning robot representations to humans. The key observation behind the project is that traditional input used in learning, such as demonstrations or comparisons, which is designed to teach the robot the full task, is not ideal for aligning the robot’s representation. With representation alignment defined as a problem, there is the opportunity to design new types of human feedback that help the robot explicitly isolate the right representation. The project will develop new types of human feedback and algorithms for efficiently learning from them to arrive at an aligned representation. Preliminary work leveraged this observation to introduce feature traces---a novel type of human input through which users can teach the robot about specific features they care about. The project will pursue four objectives that together tackle the aspects of aligning robot representations with humans: (1) Teaching one feature at a time, beyond feature traces: It will investigate new input types for aligning robot representations with users, contribute active learning algorithms that help the human teacher provide the most informative input, and build transparency tools that enable robots to teach back to the user their current understanding of the representation. (2) Extracting features all at once from new, representation-specific human input: It will investigate new human input types that teach the full representation all at once by combining self-supervised representation learning methods with human-centric representation learning. (3) Using a correct representation in the right way: Given a new task, the robot needs to learn which features matter and in which contexts. (4) Extending earlier work to policy learning: It will extend new tools to the policy learning setting and use the lens of human-aligned representations to enable better policy generalization to new users and to improve goal mis-generalization in reinforcement learning.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

该项目旨在使机器人更健壮，并与人类的偏好和价值观保持一致。传统上，对机器人的行为和目标进行了训练，以包括一组手工制作的功能（即数据中表示的变量），这些功能反映了与任务相关的环境方面。使用精心挑选的功能非常有数据效率，但是人工工程师提前识别和编写代码的所有功能是不现实的。从许多数据中培训现代的高容量模型是一个很好的选择，只要我们不探究新型模型的新颖（分数）输入。这些模型之所以无法推广到分发输入的原因，是因为它们通常无法学习正确的表示形式，完成重要的功能，而是在数据中选择了伪造模式。该项目的核心目标是使机器人能够获得对象（以及行为）的基础正确表示。自从学习目标功能（人类用户想要的东西）以来，从根本上讲是关于人类的，这项工作只有人才才能确定实际重要的事情，而不是虚假的。该研究将引入将机器人表示与人类保持一致的问题。该项目背后的关键观察结果是，用于学习旨在教机器人的全部任务的传统输入（例如，示范或比较）并不是对齐机器人表示的理想选择。由于表示的对齐方式被定义为问题，因此有机会设计新型的人类反馈，以帮助机器人明确隔离正确的表示形式。该项目将开发新型的人类反馈和算法，以有效地向其学习，以达到一致的表示。初步工作利用了这一观察结果来介绍特征痕迹---一种新型的人类投入类型，用户可以通过该曲线来教机器人有关他们关心的特定功能。该项目将追求四个目标，共同解决将机器人表示与人类对齐的各个方面：（1）一次教学一项功能，超出特征轨迹：它将调查用于使机器人表示与用户保持一致的新输入类型，贡献积极的学习算法，以帮助人类教师提供最有用的输入，并构建人类的透明度，并在启用人类的透明度上，以启用人类的特征，以启用人类的特征。将研究新的人类输入类型，这些类型通过将自我监督的表示方法与以人为中心的代表学习结合在一起来一次教授全部代表。（3）以正确的方式使用正确的表示：给定新任务，机器人需要学习哪些功能重要以及在哪些情况下。（4）将较早的工作扩展到政策学习：它将将新工具扩展到政策学习设置，并利用与人类一致的陈述的镜头，以更好地对新用户进行更好的政策概括，并在加强学习中改善目标错误的目标误导性。该奖项反映了NSF的法定任务，并通过对基金会的知识优点和广泛的影响来评估，通过评估来获得珍贵的支持。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

暂无数据

数据更新时间：2024-06-01

Anca Dragan其他文献

Learning Temporal Distances: Contrastive Successor Features Can Provide a Metric Structure for Decision-Making

学习时间距离：对比后继特征可以为决策提供度量结构

DOI：
发表时间：
2024
2024
期刊：
影响因子：
0
作者：
Vivek Myers;Chongyi Zheng;Anca Dragan;Sergey Levine;Benjamin Eysenbach
Vivek Myers;Chongyi Zheng;Anca Dragan;Sergey Levine;Benjamin Eysenbach
通讯作者：
Benjamin Eysenbach
Benjamin Eysenbach

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

当你的人工智能欺骗你时：奖励学习中人类评估者的部分可观察性挑战

DOI：
10.48550/arxiv.2402.17747
10.48550/arxiv.2402.17747
发表时间：
2024
2024
期刊：
ArXiv
ArXiv
影响因子：
0
作者：
Leon Lang;Davis Foote;Stuart J. Russell;Anca Dragan;Erik Jenner;Scott Emmons
Leon Lang;Davis Foote;Stuart J. Russell;Anca Dragan;Erik Jenner;Scott Emmons
通讯作者：
Scott Emmons
Scott Emmons

Adversaries Can Misuse Combinations of Safe Models

对手可能会滥用安全模型的组合

DOI：
发表时间：
2024
2024
期刊：
影响因子：
0
作者：
Erik Jones;Anca Dragan;Jacob Steinhardt
Erik Jones;Anca Dragan;Jacob Steinhardt
通讯作者：
Jacob Steinhardt
Jacob Steinhardt