S&AS: FND: COLLAB: Learning from Stories: Practical Value Alignment and Taskability for Autonomous Systems

基本信息

批准号：
1849262
负责人：
Mark Riedl
金额：
$ 30.87万
依托单位：
Georgia Tech Research Corporation
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2019
资助国家：
美国
起止时间：
2019-06-01 至 2023-05-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1849262&HistoricalAwards=false
关键词：
S&amp FND COLLAB Learning Stories

项目摘要

In the near future we are likely to see increasingly-capable autonomous systems operating in proximity to humans and immersed in society. As these systems become more sophisticated, they will interact increasingly with humans. With this increased human-agent interaction comes an increased obligation to ensure that autonomous systems do not cause even unintentional harm to a human. Creating systems that cannot intentionally or unintentionally harm humans in not an easy task. This is because there are infinitely many undesirable outcomes that can be achieved in an open world, making it impossible to instruct these systems to avoid each one. If the desired behavior cannot be directly specified, then it must be learned. Past approaches to learn these types of behaviors have focused on learning from human examples, but these methods are unlikely to scale. This research uses natural language explanations of behavior as a scalable alternative for training autonomous agents for safe operation. Naturalistic descriptions contain vast amounts of information about sociocultural norms, which make them rich sources for such training. Enabling systems to better understand and learn from such descriptions will enable human operators to more naturally specify goals or tasks for the agent to complete.This research explores the concept of learning via natural language descriptions of desired behavior. This technique uses procedural knowledge contained in natural language explanations to help train autonomous agents. Concretely, this approach learns utility functions that can be used to guide autonomous agents towards behaviors that are aligned with the description used for training. To accomplish this, researchers will create computational models capable of extracting both knowledge about sociocultural norms as well as procedural knowledge from naturally occurring corpora. These models will then be used to create behavior policies that are both aligned with sociocultural norms and procedurally plausible. To further ensure that these models can be practically deployed, researchers will enable their models to incorporate a "human in the loop" to provide online feedback about the quality of these learned behavior policies in terms of their social acceptability and appropriateness. Safeguards will also be investigated to protect the learned behavior policies against the effects of adversarial or malicious training examples.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在不久的将来，我们可能会看到功能越来越强大的自主系统在人类附近运行并融入社会。随着这些系统变得更加复杂，它们将越来越多地与人类互动。随着人机交互的增加，确保自主系统不会对人类造成无意伤害的义务也随之增加。创建不会有意或无意伤害人类的系统并不是一件容易的事。这是因为在开放世界中可能会出现无数种不良结果，因此不可能指示这些系统避免每一种结果。如果无法直接指定所需的行为，则必须学习它。过去学习此类行为的方法主要集中于从人类例子中学习，但这些方法不太可能扩展。这项研究使用自然语言解释行为作为训练自主代理安全操作的可扩展替代方案。自然主义描述包含大量有关社会文化规范的信息，这使它们成为此类培训的丰富来源。使系统能够更好地理解和学习这些描述将使人类操作员能够更自然地指定代理要完成的目标或任务。这项研究探讨了通过期望行为的自然语言描述进行学习的概念。该技术使用自然语言解释中包含的程序知识来帮助训练自主代理。具体来说，这种方法学习效用函数，可用于指导自主代理采取与训练所用描述一致的行为。为了实现这一目标，研究人员将创建能够从自然存在的语料库中提取有关社会文化规范的知识以及程序性知识的计算模型。然后，这些模型将用于制定既符合社会文化规范又在程序上合理的行为政策。为了进一步确保这些模型可以实际部署，研究人员将使他们的模型纳入“循环中的人”，以提供有关这些学习行为政策的社会可接受性和适当性质量的在线反馈。还将调查保障措施，以保护学习行为政策免受对抗性或恶意训练示例的影响。该奖项反映了 NSF 的法定使命，并通过使用基金会的智力价值和更广泛的影响审查标准进行评估，被认为值得支持。