RI: Small: Improving Crowd-Sourced Annotation by Autonomous Intelligent Agents
RI:小型:通过自主智能代理改进众包注释
基本信息
- 批准号:1420667
- 负责人:
- 金额:$ 46万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-08-01 至 2018-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Supervised machine learning methods are arguably the greatest success story for Artificial Intellitence with a deep underlying theory and applications ranging from medical diagnosis and scientific data analysis to ecommerce recommender systems and credit-card fraud detection. Unfortunately, all these methods require labeled training data, which has been annotated by a human --- a time consuming and extremely expensive process. This project will use automated decision theory to control the annotation process, saving significant amounts of human labor and extending the practical use of machine learning to a much broader array of societal problems. Specifically, the methods address the case where labeled data is crowd-sourced by a large number of human annotators whose skill and error rates are variable. The project develops new control algorithms that let the learner efficiently ask specific workers to label (or redundantly re-label) specific examples. To test the practicality of their methods, the PIs build and conduct studies with the Information Omnivore, a fully autonomous agent that optimizes the annotation of natural language processing (NLP) training data. By continuously posing questions to paid workers and volunteer citizen-scientists, the Omnivore 1) will learn which problems are hard and which are easy, 2) will learn about the skills of the various workers, 3) and will decide questions to ask which workers in order to maximize the accuracy of the learned model given scare human help. Besides contributing to the science of automated control, the Omnivore will generate labeled training data for two important NLP problems: named entity linking (NEL) and information extraction (IE), greatly helping the community of NLP researchers. Furthermore, the researchers plan a number of outreach efforts, including curriculum development, participation in the K12 Paws on Science program at the Pacific Science Center and interaction with the diverse students comprising the Washington STate Academic RedShirt (STARS) in Engineering program. The specific algorithms proposed by the PIs are notable in several respects. Their decision-theoretic optimization framework operationalizes intuitions like (1) one should assign more or better workers to hard problems and (2) one should redirect effort away from easy questions or from tasks that are too hard to solve. Automating this reasoning is hard because problem difficulty and worker skill are latent variables and thus the agent must confront an exploration / exploitation tradeoff as it balances actions that enable it to learn about the capabilities of workers with the ultimate goal of producing quality annotations. The PIs consider two cases: Task Allocation for Annotation Accuracy tries to maximize the overall annotation accuracy of a fixed size data set through batch assignment of workers to tasks. Re-Active Learning seeks instead to directly construct an accurate ML classifier through a balanced mix of annotator requests to re-label old or label new examples. In both cases they propose a model based on decision-theoretic methods (e.g., partially-observable Markov decision processes (POMDPs) and multi-armed bandits). The PIs propose to integrate their methods in the Information Omnivore, a long-lived software agent that integrates planning and execution, acts in the real world, and learns a model of its environment. The Omnivore will allow large-scale latitudinal studies of their algorithms, and as a byproduct will generate NLP training data that will greatly assist a large community of other researchers.
监督机器学习方法可以说是人工智能最成功的故事,具有深厚的基础理论和应用,从医学诊断和科学数据分析到电子商务推荐系统和信用卡欺诈检测。不幸的是,所有这些方法都需要标记的训练数据,并由人工注释——这是一个耗时且极其昂贵的过程。该项目将使用自动决策理论来控制注释过程,节省大量人力,并将机器学习的实际应用扩展到更广泛的社会问题。 具体来说,这些方法解决了标记数据由大量技能和错误率可变的人类注释者众包的情况。该项目开发了新的控制算法,使学习者能够有效地要求特定的工作人员标记(或冗余地重新标记)特定的示例。 为了测试其方法的实用性,PI 构建并使用 Information Omnivore 进行研究,这是一个完全自主的代理,可优化自然语言处理 (NLP) 训练数据的注释。通过不断向受薪工人和志愿公民科学家提出问题,杂食者 1) 将了解哪些问题是困难的,哪些是容易的,2) 将了解各种工人的技能,3) 并决定向哪些工人提出问题为了最大限度地提高学习模型的准确性,给予人类帮助。除了为自动化控制科学做出贡献之外,Omnivore 还将为两个重要的 NLP 问题生成标记训练数据:命名实体链接 (NEL) 和信息提取 (IE),极大地帮助 NLP 研究人员社区。此外,研究人员还计划开展一系列外展工作,包括课程开发、参与太平洋科学中心的 K12 Paws on Science 项目,以及与华盛顿州立大学学术红衫 (STARS) 工程项目的不同学生进行互动。 PI 提出的具体算法在几个方面值得注意。他们的决策理论优化框架将直觉付诸实践,例如:(1)人们应该分配更多或更好的工人来解决难题;(2)人们应该将精力从简单的问题或太难解决的任务上转移开。自动化这种推理很困难,因为问题难度和工人技能是潜在变量,因此代理必须面对探索/利用权衡,因为它需要平衡行动,使其能够了解工人的能力,最终目标是生成高质量注释。 PI 考虑两种情况: 注释准确性的任务分配试图通过将工作人员批量分配给任务来最大化固定大小数据集的整体注释准确性。相反,Re-Active Learning 寻求通过平衡注释器请求来重新标记旧示例或标记新示例来直接构建准确的 ML 分类器。在这两种情况下,他们都提出了一个基于决策理论方法的模型(例如,部分可观察的马尔可夫决策过程(POMDP)和多臂老虎机)。 PI 建议将他们的方法集成到 Information Omnivore 中,这是一种长期存在的软件代理,它集成了规划和执行,在现实世界中行动,并学习其环境模型。 Omnivore 将允许对其算法进行大规模纬度研究,并且作为副产品将生成 NLP 训练数据,这将极大地帮助其他研究人员的大型社区。
项目成果
期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Active Learning with Unbalanced Classes & Example-Generated Queries
班级不平衡的主动学习
- DOI:
- 发表时间:2018-01
- 期刊:
- 影响因子:0
- 作者:C. Lin; Mausam
- 通讯作者:Mausam
Intelligible Artificial Intelligence
可理解的人工智能
- DOI:
- 发表时间:2018-01
- 期刊:
- 影响因子:0
- 作者:D.S. Weld; G. Bansal
- 通讯作者:G. Bansal
Semi-Supervised Event Extraction with Paraphrase Clusters
使用释义簇进行半监督事件提取
- DOI:10.18653/v1/n18-2058
- 发表时间:2018-06-01
- 期刊:
- 影响因子:0
- 作者:James Ferguson;Colin Lockard;Daniel S. Weld;Hannaneh Hajishirzi
- 通讯作者:Hannaneh Hajishirzi
Sprout: Crowd-Powered Task Design for Crowdsourcing
Sprout:众包的众包任务设计
- DOI:
- 发表时间:2018-01
- 期刊:
- 影响因子:0
- 作者:J. Bragg; Mausam
- 通讯作者:Mausam
Self-Improving Crowdsourcing: Near-Effortless Design of Adaptive Distributed Work
自我改进的众包:自适应分布式工作的近乎轻松的设计
- DOI:10.1016/j.chbr.2021.100077
- 发表时间:2018-11-28
- 期刊:
- 影响因子:3.9
- 作者:Jonathan Bragg
- 通讯作者:Jonathan Bragg
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Daniel Weld其他文献
Daniel Weld的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Daniel Weld', 18)}}的其他基金
CCRI: Research Infrastructure: NEW: Semantic Scholar Open Data Platform: Enabling Research Into Scientific Search and Discovery
CCRI:研究基础设施:新:语义学者开放数据平台:促进科学搜索和发现研究
- 批准号:
2213656 - 财政年份:2022
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
RAPID: Augmented Intelligence for Accelerating Covid-Related Scientific Discovery
RAPID:增强智能加速新冠相关科学发现
- 批准号:
2040196 - 财政年份:2020
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
RI: Small: Integrating Paradigms for Approximate Stochastic Planning
RI:小型:集成近似随机规划的范式
- 批准号:
1016465 - 财政年份:2010
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
RI: Small: Decision-Theoretic Control of Crowd-Sourced Workflows
RI:小型:众包工作流程的决策理论控制
- 批准号:
1016713 - 财政年份:2010
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
Supporting Students Attending IUI 2009 Conference
支持学生参加 IUI 2009 会议
- 批准号:
0914591 - 财政年份:2009
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
Representation and Reasoning about Adaptive Interfaces
自适应接口的表示和推理
- 批准号:
0307906 - 财政年份:2003
- 资助金额:
$ 46万 - 项目类别:
Continuing Grant
Extending Graphplan to Handle Uncertainty and Sensing Actions
扩展 Graphplan 来处理不确定性和感知动作
- 批准号:
9872128 - 财政年份:1998
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
Principled Planning with Simultaneous Actions, Metric Time and Continuous Effects
同步行动、公制时间和连续效应的原则性规划
- 批准号:
9303461 - 财政年份:1994
- 资助金额:
$ 46万 - 项目类别:
Continuing Grant
Managing Complexity in Qualitative Physics
管理定性物理学的复杂性
- 批准号:
8902010 - 财政年份:1989
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
相似国自然基金
基于翻译组学理论探究LncRNA H19编码多肽PELRM促进小胶质细胞活化介导电针巨刺改善膝关节术后疼痛的机制研究
- 批准号:82305399
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
CB2R-β-arrestin1抑制小胶质细胞代谢重编程调控神经炎症在改善POCD中的机制研究
- 批准号:82360227
- 批准年份:2023
- 资助金额:32.2 万元
- 项目类别:地区科学基金项目
麦角硫因基于Nrf-2-CD36通路改善高脂饮食诱导小胶质细胞吞噬功能受损的机制研究
- 批准号:32372326
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
逍遥散通过IDO1调控小胶质细胞吞噬作用改善神经可塑性治疗抑郁症的机制研究
- 批准号:82305176
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
逍遥散调节脂肪酸代谢介导的小胶质细胞“敏化”改善早期应激致抑郁机制研究
- 批准号:82304990
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
相似海外基金
SHF: Small: Improving Efficiency of Vision Transformers via Software-Hardware Co-Design and Acceleration
SHF:小型:通过软硬件协同设计和加速提高视觉变压器的效率
- 批准号:
2233893 - 财政年份:2023
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
Improving Diagnosis in Gastrointestinal Cancer: Integrating Prediction Models into Routine Clinical Care
改善胃肠癌的诊断:将预测模型纳入常规临床护理
- 批准号:
10641060 - 财政年份:2023
- 资助金额:
$ 46万 - 项目类别:
Improving Glioma Immunotherapy Efficacy by Regulating Tumor Inflammation
通过调节肿瘤炎症提高胶质瘤免疫治疗效果
- 批准号:
10750788 - 财政年份:2023
- 资助金额:
$ 46万 - 项目类别:
Collaborative Research: SaTC: CORE: Small: Measuring, Validating and Improving upon App-Based Privacy Nutrition Labels
合作研究:SaTC:核心:小型:测量、验证和改进基于应用程序的隐私营养标签
- 批准号:
2247952 - 财政年份:2023
- 资助金额:
$ 46万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: CORE: Small: Measuring, Validating and Improving upon App-Based Privacy Nutrition Labels
合作研究:SaTC:核心:小型:测量、验证和改进基于应用程序的隐私营养标签
- 批准号:
2247953 - 财政年份:2023
- 资助金额:
$ 46万 - 项目类别:
Standard Grant