CAREER: Advancing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management

职业:推进开放式众包:众包数据管理的下一个前沿

基本信息

项目摘要

Machine learning on big data is finally having an impact on our daily lives, from small triumphs like Siri and Google Translate to much tougher emerging applications like driverless cars and computer-assisted medical image diagnosis. From mundane online fraud detection to the most sophisticated uses of computer vision, these applications share an insatiable appetite for massive labeled training data. The primary source of high-quality labels is crowdsourcing, and research to date on crowdsourcing has focused on the key problem of how to maximize the production of high-quality crowdsourced labels per dollar spent, for problems where workers must choose between just a few predefined labels. However, more open-ended labeling problems have grown to constitute almost half of crowdsourced tasks today, and open-ended tasks raise an entirely new set of research challenges for crowdsourced data management.This activity addresses the key new research challenges in managing and optimizing open-ended crowdsourcing. Since open-ended crowdsourcing employs tasks with a large number of alternatives, humans struggle to select error-free ones. Additional challenges emerge in determining the open-ended task types appropriate for a specific problem, developing schemes to ascertain the right answer given open-ended worker responses, and inferring the hidden perspectives behind worker answers. The activity targets open-ended crowdsourcing problems that span nearly 90% of those used in practice today, with wide applicability in computer vision, natural language processing, and machine learning in general. The technical outcomes of the activity include the first foundational principles for open-ended crowdsourced data management, which in turn will expand the reach of machine learning into new and more challenging domains and more effective solutions in existing applications that impact our everyday lives. The pedagogical outcomes of the activity include a course on human-in-the-loop data analytics, crowdsourcing education modules for school teachers, as well as a quantification and dissemination of how crowdsourcing is performed in practice, along with a benchmark to accelerate crowdsourcing research in the future.
大数据上的机器学习终于对我们的日常生活产生了影响,从 Siri 和谷歌翻译这样的小胜利,到无人驾驶汽车和计算机辅助医学图像诊断等更严峻的新兴应用。 从普通的在线欺诈检测到最复杂的计算机视觉应用,这些应用程序都对大量标记的训练数据有着永不满足的胃口。高质量标签的主要来源是众包,迄今为止,关于众包的研究主要集中在如何最大限度地提高每一美元生产的高质量众包标签的关键问题,以解决工人必须在几个预定义的标签之间进行选择的问题。标签。 然而,更多的开放式标签问题已发展成为当今众包任务的近一半,开放式任务为众包数据管理提出了一系列全新的研究挑战。这项活动解决了管理和优化开放式任务中的关键新研究挑战。结束众包。由于开放式众包采用具有大量替代方案的任务,因此人类很难选择无错误的任务。在确定适合特定问题的开放式任务类型、制定方案以确定给定开放式工作人员响应的正确答案以及推断工作人员答案背后隐藏的观点方面,还出现了其他挑战。该活动针对开放式众包问题,涵盖当今实践中近 90% 的问题,在计算机视觉、自然语言处理和机器学习中具有广泛的适用性。该活动的技术成果包括开放式众包数据管理的首要基本原则,这反过来又会将机器学习的范围扩大到新的、更具挑战性的领域,并在影响我们日常生活的现有应用程序中提供更有效的解决方案。该活动的教学成果包括人机循环数据分析课程、学校教师的众包教育模块、众包在实践中如何进行的量化和传播,以及加速众包研究的基准将来。

项目成果

期刊论文数量(8)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Understanding workers, developing effective tasks, and enhancing marketplace dynamics: a study of a large crowdsourcing marketplace
了解员工、制定有效任务并增强市场动态:对大型众包市场的研究
  • DOI:
    10.14778/3067421.3067431
  • 发表时间:
    2017-03
  • 期刊:
  • 影响因子:
    2.5
  • 作者:
    Jain, Ayush;Sarma, Akash Das;Parameswaran, Aditya;Widom, Jennifer
  • 通讯作者:
    Widom, Jennifer
How Developers Iterate on Machine Learning Workflows
开发人员如何迭代机器学习工作流程
  • DOI:
  • 发表时间:
    2018-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xin, D;Ma, L;Song, S;Parameswaran, A.
  • 通讯作者:
    Parameswaran, A.
Holistic Crowd-Powered Sorting via AID: Optimizing for Accuracies, Inconsistencies, and Difficulties
通过 AID 进行整体群体支持的排序:针对准确性、不一致和困难进行优化
  • DOI:
    10.1145/3269206.3269279
  • 发表时间:
    2018-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Rajpal, Shreya;Parameswaran, Aditya
  • 通讯作者:
    Parameswaran, Aditya
Accelerating Human-in-the-loop Machine Learning: Challenges and Opportunities
加速人机循环机器学习:挑战与机遇
  • DOI:
    10.1145/3209889.3209897
  • 发表时间:
    2018-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xin, Doris;Ma, Litian;Liu, Jialin;Macke, Stephen;Song, Shuchen;Parameswaran, Aditya
  • 通讯作者:
    Parameswaran, Aditya
Helix: Holistic Optimization for Accelerating Iterative Machine Learning
Helix:加速迭代机器学习的整体优化
  • DOI:
    10.14778/3297753.3297763
  • 发表时间:
    2018-12-01
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Doris Xin;Stephen Macke;Litian Ma;Jialin Liu;Shuchen Song;Aditya G. Parameswaran
  • 通讯作者:
    Aditya G. Parameswaran
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Aditya Parameswaran其他文献

MIT Open Access Articles Towards Visualization Recommendation Systems
麻省理工学院面向可视化推荐系统的开放获取文章
  • DOI:
    10.1109/access.2022.3159976
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    3.9
  • 作者:
    Manasi Vartak;Silu Huang;Tarique Siddiqui;Samuel Madden;Aditya Parameswaran
  • 通讯作者:
    Aditya Parameswaran
Automatic email response suggestion for support departments within a university
为大学内的支持部门提供自动电子邮件回复建议
  • DOI:
    10.7287/peerj.preprints.26531v1
  • 发表时间:
    2018-02-17
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Aditya Parameswaran;D. Mishra;Sanchit Bansal;Vinayak Agarwal;Anjali Goyal;A. Sureka
  • 通讯作者:
    A. Sureka

Aditya Parameswaran的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Aditya Parameswaran', 18)}}的其他基金

FW-HTF-R: Human-Machine Teaming for Effective Data Work at Scale: Upskilling Defense Lawyers Working with Police and Court Process Data
FW-HTF-R:大规模有效数据工作的人机协作:提高辩护律师处理警察和法院流程数据的技能
  • 批准号:
    2129008
  • 财政年份:
    2021
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Standard Grant
FW-HTF-R: Human-Machine Teaming for Effective Data Work at Scale: Upskilling Defense Lawyers Working with Police and Court Process Data
FW-HTF-R:大规模有效数据工作的人机协作:提高辩护律师处理警察和法院流程数据的技能
  • 批准号:
    2129008
  • 财政年份:
    2021
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Standard Grant
CAREER: Advancing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
职业:推进开放式众包:众包数据管理的下一个前沿
  • 批准号:
    1940757
  • 财政年份:
    2019
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Continuing Grant
AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization
AitF:协作研究:快速、准确和实用:用于可扩展可视化的自适应次线性算法
  • 批准号:
    1940759
  • 财政年份:
    2019
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Standard Grant
AitF: Collaborative Research: Fast, Accurate, and Practical: Adaptive Sublinear Algorithms for Scalable Visualization
AitF:协作研究:快速、准确和实用:用于可扩展可视化的自适应次线性算法
  • 批准号:
    1733878
  • 财政年份:
    2017
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: DataHub - A Collaborative Dataset Management Platform for Data Science
III:媒介:协作研究:DataHub - 数据科学协作数据集管理平台
  • 批准号:
    1513407
  • 财政年份:
    2015
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Continuing Grant

相似国自然基金

果蝇幼虫前进运动发起的神经机制
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    54 万元
  • 项目类别:
    面上项目
机器人鸟“前进”运动控制神经信息传导通路及反馈研究
  • 批准号:
    61903230
  • 批准年份:
    2019
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目
内蒙古中东部毛登-前进场早石炭世强过铝花岗岩带地球化学成因及其构造意义
  • 批准号:
    41702054
  • 批准年份:
    2017
  • 资助金额:
    23.0 万元
  • 项目类别:
    青年科学基金项目
搅拌摩擦焊接过程前进阻力周期脉动振荡行为及调控研究
  • 批准号:
    51675248
  • 批准年份:
    2016
  • 资助金额:
    62.0 万元
  • 项目类别:
    面上项目
高前进比大反流区对旋翼操纵响应的作用机理及影响规律研究
  • 批准号:
    51505216
  • 批准年份:
    2015
  • 资助金额:
    20.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Development of new drugs for Toxoplasma by advancing hits from the Global Health Chemical Diversity Library
通过推进全球健康化学多样性图书馆的热门产品开发治疗弓形虫的新药
  • 批准号:
    10552608
  • 财政年份:
    2020
  • 资助金额:
    $ 51.72万
  • 项目类别:
Development of new drugs for Toxoplasma by advancing hits from the Global Health Chemical Diversity Library
通过推进全球健康化学多样性图书馆的热门产品开发治疗弓形虫的新药
  • 批准号:
    9891756
  • 财政年份:
    2020
  • 资助金额:
    $ 51.72万
  • 项目类别:
Development of new drugs for Toxoplasma by advancing hits from the Global Health Chemical Diversity Library
通过推进全球健康化学多样性图书馆的热门产品开发治疗弓形虫的新药
  • 批准号:
    10438518
  • 财政年份:
    2020
  • 资助金额:
    $ 51.72万
  • 项目类别:
CAREER: Advancing Open-Ended Crowdsourcing: The Next Frontier in Crowdsourced Data Management
职业:推进开放式众包:众包数据管理的下一个前沿
  • 批准号:
    1940757
  • 财政年份:
    2019
  • 资助金额:
    $ 51.72万
  • 项目类别:
    Continuing Grant
The Utah Regional Network for Excellence in Neuroscience Clinical Trials
犹他州神经科学临床试验卓越区域网络
  • 批准号:
    10593643
  • 财政年份:
    2018
  • 资助金额:
    $ 51.72万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了