III: Large: Collaborative Research: Analysis Engineering for Robust End-to-End Data Science
III:大型:协作研究:稳健的端到端数据科学的分析工程
基本信息
- 批准号:1856641
- 负责人:
- 金额:$ 71.25万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
From poor statistical practices leading to retractions of scientific "discoveries" to low-level spreadsheet errors subverting high-stakes analyses, failures of data analysis can have catastrophic consequences. The rapid growth of data science practice in the last decade has led to large collaborative efforts to develop new data processing, machine learning, and analytics tools that put more advanced data analysis into the hands of a wider audience of practitioners, from students to scientists to designers. The most dominant tool for data science is code, where cutting-edge algorithms can be applied from an existing libraries. However, as this democratization of data science has lowered the barrier to using advanced methods, safely using these tools under sound statistical practice remains as difficult as ever. To facilitate more robust data science, this project investigates models and tools for analysis engineering by data scientists who write programs. The focus is on the complete end-to-end process of data analysis performed with code: the iterative, and often exploratory, steps that analysts go through to turn data into This project will contribute insights and characterizations of analytic work, novel methods for capturing and analyzing data science activities, and develop new programming tools and visualization methods for authoring and validating analyses. If successful, this project will augment people's ability to conduct and assess data analyses, promoting more robust results and reducing the gap between novice and expert analysts. The findings and tools from the project will be incorporated into educational efforts, including classroom teaching and tutorials and available as open source software integrated into popular analytical environments (e.g., Jupyter).Data analysis is a central activity to scientific research, yet is too often conducted in an undisciplined fashion. This project treats the entire analytic process as our central phenomenon of study. The project will employ mixed methods to study and characterize common analysis practices and pitfalls, including direct observations of data analysts, large-scale analysis of computational notebooks, and instrumentation of analytic programming environments like JupyterLab. The project will contribute new methods for specifying and safeguarding analyses, including domain-specific languages and program synthesis methods to guide users to preferred next steps. It will also explore "multiverse" workflows to manage and assess a diversity of analysis decisions. Analogues of debugging and testing tools will be developed to flag problems and perform error analysis, while the capture and visualization of analytic provenance to aid reproducibility, verification, and collaborative review. The work will be evaluated through controlled studies, classroom use, and open-source deployment for wide-scale field use.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
从导致科学“发现”的缩回到颠覆高风险分析的低水平电子表格错误中,数据分析的失败可能会带来灾难性的后果。在过去十年中,数据科学实践的快速发展导致了开发新的数据处理,机器学习和分析工具的巨大协作努力,这些工具将更高级的数据分析置于从学生到科学家再到设计师的更多受众群体的手中。数据科学的最主要工具是代码,可以从现有库中应用最先进的算法。但是,由于数据科学的民主化降低了使用先进方法的障碍,因此在合理的统计实践下安全地使用这些工具仍然一如既往地困难。为了促进更强大的数据科学,该项目研究了编写程序的数据科学家分析工程的模型和工具。重点是使用代码执行的数据分析的完整端到端过程:分析师将数据转化为该项目的迭代,通常是探索性的步骤,将有助于分析工作的见解和特征,新的方法捕获和分析数据科学活动的方法,以及开发新的编程工具和可视化工具和可视化方法,用于创作和验证分析。如果成功,该项目将增强人们进行和评估数据分析的能力,促进更强大的结果并减少新手和专家分析师之间的差距。该项目的发现和工具将被纳入教育工作,包括课堂教学和教程,并作为集成到流行的分析环境中的开源软件(例如,Jupyter)。DATA分析是科学研究的核心活动,但通常以一种无关的方式进行。该项目将整个分析过程视为我们的研究中心现象。该项目将采用混合方法来研究和表征常见的分析实践和陷阱,包括直接观察数据分析师,对计算笔记本的大规模分析以及诸如Jupyterlab之类的分析编程环境的仪器。该项目将为指定和保护分析的新方法提供新的方法,包括特定于领域的语言和程序合成方法,以指导用户获得首选的下一步。它还将探索“多元宇宙”工作流程,以管理和评估各种分析决策。调试和测试工具的类似物将被开发以提示问题并执行错误分析,而分析起源的捕获和可视化以帮助可重复性,验证和协作审查。这项工作将通过对照研究,课堂使用和开源部署进行评估,以供大规模现场使用。该奖项反映了NSF的法定任务,并使用基金会的知识分子优点和更广泛的影响评估标准,认为值得通过评估来获得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Brad Myers其他文献
Using traits of web macro scripts to predict reuse
- DOI:
10.1016/j.jvlc.2010.08.003 - 发表时间:
2010-12-01 - 期刊:
- 影响因子:
- 作者:
Chris Scaffidi;Chris Bogart;Margaret Burnett;Allen Cypher;Brad Myers;Mary Shaw - 通讯作者:
Mary Shaw
Brad Myers的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Brad Myers', 18)}}的其他基金
SHF: Small: Personalizing API Documentation
SHF:小型:个性化 API 文档
- 批准号:
2007482 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
CHS: Small: Multimodal Conversational Assistant that Learns from Demonstrations
CHS:Small:从演示中学习的多模式对话助手
- 批准号:
1814472 - 财政年份:2018
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
TWC: Small: Empirical Evaluation of the Usability and Security Implications of Application Programming Interface Design
TWC:小:应用程序编程接口设计的可用性和安全性影响的实证评估
- 批准号:
1423054 - 财政年份:2014
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
HCC: Large: Collaborative Research: Variations to Support Exploratory Programming
HCC:大型:协作研究:支持探索性编程的变体
- 批准号:
1314356 - 财政年份:2013
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
HCC: Small: Better Tools for Authoring Interactive Behaviors
HCC:小:用于创作交互行为的更好工具
- 批准号:
1116724 - 财政年份:2011
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Pilot: Exploratory Programming for Interactive Behaviors: Unleashing Interaction Designers' Creativity
试点:交互行为的探索性编程:释放交互设计师的创造力
- 批准号:
0757511 - 财政年份:2008
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
CPA-SEL: Better Tools for Software Understanding
CPA-SEL:更好的软件理解工具
- 批准号:
0811610 - 财政年份:2008
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Automatically Generating Consistent User Interfaces for Multiple Appliances
自动为多个设备生成一致的用户界面
- 批准号:
0534349 - 财政年份:2005
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Lowering the Barriers to Successful Programming
降低成功编程的障碍
- 批准号:
0329090 - 财政年份:2003
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Using Handhelds to Help People with Motor Impairments
使用手持设备帮助运动障碍患者
- 批准号:
0308065 - 财政年份:2003
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
相似国自然基金
基于大塑性变形晶粒细化的背压触变反挤压锡青铜偏析行为调控研究
- 批准号:52365047
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
面向大跨度结构的高强多孔骨料内养护UHPC徐变性能与模型研究
- 批准号:52308231
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于深度光学的大视场高分辨宽景深小型化显微成像
- 批准号:62301293
- 批准年份:2023
- 资助金额:10 万元
- 项目类别:青年科学基金项目
基于气体多通腔多模非线性效应的大能量可调谐光源的研究
- 批准号:12374318
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
二维氮化钼/磷化钼面内异质结构催化材料的设计合成及大电流密度析氢性能研究
- 批准号:22379116
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
III: Medium: Collaborative Research: Integrating Large-Scale Machine Learning and Edge Computing for Collaborative Autonomous Vehicles
III:媒介:协作研究:集成大规模机器学习和边缘计算以实现协作自动驾驶汽车
- 批准号:
2348169 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Continuing Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236578 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Collaborative Research: III: Small: Taming Large-Scale Streaming Graphs in an Open World
协作研究:III:小型:在开放世界中驯服大规模流图
- 批准号:
2236579 - 财政年份:2023
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
III: Small: Collaborative Research: Cost-Efficient Sampling and Estimation from Large-Scale Networks
III:小型:协作研究:大规模网络的经济高效采样和估计
- 批准号:
2209921 - 财政年份:2021
- 资助金额:
$ 71.25万 - 项目类别:
Standard Grant
Collaborative Research: Chameleon Phase III: A Large-Scale, Reconfigurable Experimental Environment for Cloud Research
合作研究:Chameleon 第三阶段:用于云研究的大规模、可重构实验环境
- 批准号:
2027170 - 财政年份:2020
- 资助金额:
$ 71.25万 - 项目类别:
Cooperative Agreement