EAGER:High Performance Algorithms for Interactive Data Science at Scale

EAGER:大规模交互式数据科学的高性能算法

基本信息

  • 批准号:
    2109988
  • 负责人:
  • 金额:
    $ 18.74万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2021
  • 资助国家:
    美国
  • 起止时间:
    2021-03-01 至 2025-06-30
  • 项目状态:
    未结题

项目摘要

A real-world challenge in data science is to develop interactive methods for quickly analyzing new and novel data sets that are potentially of massive scale. This award will design and implement fundamental algorithms for high performance computing solutions that enable the interactive large-scale data analysis of massive data sets. Based on the widely-used data types and structures of strings, sets, matrices and graphs, this methodology will produce efficient and scalable software for three classes of fundamental algorithms that will drastically improve the performance on a wide range of real-world queries or directly realize frequent queries. These innovations will allow the broad community to move massive-scale data exploration from time-consuming batch processing to interactive analyses that give a data analyst the ability to comprehensively, deeply and efficiently explore the insights and science in real world data sets. By enabling the increasing number of developers to easily manipulate large data sets, this will greatly enlarge the data science community and find much broader use in new communities. Materials from this project will be included in graduate and undergraduate course curriculum. Especially, women, high school students and other underrepresented groups in STEM areas will be encouraged to participate in this research activity. This project focuses on these three important data structures for data analytics: 1) suffix array construction, 2) 'treap' construction and 3) distributed memory join algorithms, useful for analyzing large scale strings, implementing random search in large string data sets, and generating new relations, respectively. These fundamental algorithms serve as the cornerstone to support interactive data science at scale. Based on the theoretical achievements and systematic algorithm design, a novel symbiotic optimization methodology that can combine the theoretical analysis, data structure features, and typical data distribution features together as a whole will be developed to significantly improve the practical performance of the proposed algorithms. To evaluate and show the effectiveness of the proposed algorithms, these algorithms will be implemented in and contribute to an open source NumPy-like software framework that aims to provide productive data discovery tools on massive, dozens-of-terabytes data sets by bringing together the productivity of Python with world-class high performance computing.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
数据科学中的一个现实挑战是开发交互式方法,以快速分析潜在规模的新型和新颖的数据集。该奖项将针对高性能计算解决方案设计和实施基本算法,以实现大规模数据集的交互式大规模数据分析。基于广泛使用的数据类型和字符串,集合,矩阵和图形的结构,该方法将为三类基本算法生成有效且可扩展的软件,这些软件将大大改善广泛的现实查询中的性能或直接实现频繁查询。这些创新将使广泛的社区从耗时的批处理处理中将大规模的数据探索转移到交互式分析中,这些分析使数据分析师能够全面,深刻,深刻,高效地探索现实世界数据集中的见解和科学。通过使越来越多的开发人员轻松地操纵大型数据集,这将大大扩大数据科学界,并在新社区中找到更广泛的用途。 该项目的材料将包括在研究生和本科课程课程中。尤其是,将鼓励妇女,高中生和其他代表性不足的群体参加这项研究活动。该项目着重于数据分析的这三个重要数据结构:1)后缀阵列构造,2)“ Treap”构建和3)分布式内存加入算法,可用于分析大规模字符串,在大型字符串数据集中实施随机搜索,并分别生成新的关系。这些基本算法是支持互动数据科学的基石。基于理论成就和系统的算法设计,一种新型的共生优化方法可以将理论分析,数据结构特征和典型的数据分布特征整体结合在一起,以显着提高所提出算法的实践性能。为了评估和显示拟议算法的有效性,这些算法将实施并为开源的类似Numpy的软件框架而实施,并旨在为大规模的数据发现工具提供富有生产力的数据发现工具,数十个数据集,数十个数据集,通过将python的生产力与世界级的授权授予。利用基金会的知识分子和更广泛的影响审查标准。

项目成果

期刊论文数量(26)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Triangle Counting Through Cover-Edges
通过盖边缘进行三角形计数
Triangle Centrality in Arkouda
Arkouda 的三角形中心性
Anti-Section Transitive Closure
Parallel Longest Common SubSequence Analysis In Chapel
Fast Triangle Counting
快速三角形计数
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

David Bader其他文献

The effect of combined spinal-epidural anesthesia versus general anesthesia on the recovery time of intestinal function in young infants undergoing intestinal surgery: a randomized, prospective, controlled trial
  • DOI:
    10.1016/j.jclinane.2012.02.004
  • 发表时间:
    2012-09-01
  • 期刊:
  • 影响因子:
  • 作者:
    Mostafa Somri;Ibrahim Matter;Constantinos A. Parisinos;Ron Shaoul;Jorge G. Mogilner;David Bader;Eldar Asphandiarov;Luis A. Gaitini
  • 通讯作者:
    Luis A. Gaitini
Investigating an interchangeable potential between heart and gut mesothelial development
  • DOI:
    10.1016/j.ydbio.2011.05.236
  • 发表时间:
    2011-08-01
  • 期刊:
  • 影响因子:
  • 作者:
    Rebecca T. Thomason;Niki Winters;Emily Cross;David Bader
  • 通讯作者:
    David Bader
Unintended Consequence: Diversity as a Casualty of Eliminating United States Medical Licensing Examination Step 1 Scores
  • DOI:
    10.1016/j.jacr.2023.07.019
  • 发表时间:
    2023-11-01
  • 期刊:
  • 影响因子:
  • 作者:
    Felipe M. Campos;Lars J. Grimm;Charles M. Maxfield;Sabina Amin;David Bader;Brooke Beckett;Kevin Carter;Teresa Chapman;Bernard Chow;Amanda Derylo;Francis Flaherty;Michael Fox;Jennifer Gould;Robert Groves;Darel Heitkamp;John Heymann;Christopher Ho;Marion Hughes;Nathan Hull;Abtin Jafroodifar
  • 通讯作者:
    Abtin Jafroodifar
Local cues influence atrial and ventricular differentiation of precardiac mesoderm
  • DOI:
    10.1016/s0022-2828(87)80673-9
  • 发表时间:
    1987-01-01
  • 期刊:
  • 影响因子:
  • 作者:
    Jonathan Satin;David Bader;Robert L. DeHaan
  • 通讯作者:
    Robert L. DeHaan

David Bader的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('David Bader', 18)}}的其他基金

Collaborative Research:PPoSS:Planning: Streamware - A Scalable Framework for Accelerating Streaming Data Science
合作研究:PPoSS:规划:Streamware - 加速流数据科学的可扩展框架
  • 批准号:
    2118458
  • 财政年份:
    2021
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: PPoSS: Planning: Extreme-scale Sparse Data Analytics
协作研究:PPoSS:规划:超大规模稀疏数据分析
  • 批准号:
    2118385
  • 财政年份:
    2021
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: EMBRACE: Evolvable Methods for Benchmarking Realism through Application and Community Engagement
合作研究:拥抱:通过应用和社区参与对现实主义进行基准测试的演化方法
  • 批准号:
    1535058
  • 财政年份:
    2015
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: IEEE IPDPS Conference Student Participation Support
合作研究:IEEE IPDPS 会议学生参与支持
  • 批准号:
    1362300
  • 财政年份:
    2014
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: Collaborative Research: Using PDE Descriptions to Generate Code Precisely Tailored to Energy-Constrained Systems Including Large GPU Accelerated Clusters
EAGER:协作研究:使用偏微分方程描述生成专门针对能源受限系统(包括大型 GPU 加速集群)定制的代码
  • 批准号:
    1265434
  • 财政年份:
    2013
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
SI2-SSI: Collaborative: The XScala Project: A Community Repository for Model-Driven Design and Tuning of Data-Intensive Applications for Extreme-Scale Accelerator-Based Systems
SI2-SSI:协作:XScala 项目:用于基于超大规模加速器的系统的模型驱动设计和数据密集型应用程序调整的社区存储库
  • 批准号:
    1339745
  • 财政年份:
    2013
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Software Infrastructure for Accelerating Grand Challenge Science with Future Computing Platforms
协作研究:利用未来计算平台加速重大挑战科学的软件基础设施
  • 批准号:
    1216504
  • 财政年份:
    2012
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Understanding Whole-genome Evolution through Petascale Simulation
合作研究:通过千万亿次模拟了解全基因组进化
  • 批准号:
    0904461
  • 财政年份:
    2009
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: Establishing an I/UCRC Center for Multicore Productivity Research (CMPR)
合作研究:建立 I/UCRC 多核生产力研究中心 (CMPR)
  • 批准号:
    0831110
  • 财政年份:
    2008
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
Collaborative Research: CRI: IAD: Development of a Research Infrastructure
合作研究:CRI:IAD:研究基础设施的开发
  • 批准号:
    0708307
  • 财政年份:
    2007
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Continuing Grant

相似国自然基金

儿童时间偏好对学业和在校行为表现的长期影响及机制研究
  • 批准号:
    72303081
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
游戏是工作的对立面吗?游戏式工作对员工和团队绩效表现的影响机制研究
  • 批准号:
    72302024
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
生态移民对移民劳动力市场表现、儿童发展和代际流动的影响研究
  • 批准号:
    72303181
  • 批准年份:
    2023
  • 资助金额:
    30.00 万元
  • 项目类别:
    青年科学基金项目
多组学分析赛马肠道微生物增强宿主运动表现的作用机制
  • 批准号:
    32360016
  • 批准年份:
    2023
  • 资助金额:
    32 万元
  • 项目类别:
    地区科学基金项目
华南埃迪卡拉纪Shuram事件不同表现形式的天文年代学约束
  • 批准号:
    42302129
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

EAGER: The Performance Evaluation of Intra-domain Bandwidth Allocation and Inter-domain Routing Algorithms for a QoS-guaranteed Routing Path Discovery
EAGER:用于 QoS 保证的路由路径发现的域内带宽分配和域间路由算法的性能评估
  • 批准号:
    1633978
  • 财政年份:
    2015
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: High Performance Algorithms and Implementatations for Genome Alignment
EAGER:基因组比对的高性能算法和实现
  • 批准号:
    1441384
  • 财政年份:
    2013
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: High Performance Algorithms and Implementatations for Genome Alignment
EAGER:基因组比对的高性能算法和实现
  • 批准号:
    1250264
  • 财政年份:
    2012
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: The Performance Evaluation of Intra-domain Bandwidth Allocation and Inter-domain Routing Algorithms for a QoS-guaranteed Routing Path Discovery
EAGER:用于 QoS 保证的路由路径发现的域内带宽分配和域间路由算法的性能评估
  • 批准号:
    1050267
  • 财政年份:
    2010
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
EAGER: The Performance Evaluation of Intra-domain Bandwidth Allocation and Inter-domain Routing Algorithms for a QoS-guaranteed Routing Path Discovery
EAGER:用于 QoS 保证的路由路径发现的域内带宽分配和域间路由算法的性能评估
  • 批准号:
    1065665
  • 财政年份:
    2010
  • 资助金额:
    $ 18.74万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了