CRII: OAC: Scalable Cyberinfrastructure for Big Graph and Matrix/Tensor Analytics

CRII:OAC:用于大图和矩阵/张量分析的可扩展网络基础设施

基本信息

  • 批准号:
    1755464
  • 负责人:
  • 金额:
    $ 17.09万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2018
  • 资助国家:
    美国
  • 起止时间:
    2018-06-01 至 2022-05-31
  • 项目状态:
    已结题

项目摘要

The existing distributed graph and matrix analytics frameworks are designed with data-intensive workloads in mind, rendering them inefficient for compute-intensive applications such as graph mining and scientific computing. The goal of this project is to develop novel big data frameworks for two compute-intensive tasks, graph mining and matrix/tensor computations, respectively. The two frameworks advance the field of big data analytics by motivating future systems for compute-intensive analytics, and promoting their application in various scientific areas to improve research productivity. The two systems will be available for public use, and can serve several cross-disciplinary projects in computer forensics, computational physics, and bioinformatics. The project includes mentoring graduate students and training K-12 students through summer internships, as well as related new course materials and outreach activities to help the public learn big data technologies. Thus, the project aligns with the NSF's mission to promote the progress of science and to advance the national health and prosperity.The graph mining system and the matrix/tensor platform share the design of (i) a tailor-made storage subsystem providing efficient and flexible data access, and (ii) a computation subsystem with fine-grained task control for data-reuse-aware task assignment and load balancing. The graph mining system, called G-thinker, aims to facilitate the writing of distributed programs which mine from a big graph those subgraphs that satisfy certain requirements. Such mining problems are useful in many applications like community detection and subgraph matching. These problems usually have a high computational complexity, and existing serial algorithms tackle these problems by backtracking in a duplication-free vertex-set numeration tree, which recursively partitions the search space. G-thinker adopts an intuitive programming interface that minimizes the effort of adapting an existing serial subgraph mining algorithm for distributed execution. The subgraphs to mine are spawned from individual vertices and they grow their frontiers as needed, and memory overflow is avoided by spilling subgraphs to disks when needed. In each machine, vertices and edges shared by multiple subgraphs need only be transmitted and cached once, which minimizes communication (and hence data waiting) so that CPU cores are better utilized. To address the load-balancing problem of power-law graphs, G-thinker explores recursive decomposition and work stealing to allow idle machines to steal subgraphs for mining from heavily-loaded machines. The project also explores a distributed matrix/tensor storage and computing framework, where matrix/tensor partitions are stored in multiple replicas using different storage schemes to efficiently support all kinds of submatrix access operations. This flexible storage scheme offers the upper-layer computations much more opportunities for fine-grained optimizations, including smarter task scheduling and in-situ updates. The use of this framework is exemplified by matrix multiplication and LU factorization. Both of the proposed frameworks can help build a cyberinfrastructure for collaborations with scientists in science, medicine, and industry.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现有的分布式图和矩阵分析框架是考虑到数据密集型工作负载的设计,使它们无效地计算诸如图形挖掘和科学计算的计算应用程序效率。该项目的目的是为两个计算密集型任务,图形挖掘和矩阵/张量计算开发新的大数据框架。这两个框架通过激励未来的计算密集分析系统,并促进其在各个科学领域的应用以提高研究生产率,从而推动了大数据分析的领域。这两个系统将用于公共使用,并可以在计算机取证,计算物理学和生物信息学中为几个跨学科项目提供服务。该项目包括指导研究生和通过暑期实习培训K-12学生,以及相关的新课程材料和外展活动,以帮助公众学习大数据技术。因此,该项目符合NSF促进科学进步并提高民族健康和繁荣的使命。图形挖掘系统和矩阵/张量平台共享(i)(i)量身定制的存储子系统的设计,可提供有效且灵活的数据访问,以及(ii)计算子系统,具有良好的数据控制数据控制数据控制的计算子系统。 Graph挖掘系统(称为G-THENINGER)旨在促进分布式程序的编写,这些程序从大图中挖掘出满足某些要求的子图。这种采矿问题在许多应用程序中很有用,例如社区检测和子图匹配。这些问题通常具有较高的计算复杂性,现有的串行算法通过在无重复的顶点计算树中回溯来解决这些问题,该基因将搜索空间递归分区。 G-THENINGER采用直观的编程接口,该接口最大程度地减少了为分布式执行的现有串行子图挖掘算法调整现有的串行子图挖掘算法的努力。我的子图是从各个顶点产生的,它们会根据需要生长其前沿,并在需要时通过将子图洒到磁盘上来避免记忆溢出。在每台计算机中,只需传输和缓存一次由多个子图共享的顶点和边缘一次,这可以最大程度地减少通信(以及数据等待),以便更好地利用CPU内核。为了解决幂律图的负载平衡问题,G-Thinker探索了递归分解和窃取工作,以允许空闲的机器窃取子图从重装机器中挖掘。该项目还探讨了分布式矩阵/张量存储和计算框架,其中使用不同的存储方案将矩阵/张量分区存储在多个副本中,以有效地支持各种subpatrix访问操作。这种灵活的存储方案为高层计算提供了更多的机会,以进行细粒度的优化,包括更智能的任务调度和现场更新。矩阵乘法和LU分解来说明了此框架的使用。拟议的两个框架都可以帮助建立与科学,医学和行业科学家合作的网络基础设施。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子和更广泛影响的评估审查标准来通过评估来支持的。

项目成果

期刊论文数量(17)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
The Future is Big Graphs! A Community View on Graph Processing Systems
未来是大图!
  • DOI:
  • 发表时间:
    2021
  • 期刊:
  • 影响因子:
    22.7
  • 作者:
    Sakr, Sherif;Bonifati, Angela;Voigt, Hannes;Iosup, Alexandru;Ammar, Khaled;Angles, Renzo;Aref, Walid G.;Arenas, Marcelo;Besta, Maciej;Boncz, Peter A.
  • 通讯作者:
    Boncz, Peter A.
Parallel mining of large maximal quasi-cliques
  • DOI:
    10.1007/s00778-021-00712-2
  • 发表时间:
    2021-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    J. Khalil;Da Yan;Guimu Guo;Lyuheng Yuan
  • 通讯作者:
    J. Khalil;Da Yan;Guimu Guo;Lyuheng Yuan
Accurate Tensor Decomposition with Simultaneous Rank Approximation for Surveillance Videos
G-thinker: A Distributed Framework for Mining Subgraphs in a Big Graph
  • DOI:
    10.1109/icde48307.2020.00122
  • 发表时间:
    2020-04
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Da Yan;Guimu Guo;Md Mashiur Rahman Chowdhury;M. Tamer Özsu;Wei-Shinn Ku;John C.S. Lui
  • 通讯作者:
    Da Yan;Guimu Guo;Md Mashiur Rahman Chowdhury;M. Tamer Özsu;Wei-Shinn Ku;John C.S. Lui
Parallel Mining of Frequent Subtree Patterns
  • DOI:
    10.1007/978-3-030-61133-0_2
  • 发表时间:
    2020-09
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Wenwen Qu;Da Yan;Guimu Guo;Xiaoling Wang;Lei Zou;Yang Zhou
  • 通讯作者:
    Wenwen Qu;Da Yan;Guimu Guo;Xiaoling Wang;Lei Zou;Yang Zhou
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Da Yan其他文献

A high-fidelity zoning and characterization approach for building energy models in urban building energy modeling
城市建筑能源建模中建筑能源模型的高保真分区和表征方法
  • DOI:
    10.26868/25222708.2023.1435
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Hanyun Wang;Zhaoru Liu;Changxiang Xu;Jiangjun Tan;Tao Wang;Da Yan
  • 通讯作者:
    Da Yan
District household electricity consumption pattern analysis based on auto-encoder algorithm
基于自编码算法的地区家庭用电模式分析
Spatial-Logic-Aware Weakly Supervised Learning for Flood Mapping on Earth Imagery
地球图像洪水测绘的空间逻辑感知弱监督学习
  • DOI:
    10.1609/aaai.v38i20.30253
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Zelin Xu;Tingsong Xiao;Wenchong He;Yu Wang;Zhe Jiang;Shigang Chen;Yiqun Xie;Xiaowei Jia;Da Yan;Yang Zhou
  • 通讯作者:
    Yang Zhou
A district-level building electricity use profile simulation model based on probability distribution inferences
  • DOI:
    10.1016/j.scs.2024.105822
  • 发表时间:
    2024-11-15
  • 期刊:
  • 影响因子:
  • 作者:
    Xuyuan Kang;Hongyin Chen;Zhenlan Dou;Xiao Wang;Zhaoru Liu;Chunyan Zhang;Kunqi Jia;Da Yan
  • 通讯作者:
    Da Yan
Lighting System Control in Office Building Using Occupancy Prediction Based on Historical Occupied Ratio
基于历史占用率的占用预测的办公楼照明系统控制

Da Yan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Da Yan', 18)}}的其他基金

Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2414474
  • 财政年份:
    2024
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
RII Track-4: NSF: Massively Parallel Graph Processing on Next-Generation Multi-GPU Supercomputers
RII Track-4:NSF:下一代多 GPU 超级计算机上的大规模并行图形处理
  • 批准号:
    2229394
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
  • 批准号:
    2313192
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2106461
  • 财政年份:
    2021
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant

相似国自然基金

Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
  • 批准号:
    32160636
  • 批准年份:
    2021
  • 资助金额:
    35.00 万元
  • 项目类别:
    地区科学基金项目
Z8-12:OH和Z8-14:OAc分别维持梨小食心虫和李小食心虫性诱剂特异性的分子基础
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    35 万元
  • 项目类别:
    地区科学基金项目
亚硝酰钌配合物[Ru(OAc)(2mqn)2NO]的光异构反应机理研究
  • 批准号:
    21603131
  • 批准年份:
    2016
  • 资助金额:
    19.0 万元
  • 项目类别:
    青年科学基金项目
机械化学条件下Mn(OAc)3促进的自由基串联反应研究
  • 批准号:
    21242013
  • 批准年份:
    2012
  • 资助金额:
    10.0 万元
  • 项目类别:
    专项基金项目

相似海外基金

OAC Core: A Scalable and Deployable Container Orchestration Cyber Infrastructure Toolkit for Deploying Big Data Analytics Applications in Public Cloud
OAC Core:用于在公共云中部署大数据分析应用程序的可扩展和可部署的容器编排网络基础设施工具包
  • 批准号:
    2313738
  • 财政年份:
    2023
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Geometry-aware and Deep Learning-based Cyberinfrastructure for Scalable Modeling of Solids and Fluids
OAC 核心:基于几何感知和深度学习的网络基础设施,用于固体和流体的可扩展建模
  • 批准号:
    2211908
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: A Scalable and Deployable Container Orchestration Cyber Infrastructure Toolkit for Deploying Big Data Analytics Applications in Public Cloud
OAC Core:用于在公共云中部署大数据分析应用程序的可扩展和可部署的容器编排网络基础设施工具包
  • 批准号:
    2212256
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
OAC Core: Scalable Graph ML on Distributed Heterogeneous Systems
OAC 核心:分布式异构系统上的可扩展图 ML
  • 批准号:
    2209563
  • 财政年份:
    2022
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Robust, Scalable, and Practical Low Rank Approximation
合作研究:OAC 核心:稳健、可扩展且实用的低阶近似
  • 批准号:
    2106738
  • 财政年份:
    2021
  • 资助金额:
    $ 17.09万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了