Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing

合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享

基本信息

  • 批准号:
    2008388
  • 负责人:
  • 金额:
    $ 29.29万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-10-01 至 2023-09-30
  • 项目状态:
    已结题

项目摘要

Modern scientific research heavily relies on supercomputers. Supercomputing applications, such as traditional numerical simulations (HPC), data intensive applications (Big Data), and most recently, deep learning (DL) applications, are increasingly run on supercomputers to obtain timely results and explore new research methods that combine multiple application types. However, a bottleneck in their design reduces the potential performance of modern supercomputers. This project, bbThemis, addresses this problem by enabling efficient and policy-driven sharing of an intermediate storage layer known as a "burst buffer", so that more scientists and applications can leverage state-of-the-art storage techniques to significantly reduce their runtime and enhance the productivity of their research. This project will deliver substantial gains to almost every research area that uses HPC resources, leading to improved science and engineering methods and products in all fields. This research will have an immediate and significant impact on existing scientific applications and on deriving guidelines for next-generation HPC system design, deployment, and utilization. The project will also contribute to educational outcomes. In addition to students working directly on project goals, results developed in the project will be used in tutorial and training sessions at Texas Advanced Computing Center’s summer institute in deep learning and other major conferences, and in University of Illinois Urbana-Champaign student projects. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC.This project, bbThemis (https://github.com/bbThemis), leverages a suite of technologies, such as disassociation of I/O processing from control logic, time-sliced intra I/O node sharing, function interception for low overhead POSIX I/O, and metadata and data placement for optimal individual application performance. It is investigating how to best apply these technologies, by: 1) Identifying optimal burst buffer configurations for a suite of representative supercomputing applications; 2) Proposing, prototyping, and verifying different design options to address intra-node and inter-node I/O performance sharing; and 3) Designing and evaluating a set of sharing policies, such as fair sharing and priority sharing, with real applications and I/O traces. This project will dramatically increase the sharing capacity of existing burst buffers and enhance domain scientists’ productivity at a large scale. It explores various sharing policies that permit efficient sharing of I/O resources and that meet the requirements of computing centers. The results will enable the provisioning of I/O resources, where users can request specific IOPS or bandwidth for a period of time. The prototype burst buffer sharing framework will immediately increase the capacity of existing supercomputers with enhanced I/O performance. The lessons learned will guide next-generation I/O system design for large scale systems. The general improvement of HPC, Big Data, and DL applications will also increase the coherence of the hardware and software used for data analytics computing and modeling and simulation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代科学研究严重依赖超级计算机应用,例如传统的数值模拟(HPC)、数据密集型应用(大数据)以及最近的深度学习(DL)应用,越来越多地在超级计算机上运行以获得及时的结果和探索。然而,其设计中的瓶颈降低了现代超级计算机的潜在性能,该项目 bbThemis 通过实现高效且策略驱动的中间存储层共享来解决这个问题。 “突发缓冲区”,以便更多的科学家和应用程序能够利用最先进的存储技术来显着减少运行时间并提高研究生产力。该项目将为几乎所有使用 HPC 资源的研究领域带来巨大收益。 ,从而改进所有领域的科学和工程方法和产品,该研究将对现有的科学应用以及下一代 HPC 系统设计、部署和利用的指导方针产生直接和重大的影响。除了直接从事项目的学生之外。为了实现目标,该项目开发的成果将用于德克萨斯州高级计算中心的深度学习暑期学院和其他主要会议的教程和培训课程,以及伊利诺伊大学厄巴纳-香槟分校的学生项目。该项目与国家战略计算保持一致。旨在提升美国在 HPC 领域领导地位的倡议 (NSCI)。该项目 bbThemis (https://github.com/bbThemis) 利用了一套技术,例如将 I/O 处理与控制逻辑分离,时间片内 I/O 节点共享、低开销 POSIX I/O 的函数拦截以及最佳单个应用程序性能的元数据和数据放置正在研究如何最好地应用这些技术,方法是:1)确定最佳突发。一套代表性超级计算应用程序的缓冲区配置;2) 提出、原型设计和验证不同的设计选项,以解决节点内和节点间 I/O 性能共享问题;以及 3) 设计和评估一组共享策略,例如作为公平共享和优先级共享,通过实际应用程序和 I/O 跟踪,该项目将极大地提高现有突发缓冲区的共享容量,并大规模提高领域科学家的生产力。它探索了允许有效共享 I 的各种共享策略。 /O 资源并且满足计算中心的要求,结果将启用 I/O 资源的配置,用户可以在一段时间内请求特定的 IOPS 或带宽,原型突发缓冲区共享框架将立即增加计算中心的容量。现有超级计算机具有增强的 I/O 性能。汲取的经验教训将指导大规模系统的下一代 I/O 系统设计。 HPC、大数据和深度学习应用的总体改进也将提高用于数据分析计算、建​​模和仿真的硬件和软件的一致性。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fine-grained Policy-driven I/O Sharing for Burst Buffers
细粒度策略驱动的突发缓冲区 I/O 共享
  • DOI:
    10.1145/3581784.3607041
  • 发表时间:
    2023-11
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Karrels, Ed;Huang, Lei;Kan, Yuhong;Arora, Ishank;Wang, Yinzhi;Katz, Daniel S.;Gropp, William;Zhang, Zhao
  • 通讯作者:
    Zhang, Zhao
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhao Zhang其他文献

The genomic history of the Iberian Peninsula over the past 8000 years
伊比利亚半岛过去8000年的基因组历史
  • DOI:
    10.4236/jbbs.2019.96018
  • 发表时间:
    2024-09-14
  • 期刊:
  • 影响因子:
    0
  • 作者:
    I. Olalde;Swapan Mallick;Nick Patterson;N. Rohl;Mouco;Marina Silva;Katharina Dulias;C. Edwards;Francesca G;ini;ini;Maria;Pala;Pedro;Soares;Manuel;Ferr;o;o;Nicole;Adamski;Broom;khoshbacht;khoshbacht;O. Cheronet;B. Culleton;Daniel Fern;es;es;Marie Lawson;Matthew Mah;Jonas Oppenheimer;Kristin Stewardson;Zhao Zhang;Juan Manuel Jiménez Arenas;Isidro Jorge Toro Moyano;Domingo C. Salazar;P. Castanyer;Marta Santos;J. Tremoleda;Marina Lozano;Pablo García;Borja;J. Fernández;J. A. Mujika;Cecilio Barroso;J. Bermúdez;E. Mínguez;Josep Burch;Neus Coromina;David Vivó;A. Cebrià;Josep Maria Fullola;Oreto García‐Puchol;J. I. Morales;F. Xavier;12;Oms;Tona;Majó;Josep;Vergés;Antònia;Díaz;Imma;13;Castanyer;F. J. López;A. M. Silva;C. Alonso;Germán;Delibes;de;Castro;Javier;Jiménez;Echevarría;Adolfo;Moreno;Guillermo Pascual Berlanga;Pablo Ramos;José Ramos Muñoz;E. Vij;e;e;16;Vila;Gustau Aguilella Arzo;Ángel Esparza Arroyo;K. Lillios;Jennifer Mack;J. Velasco;A. Waterman;Luis Benítez de Lugo Enrich;María Benito;18;Sánchez;B. Agustí;F. Codina;Gabriel de Prado;A. Estalrrich;Álvaro;Fernández;Flores;Clive;Finlayson;Geraldine;Stewart;20;Francisco Giles;Antonio Rosas;V. González;Gabriel García Atiénzar;M. S. H. Pérez;Arm;o Llanos;o;Carrión Marco;Isabel Beneyto;David López;Mar Tormo;A. C. Valera;C. Blasco;Corina Liesau;Patricia Ríos;Joan Daura;Jesús de Pedro Michó;Agustín A Diez Castillo;R. F. Fernández;R. Garrido;V. S. Gonçalves;E. Guerra;Ana Mercedes;26;Herrero;Joaquim Juan;Dani López;S. McClure;Merino Pérez;Arturo Oliver Foix;Montse Borràs;A. Sousa;Manuel Vidal Encinas;D. Kennett;Martin B. Richards;K. Alt;W. Haak;R. Pinhasi;C. Lalueza;David Reich
  • 通讯作者:
    David Reich
Hawkeye: Change-targeted Testing for Android Apps based on Deep Reinforcement Learning
Hawkeye:基于深度强化学习的 Android 应用变更目标测试
Identification of microenvironment‐related genes with prognostic value in clear cell renal cell carcinoma
鉴定对透明细胞肾细胞癌具有预后价值的微环境相关基因
  • DOI:
    10.1002/jcb.29654
  • 发表时间:
    2020-01-21
  • 期刊:
  • 影响因子:
    4
  • 作者:
    Zhao Zhang;Zeyan Li;Zhao Liu;Xiang Zhang;Nengwang Yu;Zhonghua Xu
  • 通讯作者:
    Zhonghua Xu
A performance comparison of DRAM memory system optimizations for SMT processors
SMT 处理器的 DRAM 内存系统优化的性能比较
Association Between Sex and Immune-Related Adverse Events During Immune Checkpoint Inhibitor Therapy.
免疫检查点抑制剂治疗期间性别与免疫相关不良事件之间的关联。
  • DOI:
    10.1093/jnci/djab035
  • 发表时间:
    2021-03-10
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Ying Jing;Yongchang Zhang;Jing Wang;Kunyan Li;Xue Chen;Jianfu Heng;Qian Gao;Youqiong Ye;Zhao Zhang;Yaoming Liu;Y. Lou;Steven H. Lin;L. Diao;Hong Liu;Xiang Chen;G. Mills;Leng Han
  • 通讯作者:
    Leng Han

Zhao Zhang的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhao Zhang', 18)}}的其他基金

Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
  • 批准号:
    2411294
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
  • 批准号:
    2340011
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
  • 批准号:
    2401245
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
  • 批准号:
    2311766
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2312689
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
  • 批准号:
    2401246
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
  • 批准号:
    2401244
  • 财政年份:
    2023
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
  • 批准号:
    2106661
  • 财政年份:
    2021
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
  • 批准号:
    1643271
  • 财政年份:
    2016
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
  • 批准号:
    1514229
  • 财政年份:
    2015
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Continuing Grant

相似国自然基金

IGF-1R调控HIF-1α促进Th17细胞分化在甲状腺眼病发病中的机制研究
  • 批准号:
    82301258
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
CTCFL调控IL-10抑制CD4+CTL旁观者激活促口腔鳞状细胞癌新辅助免疫治疗抵抗机制研究
  • 批准号:
    82373325
  • 批准年份:
    2023
  • 资助金额:
    49 万元
  • 项目类别:
    面上项目
RNA剪接因子PRPF31突变导致人视网膜色素变性的机制研究
  • 批准号:
    82301216
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
血管内皮细胞通过E2F1/NF-kB/IL-6轴调控巨噬细胞活化在眼眶静脉畸形中的作用及机制研究
  • 批准号:
    82301257
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于多元原子间相互作用的铝合金基体团簇调控与强化机制研究
  • 批准号:
    52371115
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目

相似海外基金

Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403088
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: CropDL - Scheduling and Checkpoint/Restart Support for Deep Learning Applications on HPC Clusters
合作研究:OAC 核心:CropDL - HPC 集群上深度学习应用的调度和检查点/重启支持
  • 批准号:
    2403090
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
  • 批准号:
    2403313
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
  • 批准号:
    2414185
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
  • 批准号:
    2402946
  • 财政年份:
    2024
  • 资助金额:
    $ 29.29万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了