Collaborative Research: OAC Core: Small: Efficient and Policy-driven Burst Buffer Sharing
合作研究:OAC Core:小型:高效且策略驱动的突发缓冲区共享
基本信息
- 批准号:2008388
- 负责人:
- 金额:$ 29.29万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2020
- 资助国家:美国
- 起止时间:2020-10-01 至 2023-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern scientific research heavily relies on supercomputers. Supercomputing applications, such as traditional numerical simulations (HPC), data intensive applications (Big Data), and most recently, deep learning (DL) applications, are increasingly run on supercomputers to obtain timely results and explore new research methods that combine multiple application types. However, a bottleneck in their design reduces the potential performance of modern supercomputers. This project, bbThemis, addresses this problem by enabling efficient and policy-driven sharing of an intermediate storage layer known as a "burst buffer", so that more scientists and applications can leverage state-of-the-art storage techniques to significantly reduce their runtime and enhance the productivity of their research. This project will deliver substantial gains to almost every research area that uses HPC resources, leading to improved science and engineering methods and products in all fields. This research will have an immediate and significant impact on existing scientific applications and on deriving guidelines for next-generation HPC system design, deployment, and utilization. The project will also contribute to educational outcomes. In addition to students working directly on project goals, results developed in the project will be used in tutorial and training sessions at Texas Advanced Computing Center’s summer institute in deep learning and other major conferences, and in University of Illinois Urbana-Champaign student projects. The project is aligned with the National Strategic Computing Initiative (NSCI) to advance US leadership in HPC.This project, bbThemis (https://github.com/bbThemis), leverages a suite of technologies, such as disassociation of I/O processing from control logic, time-sliced intra I/O node sharing, function interception for low overhead POSIX I/O, and metadata and data placement for optimal individual application performance. It is investigating how to best apply these technologies, by: 1) Identifying optimal burst buffer configurations for a suite of representative supercomputing applications; 2) Proposing, prototyping, and verifying different design options to address intra-node and inter-node I/O performance sharing; and 3) Designing and evaluating a set of sharing policies, such as fair sharing and priority sharing, with real applications and I/O traces. This project will dramatically increase the sharing capacity of existing burst buffers and enhance domain scientists’ productivity at a large scale. It explores various sharing policies that permit efficient sharing of I/O resources and that meet the requirements of computing centers. The results will enable the provisioning of I/O resources, where users can request specific IOPS or bandwidth for a period of time. The prototype burst buffer sharing framework will immediately increase the capacity of existing supercomputers with enhanced I/O performance. The lessons learned will guide next-generation I/O system design for large scale systems. The general improvement of HPC, Big Data, and DL applications will also increase the coherence of the hardware and software used for data analytics computing and modeling and simulation.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代科学研究在很大程度上依赖超级计算机。超级计算应用程序,例如传统数值模拟(HPC),数据密集型应用程序(大数据)以及最近的深度学习(DL)申请,越来越多地在超级计算机上运行,以获得及时的结果并探索结合多种应用程序类型的新研究方法。但是,设计中的瓶颈可降低现代超级计算机的潜在性能。该项目BBThemis通过实现称为“爆发缓冲液”的中间存储层的高效和政策驱动的共享来解决这个问题,以便更多的科学家和应用可以利用最先进的存储技术来显着降低其运行时并增强其研究产品。该项目将为几乎每个使用HPC资源的研究领域带来可观的收益,从而改善了所有领域的科学和工程方法和产品。这项研究将对现有的科学应用以及推导下一代HPC系统设计,部署和利用的指南产生直接和重大的影响。该项目还将有助于教育成果。除了直接从事项目目标的学生外,该项目中开发的结果还将在得克萨斯州高级计算中心的夏季研究所的教程和培训课程中使用,并在深度学习和其他主要会议上以及伊利诺伊大学Urbana-Champaign学生项目中使用。该项目与国家战略计算计划(NSCI)保持一致,以提高我们在HPC的领导地位,该项目,BBTHEMIS(https://github.com/bbthemis),利用一套技术套件,以i/o处理诸如控制逻辑,时间固定的Intera in Intera i/o i/o i/o OO OO OO OO OO OO OME SERIIST,诸如I/O处理之类的技术,例如元数据和数据放置,以实现最佳的个人应用程序性能。它正在研究如何最好地应用这些技术,作者:1)为一组代表性的超级计算应用程序识别最佳的突发缓冲配置; 2)提出,原型和验证不同的设计选项,以解决节点内和节点I/O性能共享; 3)设计和评估一组共享策略,例如公平共享和优先共享,与真实的应用程序和I/O跟踪。该项目将大大提高现有爆发缓冲区的共享能力,并在大规模上提高域科学家的生产率。它探讨了各种共享政策,这些政策允许有效地共享I/O资源,并满足计算中心的要求。结果将使I/O资源提供配置,用户可以在一段时间内请求特定的IOPS或带宽。原型爆发缓冲区共享框架将立即通过增强的I/O性能提高现有超级计算机的容量。所学的教训将指导大型系统的下一代I/O系统设计。 HPC,大数据和DL应用程序的总体改进还将提高用于数据分析计算和建模和模拟的硬件和软件的连贯性。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来获得支持的。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Fine-Grained Policy-Driven I/O Sharing for Burst Buffers
- DOI:10.1145/3581784.3607041
- 发表时间:2023-06
- 期刊:
- 影响因子:0
- 作者:E. Karrels;Lei Huang;Yuhong Kan;Ishank Arora;Yinzhi Wang;D. Katz;W. Gropp;Zhao Zhang
- 通讯作者:E. Karrels;Lei Huang;Yuhong Kan;Ishank Arora;Yinzhi Wang;D. Katz;W. Gropp;Zhao Zhang
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhao Zhang其他文献
Probe-Type Microforce Sensor for Mirco/Nano Experimental Mechanics
用于微/纳米实验力学的探针式微力传感器
- DOI:
10.4028/www.scientific.net/amr.33-37.943 - 发表时间:
2008-03 - 期刊:
- 影响因子:0
- 作者:
Xide Li;Zhao Zhang - 通讯作者:
Zhao Zhang
3D trajectory tracking control of an underactuated AUV based on adaptive neural network dynamic surface
基于自适应神经网络动态面的欠驱动AUV 3D轨迹跟踪控制
- DOI:
10.1504/ijvd.2020.115864 - 发表时间:
2020 - 期刊:
- 影响因子:0.5
- 作者:
Xiao Liang;Zhao Zhang;Xingru Qu - 通讯作者:
Xingru Qu
Uncertainty analysis and robust design optimization for the heat-assisted bending of high-strength titanium tube
高强钛管热辅助弯曲的不确定性分析与鲁棒设计优化
- DOI:
10.1007/s11431-021-1881-8 - 发表时间:
2021-09 - 期刊:
- 影响因子:0
- 作者:
Zhao Zhang;Jingchao Yang;Weiliang Huang;Jun Ma;Heng Li - 通讯作者:
Heng Li
Tunable erbium-doped fiber ring laser based on an all-fiber filter
基于全光纤滤波器的可调谐掺铒光纤环形激光器
- DOI:
10.1117/12.2000105 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
X. Ji;Z. Cao;Zhao Zhang;Tao Shui;Wenliang Hao;B. Yu - 通讯作者:
B. Yu
An efficient and convenient formal synthesis of Jaspine B from D-xylose.
由 D-木糖高效、便捷地正式合成 Jaspine B。
- DOI:
10.1016/j.carres.2012.01.013 - 发表时间:
2012-04 - 期刊:
- 影响因子:3.1
- 作者:
Zhao Zhang;Yu-Tao Zhao;Wen Qu;Hong-Min Liu - 通讯作者:
Hong-Min Liu
Zhao Zhang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhao Zhang', 18)}}的其他基金
CAREER: Efficient and Scalable Large Foundational Model Training on Supercomputers for Science
职业:科学超级计算机上高效且可扩展的大型基础模型训练
- 批准号:
2340011 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: hpcGPT: Enhancing Computing Center User Support with HPC-enriched Generative AI
协作研究:框架:hpcGPT:通过 HPC 丰富的生成式 AI 增强计算中心用户支持
- 批准号:
2411294 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2312689 - 财政年份:2023
- 资助金额:
$ 29.29万 - 项目类别:
Continuing Grant
Collaborative Research: CSR: Medium: Fortuna: Characterizing and Harnessing Performance Variability in Accelerator-rich Clusters
合作研究:CSR:Medium:Fortuna:表征和利用富含加速器的集群中的性能变异性
- 批准号:
2401244 - 财政年份:2023
- 资助金额:
$ 29.29万 - 项目类别:
Continuing Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2311766 - 财政年份:2023
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2401246 - 财政年份:2023
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Diamond: Democratizing Large Neural Network Model Training for Science
合作研究:框架:钻石:科学大型神经网络模型训练的民主化
- 批准号:
2401245 - 财政年份:2023
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: ScaDL: New Approaches to Scaling Deep Learning for Science Applications on Supercomputers
协作研究:OAC 核心:ScaDL:在超级计算机上扩展深度学习科学应用的新方法
- 批准号:
2106661 - 财政年份:2021
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1643271 - 财政年份:2016
- 资助金额:
$ 29.29万 - 项目类别:
Continuing Grant
SHF: Medium:Collaborative Research: Architectural and System Support for Building Versatile Memory Systems
SHF:媒介:协作研究:构建多功能内存系统的架构和系统支持
- 批准号:
1514229 - 财政年份:2015
- 资助金额:
$ 29.29万 - 项目类别:
Continuing Grant
相似国自然基金
支持二维毫米波波束扫描的微波/毫米波高集成度天线研究
- 批准号:62371263
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
腙的Heck/脱氮气重排串联反应研究
- 批准号:22301211
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
水系锌离子电池协同性能调控及枝晶抑制机理研究
- 批准号:52364038
- 批准年份:2023
- 资助金额:33 万元
- 项目类别:地区科学基金项目
基于人类血清素神经元报告系统研究TSPYL1突变对婴儿猝死综合征的致病作用及机制
- 批准号:82371176
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
FOXO3 m6A甲基化修饰诱导滋养细胞衰老效应在补肾法治疗自然流产中的机制研究
- 批准号:82305286
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403312 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC CORE: Federated-Learning-Driven Traffic Event Management for Intelligent Transportation Systems
合作研究:OAC CORE:智能交通系统的联邦学习驱动的交通事件管理
- 批准号:
2414474 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Learning AI Surrogate of Large-Scale Spatiotemporal Simulations for Coastal Circulation
合作研究:OAC Core:学习沿海环流大规模时空模拟的人工智能替代品
- 批准号:
2402947 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Distributed Graph Learning Cyberinfrastructure for Large-scale Spatiotemporal Prediction
合作研究:OAC Core:用于大规模时空预测的分布式图学习网络基础设施
- 批准号:
2403313 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant
Collaborative Research: OAC Core: Large-Scale Spatial Machine Learning for 3D Surface Topology in Hydrological Applications
合作研究:OAC 核心:水文应用中 3D 表面拓扑的大规模空间机器学习
- 批准号:
2414185 - 财政年份:2024
- 资助金额:
$ 29.29万 - 项目类别:
Standard Grant