Framework: Software: NSCI: Collaborative Research: Hermes: Extending the HDF Library to Support Intelligent I/O Buffering for Deep Memory and Storage Hierarchy Systems
框架: 软件:NSCI:协作研究:Hermes:扩展 HDF 库以支持深度内存和存储层次系统的智能 I/O 缓冲
基本信息
- 批准号:1835764
- 负责人:
- 金额:$ 285万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-11-01 至 2023-10-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Modern high performance computing (HPC) applications generate massive amounts of data. However, the performance improvement of disk based storage systems has been much slower than that of memory, creating a significant Input/Output (I/O) performance gap. To reduce the performance gap, storage subsystems are under extensive changes, adopting new technologies and adding more layers into the memory/storage hierarchy. With a deeper memory hierarchy, the data movement complexity of memory systems is increased significantly, making it harder to utilize the potential of the deep memory and storage hierarchy (DMSH) design. As we move towards the exascale era, I/O bottleneck is a must to solve performance bottleneck facing the HPC community. DMSHs with multiple levels of memory/storage layers offer a feasible solution but are very complex to use effectively. Ideally, the presence of multiple layers of storage should be transparent to applications without having to sacrifice I/O performance. There is a need to enhance and extend current software systems to support data access and movement transparently and effectively under DMSHs. Hierarchical Data Format (HDF) technologies are a set of current I/O solutions addressing the problems in organizing, accessing, analyzing, and preserving data. HDF5 library is widely popular within the scientific community. Among the high level I/O libraries used in DOE labs, HDF5 is the undeniable leader with 99% of the share. HDF5 addresses the I/O bottleneck by hiding the complexity of performing coordinated I/O to single, shared files, and by encapsulating general purpose optimizations. While HDF technologies, like other existing I/O middleware, are not designed to support DMSHs, its wide popularity and its middleware nature make HDF5 an ideal candidate to enable, manage, and supervise I/O buffering under DMSHs. This project proposes the development of Hermes, a heterogeneous aware, multi tiered, dynamic, and distributed I/O buffering system that will significantly accelerate I/O performance. This project proposes to extend HDF technologies with the Hermes design. Hermes is new, and the enhancement of HDF5 is new. The deliveries of this research include an enhanced HDF5 library, a set of extended HDF technologies, and a group of general I/O buffering and memory system optimization mechanisms and methods. We believe that the combination of DMSH I/O buffering and HDF technologies is a reachable practical solution that can efficiently support scientific discovery. Hermes will advance HDF5 core technology by developing new buffering algorithms and mechanisms to support 1) vertical and horizontal buffering in DMSHs: here vertical means access data to/from different levels locally and horizontal means spread/gather data across remote compute nodes; 2) selective buffering via HDF5: here selective means some memory layer, e.g. NVMe, only for selected data; 3) dynamic buffering via online system profiling: the buffering schema can be changed dynamically based on messaging traffic; 4) adaptive buffering via Reinforcement Learning: by learning the application's access pattern, we can adapt prefetching algorithms and cache replacement policies at runtime. The development Hermes will be translated into high quality dependable software and will be released with the core HDF5 library.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
现代高性能计算 (HPC) 应用程序会生成大量数据。然而,基于磁盘的存储系统的性能提升比内存慢得多,造成了巨大的输入/输出(I/O)性能差距。为了缩小性能差距,存储子系统正在进行广泛的变革,采用新技术并在内存/存储层次结构中添加更多层。随着内存层次结构的加深,内存系统的数据移动复杂性显着增加,使得深度内存和存储层次结构(DMSH)设计的潜力更加难以发挥。随着我们迈向百亿亿次时代,I/O瓶颈是HPC社区面临的性能瓶颈必须解决的问题。具有多级内存/存储层的 DMSH 提供了一种可行的解决方案,但有效使用起来非常复杂。理想情况下,多层存储的存在应该对应用程序透明,而不必牺牲 I/O 性能。需要增强和扩展当前的软件系统,以支持 DMSH 下透明且有效的数据访问和移动。分层数据格式 (HDF) 技术是一组当前的 I/O 解决方案,解决组织、访问、分析和保存数据方面的问题。 HDF5 库在科学界广泛流行。在DOE实验室使用的高级I/O库中,HDF5是无可否认的领导者,拥有99%的份额。 HDF5 通过隐藏对单个共享文件执行协调 I/O 的复杂性以及封装通用优化来解决 I/O 瓶颈。虽然 HDF 技术与其他现有 I/O 中间件一样,并非旨在支持 DMSH,但其广泛普及及其中间件性质使 HDF5 成为在 DMSH 下启用、管理和监督 I/O 缓冲的理想候选者。该项目建议开发 Hermes,这是一种异构感知、多层、动态、分布式 I/O 缓冲系统,将显着提高 I/O 性能。该项目建议通过 Hermes 设计扩展 HDF 技术。 Hermes是新的,HDF5的增强也是新的。本研究成果包括增强的HDF5库、一套扩展的HDF技术以及一组通用I/O缓冲和内存系统优化机制和方法。我们相信,DMSH I/O 缓冲和 HDF 技术的结合是一个可行的实用解决方案,可以有效地支持科学发现。 Hermes 将通过开发新的缓冲算法和机制来推进 HDF5 核心技术,以支持 1) DMSH 中的垂直和水平缓冲:这里垂直意味着在本地从不同级别访问数据,水平意味着跨远程计算节点传播/收集数据; 2)通过HDF5进行选择性缓冲:这里选择性意味着一些存储层,例如NVMe,仅适用于选定的数据; 3)通过在线系统分析进行动态缓冲:可以根据消息流量动态更改缓冲模式; 4)通过强化学习进行自适应缓冲:通过学习应用程序的访问模式,我们可以在运行时调整预取算法和缓存替换策略。开发的 Hermes 将转化为高质量的可靠软件,并将与核心 HDF5 库一起发布。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Bridging Storage Semantics Using Data Labels and Asynchronous I/O
使用数据标签和异步 I/O 桥接存储语义
- DOI:10.1145/3415579
- 发表时间:2020-10-13
- 期刊:
- 影响因子:0
- 作者:Anthony Kougkas;H. Devarajan;Xian
- 通讯作者:Xian
I/O Acceleration via Multi-Tiered Data Buffering and Prefetching
通过多层数据缓冲和预取实现 I/O 加速
- DOI:10.1007/s11390-020-9781-1
- 发表时间:2020-01
- 期刊:
- 影响因子:1.9
- 作者:Kougkas, Anthony;Devarajan, Hariharan;Sun, Xian
- 通讯作者:Sun, Xian
Stimulus: Accelerate Data Management for Scientific AI applications in HPC
刺激:加速 HPC 中科学 AI 应用的数据管理
- DOI:10.1109/ccgrid54584.2022.00020
- 发表时间:2022-05
- 期刊:
- 影响因子:0
- 作者:Devarajan, Hariharan;Kougkas, Anthony;Zheng, Huihuo;Vishwanath, Venkatram;Sun, Xian
- 通讯作者:Sun, Xian
Apollo:: An ML-assisted Real-Time Storage Resource Observer
Apollo:: 机器学习辅助的实时存储资源观察器
- DOI:10.1145/3431379.3460640
- 发表时间:2021-06
- 期刊:
- 影响因子:0
- 作者:Rajesh, Neeraj;Devarajan, Hariharan;Garcia, Jaime Cernuda;Bateman, Keith;Logan, Luke;Ye, Jie;Kougkas, Anthony;Sun, Xian
- 通讯作者:Sun, Xian
DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications
DLIO:科学深度学习应用程序的以数据为中心的基准
- DOI:10.1109/ccgrid51090.2021.00018
- 发表时间:2021-05
- 期刊:
- 影响因子:0
- 作者:Devarajan, Hariharan;Zheng, Huihuo;Kougkas, Anthony;Sun, Xian;Vishwanath, Venkatram
- 通讯作者:Vishwanath, Venkatram
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xian-He Sun其他文献
HARL: Optimizing Parallel File Systems with Heterogeneity-Aware Region-Level Data Layout
HARL:使用异构感知区域级数据布局优化并行文件系统
- DOI:
10.1109/tc.2016.2637905 - 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Shuibing He;Yang Wang;Xian-He Sun;Chengzhong Xu - 通讯作者:
Chengzhong Xu
Optimizing Parallel I/O Accesses through Pattern-Directed and Layout-Aware Replication
通过模式导向和布局感知复制优化并行 I/O 访问
- DOI:
10.1109/tc.2019.2946135 - 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Shuibing He;Yanlong Yin;Xian-He Sun;Xuechen Zhang;Zongpeng Li - 通讯作者:
Zongpeng Li
HCDA: From Computational Thinking to a Generalized Thinking Paradigm
HCDA:从计算思维到广义思维范式
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Yuhang Liu;Xian-He Sun;Yang Wang;Yungang Bao - 通讯作者:
Yungang Bao
Enhancing Hybrid Parallel File System through Performance and Space-Aware Data layout
通过性能和空间感知数据布局增强混合并行文件系统
- DOI:
10.1177/1094342016631610 - 发表时间:
2016 - 期刊:
- 影响因子:0
- 作者:
Shuibing He;Yan Liu;Yang Wang;Xian-He Sun;Chuanhe Huang - 通讯作者:
Chuanhe Huang
On Cost-Driven Collaborative Data Caching: A New Model Approach
成本驱动的协作数据缓存:一种新的模型方法
- DOI:
- 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Yang Wang;Shuibing He;Xiaopeng Fan;Chengzhong Xu;Xian-He Sun - 通讯作者:
Xian-He Sun
Xian-He Sun的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xian-He Sun', 18)}}的其他基金
Collaborative Research: CSR: Medium: Towards A Unified Memory-centric Computing System with Cross-layer Support
协作研究:CSR:中:迈向具有跨层支持的统一的以内存为中心的计算系统
- 批准号:
2310422 - 财政年份:2023
- 资助金额:
$ 285万 - 项目类别:
Continuing Grant
OAC Core: LABIOS: Storage Acceleration via Data Labeling and Asynchronous I/O
OAC 核心:LABIOS:通过数据标签和异步 I/O 进行存储加速
- 批准号:
2313154 - 财政年份:2023
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
CNS Core: Small: Practical Memory Access Pattern Obfuscation with Algorithm, Application and Architecture Co-designs
CNS 核心:小型:通过算法、应用程序和架构协同设计进行实用内存访问模式混淆
- 批准号:
2152497 - 财政年份:2022
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Frameworks: Collaborative Research: ChronoLog: A High-Performance Storage Infrastructure for Activity and Log Workloads
框架:协作研究:ChronoLog:用于活动和日志工作负载的高性能存储基础架构
- 批准号:
2104013 - 财政年份:2021
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Frameworks: Collaborative Research: ChronoLog: A High-Performance Storage Infrastructure for Activity and Log Workloads
框架:协作研究:ChronoLog:用于活动和日志工作负载的高性能存储基础架构
- 批准号:
2104013 - 财政年份:2021
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Optimization of Memory Architectures: A Foundation Approach
合作研究:SHF:小型:内存架构优化:基础方法
- 批准号:
2008907 - 财政年份:2020
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
CSR: Small: IRIS: A unified data access framework for the merging of compute-centric and data-centric storage
CSR:小型:IRIS:用于合并以计算为中心和以数据为中心的存储的统一数据访问框架
- 批准号:
1814872 - 财政年份:2019
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Eager: Collaborative Research: DiRecMR: Reconciling the Dichotomy of MapReduce for Efficient Speculation and Resilience
Eager:协作研究:DiRecMR:调和 MapReduce 的二分法以实现高效推测和弹性
- 批准号:
1744317 - 财政年份:2017
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
CRI: II-NEW: A Big Data Professing Infrastructure for Smart Energy Systems
CRI:II-NEW:智能能源系统的大数据专业基础设施
- 批准号:
1730488 - 财政年份:2017
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Eager: Collaborative Research: DiRecMR: Reconciling the Dichotomy of MapReduce for Efficient Speculation and Resilience
Eager:协作研究:DiRecMR:调和 MapReduce 的二分法以实现高效推测和弹性
- 批准号:
1744317 - 财政年份:2017
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
相似国自然基金
价值视角下软件服务生态系统治理体系和关键技术
- 批准号:62372323
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
自适应软件系统中人在回路的搜索式性能保障研究
- 批准号:62372084
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
基于编译器多级中间表示的跨语言开源软件漏洞检测与修复方法研究
- 批准号:62372373
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
通用高性能粒子径迹重建软件开发及基于GPU加速的径迹重建研究
- 批准号:12375194
- 批准年份:2023
- 资助金额:52 万元
- 项目类别:面上项目
基于仿真的嵌入式控制软件设计模型与需求模型之间一致性分析方法
- 批准号:62372181
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Framework: Software: NSCI : Computational and data innovation implementing a national community hydrologic modeling framework for scientific discovery
合作研究:框架:软件:NSCI:计算和数据创新实施国家社区水文建模框架以促进科学发现
- 批准号:
2054506 - 财政年份:2020
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Collaborative Research: NSCI Framework: Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations
合作研究:NSCI 框架:软件:SCALE-MS - 可扩展自适应大型分子模拟集成
- 批准号:
1835607 - 财政年份:2019
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Collaborative Research: NSCI Framework: Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations
合作研究:NSCI 框架:软件:SCALE-MS - 可扩展自适应大型分子模拟集成
- 批准号:
1835720 - 财政年份:2019
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Collaborative Research: NSCI Framework. Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations
合作研究:NSCI 框架。
- 批准号:
1835780 - 财政年份:2019
- 资助金额:
$ 285万 - 项目类别:
Standard Grant
Collaborative Research: NSCI Framework: Software: SCALE-MS - Scalable Adaptive Large Ensembles of Molecular Simulations
合作研究:NSCI 框架:软件:SCALE-MS - 可扩展自适应大型分子模拟集成
- 批准号:
1835449 - 财政年份:2019
- 资助金额:
$ 285万 - 项目类别:
Standard Grant