NSCI Elements: Software - PFSTRASE - A Parallel FileSystem TRacing and Analysis SErvice to Enhance Cyberinfrastructure Performance and Reliability
NSCI Elements:软件 - PFSTRASE - 用于增强网络基础设施性能和可靠性的并行文件系统跟踪和分析服务
基本信息
- 批准号:1835135
- 负责人:
- 金额:$ 38.59万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2018
- 资助国家:美国
- 起止时间:2018-10-01 至 2022-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This project will develop an open-source software service, the Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE), that improves the reliability and performance of data storage systems for the nation?s largest supercomputers. As simulations and computations represent reality more faithfully they grow commensurately in scale along with the size of the data they consume and generate. To handle the storage and movement of this data, supercomputing systems are built on the backbone of massively parallel data storage systems. Due to their parallel nature these storage systems are capable of moving data at hundreds of times the speed of conventional storage systems, enabling otherwise impractical computations. The performance capabilities these storage systems provide is accompanied by a complexity that results in them often functioning significantly less than optimally and even in some instances failing. This results in wasted computational time and ultimately lost scientific progress. The state of development of tools that could cast light on these problems and improve storage system reliability and performance is inadequate for current and future computing systems. PFSTRASE will fill this gap by continually and automatically monitoring storage system health and performance, providing insights through an easy to use interface that will improve the reliability and performance of storage and supercomputer systems. Parallel filesystems (PFSs) are the most critical high-availability components of High Performance Computing (HPC) architectures, providing input/output (I/O) services to running computations, the environment that users and system services operate in, and storage for applications and data. Because of this central role, failure or performance degradation events in the PFS impact every user of an HPC resource. PFS events must be dealt with quickly and effectively by system administrators; however, there is typically insufficient information to establish precise causal relationships between PFS activity and events, impeding the implementation of timely and targeted remedies. To fill this information gap, an open-source Parallel FileSystem TRacing and Analysis SErvice (PFSTRASE) that traces and analyzes the requisite data to establish causal relationships between PFS activity and both realized and imminent events will be developed. This project will implement the service for the open-source Lustre filesystem, which is the most commonly used PFS at large-scale HPC sites. Loads for specific PFS directory and file operations will be measured and incorporated into the service to construct authentic server load contributions from every job, process, and user. The service?s infrastructure will continuously monitor the entire PFS and generate a real-time, seamless representation that connects contributions of jobs, processes, and users to storage server loads, network bandwidth, and storage capacities. The infrastructure will provide an easily navigable web interface that presents this data, both real-time and historical, in a visual format.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目将开发开源软件服务,即并行文件系统跟踪和分析服务(PFSTRASE),该服务可改善国家最大的超级计算机的数据存储系统的可靠性和性能。 随着模拟和计算更忠实地代表了现实,它们以及它们消耗和生成的数据的大小都在规模上相称。 为了处理这些数据的存储和移动,超级计算系统建立在大量并行数据存储系统的骨干上。 由于其平行性,这些存储系统能够在传统存储系统的速度上移动数据,从而实现了不切实际的计算。 这些存储系统提供的性能功能伴随着一种复杂性,使它们的功能通常明显少于最佳功能,甚至在某些情况下失败。 这导致了浪费的计算时间,并最终失去了科学进步。 可以揭示这些问题并提高存储系统可靠性和性能的工具的开发状态对于当前和将来的计算系统不足。 PFSTRASE将通过不断,自动监视存储系统的健康和性能来填补这一空白,从而通过易于使用的接口提供见解,从而提高存储和超级计算机系统的可靠性和性能。并行文件系统(PFSS)是高性能计算(HPC)体系结构的最关键的高可用性组件,为运行计算提供输入/输出(I/O)服务,用户和系统服务在其中运行的环境以及用于应用程序和数据的存储。由于这种核心角色,PFS中的失败或性能降解事件会影响HPC资源的每个用户。系统管理员必须快速有效地处理PFS事件;但是,通常没有足够的信息来建立PFS活动与事件之间的精确因果关系,从而阻碍了及时和有针对性的补救措施的实施。为了填补此信息差距,将开发一个必要的数据,以建立PFS活动与已实现和迫在眉睫的事件之间建立因果关系的开源并行文件系统跟踪和分析服务(PFSTRASE)。该项目将为开源光泽文件系统实施服务,该系统是大规模HPC站点上最常用的PFS。特定PFS目录和文件操作的负载将被测量并合并到服务中,以构建每个作业,过程和用户的真实服务器负载贡献。该服务的基础架构将不断监视整个PFS,并生成实时的无缝表示形式,该表示将作业,流程和用户的贡献连接到存储服务器负载,网络带宽和存储容量。基础架构将提供一个易于导航的Web界面,以视觉格式呈现这些数据,无论是实时还是历史的。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子和更广泛影响的评估评估标准的评估值得支持的。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Democratizing Parallel Filesystem Monitoring
并行文件系统监控民主化
- DOI:10.1109/cluster49012.2020.00065
- 发表时间:2020
- 期刊:
- 影响因子:0
- 作者:Evans, Richard Todd
- 通讯作者:Evans, Richard Todd
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Richard Evans其他文献
Web services-based knowledge sharing, reuse and integration in the design evaluation of mechanical systems
机械系统设计评估中基于网络服务的知识共享、重用和集成
- DOI:
10.1016/j.rcim.2018.12.010 - 发表时间:
2019-06 - 期刊:
- 影响因子:10.4
- 作者:
Liu Jun;Zhang Zhinan;Richard Evans;Xie Youbai - 通讯作者:
Xie Youbai
Memoirs: A Description of Ephydatia Blembingia, with an Account of the Formation and Structure of the Gemmule
回忆录:对 Ephydatia Blembingia 的描述,以及对 Gemmule 的形成和结构的描述
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Richard Evans - 通讯作者:
Richard Evans
Memoirs: On the Malayan Species of Onychophora. Part II.--The Development of Eoperipatus weldoni
回忆录:关于马来亚甲科物种。
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Richard Evans - 通讯作者:
Richard Evans
Massive Hemoptysis in a Patient With Marantic Endocarditis
- DOI:
10.1378/chest.1376428 - 发表时间:
2012-10-01 - 期刊:
- 影响因子:
- 作者:
Larisa Buyantseva;Andrew Lutzkanin;Eduardo Villarreal;Mubashir Mumtaz;Richard Evans;Hiren Shingala - 通讯作者:
Hiren Shingala
Prophylactic use of epsilon aminocaproic acid for oral surgery in a patient with hereditary angioneurotic edema
- DOI:
10.1016/0091-6749(74)90109-2 - 发表时间:
1974-05-01 - 期刊:
- 影响因子:
- 作者:
Hobert L. Pence;Richard Evans;Louis H. Guernsey;Roy C. Gerhard - 通讯作者:
Roy C. Gerhard
Richard Evans的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Richard Evans', 18)}}的其他基金
COLLABORATIVE RESEARCH: We are thriving: Challenging negative discourse through voices of women in project teams
合作研究:我们正在蓬勃发展:通过项目团队中女性的声音挑战负面言论
- 批准号:
2015741 - 财政年份:2020
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Size, shape and surface properties in realistic models of magnetic nanocrystals
磁性纳米晶体真实模型中的尺寸、形状和表面特性
- 批准号:
EP/P022006/1 - 财政年份:2017
- 资助金额:
$ 38.59万 - 项目类别:
Research Grant
Mapping "missing" conformations of ATP-gated P2X receptor ion channels
绘制 ATP 门控 P2X 受体离子通道“缺失”构象图
- 批准号:
BB/P001076/1 - 财政年份:2016
- 资助金额:
$ 38.59万 - 项目类别:
Research Grant
Cross-linking and molecular modelling to determine the structure and dynamics of the intracellular regions of ATP gated P2X receptor ion channels
交联和分子建模以确定 ATP 门控 P2X 受体离子通道细胞内区域的结构和动力学
- 批准号:
BB/M000990/1 - 财政年份:2014
- 资助金额:
$ 38.59万 - 项目类别:
Research Grant
Integrated mutagenesis, bio-informatic and fluorescence approaches to characterize the molecular basis of antagonist action at P2X7 receptors for ATP
综合诱变、生物信息和荧光方法来表征 ATP P2X7 受体拮抗剂作用的分子基础
- 批准号:
MR/K027018/1 - 财政年份:2013
- 资助金额:
$ 38.59万 - 项目类别:
Research Grant
Mathematics Teacher Development in Central and Northern New Hampshire
新罕布什尔州中部和北部的数学教师发展
- 批准号:
8470632 - 财政年份:1985
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Minority Institutions Science Improvement Program-Individual Institutional Project
少数民族机构科学进步计划-个别机构项目
- 批准号:
7419640 - 财政年份:1974
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
相似国自然基金
桂东北加里东期李家锡矿精细成矿过程:来自锡石和电气石微区元素和同位素的制约
- 批准号:42302109
- 批准年份:2023
- 资助金额:20 万元
- 项目类别:青年科学基金项目
微量元素钒调控能量代谢用于监控结直肠癌治疗及转移抑制的机制研究
- 批准号:62305121
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
元素标记单颗粒等离子体质谱数字检测新方法及其生物分析应用
- 批准号:22374111
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
月壤矿物中撞击诱导的铁元素氧化行为及其对月表氧化环境的指示
- 批准号:42303039
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
稻米镁元素积累新主效QTL克隆和功能研究及其育种利用
- 批准号:32372095
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: Elements: Software: NSCI: Chrono-An open-source simulation platform for computational dynamics problems
合作研究:要素:软件:NSCI:Chrono-计算动力学问题的开源仿真平台
- 批准号:
1835727 - 财政年份:2019
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Elements: NSCI-Software -- A General and Effective B-Spline R-Matrix Package for Charged-Particle and Photon Collisions with Atoms, Ions, and Molecules
元素:NSCI 软件——用于带电粒子和光子与原子、离子和分子碰撞的通用且有效的 B 样条 R 矩阵包
- 批准号:
1834740 - 财政年份:2019
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Collaborative Research: Elements:Software:NSCI: Chrono - An Open-Source Simulation Platform for Computational Dynamics Problems
合作研究:Elements:Software:NSCI: Chrono - 计算动力学问题的开源仿真平台
- 批准号:
1835674 - 财政年份:2019
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Elements: Software: NSCI: A high performance suite of SVD related solvers for machine learning
要素: 软件:NSCI:用于机器学习的 SVD 相关求解器的高性能套件
- 批准号:
1835821 - 财政年份:2019
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant
Collaborative Research: Elements: Software: NSCI: HDR: Building An HPC/HTC Infrastructure For The Synthesis And Analysis Of Current And Future Cosmic Microwave Background Datasets
合作研究:要素:软件:NSCI:HDR:构建 HPC/HTC 基础设施以合成和分析当前和未来的宇宙微波背景数据集
- 批准号:
1835526 - 财政年份:2018
- 资助金额:
$ 38.59万 - 项目类别:
Standard Grant