Elements: PASSPP: Provenance-Aware Scalable Seismic Data Processing with Portability
要素: PASSPP:具有可移植性的来源感知可扩展地震数据处理
基本信息
- 批准号:1931352
- 负责人:
- 金额:$ 23.28万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-11-01 至 2022-10-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Most of what we know about the Earth's deep interior comes from the analysis of ground motion data recording seismic waves produced by large earthquakes from instruments around the entire planet. Seismologists have developed a long list of methods to process modern seismic data to ?image? the Earth?s interior. Much of our understanding of Earth's interior has been limited by the resolution of the tools available to construct these "images". At present, the massive increase in data volume has pushed the data processing infrastructure of seismology to the breaking point. The inability to handle data of this scale has imposed significant barrier to scientific discoveries, especially for the smaller research groups with limited resources. Aiming to help improve this situation, this project introduces a new data management and processing system that is portable and scalable to run on any platforms from a personal computer to a large-scale supercomputer. By leveraging and integrating sophisticated tools from cloud computing and high-performance computing (HPC) communities, the system can fill in the widening gap between the massive data made available by data centers and the inadequacy of data management and processing capability provided with current tools. Seamless discovery, access, transfer, and processing of data and metadata outside of data centers will become possible for the community. This project will also serve as the foundation to enable novel research utilizing massive data to change the way we study the structure, composition, and evolution of the Earth. This project aims to develop a seismic data management and processing system that is composed of a scalable parallel processing framework based on dataflow computation model, a NoSQL database system centered on document store, and a container-based virtualization environment. The scalable processing component will be based on the iterative map-reduce model using Apache Spark to handle scheduling and flow of data through systems of different scales. The provenance-aware data management will be enabled by managing all data created during processing with MongoDB, including process generated metadata, processed waveform data, processing parameters, and the log outputs. All these core components as well as a script to configure and deploy the framework on different systems will be containerized with Singularity to provide portability. All these components serve the two primary goals of the project: produce a system that will allow common seismology algorithms to run effectively on modern HPC platforms; and provide the means for seismologists with average experience in programming to implement their own algorithms to extend the system. The system will serve as the infrastructure to make data intensive research such as deep learning possible for smaller research groups that usually don't have the necessary manpower to manage and process massive data in a sustainable fashion. By enabling the ability to process massive data collected by increasing number of instruments, it will facilitate the transition of the field into data-intensive paradigm of science discovery.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
我们对地球深内部的了解的大部分来自对整个星球周围仪器大地震产生的地面数据记录地震波的分析。 地震学家已经开发了一系列方法来处理现代地震数据到图像?地球的内部。 我们对地球内部的大部分理解都受到可用来构建这些“图像”的工具的分辨率的限制。 目前,数据量的大量增加将地震学的数据处理基础架构推向了断裂点。 无法处理该量表的数据已经对科学发现施加了重大障碍,尤其是对于资源有限的较小的研究小组而言。 为了帮助改善这种情况,该项目引入了一个新的数据管理和处理系统,该系统可移植且可扩展,可在从个人计算机到大型超级计算机的任何平台上运行。 通过利用和整合来自云计算和高性能计算(HPC)社区的复杂工具,该系统可以填补数据中心提供的大量数据之间的扩大差距与数据管理的不足以及与当前工具提供的处理能力不足。 社区将有可能在数据中心之外进行数据和元数据的无缝发现,访问,转移和处理。 该项目还将作为利用大量数据来改变我们研究地球结构,组成和演变的方式的新研究的基础。该项目旨在开发一个基于数据流计算模型的可扩展并行处理框架组成的地震数据管理和处理系统,该框架以数据流计算模型,以文档存储为中心的NOSQL数据库系统以及基于容器的虚拟化环境。 可扩展的处理组件将基于使用Apache Spark的迭代MAP-REDUCE模型来处理通过不同尺度系统的数据调度和流动。 通过管理与MongoDB处理过程中创建的所有数据,包括流程生成的元数据,处理的波形数据,处理参数和日志输出,启用了出处感知的数据管理。 所有这些核心组件以及用于配置和部署框架在不同系统上的脚本都将以奇异性为容器,以提供可移植性。 所有这些组件都达到了该项目的两个主要目标:生成一个系统,该系统将允许常见的地震学算法在现代HPC平台上有效运行;并为具有平均编程经验的地震学家提供了实施自己的算法以扩展系统的手段。 该系统将作为基础架构,以对通常没有必要的人力以可持续的方式管理和处理大量数据的较小研究小组进行数据密集的研究,例如深度学习。 通过能够处理通过增加工具数量收集的大规模数据,它将促进该领域转变为科学发现的数据密集型数据。该奖项反映了NSF的法定任务,并被认为是值得通过基金会的知识分子优点和更广泛影响的审查标准来通过评估来获得支持的。
项目成果
期刊论文数量(1)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MsPASS: A Data Management and Processing Framework for Seismology
MsPASS:地震学数据管理和处理框架
- DOI:
- 发表时间:2021
- 期刊:
- 影响因子:3.3
- 作者:Wang, Yinzhi;Pavlis, Gary L.;Yang, Weiming;Ma, Jinxin
- 通讯作者:Ma, Jinxin
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yinzhi Wang其他文献
Performance Comparison of Julia Distributed Implementations of Dirichlet Process Mixture Models
Dirichlet 过程混合模型的 Julia 分布式实现的性能比较
- DOI:
10.1109/bigdata47090.2019.9005453 - 发表时间:
2019 - 期刊:
- 影响因子:0
- 作者:
Ruizhu Huang;Weijia Xu;Yinzhi Wang;S. Liverani;A. Stapleton - 通讯作者:
A. Stapleton
(U-Th)/He thermochronology of metallic ore deposits in the Liaodong Peninsula: Implications for orefield evolution in northeast China
辽东半岛金属矿床(U-Th)/He热年代学:对中国东北地区矿田演化的启示
- DOI:
10.1016/j.oregeorev.2017.11.025 - 发表时间:
2018 - 期刊:
- 影响因子:3.3
- 作者:
Yinzhi Wang;Fei Wang;Lin Wu;Wenbei Shi;Liekun Yang - 通讯作者:
Liekun Yang
Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper
统一内存架构上的自动 BLAS 卸载:NVIDIA Grace-Hopper 的研究
- DOI:
10.1145/3626203.3670561 - 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Junjie Li;Yinzhi Wang;Xiao Liang;Hang Liu - 通讯作者:
Hang Liu
Perspectives and Experiences Supporting Containers for Research Computing at the Texas Advanced Computing Center
德克萨斯高级计算中心支持研究计算容器的观点和经验
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Erik Ferlanti;William J. Allen;Ernesto A. B. F. Lima;Yinzhi Wang;John Fonner - 通讯作者:
John Fonner
Optimizing GPU-Enhanced HPC System and Cloud Procurements for Scientific Workloads
优化 GPU 增强型 HPC 系统和科学工作负载的云采购
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
R. T. Evans;M. Cawood;Stephen Lien Harrell;Lei Huang;Si Liu;Chun;Amit Ruhela;Yinzhi Wang;Zhao Zhang - 通讯作者:
Zhao Zhang
Yinzhi Wang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yinzhi Wang', 18)}}的其他基金
OAC Core: Cost-Adaptive Monitoring and Real-Time Tuning at Function-Level
OAC核心:功能级成本自适应监控和实时调优
- 批准号:
2402542 - 财政年份:2024
- 资助金额:
$ 23.28万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Seismic COmputational Platform for Empowering Discovery (SCOPED)
合作研究:框架:增强发现能力的地震计算平台(SCOPED)
- 批准号:
2103494 - 财政年份:2021
- 资助金额:
$ 23.28万 - 项目类别:
Standard Grant