Data CI Pilot: VariMat Streaming Polystore Integration of Varied Experimental Materials Data
数据 CI 试点:各种实验材料数据的 VariMat 流式 Polystore 集成
基本信息
- 批准号:2129051
- 负责人:
- 金额:$ 131.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2021
- 资助国家:美国
- 起止时间:2021-10-01 至 2024-09-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
VariMat is a pilot project to integrate experimental data sources with cyberinfrastructure components and bridge the data variety gap in experimental materials research data. The central goal of Materials Science and Engineering is to discover and deploy innovative materials to serve society, but the process is complex and frequently slow. Recent work to accelerate such materials discovery is focused by the Materials Genome Initiative (MGI), which recognizes the critical need for new materials in fields as diverse as energy, transportation, and national security. The MGI centers on harnessing the data revolution and fueling data-intensive methods including artificial intelligence and machine learning. To reach the potential of the MGI, however, there is pressing need for robust, high-performance data cyberinfrastructure (CI) that facilitates machine-actionable data and better implementation of FAIR data principles suited to the materials domain. Among remaining CI gaps, none is more important than the need to integrate experimental data in ways that make it more operable and consonant to the research community needs. This complex gap is compounded by the transdisciplinary nature of materials science and engineering where most projects depend on experimental data collected by highly varied techniques, in distributed labs, and by multiple investigators. The variety and volume of this data layers onto the dispersed nature of materials research to create a data variety gap that impedes both rapid use and valuable reuse of data as required by many data-hungry machine learning methods. The VariMat Data CI is designed to break these barriers with a pilot instantiation in the subdomain of quantum materials and maximize experimental data value across its whole lifecycle. The project links teams from the UCSB Quantum Foundry and PARADIM, a Materials Innovation Platform (MIP) – two of the NSF's premier investments in the Quantum Leap. This linkage integrates strong science drivers with infrastructure development while adopting a strategic focus on influential centers of high-quality, high-volume data production that are conduits to user training and workflows. VariMat will provide investigators with integrated and timely access to the breadth of experimental data needed to enable new discovery pathways and drive novel-materials development that relies on controlled, replicable synthesis; structural and compositional characterization; property determination; and connectability to theory and modeling studies.The proposed research will establish a new paradigm for an integrated data infrastructure leveraging a streaming layer for real-time ingest to a polystore. VariMat uses an automated streaming layer to link instrumental data to a polystore of heterogeneous data management systems that optimize storage, query, and access for disparate data types. The polystore encompasses multiple data models while unifying the query process for users. The VariMat polystore creates a new option for unified management and query of disparate experimental Big Data created across distributed facilities and to expand FAIR data compliance in the materials domain. VariMat implements an innovative stream processing Data Ingress Module for analytics driven ingest of experimental data. Together, the streaming and polystore layers serve a user-oriented web portal that combines advanced search with data analysis, visualization, and compute resources. Such integration will be facilitated by a unified semantic standard specific to the instantiation and that spans the project. VariMat will leverage the PARADIM data model describing synthesis and characterization with a directed acyclic graph (DAG) allowing traversal of the materials entire history. VariMat's "loosely coupled" architecture provides operational and managerial independence of subsystems well suited for geographically distributed systems with on-going evolution in components as is typical in mid-scale or larger materials science research. While the infrastructure fills a critical, community identified gap, VariMat components will be readily deployable and have broad applicability in other scientific fields dependent on distributed, operationally independent instrumental laboratories. Automated deployment and open source components will facilitate ready instantiation in new domains. To maximize impact, concepts and tools developed will be disseminated through freely available, open source codes, online tutorials and data sets, and trainings that leverage existing schools and workshops at the Quantum Foundry and PARADIM.This award by the Office of Advanced Cyberinfrastructure is jointly supported by the Division of Materials Research within the NSF Directorate for Mathematical and Physical Sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
VariMat 是一个试点项目,旨在将实验数据源与网络基础设施组件相集成,并弥合实验材料研究数据中的数据多样性差距。材料科学与工程的中心目标是发现和部署创新材料来服务社会,但过程复杂且复杂。材料基因组计划(MGI)最近致力于加速此类材料的发现,该计划认识到能源、交通和国家安全等领域对新材料的迫切需求。MGI 的重点是利用数据革命。和加油然而,为了发挥 MGI 的潜力,迫切需要强大的高性能数据网络基础设施 (CI),以促进机器可操作的数据和更好地实施适合的公平数据原则。在剩余的 CI 差距中,最重要的是需要以更可操作且符合研究界需求的方式整合实验数据。材料科学和工程的跨学科性质加剧了这种复杂的差距。大多数项目依赖于由多个研究人员在分布式实验室中通过高度多样化的技术收集的实验数据,这些数据的多样性和数量与材料研究的分散性质形成了数据多样性差距,阻碍了数据的快速使用和有价值的重复利用。 VariMat Data CI 旨在通过量子材料子领域的试点实例来打破这些障碍,并在整个生命周期中最大化实验数据的价值。 PARADIM,一个材料创新平台(MIP)——美国国家科学基金会对量子飞跃的两项重要投资。这种联系将强大的科学驱动力与基础设施发展相结合,同时战略重点是具有影响力的高质量、大容量数据生产中心。 VariMat 将为研究人员提供全面、及时的实验数据获取途径,以实现新的发现途径并推动依赖于受控、可复制的合成的新型材料的开发;拟议的研究将为利用流层实时摄取到 Polystore 的集成数据基础设施建立一个新的范例,使用自动流层来链接仪器数据。异构数据管理系统的 Polystore 可以优化不同数据类型的存储、查询和访问。 Polystore 包含多个数据模型,同时统一用户的查询过程。跨分布式设施创建不同的实验大数据,并扩展材料领域的公平数据合规性,VariMat 实施了创新的流处理数据入口模块,用于分析驱动的实验数据摄取,流和 Polystore 层共同服务于面向用户的门户网站。它将高级搜索与数据分析、可视化和计算资源相结合,这种集成将通过特定于实例化的统一语义标准来促进,并且跨越该项目将利用 PARADIM 数据模型来描述具有有向非循环的合成和表征。图 (DAG) 允许遍历材料的整个历史,VariMat 的“松散耦合”架构提供了子系统的操作和管理独立性,非常适合组件不断演变的地理分布式系统,这在中型或大型材料科学研究中很常见。虽然基础设施填补了社区确定的关键空白,但 VariMat 组件将易于部署,并且在依赖于分布式、操作独立的仪器实验室的其他科学领域具有广泛的适用性。自动化部署和开源组件将促进。为了最大限度地提高影响力,开发的概念和工具将通过免费提供的开源代码、在线教程和数据集以及利用 Quantum Foundry 和 PARADIM 现有学校和研讨会的培训来传播。高级网络基础设施办公室由 NSF 数学和物理科学理事会材料研究部共同支持。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力评估进行评估,认为值得支持。优点和更广泛的影响审查标准。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
David Elbert其他文献
OpenMSIStream: A Python package for facilitating integration of streaming data in diverse laboratory environments
OpenMSIStream:一个 Python 包,用于促进不同实验室环境中流数据的集成
- DOI:
10.21105/joss.04896 - 发表时间:
2023-03-09 - 期刊:
- 影响因子:0
- 作者:
M. Eminizer;Sam Tabrisky;Amir Sharifzadeh;C. DiMarco;Jacob M. Diamond;K. T. Ramesh;T. Hufnagel;T. McQueen;David Elbert - 通讯作者:
David Elbert
David Elbert的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('David Elbert', 18)}}的其他基金
Collaborative Research: Disciplinary Improvements: Creating a FAIROS Materials Research Coordination Network (MaRCN) in the Materials Research Data Alliance
协作研究:学科改进:在材料研究数据联盟中创建 FAIROS 材料研究协调网络 (MaRCN)
- 批准号:
2226414 - 财政年份:2022
- 资助金额:
$ 131.63万 - 项目类别:
Standard Grant
Collaborative: Summit on Big Data and Cyberinfrastructure in Materials Research
协作:材料研究中的大数据和网络基础设施峰会
- 批准号:
1933640 - 财政年份:2019
- 资助金额:
$ 131.63万 - 项目类别:
Standard Grant
相似国自然基金
基于“免疫-神经”网络探讨眼针活化CI/RI大鼠MC靶向H3R调节“免疫监视”的抗炎机制
- 批准号:82374375
- 批准年份:2023
- 资助金额:51 万元
- 项目类别:面上项目
ci-Eln促进亲本基因Eln介导的缺氧肺动脉平滑肌细胞增殖的机制研究
- 批准号:
- 批准年份:2021
- 资助金额:30 万元
- 项目类别:青年科学基金项目
森林垂直分层LAI和CI时空变异特征、LiDAR遥感反演与验证研究
- 批准号:
- 批准年份:2021
- 资助金额:59 万元
- 项目类别:面上项目
通过单细胞转录组测序揭示Wolbachia诱导果蝇CI的分子机制
- 批准号:32170497
- 批准年份:2021
- 资助金额:58 万元
- 项目类别:面上项目
近邻星系中[CI]线作为新分子气体质量探针的观测研究
- 批准号:
- 批准年份:2020
- 资助金额:24 万元
- 项目类别:青年科学基金项目
相似海外基金
CI CoE: Demo Pilot: Advancing Research Computing and Data: Strategic Tools, Practices, and Professional Development
CI CoE:演示试点:推进研究计算和数据:战略工具、实践和专业发展
- 批准号:
2100003 - 财政年份:2021
- 资助金额:
$ 131.63万 - 项目类别:
Standard Grant
Data CI Pilot: NCAR and NEON Cyberinfrastructure Collaborations to Enable Convergence Research Linking the Atmospheric and Biological Sciences
数据 CI 试点:NCAR 和 NEON 网络基础设施合作,实现连接大气和生物科学的融合研究
- 批准号:
2039932 - 财政年份:2020
- 资助金额:
$ 131.63万 - 项目类别:
Standard Grant
Data CI Pilot: CI-Based Collaborative Development of Data-Driven Interatomic Potentials for Predictive Molecular Simulations
数据 CI 试点:基于 CI 的数据驱动原子间势的协作开发,用于预测分子模拟
- 批准号:
2039575 - 财政年份:2020
- 资助金额:
$ 131.63万 - 项目类别:
Standard Grant