SHF: Small: Methods, Workflows, and Data Commons for Reducing Training Costs in Neural Architecture Search on High-Performance Computing Platforms
SHF:小型:降低高性能计算平台上神经架构搜索训练成本的方法、工作流程和数据共享
基本信息
- 批准号:2223704
- 负责人:
- 金额:$ 62.4万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-10-01 至 2025-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
Neural networks are powerful artificial-intelligence models that capture embedded knowledge in scientific data automatically. Scientists can use the knowledge to solve problems in domains such as physics, materials science, neuroscience, and medical imaging, among others. Finding accurate neural networks for a specific scientific dataset or particular problem comes at a high training cost: it requires searching among thousands of neural networks on a large number of high-performance-computing resources. This project delivers methods, workflows, and a data commons for reducing the training cost of neural networks. The methods are based on parametric modeling and enable rapid search termination early in the training process, making the search process faster and cheaper. The workflows decouple the search from the accuracy prediction of neural networks for different datasets and problems. The data commons shares the full provenance of the neural networks so other scientists can deploy the neural networks in their own research. Advances in neural networks research have a far-reaching impact on many scientific applications. Accurate neural networks can be used to extract structural information from raw microscopy data, predict performance of business processes, analyze cancer pathology data, map protein sequences to folds, and predict soil moisture or crop yield. The researchers’ efforts to build a broader community of high-performance-computing experts also have a far-reaching impact on the efficient design and use of artificial-intelligence products. The team of researchers promotes increased participation of underrepresented students, particularly women, through mentoring of students in Systers (the organization for women in Electrical Engineering and Computer Science at the University of Tennessee Knoxville). Furthermore, the researchers also develop curricula tailored for a diverse population of graduate and undergraduate students across scientific domains beyond the department of computer science.This project addresses the urgent need to reduce the use of high-performance-computing resources for the training of neural networks, while assuring explainable, reproducible and nearly-optimal neural networks. To this end, the team of researchers proposes a flexible fitness-prediction method that uses parametric modeling to predict future fitness of neural networks and allow for early termination of the training process. Through this project, the researchers create an index of effective parametric functions for a diverse suite of fitness curves, including edge cases in the modeling (e.g., neural networks that never learn or neural networks that experience a learning delay). The researchers transform neural-architecture search implementations from tightly-coupled, monolithic software tools embedding both search and prediction into a flexible, modular workflow in which search and prediction are decoupled. Project workflows enable users to reduce training cost, increase neural-architecture search throughput, and adapt fitness predictions to different fitness measurements, datasets, and problems. The researchers build a searchable and reusable neural-network data commons of record trails that capture the neural network’s lifespan through generation, training, and validation stages, recording the neural network architecture, the training dataset, and loss and accuracy values throughout each stage. The neural-network data commons enables users to study the evolution of neural-network performance during training and identify relationships between a neural network’s architecture and its performance on a given dataset with specific properties, ultimately supporting effective searches for accurate neural networks across a spectrum of real-world scientific datasets. Furthermore, the data commons provides the scientific community with a resource to study the relationships between datasets, network architectures, and performance. To assess robustness for different datasets, the project considers both well-known benchmark datasets and real-world scientific datasets of protein diffraction patterns from x–ray electron laser beams in protein structural analysis, crop-scouting images from drones in precision farming, and forestry-scouting drone images for wildfire prevention.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
神经网络是强大的人工智能模型,可以自动捕获科学数据中的嵌入式知识,科学家可以利用这些知识来解决物理、材料科学、神经科学和医学成像等领域的问题,为特定领域找到准确的神经网络。科学数据集或特定问题的训练成本很高:它需要在大量高性能计算资源上的数千个神经网络中进行搜索。该项目提供了方法、工作流程和数据共享,以降低神经网络的训练成本。这些方法基于网络。参数建模并能够在训练过程的早期快速终止搜索,从而使搜索过程更快、更便宜,从而将搜索与神经网络对不同数据集和问题的准确性预测分离。数据共享共享神经网络的完整来源。因此其他科学家可以在自己的研究中部署神经网络。神经网络研究的进步对许多科学应用产生深远的影响。准确的神经网络可用于从原始显微镜数据中提取结构信息,预测业务流程的性能。分析癌症病理学研究人员建立更广泛的高性能计算专家社区的努力也对人工智能产品的有效设计和使用产生了深远的影响。研究人员团队通过对 Systers(田纳西大学诺克斯维尔大学电气工程和计算机科学领域的女性组织)的学生进行指导,促进代表性不足的学生(尤其是女性)的参与。此外,研究人员还开发了针对女性的课程。计算机科学系以外的不同科学领域的研究生和本科生群体。该项目解决了减少使用高性能计算资源来训练神经网络的迫切需要,同时确保可解释、可重复和近乎最优为此,研究人员团队提出了一种灵活的适应度预测方法,该方法使用参数模型来预测神经网络的未来适应度,并允许提前终止训练过程,通过该项目,研究人员创建了一个有效性指数。多样化套件的参数化功能适应度曲线,包括建模中的边缘情况(例如,从不学习的神经网络或经历学习延迟的神经网络)研究人员将神经架构搜索实现从嵌入搜索和预测的紧密耦合的整体软件工具转变为。研究人员将搜索和预测分离的灵活的模块化工作流程使用户能够降低训练成本,提高神经架构搜索吞吐量,并使适应度预测适应不同的适应度测量、数据集和问题。构建可搜索和可重用的神经网络数据共享记录轨迹,通过生成、训练和验证阶段捕获神经网络的生命周期,记录每个阶段网络的神经网络架构、训练数据集以及损失和准确性值。数据共享使用户能够研究训练期间神经网络性能的演变,并识别神经网络的架构与其在具有特定属性的给定数据集上的性能之间的关系,最终支持跨现实世界科学领域的准确神经网络的有效搜索数据集。 data commons 为科学界提供了研究数据集、网络架构和性能之间关系的资源,为了评估不同数据集的稳健性,该项目考虑了众所周知的基准数据集和来自 x 的蛋白质衍射图案的真实科学数据集。 – 蛋白质结构分析中的射线电子激光束、精准农业中无人机的作物侦察图像以及用于野火预防的林业侦察无人机图像。该奖项反映了 NSF 的法定使命,并通过使用基金会的智力价值和更广泛的影响审查标准。
项目成果
期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification
使用蛋白质分类的原位分析加速神经架构搜索的可组合工作流程
- DOI:10.1145/3605573.3605636
- 发表时间:2023-08
- 期刊:
- 影响因子:0
- 作者:Channing, Georgia;Patel, Ria;Olaya, Paula;Rorabaugh, Ariel;Miyashita, Osamu;Caino;Schuman, Catherine;Tama, Florence;Taufer, Michela
- 通讯作者:Taufer, Michela
Building High-throughput Neural Architecture Search Workflows via a Decoupled Fitness Prediction Engine
- DOI:10.1109/tpds.2022.3140681
- 发表时间:2024-09-13
- 期刊:
- 影响因子:5.3
- 作者:A. Rorabaugh;Silvina Caíno;Travis Johnston;M. Taufer
- 通讯作者:M. Taufer
High frequency accuracy and loss data of random neural networks trained on image datasets
在图像数据集上训练的随机神经网络的高频精度和损失数据
- DOI:10.1016/j.dib.2021.107780
- 发表时间:2022-03
- 期刊:
- 影响因子:1.2
- 作者:Rorabaugh AK;Caíno-Lores S;Johnston T;Taufer M
- 通讯作者:Taufer M
A Methodology to Generate Efficient Neural Networks for Classification of Scientific Datasets
生成用于科学数据集分类的高效神经网络的方法
- DOI:10.1109/escience55777.2022.00052
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Patel, Ria;Rorabaugh, Ariel Keller;Olaya, Paula;Caino;Channing, Georgia;Schuman, Catherine;Miyashita, Osamu;Tama, Florence;Taufer, Michela
- 通讯作者:Taufer, Michela
Identifying Structural Properties of Proteins from X-ray Free Electron Laser Diffraction Patterns
从 X 射线自由电子激光衍射图识别蛋白质的结构特性
- DOI:10.1109/escience55777.2022.00017
- 发表时间:2022-10
- 期刊:
- 影响因子:0
- 作者:Olaya, Paula;Caino;Lama, Vanessa;Patel, Ria;Rorabaugh, Ariel Keller;Miyashita, Osamu;Tama, Florence;Taufer, Michela
- 通讯作者:Taufer, Michela
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Michela Taufer其他文献
Scalable Incremental Checkpointing using GPU-Accelerated De-Duplication
- DOI:
10.1145/3605573.3605639 - 发表时间:
2023-08-07 - 期刊:
- 影响因子:0
- 作者:
Nigel Tan;Jakob Luettgau;Jack Marquez;K. Teranishi;Nicolas Morales;Sanjukta Bhowmick;Franck Cappello;Michela Taufer;Bogdan Nicolae - 通讯作者:
Bogdan Nicolae
Integrating FAIR Digital Objects (FDOs) into the National Science Data Fabric (NSDF) to Revolutionize Dataflows for Scientific Discovery
将 FAIR 数字对象 (FDO) 集成到国家科学数据结构 (NSDF) 中,彻底改变科学发现的数据流
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Michela Taufer;Heberth Martinez;Jakob Luettgau;Lauren Whitnah;†. GiorgioScorzelli;†. PaniaNewel;Aashish Panta;Timo Bremer;§. DougFils;¶. ChristineR.Kirkpatrick;Nina McCurdy;V. Pascucci;U. Knoxville;†. U.Utah;R. LLNL ‡;Research Center - 通讯作者:
Research Center
Computational multiscale modeling in protein--ligand docking
蛋白质-配体对接的计算多尺度建模
- DOI:
10.1109/memb.2009.931789 - 发表时间:
2009-04-03 - 期刊:
- 影响因子:0
- 作者:
Michela Taufer;R. Armen;Jianhan Chen;Patricia Teller;Charles Brooks - 通讯作者:
Charles Brooks
NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery
NSDF 服务:将网络、存储和计算服务集成到数据交付民主化的测试平台中
- DOI:
10.1145/3603166.3632136 - 发表时间:
2023-12-04 - 期刊:
- 影响因子:0
- 作者:
Jakob Luettgau;Heberth Martinez;Paula Olaya;G. Scorzelli;G. Tarcea;Jay F. Lofstead;Christine R. Kirkpatrick;Valerio Pascucci;Michela Taufer - 通讯作者:
Michela Taufer
Special issue of computer communications on information and future communication security
计算机通信信息与未来通信安全专刊
- DOI:
10.1016/j.comcom.2010.10.008 - 发表时间:
2011-03-01 - 期刊:
- 影响因子:0
- 作者:
Jong Hyuk Park;Sheikh Iqbal Ahamed;Willy Susilo;Michela Taufer - 通讯作者:
Michela Taufer
Michela Taufer的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Michela Taufer', 18)}}的其他基金
EAGER: A Comprehensive Approach for Generating, Sharing, Searching, and Using High-Resolution Terrain Parameters
EAGER:生成、共享、搜索和使用高分辨率地形参数的综合方法
- 批准号:
2334945 - 财政年份:2023
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative Research: SHF: Small: Model-driven Design and Optimization of Dataflows for Scientific Applications
协作研究:SHF:小型:科学应用数据流的模型驱动设计和优化
- 批准号:
2331152 - 财政年份:2023
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative Research: Elements: SENSORY: Software Ecosystem for kNowledge diScOveRY - a data-driven framework for soil moisture applications
协作研究:要素:SENSORY:知识发现的软件生态系统 - 土壤湿度应用的数据驱动框架
- 批准号:
2103845 - 财政年份:2021
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative Research: Elements: SENSORY: Software Ecosystem for kNowledge diScOveRY - a data-driven framework for soil moisture applications
协作研究:要素:SENSORY:知识发现的软件生态系统 - 土壤湿度应用的数据驱动框架
- 批准号:
2103845 - 财政年份:2021
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative Research: EAGER: Advancing Reproducibility in Multi-Messenger Astrophysics
合作研究:EAGER:提高多信使天体物理学的可重复性
- 批准号:
2041977 - 财政年份:2020
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative Research: PPoSS: Planning: Performance Scalability, Trust, and Reproducibility: A Community Roadmap to Robust Science in High-throughput Applications
协作研究:PPoSS:规划:性能可扩展性、信任和可重复性:高通量应用中稳健科学的社区路线图
- 批准号:
2028923 - 财政年份:2020
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF: Medium: Collaborative Research: ANACIN-X: Analysis and modeling of Nondeterminism and Associated Costs in eXtreme scale applications
SHF:中:协作研究:ANACIN-X:极端规模应用中的非确定性和相关成本的分析和建模
- 批准号:
1900888 - 财政年份:2019
- 资助金额:
$ 62.4万 - 项目类别:
Continuing Grant
BIGDATA: IA: Collaborative Research: In Situ Data Analytics for Next Generation Molecular Dynamics Workflows
BIGDATA:IA:协作研究:下一代分子动力学工作流程的原位数据分析
- 批准号:
1841758 - 财政年份:2018
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF:Medium:Collaborative Research:A comprehensive methodology to pursue reproducible accuracy in ensemble scientific simulations on multi- and many-core platforms
SHF:中:协作研究:在多核和众核平台上追求集合科学模拟的可重复精度的综合方法
- 批准号:
1841552 - 财政年份:2018
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
Collaborative: EAGER: Exploring and Advancing the State of the Art in Robust Science in Gravitational Wave Physics
合作:EAGER:探索和推进引力波物理学稳健科学的最新技术
- 批准号:
1823372 - 财政年份:2018
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
相似国自然基金
合成方法学驱动的新型靶向LCK激酶小分子抑制剂的设计、合成及抗急性T淋巴细胞白血病的作用机制研究
- 批准号:22307009
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
有机小分子插入共价有机框架调控电化学发光性能及对铀的分析新方法研究
- 批准号:22376023
- 批准年份:2023
- 资助金额:50 万元
- 项目类别:面上项目
综合应用多组学方法鉴定大豆-根瘤菌共生固氮中有功能的小肽
- 批准号:32300219
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于无监督深度学习的复材小尺寸缺陷热成像表征方法研究
- 批准号:62301507
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
基于复杂抽样和时空效应下卫生服务调查数据的小域估计方法研究
- 批准号:82304238
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
相似海外基金
SHF: Small: Methods and Architectures for Optimization and Hardware Acceleration of Spiking Neural Networks
SHF:小型:尖峰神经网络优化和硬件加速的方法和架构
- 批准号:
2310170 - 财政年份:2023
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF: Small: Efficient, Deterministic and Formally Certified Methods for Solving Low-dimensional Linear Programs with Floating-point Precision
SHF:小型:用于以浮点精度求解低维线性程序的高效、确定性且经过正式认证的方法
- 批准号:
2312220 - 财政年份:2023
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF: Small: Algorithms and Software for Scalable Kernel Methods
SHF:小型:可扩展核方法的算法和软件
- 批准号:
1817048 - 财政年份:2018
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF:Small:New models, design, and test methods for long-term aging of nanometer VLSI
SHF:Small:纳米VLSI长期老化的新模型、设计和测试方法
- 批准号:
1719047 - 财政年份:2017
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant
SHF: Small: Formal Methods for Modern System Configuration Languages
SHF:小:现代系统配置语言的形式化方法
- 批准号:
1717636 - 财政年份:2017
- 资助金额:
$ 62.4万 - 项目类别:
Standard Grant