Collaborative Research: CNS Core: Small: Optimizing Large-Scale Heterogeneous ML Platforms

合作研究:CNS Core:小型:优化大规模异构机器学习平台

基本信息

  • 批准号:
    2146909
  • 负责人:
  • 金额:
    $ 25万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2022
  • 资助国家:
    美国
  • 起止时间:
    2022-01-01 至 2024-12-31
  • 项目状态:
    已结题

项目摘要

Large-scale artificial intelligence and machine learning (AI/ML) platforms are playing a vital role in the current data revolution. To minimize efforts from users, an end-to-end solution is desired to deploy complex workflow over possibly heterogeneous computing clusters. However, the scheduling and resource management problems behind such “push-button” deployment are challenging. If left unsolved, these costly systems will be severely under-utilized, leading to unnecessary electricity consumption and greenhouse gas emissions. This project will develop efficient resource allocation policies for distributed, large-scale AI/ML systems to tackle the challenges. Specifically, this project will accelerate and parallelize the large-scale optimization and inference tasks that dominate workloads in AI/ML platforms via distributed optimization that provides fault tolerance and robustness to stragglers in heterogeneous settings. Built upon the distributed optimization, the project will further schedule AI/ML workflows with precedence constraints among sub-tasks. Finally, heterogeneous resources are allocated among jobs fairly and efficiently in the case where the resources being allocated are exchangeable, which is key for AI/ML platforms with graphic processing units (GPUs) and other accelerators. The project will provide new fundamental algorithms for scheduling and resource allocation in AI/ML platforms used across academia and industry. The algorithmic ideas will be developed in the context of core, classical models and so will apply more broadly than AI/ML platforms, e.g., to networking, storage, supply chain management, and beyond. The project will seek to broaden the participation of underrepresented groups in Science, Technology, Engineering and Mathematics by planned activities including the development of accelerated mathematics programs for middle school students, summer programs for middle-school and high-school students, and summer research programs for undergraduate students.The project will make its software artifacts, datasets, and research results available to the research community on the project website at https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/ Artifacts will be maintained for a minimum of 10 years.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
大规模人工智能和机器学习 (AI/ML) 平台在当前的数据革命中发挥着至关重要的作用,为了最大限度地减少用户的工作量,需要一种端到端的解决方案来在可能的异构计算集群上部署复杂的工作流程。然而,这种“按钮式”部署背后的调度和资源管理问题具有挑战性,如果不解决,这些成本高昂的系统将严重得不到充分利用,导致不必要的电力消耗和温室气体排放。该项目将开发有效的资源分配。分布式、大规模的政策具体来说,该项目将通过分布式优化来加速和并行化主导 AI/ML 平台工作负载的大规模优化和推理任务,为异构环境中的落后者提供容错性和鲁棒性。通过分布式优化,该项目将进一步调度人工智能/机器学习工作流程,并在子任务之间设置优先级约束。最后,在分配的资源可互换的情况下,异构资源在作业之间公平、高效地分配,这是人工智能/机器学习的关键。平台该项目将为学术界和工业界使用的人工智能/机器学习平台中的调度和资源分配提供新的基础算法。算法思想将在核心、经典模型等的背景下开发。将比人工智能/机器学习平台应用更广泛,例如网络、存储、供应链管理等。该项目将寻求通过计划活动,包括开发加速的中学生的数学课程、中学生和高中生的暑期课程以及本科生的暑期研究计划。该项目将在项目网站上向研究界提供其软件工件、数据集和研究结果: https://adamwierman.com/optimizing-large-scale-heterogeneous-ml-platforms/ 工件将保留至少 10 年。该奖项反映了 NSF 的法定使命,并通过使用基金会的评估进行评估,认为值得支持智力价值和更广泛的影响审查标准。

项目成果

期刊论文数量(5)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Deep Learning-Assisted Online Task Offloading for Latency Minimization in Heterogeneous Mobile Edge
  • DOI:
    10.1109/tmc.2023.3285882
  • 发表时间:
    2024-05
  • 期刊:
  • 影响因子:
    7.9
  • 作者:
    Yu Liu;Yingling Mao;Z. Liu;Yuanyuan Yang
  • 通讯作者:
    Yu Liu;Yingling Mao;Z. Liu;Yuanyuan Yang
Applied Online Algorithms with Heterogeneous Predictors
  • DOI:
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jessica Maghakian;Russell Lee;M. Hajiesmaili;Jian Li;R. Sitaraman;Zhenhu Liu
  • 通讯作者:
    Jessica Maghakian;Russell Lee;M. Hajiesmaili;Jian Li;R. Sitaraman;Zhenhu Liu
Online Container Scheduling for Data-intensive Applications in Serverless Edge Computing
  • DOI:
    10.1109/infocom53939.2023.10229034
  • 发表时间:
    2023-05
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiaojun Shang;Yingling Mao;Yu Liu;Yaodong Huang;Zhen Liu;Yuanyuan Yang
  • 通讯作者:
    Xiaojun Shang;Yingling Mao;Yu Liu;Yaodong Huang;Zhen Liu;Yuanyuan Yang
Joint Task Offloading and Resource Allocation in Heterogeneous Edge Environments
Energy-Aware Online Task Offloading and Resource Allocation for Mobile Edge Computing
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhenhua Liu其他文献

Obesity: An independent protective factor for localized renal cell carcinoma in a systemic inflammation state
肥胖:全身炎症状态下局限性肾细胞癌的独立保护因素
Luminescence-Resonance-Energy-Transfer-Based Luminescence Nanoprobe for In Situ Imaging of CD36 Activation and CD36−oxLDL Binding in Atherogenesis.
基于发光共振能量转移的发光纳米探针,用于动脉粥样硬化形成过程中 CD36 激活和 CD36-oxLDL 结合的原位成像
  • DOI:
    10.1021/acs.analchem.9b01398
  • 发表时间:
    2019
  • 期刊:
  • 影响因子:
    7.4
  • 作者:
    Yuhui Sun;Wen Gao*;Zhenhua Liu;Huazhen Yang;Wenhua Cao;Lili Tong;Bo Tang*
  • 通讯作者:
    Bo Tang*
New iridoids from Patrinia scabiosaefolia and their hypoglycemic effects by activating PI3K/Akt signaling pathway
败酱草新环烯醚萜类化合物及其激活 PI3K/Akt 信号通路的降血糖作用
  • DOI:
    10.1016/j.fitote.2022.105423
  • 发表时间:
    2023
  • 期刊:
  • 影响因子:
    3.4
  • 作者:
    Zhenhua Liu;Lijun Meng;Mengke Wang;Li Wang;Yuhang Liu;Gaixia Hou;Shiming Li;Wenyi Kang
  • 通讯作者:
    Wenyi Kang
A non-interior-point smoothing method for variational inequality problem
变分不等式问题的非内点平滑方法
  • DOI:
    10.1016/j.cam.2010.01.011
  • 发表时间:
    2010
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Xiangsong Zhang;Sanyang Liu;Zhenhua Liu
  • 通讯作者:
    Zhenhua Liu
Transperineal 3-core Magnetic Resonance Imaging Ultrasound Fusion Targeted plus laterally 6-core Systematic Biopsy in Prostate Cancer Diagnosis
经会阴三核磁共振成像超声融合靶向加侧向六核系统活检在前列腺癌诊断中的应用
  • DOI:
  • 发表时间:
    2024
  • 期刊:
  • 影响因子:
    3.2
  • 作者:
    Chichen Zhang;Qiyou Wu;Qiong Zhang;Mengni Zhang;Diming Cai;Ling Nie;Xueqin Chen;Zhenhua Liu;Tianhai Lin;Shulei Xiao;Lu Yang;Shi Qiu;Yige Bao;Qiang Wei;X. Tu
  • 通讯作者:
    X. Tu

Zhenhua Liu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhenhua Liu', 18)}}的其他基金

Collaborative Research: CNS Core: Medium: Dynamic Data-driven Systems - Theory and Applications
合作研究:CNS 核心:媒介:动态数据驱动系统 - 理论与应用
  • 批准号:
    2106027
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CAREER: An adaptive framework to accelerate real-time workloads in heterogeneous and reconfigurable environments
职业:一个自适应框架,可在异构和可重新配置的环境中加速实时工作负载
  • 批准号:
    2046444
  • 财政年份:
    2021
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
NeTS: Small: Collaborative Research: Enabling Application-Level Performance Predictability in Public Clouds
NeTS:小型:协作研究:在公共云中实现应用程序级性能可预测性
  • 批准号:
    1617698
  • 财政年份:
    2016
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
CRII: NeTS: Enabling Demand Response from Cloud Data Centers -- from Sustainable IT to IT for Sustainability
CRII:NeTS:实现云数据中心的需求响应——从可持续 IT 到 IT 促进可持续发展
  • 批准号:
    1464388
  • 财政年份:
    2015
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant

相似国自然基金

染色质重塑因子CHD3调控中枢神经系统少突胶质细胞发育的机制研究
  • 批准号:
    82301950
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
体细胞突变诱导的壁细胞缺陷在中枢神经系统血管畸形出血中的作用机制及干预研究
  • 批准号:
    82330038
  • 批准年份:
    2023
  • 资助金额:
    220 万元
  • 项目类别:
    重点项目
IL-17A通过STAT5影响CNS2区域甲基化抑制调节性T细胞功能在银屑病发病中的作用和机制研究
  • 批准号:
    82304006
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于人体镜像中枢神经系统和信任度的假肢互适应机制研究
  • 批准号:
    62363006
  • 批准年份:
    2023
  • 资助金额:
    31 万元
  • 项目类别:
    地区科学基金项目
S100A9作为万古霉素儿童中枢神经系统抗感染个体化治疗预测因子的机制研究和量效分析
  • 批准号:
    82304631
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

Collaborative Research: CNS Core: Medium: Reconfigurable Kernel Datapaths with Adaptive Optimizations
协作研究:CNS 核心:中:具有自适应优化的可重构内核数据路径
  • 批准号:
    2345339
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Small: A Compilation System for Mapping Deep Learning Models to Tensorized Instructions (DELITE)
合作研究:CNS Core:Small:将深度学习模型映射到张量化指令的编译系统(DELITE)
  • 批准号:
    2230945
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: NSF-AoF: CNS Core: Small: Towards Scalable and Al-based Solutions for Beyond-5G Radio Access Networks
合作研究:NSF-AoF:CNS 核心:小型:面向超 5G 无线接入网络的可扩展和基于人工智能的解决方案
  • 批准号:
    2225578
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
Collaborative Research: CNS Core: Medium: Movement of Computation and Data in Splitkernel-disaggregated, Data-intensive Systems
合作研究:CNS 核心:媒介:Splitkernel 分解的数据密集型系统中的计算和数据移动
  • 批准号:
    2406598
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Continuing Grant
Collaborative Research: CNS Core: Small: SmartSight: an AI-Based Computing Platform to Assist Blind and Visually Impaired People
合作研究:中枢神经系统核心:小型:SmartSight:基于人工智能的计算平台,帮助盲人和视障人士
  • 批准号:
    2418188
  • 财政年份:
    2023
  • 资助金额:
    $ 25万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了