喵ID:k8Cfr5免责声明

Surrogate Modeling for HPC Application Iteration Times Forecasting with Network Features

具有网络特征的 HPC 应用程序迭代时间预测的代理建模

基本信息

DOI:
--
发表时间:
2024
期刊:
SIGSIM Principles of Advanced Discrete Simulation
影响因子:
--
通讯作者:
Kai Shu
中科院分区:
文献类型:
--
作者: Xiongxiao Xu;Kevin A. Brown;Tanwi Mallick;Xin Wang;Elkin Cruz;Robert B. Ross;Christopher D. Carothers;Zhiling Lan;Kai Shu研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Interconnect networks are the foundation for modern high performance computing (HPC) systems. Parallel discrete event simulation (PDES), serving as a cornerstone in the study of large-scale networking systems by modeling and simulating the real-world behaviors of HPC facilities, faces escalating computational complexities at an unsustainable scale. The research community is interested in building a surrogate-ready PDES framework where an accurate surrogate model can be used to forecast HPC behaviors and replace computationally expensive PDES phases. In this paper, we focus on forecasting application iteration times, the key indicator of large-scale networking performance, with network features, such as bandwidth-consumed and busy time on routers. We introduce five representative methods, including LAST, Average, ARIMA, LSTM, and the proposed framework LSTM-Feat, to forecast the iteration times of an exemplar application MILC running on a dragonfly system. By incorporating network features, LSTM-Feat can understand dependencies between network features and iteration times, thus facilitating forecasts. The experiments demonstrate the effectiveness of incorporating network features into surrogate models and the potential of surrogate models to accelerate PDES.
互连网络是现代高性能计算(HPC)系统的基础。并行离散事件模拟(PDES)作为通过对高性能计算设施的实际行为进行建模和模拟来研究大规模网络系统的基石,面临着难以承受的不断增加的计算复杂性。研究界有兴趣构建一个可使用替代模型的PDES框架,在该框架中,准确的替代模型可用于预测高性能计算行为并取代计算成本高昂的PDES阶段。在本文中,我们专注于利用网络特征(如路由器上的带宽消耗和繁忙时间)来预测应用迭代时间,这是大规模网络性能的关键指标。我们介绍了五种有代表性的方法,包括LAST、平均法、ARIMA、LSTM以及所提出的框架LSTM - Feat,用于预测在蜻蜓系统上运行的示例应用MILC的迭代时间。通过纳入网络特征,LSTM - Feat能够理解网络特征与迭代时间之间的依赖关系,从而有助于预测。实验证明了将网络特征纳入替代模型的有效性以及替代模型加速PDES的潜力。
参考文献(1)
被引文献(0)
Trade-Off Study of Localizing Communication and Balancing Network Traffic on a Dragonfly System
Dragonfly 系统上本地化通信和平衡网络流量的权衡研究
DOI:
发表时间:
2018
期刊:
IPDPS .... [proceedings]
影响因子:
0
作者:
Wang, Xin;Mubarak, Misbah;Yang, Xu;Ross, Rob;Lan, Zhiling
通讯作者:
Lan, Zhiling

数据更新时间:{{ references.updateTime }}

Kai Shu
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓