喵ID:3VVsGE免责声明

FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

基本信息

DOI:
10.1145/3588195.3592994
发表时间:
2023-04
期刊:
Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing
影响因子:
--
通讯作者:
Bo Zhang;Jiannan Tian;S. Di;Xiaodong Yu;Yunhe Feng;Xin Liang;Dingwen Tao;F. Cappello
中科院分区:
其他
文献类型:
--
作者: Bo Zhang;Jiannan Tian;S. Di;Xiaodong Yu;Yunhe Feng;Xin Liang;Dingwen Tao;F. Cappello研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Today's large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high- ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.
当今运行在高性能计算(HPC)系统上的大规模科学应用程序会产生大量的数据,这些数据量已成为减轻存储伯恩的关键技术。在许多需要快速压缩的应用中,在许多应用程序中(例如内存压缩)达到了高压比率。工作,我们在科学数据上开发了GPU上的快速和高速公路损失压缩机(称为FZ-GPU),我们首先设计了一个新的压缩管道,该管道包括完全并行的量化,然后,我们提出了一系列对管道中每个内核的深度架构优化,以充分利用CUDA架构。 BITSHUFFLE中的位置操作的冲突,最大化共享内存利用率,并通过融合不同的压缩核来消除不必要的数据运动。 A100 GPU上的SDRBENCH表明,FZ-GPU的平均加速度为4.2倍,平均速度为37.0×在相同的误差下,我们的算法的多线程CPU实现也可以达到2.3倍的平均速度,而在相同数据失真下,平均压缩率提高了2.0×。
参考文献(60)
被引文献(5)

数据更新时间:{{ references.updateTime }}

Bo Zhang;Jiannan Tian;S. Di;Xiaodong Yu;Yunhe Feng;Xin Liang;Dingwen Tao;F. Cappello
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓