SHF:Small: Solving the Problem of Scalable Multi-Precision Matrix Arithmetic on GPUs
SHF:Small:解决 GPU 上可扩展多精度矩阵算术问题
基本信息
- 批准号:1217590
- 负责人:
- 金额:$ 45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-06-01 至 2016-05-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Computers directly support arithmetic that is typically limited to 64 bits (about 19 decimal digits) of precision. Applications that need more precision must implement arithmetic through computationally expensive software. Beyond about 256 bits of precision, such calculations become quite costly. The RSA encryption algorithm, for example, can require arithmetic with up to 4096 bits of precision. Applications in areas such as experimental mathematics and number theory can require millions of bits of precision. One multiplication with 10 million bits of precision can take a tenth of a second to compute on a modern processor, which means that matrix arithmetic using such large values can take days to weeks to execute. In previous work the investigators have shown that it is possible to obtain a factor of 20 improvement in performance by utilizing the parallel processing capabilities of a commodity graphics processing unit (GPU) in place of the traditional CPU. However, programming a GPU to achieve this level of performance is quite difficult, and the resulting code requires considerable hand-tuning to move it to new generations of GPU and gain the advantage of their performance, which is scaling up at a rate that exceeds CPU performance scaling.This project is working to develop a framework that automatically generates and tunes multi-precision arithmetic libraries to execute on successive generations of GPUs. The libraries include both scalar and basic matrix arithmetic routines. They support scaling in precision as well as matrix size. The problem is challenging because different parallel algorithms must be automatically selected for different levels of precision, which must be balanced with the exploitation of the alternate dimension of parallelism inherent in matrix arithmetic. In addition, the work seeks to employ distributed parallelism across a cluster of computers enhanced with GPUs, so that the libraries can be used on a new generation of GPU-based supercomputers that is beginning to be deployed at national laboratories. The work is significant because it enables easier exploitation of low-cost commodity graphics processors to achieve more than an order of magnitude increase in performance for multi-precision scalar and matrix arithmetic. One important application is enhancing performance of RSA encryption to support longer, more secure keys, at greater data rates, so that it becomes feasible to encrypt greater volumes of internet traffic. Another important use is experimental mathematics, where computationally expensive functions (e.g., integrals, infinite series) are computed at high precision and compared to other functions and high precision constants to help identify more efficient closed-form solutions. Results from experimental mathematics have found applications in particle physics, chaos theory, and calculation of fundamental constants. The resulting software framework offers a significant performance enhancement for multi-precision arithmetic to systems that range from individual researcher workstations to large supercomputers.
计算机直接支持算术,该算术通常仅限于64位(约19位小数位)精度。需要更精确的应用程序必须通过计算昂贵的软件来实现算术。超过大约256位的精度,此类计算变得非常昂贵。例如,RSA加密算法可能需要算术,最多需要4096位精确度。在实验数学和数字理论等领域的应用可能需要数百万的精确度。一个具有1000万位精确度的乘法可能需要十分之一的时间来计算现代处理器,这意味着使用如此庞大的值矩阵算术可能需要几天到几周才能执行。在先前的工作中,调查人员表明,通过利用商品图形处理单元(GPU)代替传统CPU,可以通过利用商品图形处理单元(GPU)的并行处理能力来提高性能。 However, programming a GPU to achieve this level of performance is quite difficult, and the resulting code requires considerable hand-tuning to move it to new generations of GPU and gain the advantage of their performance, which is scaling up at a rate that exceeds CPU performance scaling.This project is working to develop a framework that automatically generates and tunes multi-precision arithmetic libraries to execute on successive generations of GPUs.库包括标量和基本矩阵算术例程。它们支持精确度和矩阵尺寸的缩放。该问题是具有挑战性的,因为必须自动选择不同的平行算法以不同的精度级别,这必须与矩阵算术中固有的并行性的替代维度的利用平衡。此外,该作品旨在利用GPU增强的一组计算机群中采用分布式并行性,以便可以将图书馆用于新一代的基于GPU的超级计算机,这些超级计算机已开始在国家实验室部署。这项工作很重要,因为它可以更轻松地利用低成本商品图形处理器,以实现多个精确标量和矩阵算术的性能的数量级增加。一个重要的应用程序是增强RSA加密的性能,以更长的数据速率支持更长,更安全的密钥,以便加密更多互联网流量的量变得可行。另一个重要用途是实验数学,其中计算昂贵的函数(例如积分,无限序列)以高精度计算,并与其他功能和高精度常数进行比较,以帮助识别更有效的闭合式溶液。实验数学的结果发现了在粒子物理学,混乱理论和基本常数计算中的应用。最终的软件框架为多精确算术到从单个研究人员工作站到大型超级计算机的系统提供了显着的性能提高。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Charles Weems其他文献
DPF-ECC: A Framework for Efficient ECC With Double Precision Floating-Point Computing Power
DPF-ECC:具有双精度浮点计算能力的高效 ECC 框架
- DOI:
10.1109/tifs.2021.3098987 - 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
高莉莉;郑昉昱;魏荣;董建阔;Niall Emmart;马原;林璟锵;Charles Weems - 通讯作者:
Charles Weems
Retrorectus prosthetic mesh repair of midline abdominal hernia.
直肠后肌假体网片修复腹部中线疝。
- DOI:
- 发表时间:
1997 - 期刊:
- 影响因子:3
- 作者:
D. Mclanahan;L. King;Charles Weems;Michael L. Novotney;K. Gibson - 通讯作者:
K. Gibson
The smallest eigenvalue of large Hankel matrices
- DOI:
10.1016/j.amc.2018.04.012 - 发表时间:
2018-10-01 - 期刊:
- 影响因子:
- 作者:
Mengkun Zhu;Yang Chen;Niall Emmart;Charles Weems - 通讯作者:
Charles Weems
Charles Weems的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Charles Weems', 18)}}的其他基金
Collaborative Research:CyberTraining:Implementation:Medium: Modern Course Exemplars infused with Parallel and Distributed Computing for the Introductory Computing Course Sequence
协作研究:网络培训:实施:中:为入门计算课程序列注入并行和分布式计算的现代课程范例
- 批准号:
2321016 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research:CyberTraining: Implementation: Medium:Broadening Adoption of Parallel and Distributed Computing in Undergraduate Computer Science and Engineering Curricula
协作研究:网络培训:实施:中:在本科计算机科学与工程课程中扩大并行和分布式计算的采用
- 批准号:
2017427 - 财政年份:2020
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research:CyberTraining:Conceptualization: Planning a Sustainable Ecosystem for Incorporating Parallel and Distributed Computing into Undergraduate Education
合作研究:网络培训:概念化:规划可持续生态系统,将并行和分布式计算纳入本科教育
- 批准号:
1924023 - 财政年份:2019
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: CyberTraining: CDL: Preparing Instructors to Offer Experimental Courses in an Updated PDC Curriculum, and Broadening Participation
协作研究:网络培训:CDL:准备教师在更新的 PDC 课程中提供实验课程,并扩大参与范围
- 批准号:
1730527 - 财政年份:2017
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Workshop for Updating and Broadening the Parallel and Distributed Computing Curriculum in Undergraduate Education; Arlington, VA, August 17-18, 2015
更新和扩展本科教育并行和分布式计算课程研讨会;
- 批准号:
1546086 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: Solving the Problems of Scalability and Portability while Maximizing Performance of Multiprecision Scalar and Vector Arithmetic on Clusters of GPUs
SHF:小型:解决可扩展性和可移植性问题,同时最大限度地提高 GPU 集群上多精度标量和矢量算术的性能
- 批准号:
1525754 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
EAGER: Collaborative Research: Developing a Parallel and Distributed Computing Concepts Curriculum Enhancement for the Computer Science Principles Course
EAGER:协作研究:为计算机科学原理课程开发并行和分布式计算概念课程增强
- 批准号:
1550794 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
Collaborative Research: CI-ADDO-NEW: Parallel and Distributed Computing Curriculum Development and Educational Resources
合作研究:CI-ADDO-NEW:并行和分布式计算课程开发和教育资源
- 批准号:
1205492 - 财政年份:2012
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
相似国自然基金
SERT-nNOS蛋白相互作用的结构基础及其小分子互作抑制剂的设计、合成及快速抗抑郁活性研究
- 批准号:82373728
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
APOE调控小胶质细胞脂代谢模式在ASD认知和社交损伤中的作用及机制研究
- 批准号:82373597
- 批准年份:2023
- 资助金额:49 万元
- 项目类别:面上项目
小胶质细胞外泌体通过miR-486抑制神经元铁死亡介导电针修复脊髓损伤的机制研究
- 批准号:82360454
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
CUL4B正反馈调控FOXO3a-FOXM1通路促进非小细胞肺癌放疗抵抗的机制研究
- 批准号:82360584
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
葡萄糖饥饿条件下AMPK-CREB-PPA1信号通路促进非小细胞肺癌细胞增殖的分子机制研究
- 批准号:82360518
- 批准年份:2023
- 资助金额:32 万元
- 项目类别:地区科学基金项目
相似海外基金
SHF: Small: Efficient, Deterministic and Formally Certified Methods for Solving Low-dimensional Linear Programs with Floating-point Precision
SHF:小型:用于以浮点精度求解低维线性程序的高效、确定性且经过正式认证的方法
- 批准号:
2312220 - 财政年份:2023
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: Solving the Parallel Functional Programming Challenge
SHF:小型:解决并行函数式编程挑战
- 批准号:
2115104 - 财政年份:2021
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: MaPaMaP: Massively Parallel Solving of Math Problems
SHF:小型:MaPaMaP:数学问题的大规模并行解决
- 批准号:
2006363 - 财政年份:2019
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: MaPaMaP: Massively Parallel Solving of Math Problems
SHF:小型:MaPaMaP:数学问题的大规模并行解决
- 批准号:
1813993 - 财政年份:2018
- 资助金额:
$ 45万 - 项目类别:
Standard Grant
SHF: Small: Solving the Problems of Scalability and Portability while Maximizing Performance of Multiprecision Scalar and Vector Arithmetic on Clusters of GPUs
SHF:小型:解决可扩展性和可移植性问题,同时最大限度地提高 GPU 集群上多精度标量和矢量算术的性能
- 批准号:
1525754 - 财政年份:2015
- 资助金额:
$ 45万 - 项目类别:
Standard Grant