Graph Neural Network Inference on Multi-FPGA Clusters

多 FPGA 集群上的图神经网络推理

基本信息

  • 批准号:
    2894270
  • 负责人:
  • 金额:
    --
  • 依托单位:
  • 依托单位国家:
    英国
  • 项目类别:
    Studentship
  • 财政年份:
    2023
  • 资助国家:
    英国
  • 起止时间:
    2023 至 无数据
  • 项目状态:
    未结题

项目摘要

Neural networks have been widely deployed to achieve state-of-the-art performance in tasks within various domains, such as in image classification, machine translation, and text generation. Such models are typically executed on Graphical Processing Units (GPU), which are widely commercially available, and offer large performance improvements over general-purpose computers due to their deeply parallelized architecture.With increasing complexity in cutting edge models, GPUs have shown a performance limitation due to expensive data management mechanisms. In particular, low-latency applications such as in high-energy physics or autonomous vehicles show the need for custom hardware to achieve sub-microsecond computation. Field-Programmable Gate Arrays (FPGA) are a class of integrated circuit which are well capable of meeting these requirements due to their reconfigurable fabric, and have been shown to achieve up to 10x latency and throughput improvements over GPU counterparts, with orders of magnitude lower power consumption. Additionally, FPGAs provide the flexibility to perform fine-grained optimizations in the network implementation, due to their reconfigurability.In recent times, Graph Neural Networks (GNNs) have attracted great attention due to their classification performance on non-Euclidean data, such as in social networks, drug discovery and recommendation systems. FPGA acceleration proves particularly beneficial for GNNs given their irregular memory access patterns, resulting from the sparse structure of graphs. These unique compute requirements have been addressed by several FPGA accelerators in the literature. Despite the benefits of inference on reconfigurable logic, high-end FPGAs are still limited by resource availability on-chip. This challenge can be addressed by FPGA clusters connecting multiple devices through high-speed interconnects. This offers the ability to scale inference performance approximately linearly with the number of devices connected in the network. This approach has been explored in the literature to accelerate Convolutional Neural Networks (CNN), through an exploration of dedicated layer partitioning approaches.Although this method has proved effective for CNN acceleration, GNNs offer an unexplored problem setting. GNNs have shown an inherently shallower structure than CNNs since the number of layers corresponds to the number of neighbours through which features propagate. As such, my research aims to demonstrate that GNN inference on FPGA clusters benefits most from partitioning in the graph rather than layer dimension.Several graph partitioning approaches have been proposed in the literature; a naïve approach involves splitting the adjacency matrix into regular node intervals. Alternatively, dynamic sliding-window based approaches consider the graph data, leading to denser partitions and higher spatial locality. In real-time applications, the latency of this pre-processing step needs to be traded-off against the added throughput in node feature transformations per layer. With any given partitioning scheme, a distributed node transformation engine requires careful consideration of data coherency, a classic problem in computer architecture. The distribution of feature updates across several devices with dedicated memory components shows the need for "residual" connections between devices such that messages can be computed. Various hardware optimisations could then be explored to limit the overhead of intra-device communication.In conclusion, as the demand for efficient hardware acceleration grows beyond traditional GPUs, FPGAs present a compelling solution. However, scalability challenges in high-end FPGAs prompt the exploration of FPGA clusters. For GNNs, the proposal to shift from layer to graph partitioning in FPGA clusters shows promise, but refining partitioning strategies and addressing data coherency are critical for unlocking the full potential
神经网络已被广泛部署,以在图像分类、机器翻译和文本生成等各个领域的任务中实现最先进的性能,此类模型通常在图形处理单元 (GPU) 上执行。广泛商用,并且由于其深度并行架构而比通用计算机提供了巨大的性能改进。随着尖端模型的复杂性不断增加,GPU 由于昂贵的数据管理机制而表现出性能限制,特别是低延迟应用程序。如高能物理学或自主物理学车辆表明需要定制硬件来实现亚微秒计算。现场可编程门阵列 (FPGA) 是一类集成电路,由于其可重构结构,能够很好地满足这些要求,并且已被证明可以实现高达与 GPU 积分相比,延迟和吞吐量提高了 10 倍,功耗降低了几个数量级。此外,由于其可重新配置性,FPGA 提供了在网络实现中执行细粒度优化的灵活性。神经网络 (GNN) 因其对非欧几里得数据的分类性能而受到广泛关注,例如在社交网络、药物发现和推荐系统中,鉴于稀疏性导致的不规则内存访问模式,FPGA 加速对 GNN 特别有益。尽管可重构逻辑推理有很多好处,但高端 FPGA 仍然受到片上资源可用性的限制。通过高速互连连接多个设备的 FPGA 集群可以解决这一挑战,这提供了与网络中连接的设备数量近似线性扩展的能力,这种方法已在文献中进行了探讨,以加速卷积神经网络。 CNN),通过对专用层划分方法的探索。尽管这种方法已被证明对 CNN 加速有效,但 GNN 提供了一个未探索的问题设置,因为层数对应于 CNN,所以 GNN 表现出本质上更浅的结构。因此,我的研究旨在证明 FPGA 集群上的 GNN 推理最受益于图的划分,而不是层维度。文献中已经提出了几种涉及分割的图划分方法;或者,基于动态滑动窗口的方法考虑图数据,导致更密集的分区和更高的空间局部性。在实时应用中,该预处理步骤的延迟需要降低。对于任何给定的分区方案,分布式节点转换引擎都需要仔细考虑数据一致性,这是计算机体系结构中特征更新在多个专用设备上的分布。内存组件表明设备之间需要“剩余”连接,以便可以计算消息,从而可以探索各种硬件优化来限制设备内通信的开销。 总之,随着对高效硬件加速的需求不断增长,超出了传统 GPU 的范围。 , FPGA然而,高端 FPGA 的可扩展性挑战促使人们对 FPGA 集群进行探索,在 FPGA 集群中从层分区转向图分区的提议显示出了希望,但完善分区策略和解决数据一致性问题至关重要。释放全部潜力

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

其他文献

リンの回収方法および回収装置
磷的回收方法及回收装置
  • DOI:
  • 发表时间:
    2009
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
ホームページ等
主页等
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
20世紀前半のフィリピン降水量データセット作成(DIAS地球観測データ統合解析プロダクトに掲載)
菲律宾20世纪上半叶降水数据集创建(发表于DIAS对地观测数据综合分析产品)
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
浅沼順
浅沼纯
  • DOI:
  • 发表时间:
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:
陽極酸化アルミナの製造方法、陽極酸化アルミナ、および高密度構造体
制造阳极氧化铝的方法、阳极氧化铝和致密结构
  • DOI:
  • 发表时间:
    2008
  • 期刊:
  • 影响因子:
    0
  • 作者:
  • 通讯作者:

的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('', 18)}}的其他基金

An implantable biosensor microsystem for real-time measurement of circulating biomarkers
用于实时测量循环生物标志物的植入式生物传感器微系统
  • 批准号:
    2901954
  • 财政年份:
    2028
  • 资助金额:
    --
  • 项目类别:
    Studentship
Exploiting the polysaccharide breakdown capacity of the human gut microbiome to develop environmentally sustainable dishwashing solutions
利用人类肠道微生物群的多糖分解能力来开发环境可持续的洗碗解决方案
  • 批准号:
    2896097
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Field Assisted Sintering of Nuclear Fuel Simulants
核燃料模拟物的现场辅助烧结
  • 批准号:
    2908917
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Development of a new solid tritium breeder blanket
新型固体氚增殖毯的研制
  • 批准号:
    2908923
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Landscapes of Music: The more-than-human lives and politics of musical instruments
音乐景观:超越人类的生活和乐器的政治
  • 批准号:
    2889655
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Cosmological hydrodynamical simulations with calibrated non-universal initial mass functions
使用校准的非通用初始质量函数进行宇宙流体动力学模拟
  • 批准号:
    2903298
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Proton, alpha and gamma irradiation assisted stress corrosion cracking: understanding the fuel-stainless steel interface
质子、α 和 γ 辐照辅助应力腐蚀开裂:了解燃料-不锈钢界面
  • 批准号:
    2908693
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Understanding the interplay between the gut microbiome, behavior and urbanisation in wild birds
了解野生鸟类肠道微生物组、行为和城市化之间的相互作用
  • 批准号:
    2876993
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
Likelihood and impact of severe space weather events on the resilience of nuclear power and safeguards monitoring.
严重空间天气事件对核电和保障监督的恢复力的可能性和影响。
  • 批准号:
    2908918
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship
A Robot that Swims Through Granular Materials
可以在颗粒材料中游动的机器人
  • 批准号:
    2780268
  • 财政年份:
    2027
  • 资助金额:
    --
  • 项目类别:
    Studentship

相似国自然基金

非线性的可编程超表面衍射神经网络
  • 批准号:
    62301147
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
基于物理信息神经网络的雷达回波资料反演蒸发波导算法研究
  • 批准号:
    42305048
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
分片线性神经网络中神经元的低功耗激活方法研究
  • 批准号:
    62303472
  • 批准年份:
    2023
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
眼睑肌阵挛癫痫合眼敏感的神经网络环路机制研究
  • 批准号:
    82372033
  • 批准年份:
    2023
  • 资助金额:
    50 万元
  • 项目类别:
    面上项目
养阴益气方与活血方配伍调节HPA轴-Th17细胞-神经元PANoptosis网络抗脑缺血机制研究
  • 批准号:
    82330120
  • 批准年份:
    2023
  • 资助金额:
    220 万元
  • 项目类别:
    重点项目

相似海外基金

Vessel Identification and Tracing in DSA Image Series for Cerebrovascular Surgical Planning
用于脑血管手术计划的 DSA 图像系列中的血管识别和追踪
  • 批准号:
    10726103
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Enabling AI-based Mouse Genetic Discovery
实现基于人工智能的小鼠基因发现
  • 批准号:
    10724522
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Evaluating the utility of cis-regulatory element graphs for modeling gene regulation
评估顺式调控元件图在基因调控建模中的效用
  • 批准号:
    10776793
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Developmental Pathophysiology of Adverse Patterns of Substance Use in Adolescents with Anxiety
焦虑青少年不良物质使用模式的发育病理生理学
  • 批准号:
    10566213
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
Data-driven modeling of the vibrational spectroscopy of ion channels
离子通道振动光谱的数据驱动建模
  • 批准号:
    10715048
  • 财政年份:
    2023
  • 资助金额:
    --
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了