The 3D FFT is critical in many physical simulations and image processing applications. On FPGAs, however, the 3D FFT was thought to be inefficient relative to other methods such as convolution-based implementations of multi-grid. We find the opposite: a simple design, operating at a conservative frequency, takes 4μs for 163, 21μs for 323, and 215μs for 643 single precision data points. The first two of these compare favorably with the 25μs and 29μs obtained running on a current Nvidia GPU. Some broader significance is that this is a critical piece in implementing a large scale FPGA-based MD engine: even a single FPGA is capable of keeping the FFT off of the critical path for a large fraction of possible MD simulations.
三维快速傅里叶变换(3D FFT)在许多物理模拟和图像处理应用中至关重要。然而,在现场可编程门阵列(FPGA)上,人们认为三维快速傅里叶变换相对于其他方法(例如基于卷积的多重网格实现)效率低下。我们却发现情况相反:一个简单的设计,以保守的频率运行,对于16³个单精度数据点需要4微秒,对于32³个需要21微秒,对于64³个需要215微秒。前两个数据与在当前英伟达图形处理器(GPU)上运行所得到的25微秒和29微秒相比具有优势。更广泛的意义在于,这是实现大规模基于FPGA的分子动力学(MD)引擎的关键部分:即使是单个FPGA也能够使快速傅里叶变换在大部分可能的分子动力学模拟中不处于关键路径上。