This work studies the porting and optimization of the tensor network simulator QTensor on GPUs, with the ultimate goal of simulating quantum circuits efficiently at scale on large GPU supercomputers. We implement NumPy, PyTorch, and CuPy backends and benchmark the codes to find the optimal allocation of tensor simulations to either a CPU or a GPU. We also present a dynamic mixed backend to achieve optimal performance. To demonstrate the performance, we simulate QAOA circuits for computing the MaxCut energy expectation. Our method achieves 176× speedup on a GPU over the NumPy baseline on a CPU for the benchmarked QAOA circuits to solve MaxCut problem on a 3-regular graph of size 30 with depth p = 4.
这项工作研究了张量网络模拟器QTensor在GPU上的移植和优化,其最终目标是在大型GPU超级计算机上高效地大规模模拟量子电路。我们实现了NumPy、PyTorch和CuPy后端,并对代码进行基准测试,以找到张量模拟在CPU或GPU上的最优分配。我们还提出了一种动态混合后端以实现最佳性能。为了展示性能,我们模拟了用于计算最大割能量期望的量子近似优化算法(QAOA)电路。对于在规模为30的3 - 正则图上深度p = 4的用于解决最大割问题的基准QAOA电路,我们的方法在GPU上相对于CPU上的NumPy基准实现了176倍的加速。