We describe an efficient contour generation scheme implemented on a general purpose graphics processing unit (GPU) which extracts multiple contours in parallel from a high resolution multivalued image or depth map. In this paper, we describe an extension of a two stage crack based contour generation on GPU whose performance is hampered by horizontal dependencies generated when a contour crosses a horizontal scan line multiple times. We process blocks of pixels from a scan line for every contour, and rather than allowing these dependencies to unnecessarily cross block boundaries, we detect when the blocks terminate within a block and ‘start again’ in a subsequent block. In practice, this solution reduces the effect of these dependencies and results in a significant performance increase. Although some dependencies are not eliminated, on typical real images, this approach is 2 × faster than the previous variant of our algorithm, because the ‘pathological’ situations where the dependency must block the next thread are not observed in every scan line for every contour. Thus, it allows us to generate up to 128 contours from a high resolution (up to 2 Megapixels) multivalued real images in less than 20ms on GTX 980 GPU.
我们描述了一种在通用图形处理单元(GPU)上实现的高效轮廓生成方案,该方案从高分辨率多值图像或深度图中并行提取多个轮廓。在本文中,我们描述了在GPU上基于两阶段裂缝的轮廓生成的一种扩展,其性能因轮廓多次穿过水平扫描线时产生的水平依赖关系而受到阻碍。我们为每个轮廓处理来自扫描线的像素块,并且不是让这些依赖关系不必要地跨越块边界,而是检测块何时在一个块内终止,并在后续块中“重新开始”。在实践中,这种解决方案降低了这些依赖关系的影响,并导致性能显著提高。尽管一些依赖关系没有被消除,但在典型的真实图像上,这种方法比我们算法的先前变体快2倍,因为对于每个轮廓,在每条扫描线上并非都会出现依赖关系必须阻塞下一个线程的“病态”情况。因此,它使我们能够在GTX 980 GPU上在不到20毫秒的时间内从高分辨率(高达200万像素)的多值真实图像中生成多达128个轮廓。