An unprecedented growth in data generation is taking place. Data about larger dynamic systems is being accumulated, capturing finer granularity events, and thus processing requirements are increasingly approaching real-time. To keep up, data-analytics pipelines need to be viable at massive scale, and switch away from static, offline scenarios to support fully online analysis of dynamic systems. This paper uses a challenge problem, graph colouring, to explore massive-scale analytics for dynamic graph processing. We present an event-based infrastructure, and a novel, online, distributed graph colouring algorithm. Our implementation for colouring static graphs, used as a performance baseline, is up to an order of magnitude faster than previous results and handles massive graphs with over 257 billion edges. Our framework supports dynamic graph colouring with performance at large scale better than GraphLab's static analysis. Our experience indicates that online solutions are feasible, and can be more efficient than those based on snapshotting.
数据生成正出现前所未有的增长。有关更大动态系统的数据正在积累,能够捕捉更精细的粒度事件,因此处理要求越来越接近实时性。为了跟上步伐,数据分析管道需要在大规模下可行,并从静态、离线场景转变,以支持对动态系统的完全在线分析。本文利用一个具有挑战性的问题——图着色,来探索动态图处理的大规模分析。我们提出了一种基于事件的基础设施,以及一种新颖的在线分布式图着色算法。我们用于给静态图着色的实现(用作性能基准)比之前的结果快一个数量级,并能处理具有超过2570亿条边的大规模图。我们的框架支持动态图着色,在大规模下的性能优于GraphLab的静态分析。我们的经验表明,在线解决方案是可行的,并且可能比基于快照的解决方案更高效。