Real-time data processing frameworks like S4 and Flume have become scalable and reliable solutions for acquiring, moving, and processing voluminous amounts of data continuously produced by large numbers of online sources. Yet these frameworks lack the elasticity to horizontally scale-up or scale-down their based on current rates of input events and desired event processing latencies. The Project Hoover middleware provides distributed methods for measuring, aggregating, and analyzing the performance of distributed Flume components, thereby enabling online configuration changes to meet varying processing demands. Experimental evaluations with a sample Flume data processing code show Hoover's approach to be capable of dynamically and continuously monitoring Flume performance, demonstrating that such data can be used to right-size the number of Flume collectors according to different log production rates.
S4和Flume(例如S4和FLUME)等实时数据处理框架已成为可扩展可靠的解决方案,用于获取,移动和处理大量的大量在线资源的数据。然而,这些框架缺乏基于当前输入事件和所需事件处理潜伏期的当前速度来水平扩展或扩大其对其进行扩展的弹性。项目胡佛中间件项目提供了分布式方法,用于测量,汇总和分析分布式水槽组件的性能,从而实现在线配置更改以满足不同的处理需求。使用样品水槽数据处理代码进行的实验评估表明,胡佛的方法能够动态和连续监视水槽性能,表明该数据可根据不同的日志生产率来右键大小的水槽收集器数量。