With the unstoppable growth in data collection, data mining is playing an important role in the way massive data sets are analyzed. Trends clearly indicate that future decision making systems would weigh on even quicker and more reliable models used for data analysis. In order to achieve this, current algorithms and computing systems have to be optimized and tuned to effectively process the large volumes of raw data to be seen in future. In this paper, we present a brief overview of the current approaches and challenges faced in system design. The paper starts out by highlighting the uniqueness of data mining applications, which actually makes current “generic” system designs unsuitable for mining large data. Subsequently, we summarize the current innovations and efforts made by researchers to design systems to efficiently process data mining workloads.
随着数据收集的不可阻挡的增长,数据挖掘在大规模数据集的分析方式中起着重要作用。趋势明显表明,未来的决策系统将依赖用于数据分析的更快速、更可靠的模型。为了实现这一点,当前的算法和计算系统必须进行优化和调整,以便有效地处理未来将会出现的大量原始数据。在本文中,我们简要概述了系统设计中当前的方法以及面临的挑战。本文首先强调了数据挖掘应用的独特性,这实际上使得当前的“通用”系统设计不适合挖掘大数据。随后,我们总结了研究人员为设计能有效处理数据挖掘工作负载的系统所做出的当前创新和努力。