Study of High-speed Data Mining Algorithms from Massive Data Streams

海量数据流高速数据挖掘算法研究

基本信息

批准号：
15300036
负责人：
IKEDA Daisuke
金额：
$ 9.98万
依托单位：
KYUSHU UNIVERSITY (2003, 2005)Hokkaido University (2004)
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2003
资助国家：
日本
起止时间：
2003 至 2005
项目状态：
已结题

项目摘要

In this research, we investigated high-speed online knowledge discovery system for extracting useful information from massive semi-structured data streams. Particularly in this year, as theoretical researches, we extended further the theory of efficient pattern matching and pattern discovery methods for online streams. As application studies, we made a series of experiments on collection and analysis of network data from real high-speed networks in a huge organization. We have also published the results obtained in the research period of the last three years. In particular, we proceed the studies on the following issues:(1)Survey on semi-structured data : We have summarized and published a survey on stream data mining in an academic journal, which has been studied through this project for the last three years.(2)Study on streaming pattern matching technology for semi-structured data : We developed an efficient method for performing tree pattern matching with horizontal wildcards by bit parallel technology, which potentially gives drastic speed-up for Xpath and XQuery pattern matching languages for huge XML data.(3)Study on sequential and streaming pattern discovery technology for semi-structured data : We developed efficient algorithms for finding interesting patterns from massive data streams for various classes of complex patterns/motifs. In this year, we also published pattern discovery algorithms developed in the last year. Also, one of them got awarded for 2004 JSAI SIG AWARD.(4)Empirical study on knowledge discovery from real massive network data : As applications, we performed a series of surveys on data collection and online analysis of high-speed large-scale network for middle sized organization at Kyushu University. These experiments will give insights for future research on the development of efficient pattern matching/discovery algorithms for high-speed streaming data.

在这项研究中，我们研究了高速在线知识发现系统，用于从大量半结构数据流中提取有用的信息。特别是在今年，作为理论研究，我们进一步扩展了在线流的有效模式匹配和模式发现方法的理论。作为应用程序研究，我们对来自一个庞大组织中实际高速网络的网络数据收集和分析进行了一系列实验。我们还发布了过去三年研究期间获得的结果。特别是，我们对以下问题进行了研究：（1）对半结构化数据的调查：我们已经汇总并发布了一项关于流数据挖掘的调查，在一本学术期刊中，该项目在过去三年中通过该项目进行了研究。大量XML数据的模式匹配语言。（3）对半结构数据的顺序和流式图案发现技术的研究：我们开发了有效的算法，用于从大量数据流中查找各种复杂模式/图案类别的有趣模式。今年，我们还发布了去年开发的模式发现算法。此外，其中一位获得了2004年JSAI SIG奖。（4）从实际大规模网络数据中发现知识发现的实证研究：作为应用程序，我们对京都大学的中大型组织的高速大规模大规模网络进行了一系列调查。这些实验将为未来的研究提供有关高速流数据数据的有效模式匹配/发现算法的研究的见解。