With the gradual development of the network, RDF graphs have become more and more complex as the scale of data increases; how to perform more effective query for massive RDF graphs is a hot topic of continuous research. The traditional methods of graph query and graph traversal produce great redundancy of intermediate results, and processing subgraph collection queries in stand-alone mode cannot perform efficient matching when the amount of data is extremely large. Moreover, when processing subgraph collection queries, it is necessary to iterate the query graph multiple times in the query of the common subgraph, and the execution efficiency is not high. In response to the above problems, a distributed query strategy of RDF subgraph set based on composite relation tree is proposed. Firstly, a corresponding composite relationship is established for RDF subgraph set, then the composite relation graph is clipped, and the redundant nodes and edges of the composite relation graph are deleted to obtain the composite relation tree. Finally, using the composite relation tree, a MapReduce-based RDF subgraph set query method is proposed, which can use parallel in the computing environment, the distributed query batch processing is performed on the RDF subgraph set, and the query result of the RDF subgraph set is obtained by traversing the composite relation tree. The experimental results show that the algorithm proposed in this paper can improve the query efficiency of RDF subgraph set.
随着网络的逐步发展,RDF图随着数据规模的增大变得越来越复杂;如何对大规模RDF图进行更有效的查询是一个持续研究的热点话题。传统的图查询和图遍历方法会产生大量中间结果冗余,并且在单机模式下处理子图集合查询在数据量极大时无法进行高效匹配。此外,在处理子图集合查询时,在公共子图的查询中需要对查询图进行多次迭代,执行效率不高。针对上述问题,提出了一种基于复合关系树的RDF子图集分布式查询策略。首先,为RDF子图集建立相应的复合关系,然后对复合关系图进行裁剪,删除复合关系图中的冗余节点和边以得到复合关系树。最后,利用复合关系树,提出了一种基于MapReduce的RDF子图集查询方法,该方法能够在计算环境中利用并行性,对RDF子图集进行分布式查询批处理,并通过遍历复合关系树得到RDF子图集的查询结果。实验结果表明,本文提出的算法能够提高RDF子图集的查询效率。