Hypergraphs can capture higher-order relations between subsets of objects instead of only pairwise relations as in graphs. Hypergraph clustering is an important task in information retrieval and machine learning. We study the problem of distributed hypergraph clustering in the message passing communication model using small communication cost. We propose an algorithm framework for distributed hypergraph clustering based on spectral hypergraph sparsification. For an n-vertex hypergraph G with hyperedges of maximum size r distributed at s sites arbitrarily and a parameter ε∈ (0,1), our algorithm can produce a vertex set with conductance O(√1+ε/1-ε √φG), where φG is the conductance of G, using communication cost ~O(nr2s/εO(1)) (~O hides a polylogarithmic factor). The theoretical results are complemented with extensive experiments to demonstrate the efficiency and effectiveness of the proposed algorithm under different real-world datasets. Our source code is publicly available at github.com/chunjiangzhu/dhgc.
超图可以捕获对象子集之间的高阶关系,而不是像图中那样仅捕获成对关系。超图聚类是信息检索和机器学习中的一个重要研究课题。研究了在消息传递通信模型下,以较小的通信代价实现分布式超图聚类的问题。提出了一种基于谱超图稀疏化的分布式超图聚类算法框架。对于一个n点超图G,其最大超边r分布在s个点上,参数ε∈(0,1),我们的算法可以产生一个电导率为O(ε 1+ε/1-ε <$φG)的顶点集,其中φG是G的电导率,通信代价为O(nr 2s/εO(1))(O中隐藏了一个多对数因子).理论结果与大量的实验结果进行了补充,以证明该算法的效率和有效性在不同的现实世界的数据集。我们的源代码可在github.com/chunjiangzhu/dhgc上公开获取。