Hypergraphs can capture higher-order relations between subsets of objects instead of only pairwise relations as in graphs. Hypergraph clustering is an important task in information retrieval and machine learning. We study the problem of distributed hypergraph clustering in the message passing communication model using small communication cost. We propose an algorithm framework for distributed hypergraph clustering based on spectral hypergraph sparsification. For an n-vertex hypergraph G with hyperedges of maximum size r distributed at s sites arbitrarily and a parameter ε∈ (0,1), our algorithm can produce a vertex set with conductance O(√1+ε/1-ε √φG), where φG is the conductance of G, using communication cost ~O(nr2s/εO(1)) (~O hides a polylogarithmic factor). The theoretical results are complemented with extensive experiments to demonstrate the efficiency and effectiveness of the proposed algorithm under different real-world datasets. Our source code is publicly available at github.com/chunjiangzhu/dhgc.
超图能够捕捉对象子集之间的高阶关系,而不像图中那样仅仅是成对关系。超图聚类是信息检索和机器学习中的一项重要任务。我们在消息传递通信模型中以较小的通信成本研究分布式超图聚类问题。我们提出了一个基于谱超图稀疏化的分布式超图聚类算法框架。对于一个在\(s\)个站点任意分布、最大超边大小为\(r\)的\(n\)个顶点的超图\(G\)以及一个参数\(\varepsilon\in(0,1)\),我们的算法能够生成一个具有电导率\(O(\sqrt{\frac{1 + \varepsilon}{1 - \varepsilon}}\sqrt{\varphi_G})\)的顶点集,其中\(\varphi_G\)是\(G\)的电导率,使用的通信成本约为\(O(nr^{2}s / \varepsilon^{O(1)})\)(\(\sim O\)隐藏了一个多项对数因子)。理论结果通过大量实验得到补充,以证明所提出的算法在不同的真实数据集下的效率和有效性。我们的源代码可在github.com/chunjiangzhu/dhgc公开获取。