喵ID:yrkMir免责声明

A Domain-Oblivious Approach for Learning Concise Representations of Filtered Topological Spaces for Clustering

基本信息

DOI:
10.1109/tvcg.2021.3114872
发表时间:
2021-05
影响因子:
5.2
通讯作者:
Yuzhen Qin;Brittany Terese Fasy;C. Wenk;B. Summa
中科院分区:
计算机科学1区
文献类型:
--
作者: Yuzhen Qin;Brittany Terese Fasy;C. Wenk;B. Summa研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Persistence diagrams have been widely used to quantify the underlying features of filtered topological spaces in data visualization. In many applications, computing distances between diagrams is essential; however, computing these distances has been challenging due to the computational cost. In this paper, we propose a persistence diagram hashing framework that learns a binary code representation of persistence diagrams, which allows for fast computation of distances. This framework is built upon a generative adversarial network (GAN) with a diagram distance loss function to steer the learning process. Instead of using standard representations, we hash diagrams into binary codes, which have natural advantages in large-scale tasks. The training of this model is domain-oblivious in that it can be computed purely from synthetic, randomly created diagrams. As a consequence, our proposed method is directly applicable to various datasets without the need for retraining the model. These binary codes, when compared using fast Hamming distance, better maintain topological similarity properties between datasets than other vectorized representations. To evaluate this method, we apply our framework to the problem of diagram clustering and we compare the quality and performance of our approach to the state-of-the-art. In addition, we show the scalability of our approach on a dataset with 10k persistence diagrams, which is not possible with current techniques. Moreover, our experimental results demonstrate that our method is significantly faster with the potential of less memory usage, while retaining comparable or better quality comparisons.
持久图在数据可视化中已被广泛用于量化过滤拓扑空间的潜在特征。在许多应用中,计算图之间的距离至关重要;然而,由于计算成本,计算这些距离一直具有挑战性。在本文中,我们提出了一个持久图哈希框架,该框架学习持久图的二进制编码表示,从而能够快速计算距离。这个框架建立在具有图距离损失函数的生成对抗网络(GAN)之上,以引导学习过程。我们不是使用标准表示,而是将图哈希为二进制编码,这在大规模任务中具有天然优势。该模型的训练与领域无关,因为它可以完全从合成的、随机创建的图中进行计算。因此,我们提出的方法可直接应用于各种数据集,而无需重新训练模型。当使用快速汉明距离进行比较时,这些二进制编码比其他向量化表示更好地保持数据集之间的拓扑相似性。为了评估该方法,我们将我们的框架应用于图聚类问题,并将我们的方法与最先进的方法在质量和性能方面进行比较。此外,我们展示了我们的方法在一个具有10000个持久图的数据集中的可扩展性,这是当前技术无法做到的。而且,我们的实验结果表明,我们的方法速度明显更快,并且有可能占用更少的内存,同时在质量比较上保持相当或更好的水平。
参考文献(90)
被引文献(3)

数据更新时间:{{ references.updateTime }}

Yuzhen Qin;Brittany Terese Fasy;C. Wenk;B. Summa
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓