喵ID:476VHG免责声明

GraphSMOTE: Imbalanced Node Classification on Graphs with Graph Neural Networks

基本信息

DOI:
10.1145/3437963.3441720
发表时间:
2021-03
期刊:
Proceedings of the 14th ACM International Conference on Web Search and Data Mining
影响因子:
--
通讯作者:
Tianxiang Zhao;Xiang Zhang;Suhang Wang
中科院分区:
其他
文献类型:
--
作者: Tianxiang Zhao;Xiang Zhang;Suhang Wang研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

Node classification is an important research topic in graph learning. Graph neural networks (GNNs) have achieved state-of-the-art performance of node classification. However, existing GNNs address the problem where node samples for different classes are balanced; while for many real-world scenarios, some classes may have much fewer instances than others. Directly training a GNN classifier in this case would under-represent samples from those minority classes and result in sub-optimal performance. Therefore, it is very important to develop GNNs for imbalanced node classification. However, the work on this is rather limited. Hence, we seek to extend previous imbalanced learning techniques for i.i.d data to the imbalanced node classification task to facilitate GNN classifiers. In particular, we choose to adopt synthetic minority over-sampling algorithms, as they are found to be the most effective and stable. This task is non-trivial, as previous synthetic minority over-sampling algorithms fail to provide relation information for newly synthesized samples, which is vital for learning on graphs. Moreover, node attributes are high-dimensional. Directly over-sampling in the original input domain could generates out-of-domain samples, which may impair the accuracy of the classifier. We propose a novel framework, \method, in which an embedding space is constructed to encode the similarity among the nodes. New samples are synthesize in this space to assure genuineness. In addition, an edge generator is trained simultaneously to model the relation information, and provide it for those new samples. This framework is general and can be easily extended into different variations. The proposed framework is evaluated using three different datasets, and it outperforms all baselines with a large margin.
节点分类是图学习中的一个重要研究课题。图神经网络(GNNs)在节点分类方面取得了最先进的性能。然而,现有的GNNs解决的是不同类别节点样本平衡的问题;而在许多现实场景中,某些类别可能比其他类别实例少得多。在这种情况下直接训练GNN分类器会导致少数类样本代表性不足,从而导致性能欠佳。因此,开发用于不平衡节点分类的GNNs是非常重要的。然而,这方面的工作相当有限。因此,我们试图将先前针对独立同分布数据的不平衡学习技术扩展到不平衡节点分类任务,以促进GNN分类器的性能。特别是,我们选择采用合成少数类过采样算法,因为它们被发现是最有效和稳定的。这项任务并非微不足道,因为先前的合成少数类过采样算法无法为新合成的样本提供关系信息,而这对于图学习至关重要。此外,节点属性是高维的。在原始输入域中直接过采样可能会产生域外样本,这可能会损害分类器的准确性。我们提出了一个新颖的框架——\method,在该框架中构建了一个嵌入空间来编码节点之间的相似性。在这个空间中合成新样本以确保其真实性。此外,同时训练一个边生成器来对关系信息进行建模,并将其提供给那些新样本。这个框架是通用的,可以很容易地扩展为不同的变体。所提出的框架使用三个不同的数据集进行了评估,并且它大幅优于所有基线。
参考文献(44)
被引文献(191)

数据更新时间:{{ references.updateTime }}

Tianxiang Zhao;Xiang Zhang;Suhang Wang
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓