Graph neural networks (GNNs) have shown great power in modeling graph structured data. However, similar to other machine learning models, GNNs may make predictions biased on protected sensitive attributes, e.g., skin color and gender. Because machine learning algorithms including GNNs are trained to reflect the distribution of the training data which often contains historical bias towards sensitive attributes. In addition, the discrimination in GNNs can be magnified by graph structures and the message-passing mechanism. As a result, the applications of GNNs in sensitive domains such as crime rate prediction would be largely limited. Though extensive studies of fair classification have been conducted on i.i.d data, methods to address the problem of discrimination on non-i.i.d data are rather limited. Furthermore, the practical scenario of sparse annotations in sensitive attributes is rarely considered in existing works. Therefore, we study the novel and important problem of learning fair GNNs with limited sensitive attribute information. FairGNN is proposed to eliminate the bias of GNNs whilst maintaining high node classification accuracy by leveraging graph structures and limited sensitive information. Our theoretical analysis shows that FairGNN can ensure the fairness of GNNs under mild conditions given limited nodes with known sensitive attributes. Extensive experiments on real-world datasets also demonstrate the effectiveness of FairGNN in debiasing and keeping high accuracy.
图神经网络(GNNs)在对图结构数据进行建模方面展现出了强大的能力。然而,与其他机器学习模型类似,图神经网络可能会在受保护的敏感属性(例如肤色和性别)上产生有偏差的预测。因为包括图神经网络在内的机器学习算法是根据训练数据的分布进行训练的,而训练数据往往包含对敏感属性的历史偏差。此外,图神经网络中的歧视可能会被图结构和消息传递机制放大。因此,图神经网络在诸如犯罪率预测等敏感领域的应用将受到很大限制。尽管在独立同分布(i.i.d)数据上已经对公平分类进行了大量研究,但解决非独立同分布数据歧视问题的方法却相当有限。此外,现有工作很少考虑敏感属性中稀疏标注的实际情况。因此,我们研究了在敏感属性信息有限的情况下学习公平图神经网络这一新颖且重要的问题。我们提出了FairGNN,通过利用图结构和有限的敏感信息来消除图神经网络的偏差,同时保持较高的节点分类准确率。我们的理论分析表明,在给定有限的已知敏感属性节点的温和条件下,FairGNN能够确保图神经网络的公平性。在真实数据集上进行的大量实验也证明了FairGNN在消除偏差和保持高精度方面的有效性。