Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. Our theoretical analysis showed that scoring methods with restricted value ranges, such as TransE and RotatE, require appropriate adjustment of the margin term or the number of negative samples different from those without restricted value ranges, such as RESCAL, ComplEx, and DistMult. We also propose subsampling methods specialized for the NS loss in KGE studied from a theoretical aspect. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
负采样(NS)损失在学习知识图谱嵌入(KGE)以处理大量实体方面起着重要作用。然而,如果没有恰当地选择如NS损失中的边际项和负样本数量等超参数,KGE的性能就会下降。目前,经验性的超参数调整是以计算时间为代价来解决这个问题的。为了解决这个问题,我们从理论上分析了NS损失,以辅助超参数调整,并理解在KGE学习中更好地使用NS损失。我们的理论分析表明,像TransE和RotatE这样取值范围受限的评分方法,需要对边际项或负样本数量进行适当调整,这与像RESCAL、ComplEx和DistMult这些取值范围不受限的方法不同。我们还从理论角度提出了专门针对KGE中NS损失的子采样方法。我们在FB15k - 237、WN18RR和YAGO3 - 10数据集上的实证分析表明,实际训练模型的结果与我们的理论发现相符。