Despite recent studies on understanding deep neural networks (DNNs), there exists numerous questions on how DNNs generate their predictions. Especially, given similar predictions on different input samples, are the underlying mechanisms generating those predictions the same? In this work, we propose NeuCEPT, a method to locally discover critical neurons that play a major role in the model's predictions and identify model's mechanisms in generating those predictions. We first formulate a critical neurons identification problem as maximizing a sequence of mutual-information objectives and provide a theoretical framework to efficiently solve for critical neurons while keeping the precision under control. NeuCEPT next heuristically learns different model's mechanisms in an unsupervised manner. Our experimental results show that neurons identified by NeuCEPT not only have strong influence on the model's predictions but also hold meaningful information about model's mechanisms.
尽管近期有关于理解深度神经网络(DNN)的研究,但关于DNN如何生成其预测仍存在许多问题。特别是,对于不同的输入样本给出相似的预测时,生成这些预测的潜在机制是否相同?在这项工作中,我们提出了NeuCEPT,一种局部发现对模型预测起主要作用的关键神经元并识别模型生成这些预测的机制的方法。我们首先将关键神经元识别问题表述为最大化一系列互信息目标,并提供一个理论框架,以便在控制精度的同时有效地求解关键神经元。接下来,NeuCEPT以无监督的方式启发式地学习不同的模型机制。我们的实验结果表明,由NeuCEPT识别出的神经元不仅对模型的预测有很强的影响,而且还包含有关模型机制的有意义的信息。