Security of modern Deep Neural Networks (DNNs) is under severe scrutiny as the deployment of these models become widespread in many intelligence-based applications. Most recently, DNNs are attacked through Trojan which can effectively infect the model during the training phase and get activated only through specific input patterns (i.e, trigger) during inference. In this work, for the first time, we propose a novel Targeted Bit Trojan(TBT) method, which can insert a targeted neural Trojan into a DNN through bit-flip attack. Our algorithm efficiently generates a trigger specifically designed to locate certain vulnerable bits of DNN weights stored in main memory (i.e., DRAM). The objective is that once the attacker flips these vulnerable bits, the network still operates with normal inference accuracy with benign input. However, when the attacker activates the trigger by embedding it with any input, the network is forced to classify all inputs to a certain target class. We demonstrate that flipping only several vulnerable bits identified by our method, using available bit-flip techniques (i.e, row-hammer), can transform a fully functional DNN model into a Trojan-infected model. We perform extensive experiments of CIFAR-10, SVHN and ImageNet datasets on both VGG-16 and Resnet-18 architectures. Our proposed TBT could classify 92% of test images to a target class with as little as 84 bit-flips out of 88 million weight bits on Resnet-18 for CIFAR10 dataset.(1)
随着现代深度神经网络(DNN)在许多基于智能的应用中广泛部署,其安全性受到严格审视。最近,DNN受到木马攻击,木马可在训练阶段有效感染模型,并且仅在推理过程中通过特定输入模式(即触发器)被激活。在这项工作中,我们首次提出一种新颖的靶向比特木马(TBT)方法,该方法可通过比特翻转攻击将靶向神经木马插入DNN。我们的算法有效地生成一个触发器,专门用于定位存储在主存储器(即DRAM)中的DNN权重的某些易受攻击的比特。目标是一旦攻击者翻转这些易受攻击的比特,网络在良性输入下仍能以正常的推理精度运行。然而,当攻击者通过将触发器嵌入任何输入来激活它时,网络会被迫将所有输入分类到某个目标类别。我们证明,使用现有的比特翻转技术(即行锤)仅翻转我们的方法所识别的几个易受攻击的比特,就可以将一个功能完整的DNN模型转变为一个被木马感染的模型。我们在VGG - 16和Resnet - 18架构上对CIFAR - 10、SVHN和ImageNet数据集进行了大量实验。对于CIFAR10数据集,我们提出的TBT在Resnet - 18的8800万个权重比特中仅翻转84个比特,就可以将92%的测试图像分类到目标类别。