Recent advancements in Deep Neural Networks (DNNs) have enabled widespread deployment in multiple security-sensitive domains. The need for resource-intensive training and the use of valuable domain-specific training data have made these models the top intellectual property (IP) for model owners. One of the major threats to DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. In this work, we propose an advanced model extraction framework DeepSteal that steals DNN weights remotely for the first time with the aid of a memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer-based fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailored for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate the proposed model extraction framework on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNetNGG-11). The extracted substitute model has successfully achieved more than 90% test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model. Notably, it achieves similar performance (i.e., ~1-2% test accuracy under attack) as white-box adversarial input attack (e.g., PGD/Trades).
深度神经网络(DNN)的最新进展使其能够在多个对安全敏感的领域广泛应用。对资源密集型训练的需求以及对有价值的特定领域训练数据的使用,使得这些模型成为模型所有者的首要知识产权(IP)。对DNN隐私的主要威胁之一是模型提取攻击,攻击者试图窃取DNN模型中的敏感信息。在这项工作中,我们提出了一种先进的模型提取框架DeepSteal,它首次借助内存侧信道攻击远程窃取DNN权重。我们提出的DeepSteal包括两个关键阶段。首先,我们通过采用基于行锤(rowhammer)的故障技术作为信息泄漏向量,开发了一种新的权重比特信息提取方法,称为HammerLeak。HammerLeak利用了几种为DNN应用量身定制的新型系统级技术,以实现快速高效的权重窃取。其次,我们提出了一种带有平均聚类权重惩罚的新型替代模型训练算法,该算法有效地利用了部分泄漏的比特信息,并生成了目标受害模型的替代原型。我们在三个流行的图像数据集(例如,CIFAR - 10/100/GTSRB)和四种DNN架构(例如,ResNet - 18/34/Wide - ResNetNGG - 11)上评估了所提出的模型提取框架。对于CIFAR - 10数据集,提取的替代模型在深度残差网络上成功实现了超过90%的测试准确率。此外,我们提取的替代模型还可以生成有效的对抗性输入样本以欺骗受害模型。值得注意的是,它实现了与白盒对抗性输入攻击(例如,PGD/Trades)相似的性能(即在攻击下测试准确率约为1 - 2%)。