WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
Kunbei Cai*, Zhenkai Zhang, Qian Lou, Fan Yao*
;
Abstract
"Training from pre-trained models (PTM) is a popular approach for fast machine learning (ML) service deployment. Recent studies on hardware security have revealed that ML systems could be compromised through flipping bits in model parameters (e.g., weights) with memory faults. In this paper, we introduce (i.e., weight bit poisoning), a novel task-agnostic backdoor attack that manifests during the victim’s training time (i.e., fine-tuning from a public and clean PTM) by inducing hardware-based weight bit flips. utilizes a novel distance-aware algorithm that identifies bit flips to maximize the distance between the distribution of poisoned output representations (ORs) and clean ORs based on the public PTM. This unique set of bit flips can be applied to backdoor any victim model during the fine-tuning of the same public PTM, regardless of the downstream tasks. We evaluate on state-of-the-art CNNs and Vision Transformer models with representative downstream tasks. The results show that can compromise a wide range of PTMs and downstream tasks with an average 99.3% attack success rate by flipping as few as 11 model weight bits. can be effective in various training configurations with respect to learning rate, optimizer, and fine-tuning duration. We investigate limitations of existing backdoor protection techniques against and discuss potential future mitigation. 1 1 Our code can be accessed at: https://github.com/casrl/WBP"
Related Material
[pdf]
[DOI]