Towards Backdoor Stealthiness in Model Parameter Space
Recent research on backdoor stealthiness focuses mainly on indistinguishable triggers in input space and inseparable backdoor representations in feature space, aiming to circumvent backdoor defenses that examine these respective spaces. However, existing backdoor attacks are typically designed to resist a specific type of backdoor defense without considering the diverse range of defense mechanisms. Based on this observation, we pose a natural question: Are current backdoor attacks truly a real-world threat when facing diverse practical defenses? To answer this question, we examine 12 common backdoor attacks that focus on input-space or feature-space stealthiness and 17 diverse representative defenses. Surprisingly, we reveal a critical blind spot: Backdoor attacks designed to be stealthy in input and feature spaces can be mitigated by examining backdoored models in parameter space. To investigate the underlying causes behind this common vulnerability, we study the characteristics of backdoor attacks in the parameter space. Notably, we find that input- and feature-space attacks introduce prominent backdoor-related neurons in parameter space, which are not thoroughly considered by current backdoor attacks. Taking comprehensive stealthiness into account, we propose a novel supply-chain attack called Grond. Grond limits the parameter changes by a simple yet effective module, Adversarial Backdoor Injection (ABI), which adaptively increases the parameter-space stealthiness during the backdoor injection. Extensive experiments demonstrate that Grond outperforms all 12 backdoor attacks against state-of-the-art (including adaptive) defenses on CIFAR-10, GTSRB, and a subset of ImageNet. In addition, we show that ABI consistently improves the effectiveness of common backdoor attacks.
💡 Research Summary
This paper investigates a critical blind spot in the current landscape of backdoor attacks on deep neural networks: while many attacks achieve stealthiness in the input space (imperceptible triggers) and the feature space (inseparable representations), they remain vulnerable when defenses examine the model’s parameter space. The authors systematically evaluate twelve representative backdoor attacks—ranging from classic BadNets and Blend to more recent input‑stealthy methods such as WaNet, IAD, and adaptive attacks like Adap‑Blend—and test them against seventeen state‑of‑the‑art defenses. These defenses cover detection (both model‑level and input‑level) and mitigation (pruning‑based and fine‑tuning‑based), including recent parameter‑space defenses such as RNP, CLP, ANP, FT‑SAM, and proactive training strategies.
The experimental results reveal that almost all evaluated attacks, despite being designed to evade input‑space or feature‑space defenses, are easily mitigated by parameter‑space defenses. The authors attribute this vulnerability to the emergence of “prominent backdoor‑related neurons” in the weight matrices: during training, a small subset of neurons acquire unusually large weight magnitudes to encode the malicious functionality. By measuring the Lipschitz continuity of neuron activations, the authors can identify these sensitive neurons, and pruning them effectively removes the backdoor while preserving benign performance.
Motivated by this analysis, the paper introduces a novel supply‑chain attack named Grond, which seeks comprehensive stealthiness across three dimensions: input, feature, and parameter spaces. Grond employs Universal PGD (UPGD) to generate adversarial perturbations that serve as invisible triggers, ensuring input‑space stealth. The core of Grond is the Adversarial Backdoor Injection (ABI) module. ABI iteratively trains the model on perturbed samples while simultaneously applying a pruning‑like regularization that limits the magnitude of weight changes associated with backdoor‑related neurons. This adaptive constraint forces the backdoor to be distributed across many parameters rather than concentrated in a few conspicuous weights, thereby reducing detectability in the parameter space.
Extensive experiments on CIFAR‑10, GTSRB, and a subset of ImageNet (ImageNet‑200) demonstrate that Grond consistently outperforms all twelve baseline attacks against a wide spectrum of defenses. Specifically, Grond maintains high attack success rates (>90%) even when faced with five pruning‑based defenses, five fine‑tuning‑based defenses, five model‑detection defenses, two input‑detection defenses, and a proactive defense. Moreover, when the ABI module is attached to existing attacks, it significantly reduces the parameter‑space footprint (≈40% lower weight deviation) and improves resistance to pruning and detection, confirming the generality of the approach.
The contributions of the paper are fourfold: (1) a systematic audit showing that current backdoor attacks lack parameter‑space stealthiness; (2) an analysis linking backdoor effectiveness to a small set of prominent neurons in the weight space; (3) the design of Grond and its ABI module to achieve multi‑dimensional stealth; and (4) comprehensive empirical validation that Grond sets a new benchmark for robust, stealthy backdoor attacks.
In conclusion, the work highlights that future backdoor defense research must consider the parameter space alongside input and feature spaces. Defenses that jointly monitor weight distributions, activation sensitivities, and input anomalies are likely required to counter sophisticated supply‑chain attacks like Grond. The paper also opens avenues for developing new metrics to quantify parameter‑space stealth and for designing hybrid defenses that integrate pruning, fine‑tuning, and trigger‑inversion techniques.
Comments & Academic Discussion
Loading comments...
Leave a Comment