Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Class-incremental continual learning addresses catastrophic forgetting by enabling classification models to preserve knowledge of previously learned classes while acquiring new ones. However, the vulnerability of the models against adversarial attacks during this process has not been investigated sufficiently. In this paper, we present the first exploration of vulnerability to stage-transferred attacks, i.e., an adversarial example generated using the model in an earlier stage is used to attack the model in a later stage. Our findings reveal that continual learning methods are highly susceptible to these attacks, raising a serious security issue. We explain this phenomenon through model similarity between stages and gradual robustness degradation. Additionally, we find that existing adversarial training-based defense methods are not sufficiently effective to stage-transferred attacks. Codes are available at https://github.com/mcml-official/CSAT.

💡 Research Summary

Class‑incremental continual learning (Class‑IL) seeks to mitigate catastrophic forgetting by progressively adding new classes while preserving knowledge of previously learned ones. Although extensive work has examined adversarial robustness for static models, the vulnerability of Class‑IL systems to attacks that exploit their multi‑stage training dynamics has remained largely unexplored. This paper introduces the concept of “stage‑transferred attacks,” where adversarial examples are crafted using a model from an earlier learning stage (fθₐ) and then deployed against a model from a later stage (fθ_T, T > a).

The authors evaluate this threat on two widely used benchmarks: Split‑MNIST (5 stages, 2 classes per stage) and Split‑CIFAR‑100 (10 stages, 10 classes per stage). Four representative Class‑IL algorithms are examined: iCaRL, GDumb, ER‑ACE, and ER‑AML. For each stage, three attack methods are employed—FGSM, PGD, and AutoAttack (AA)—with perturbation budgets ε = 0.3 for MNIST and ε = 8/255 for CIFAR‑100.

Results (Tables I and II) reveal that adversarial examples generated at early stages retain high attack success rates (ASR) when evaluated on the final model. For instance, iCaRL’s FGSM attack crafted on stage 1 achieves an ASR of 0.774 on the final stage 5 model, only modestly lower than the direct attack ASR of 0.953. GDumb shows a similar pattern on CIFAR‑100, where a stage 1 FGSM perturbation yields an ASR of 0.697 on the stage 10 model, close to the direct attack ASR of 0.836. Across all methods, ASR improves as the attacker stage approaches the target stage, indicating a clear distance‑dependent transferability.

To explain this phenomenon, the authors analyze two factors. First, they measure inter‑stage model similarity using cosine similarity of parameters and centered kernel alignment (CKA) of hidden representations. Both metrics increase as the attacker stage gets closer to the final stage, and a strong positive correlation is observed between similarity and the normalized ASR ratio (ASRₐ→T / ASR_T→T). This confirms that the shared decision‑boundary direction across stages facilitates transfer.

Second, they investigate robustness degradation across stages. By computing the average local Lipschitz constant Lₜ and the maximal spectral norm λ_max of the Hessian of the loss w.r.t. the final‑layer weights, they find that both quantities grow monotonically with the stage index. This indicates that as new classes are added, the model’s decision boundary becomes steeper and more complex, making it increasingly susceptible to small perturbations. Notably, transfer is asymmetric: Table III shows that forward transfer (e.g., stage 4 → 5) yields higher ASR than backward transfer (stage 5 → 4), suggesting that later models are intrinsically less robust.

The paper also evaluates existing adversarial‑training‑based defenses (e.g., margin‑aware loss, boundary‑preserving regularizers, unified replay‑attack optimization) within the Class‑IL pipeline. Empirical results demonstrate that these defenses, while improving robustness against attacks crafted on the current model, fail to substantially reduce ASR for stage‑transferred attacks. The underlying reason is that adversarial training optimizes for the current parameter configuration, whereas the continual‑learning process continually reshapes the model, leaving previously crafted perturbations effective.

In summary, this work uncovers a previously unrecognized security risk: an adversary needs only access to an outdated checkpoint of a Class‑IL system to launch successful attacks against its latest version. The dual mechanisms of high inter‑stage similarity and progressive robustness decay jointly enable this cross‑stage adversarial transferability. The findings call for novel defense strategies that explicitly account for the evolving nature of continual‑learning models—such as multi‑stage adversarial regularization, memory‑augmented robust replay, or architectural designs that limit similarity drift. Addressing these challenges will be crucial for deploying continual‑learning systems in safety‑critical and open‑world applications.

Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment