Redirection for Erasing Memory (REM): Towards a universal unlearning method for corrupted data
Machine unlearning is studied for a multitude of tasks, but specialization of unlearning methods to particular tasks has made their systematic comparison challenging. To address this issue, we propose a conceptual space to characterize diverse corrupted data unlearning tasks in vision classifiers. This space is described by two dimensions, the discovery rate (the fraction of the corrupted data that are known at unlearning time) and the statistical regularity of the corrupted data (from random exemplars to shared concepts). Methods proposed previously have been targeted at portions of this space and-we show-fail predictably outside these regions. We propose a novel method, Redirection for Erasing Memory (REM), whose key feature is that corrupted data are redirected to dedicated neurons introduced at unlearning time and then discarded or deactivated to suppress the influence of corrupted data. REM performs strongly across the space of tasks, in contrast to prior SOTA methods that fail outside the regions for which they were designed.
💡 Research Summary
The paper tackles the practical problem of removing the influence of corrupted or malicious training data from already‑trained neural networks—a task known as machine unlearning. Existing unlearning methods are typically designed for narrow scenarios (e.g., full discovery of the corrupt set, or high‑regularity poison attacks) and therefore lack a systematic way to compare their performance across diverse real‑world situations. To address this gap, the authors introduce a two‑dimensional taxonomy for corrupted‑data unlearning tasks. The first axis, discovery rate, measures the fraction of the corrupted data that is identified at unlearning time (ranging from 0 % to 100 %). The second axis, statistical regularity, captures how similar the corrupted examples are to each other, from low‑regularity random label flips to high‑regularity patterns such as a shared visual trigger in a poisoning attack. By mapping existing state‑of‑the‑art (SOTA) methods onto this 2‑D space, the authors demonstrate that each method only succeeds in a narrow “slice” of the space and fails catastrophically elsewhere.
The paper’s main technical contribution is REM (Redirection for Erasing Memory), a universal unlearning algorithm that works across the entire taxonomy. REM operates by adding a dedicated set of neurons (or a small auxiliary layer) at unlearning time and routing all identified corrupted samples through this new pathway. Because these neurons are isolated from the rest of the network, they can be deactivated or zero‑initialized after training, effectively erasing any trace of the corrupted data without needing to retrain the whole model. This approach is inspired by Example‑Tied Dropout (ETD) but differs crucially: ETD separates “generalization” and “memorization” partitions during training, while REM creates a post‑hoc redirection module that can be dropped after unlearning, handling both discovered and undiscovered corrupted data regardless of regularity.
The experimental protocol is thorough. The authors evaluate REM and a suite of baselines (Bad Teacher, SCRUB, Potion, Gradient Ascent, NPO, ETD, fine‑tuning, and full retraining) on CIFAR‑10 and SVHN using two architectures—ResNet‑9 and Vision Transformer (ViT)—and three optimizers (SGD, Adam, AdamW). Corrupted data are generated in three regularity regimes: low (random mislabeling), medium (class swapping), and high (trigger‑based poisoning). For each regime, discovery rates of 10 %, 30 %, 50 %, 70 %, and 100 % are tested, yielding a dense grid over the taxonomy. Performance is measured by three metrics: (1) Unlearning Accuracy (error on the forget set after unlearning), (2) Utility Retention (overall test accuracy), and (3) Healing Score (fraction of corrupted examples correctly restored to their true label). Across all combinations, REM consistently achieves near‑perfect unlearning (error ≈ 0) while preserving test accuracy within 1 % of the original model. In contrast, prior methods excel only in limited regions: Potion shines on high‑regularity, high‑discovery scenarios; ETD works for low‑regularity but collapses on high‑regularity; Gradient Ascent and NPO are effective only when the corrupted signal is concentrated in a few parameters; and full retraining, while universally correct, is computationally prohibitive.
A notable aspect of REM is its resource efficiency. The auxiliary neurons constitute less than 2 % of the total parameter count, and the unlearning step requires only a forward pass through the original network plus a brief fine‑tuning of the new neurons—orders of magnitude cheaper than full retraining or iterative distillation used by methods like SCRUB or Potion. The authors also provide a formal definition of statistical regularity using the consistency score (C‑score) from Jiang et al. (2020), and show that this metric aligns with their empirical regularity categories.
The paper concludes with several promising directions: extending REM to non‑vision modalities (NLP, speech), integrating dynamic memory allocation to further reduce overhead, and adapting the approach to federated learning where data privacy constraints prevent full access to the training set. Overall, the work delivers a universal, efficient, and theoretically grounded solution to machine unlearning, filling a critical gap in the literature and offering a practical tool for deploying robust, privacy‑compliant AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment