Is Gradient Ascent Really Necessary? Memorize to Forget for Machine Unlearning
For ethical and safe AI, machine unlearning rises as a critical topic aiming to protect sensitive, private, and copyrighted knowledge from misuse. To achieve this goal, it is common to conduct gradient ascent (GA) to reverse the training on undesired data. However, such a reversal is prone to catastrophic collapse, which leads to serious performance degradation in general tasks. As a solution, we propose model extrapolation as an alternative to GA, which reaches the counterpart direction in the hypothesis space from one model given another reference model. Therefore, we leverage the original model as the reference, further train it to memorize undesired data while keeping prediction consistency on the rest retained data, to obtain a memorization model. Counterfactual as it might sound, a forget model can be obtained via extrapolation from the memorization model to the reference model. Hence, we avoid directly acquiring the forget model using GA, but proceed with gradient descent for the memorization model, which successfully stabilizes the machine unlearning process. Our model extrapolation is simple and efficient to implement, and it can also effectively converge throughout training to achieve improved unlearning performance.
💡 Research Summary
The paper addresses a critical challenge in machine unlearning (MU): how to remove sensitive, private, or copyrighted information from large language models (LLMs) without incurring the severe performance degradation that current gradient‑ascent (GA) based methods cause. Existing approaches typically split the training data into a retain set (desired knowledge) and a forget set (undesired knowledge). They then apply GA on the forget set—maximizing loss—to “reverse” the learning process, while continuing gradient descent (GD) on the retain set. Although GA can improve forget quality, prior work has shown that it often leads to catastrophic collapse: the model’s parameters drift far from the original pre‑trained state, the output distribution collapses to near‑deterministic predictions, and overall utility on downstream tasks plummets.
To overcome these issues, the authors propose a fundamentally different paradigm: memorize‑to‑forget via model extrapolation (MOX). The key steps are:
-
Memorization Phase (GD only) – Starting from the original model θ_ref, they train a memorization model θ_mem using only gradient descent. The loss combines two terms:
- A cross‑entropy term on the forget set D_F, encouraging the model to over‑fit (memorize) the undesired data.
- A KL‑divergence term on the retain set D_R, forcing predictions on retained data to stay close to those of θ_ref, thereby preserving utility. This yields a model that is highly specialized to the forget data while still behaving like the original model on everything else.
-
Extrapolation Phase – They compute the difference vector Δ = θ_ref − θ_mem, which points from the memorized model back toward the original. By scaling this vector with a positive scalar α and adding it to θ_ref, they obtain the forget model: \
Comments & Academic Discussion
Loading comments...
Leave a Comment