Conformal Unlearning: A New Paradigm for Unlearning in Conformal Predictors
Conformal unlearning aims to ensure that a trained conformal predictor miscovers data points with specific shared characteristics, such as those from a particular label class, associated with a specific user, or belonging to a defined cluster, while maintaining valid coverage on the remaining data. Existing machine unlearning methods, which typically approximate a model retrained from scratch after removing the data to be forgotten, face significant challenges when applied to conformal unlearning. These methods often lack rigorous, uncertainty-aware statistical measures to evaluate unlearning effectiveness and exhibit a mismatch between their degraded performance on forgotten data and the frequency with which that data are still correctly covered by conformal predictors-a phenomenon we term ‘‘fake conformal unlearning’’. To address these limitations, we propose a new paradigm for conformal machine unlearning that provides finite-sample, uncertainty-aware guarantees on unlearning performance without relying on a retrained model as a reference. We formalize conformal unlearning to require high coverage on retained data and high miscoverage on forgotten data, introduce practical empirical metrics for evaluation, and present an algorithm that optimizes these conformal objectives. Extensive experiments on vision and text benchmarks demonstrate that the proposed approach effectively removes targeted information while preserving utility.
💡 Research Summary
The paper tackles the problem of removing specific groups of data from conformal predictors—a task it terms “conformal unlearning.” Traditional machine unlearning (MU) methods aim to approximate a model retrained from scratch (the RT model) and typically evaluate success using parameter‑level similarity or accuracy on forgotten data. The authors demonstrate that such approaches are inadequate for conformal prediction because (1) they lack uncertainty‑aware metrics, and (2) a phenomenon they call “fake conformal unlearning” occurs: even when accuracy on forgotten data drops dramatically, the conformal prediction sets still cover the true label at the prescribed confidence level, meaning the model retains substantial knowledge about the forgotten data.
To resolve this, the authors propose a fundamentally different definition of unlearning that is rooted in the coverage properties of conformal predictors. The goal is two‑fold: (i) achieve high coverage on retained data (the usual CP guarantee) and (ii) achieve high mis‑coverage on forgotten data (the true label should be excluded from the prediction set with high probability). This definition does not rely on any retrained baseline, thereby eliminating dependence on costly RT models and avoiding the “forgeability” issue where identical parameters can arise from different training histories.
The paper introduces two empirical metrics that directly quantify these objectives:
- Empirical Coverage Frequency (ECF) – the proportion of retained points whose true label lies in a prediction set whose size does not exceed a user‑specified threshold c.
- Empirical mis‑Coverage Frequency (EmCF) – the proportion of forgotten points whose true label is excluded from a prediction set whose size does not exceed a threshold d.
ECF and EmCF are finite‑sample, distribution‑free estimators of the theoretical coverage and mis‑coverage probabilities, and they can be computed without any reference model.
Algorithmically, the authors build on split conformal prediction. After training a base model, they compute non‑conformity scores on a calibration set. For retained data they keep the usual quantile‑based threshold ˆqα, guaranteeing (1 – α) coverage under exchangeability. For forgotten data they deliberately inflate the scores (e.g., by adding a learned offset or by re‑weighting the calibration distribution) so that the resulting threshold yields a much larger mis‑coverage rate. The overall objective combines the two metrics as L = λ·EmCF – (1 – λ)·ECF, where λ balances forgetting versus utility. This objective is optimized in a batch‑wise fashion, making the method scalable to large image and text corpora.
Experiments are conducted on CIFAR‑100, an ImageNet subset, and several text classification benchmarks. The authors evaluate three forgetting scenarios: (a) class‑wise forgetting, (b) cluster‑wise forgetting based on feature similarity, and (c) user‑attribute forgetting. Results consistently show:
- High mis‑coverage on forgotten groups (EmCF ≈ 80 % or higher), indicating that the true label is rarely present in the prediction set.
- Preserved coverage on retained groups (ECF ≈ 95 % or higher), matching the nominal CP guarantee.
- Minimal loss in overall accuracy and prediction‑set size, demonstrating that utility is retained.
A notable comparison with the RT baseline reveals that while Grad‑CAM visualizations differ markedly for forgotten data (the model’s attention shifts away), the parameter distance between the unlearned model and the RT model is negligible, underscoring that parameter‑level certification does not capture behavioral changes.
In summary, the paper introduces a new paradigm—conformal unlearning—that shifts the focus from point‑estimate accuracy to set‑based uncertainty. By defining unlearning through observable coverage properties, providing finite‑sample guarantees, and offering practical metrics and scalable algorithms, it addresses the shortcomings of existing MU methods and opens a path toward privacy‑preserving, uncertainty‑aware machine learning systems suitable for regulated domains such as medical diagnosis, content moderation, and e‑commerce.
Comments & Academic Discussion
Loading comments...
Leave a Comment