Towards Privacy-Guaranteed Label Unlearning in Vertical Federated Learning: Few-Shot Forgetting without Disclosure
This paper addresses the critical challenge of unlearning in Vertical Federated Learning (VFL), a setting that has received far less attention than its horizontal counterpart. Specifically, we propose the first method tailored to \textit{label unlearning} in VFL, where labels play a dual role as both essential inputs and sensitive information. To this end, we employ a representation-level manifold mixup mechanism to generate synthetic embeddings for both unlearned and retained samples. This is to provide richer signals for the subsequent gradient-based label forgetting and recovery steps. These augmented embeddings are then subjected to gradient-based label forgetting, effectively removing the associated label information from the model. To recover performance on the retained data, we introduce a recovery-phase optimization step that refines the remaining embeddings. This design achieves effective label unlearning while maintaining computational efficiency. We validate our method through extensive experiments on diverse datasets, including MNIST, CIFAR-10, CIFAR-100, ModelNet, Brain Tumor MRI, COVID-19 Radiography, and Yahoo Answers demonstrate strong efficacy and scalability. Overall, this work establishes a new direction for unlearning in VFL, showing that re-imagining mixup as an efficient mechanism can unlock practical and utility-preserving unlearning. The code is publicly available at https://github.com/bryanhx/Towards-Privacy-Guaranteed-Label-Unlearning-in-Vertical-Federated-Learning
💡 Research Summary
Vertical Federated Learning (VFL) enables multiple parties to collaboratively train a model without sharing raw data: an active party holds the labels and a top model, while one or more passive parties hold feature subsets and corresponding bottom models. In many high‑stakes domains (healthcare, finance), the labels themselves are highly sensitive and may need to be removed after training to comply with “right‑to‑be‑forgotten” regulations such as GDPR and CCPA. Existing federated unlearning work focuses almost exclusively on horizontal federated learning (HFL) or on the removal of entire passive parties in VFL; none address selective label deletion while keeping the rest of the model intact.
The paper introduces a novel few‑shot label unlearning framework for VFL that requires only a small public dataset. The method consists of three tightly coupled stages: (1) Vertical Manifold Mixup, (2) Gradient‑Based Label Unlearning, and (3) Remained Accuracy Recovery.
Stage 1 – Vertical Manifold Mixup.
Given a tiny public set of labeled samples for the target label (Dₚ,ᵤ) and a small public set for the retained labels (Dₚ,ᵣ), each passive party computes local embeddings Hₚ,ᵤ,ₖ = G_{θₖ}(xₚ,ᵤ,ₖ). The active party then performs manifold mixup on embeddings belonging to the same passive party: Ĥᵤ,ₖ = λ·Hₚ,ᵤ,ₖ,i + (1‑λ)·Hₚ,ᵤ,ₖ,j, with the same mixing coefficient λ applied to the corresponding labels (ĥyᵤ). This creates synthetic representations that approximate the distribution of the full unlearned dataset while using only a few public examples. The same procedure is applied to Dₚ,ᵣ to obtain Ĥᵣ,ₖ and ĥyᵣ for the recovery phase.
Stage 2 – Gradient‑Based Label Unlearning.
The active party concatenates all mixed embeddings into Ĥᵤ =
Comments & Academic Discussion
Loading comments...
Leave a Comment