DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In real-world Federated Learning (FL) deployments, data distributions on devices that participate in training evolve over time. This leads to asynchronous data drift, where different devices shift at different times and toward different distributions. Mitigating such drift is challenging: frequent retraining incurs high computational cost on resource-constrained devices, while infrequent retraining degrades performance on drifting devices. We propose DriftGuard, a federated continual learning framework that efficiently adapts to asynchronous data drift. DriftGuard adopts a Mixture-of-Experts (MoE) inspired architecture that separates shared parameters, which capture globally transferable knowledge, from local parameters that adapt to group-specific distributions. This design enables two complementary retraining strategies: (i) global retraining, which updates the shared parameters when system-wide drift is identified, and (ii) group retraining, which selectively updates local parameters for clusters of devices identified via MoE gating patterns, without sharing raw data. Experiments across multiple datasets and models show that DriftGuard matches or exceeds state-of-the-art accuracy while reducing total retraining cost by up to 83%. As a result, it achieves the highest accuracy per unit retraining cost, improving over the strongest baseline by up to 2.3x. DriftGuard is available for download from https://github.com/blessonvar/DriftGuard.

💡 Research Summary

DriftGuard addresses a critical gap in federated learning (FL) deployments where data distributions on edge devices evolve independently over time—a phenomenon termed asynchronous data drift. Traditional federated continual learning (FCL) either retrains the entire global model on all devices, incurring prohibitive computational and communication costs, or clusters devices and retrains each cluster in isolation, which sacrifices the benefits of globally transferable knowledge.

The proposed framework introduces a Mixture‑of‑Experts (MoE) inspired architecture that explicitly separates model parameters into two categories: (1) shared parameters that capture knowledge useful across all devices, and (2) local expert parameters that adapt to the specific distribution of a device group. Each device runs the same shared backbone but routes its inputs through a gating network that selects which local expert(s) to activate. The gating outputs—vectors indicating expert activation probabilities—are transmitted to the central server. By clustering these gating vectors (e.g., using K‑means), the server groups devices that are experiencing similar distributional shifts without ever exchanging raw data, thereby preserving privacy.

DriftGuard’s core contribution is a two‑level retraining strategy:

Global retraining updates only the shared parameters when a system‑wide drift is detected (e.g., overall accuracy falls below a threshold or the aggregate gating distribution changes significantly). This ensures that globally useful knowledge remains up‑to‑date while limiting the amount of computation each device must perform.
Group retraining targets a specific cluster of devices whose local accuracy degrades or whose gating pattern shifts. Only the local expert parameters of that cluster are updated, leaving the rest of the network untouched.

Formally, the framework defines a retraining configuration πₜ = (Trig, S, θ) at each time step t, where Trig indicates whether retraining occurs, S is the selected subset of devices, and θ denotes the subset of model parameters to be updated. The total cost over T steps is the sum of per‑round computation across participating devices, while the average accuracy is the mean of per‑device accuracies after each retraining. DriftGuard optimizes the ratio of average accuracy to total cost, effectively maximizing “accuracy per unit retraining cost.”

Experiments were conducted on three benchmark datasets (Office‑Home, CIFAR‑10‑C, FEMNIST) using four deep models (ResNet‑18, MobileNet‑V2, etc.). DriftGuard was compared against strong baselines: standard FedAvg/FedProx, classic FCL (global retraining on all devices), and clustering‑based FCL (group‑only retraining). Results show that DriftGuard reduces total retraining cost by up to 83 % while achieving equal or slightly higher average accuracy (typically 1–2 % improvement). The derived efficiency metric (accuracy divided by cost) is up to 2.3× higher than the best baseline. A real‑world IoT prototype consisting of 20 smart‑city traffic cameras confirmed these gains: retraining time dropped by 20 % and the efficiency metric improved by 1.2×.

Ablation studies demonstrate that updating only shared parameters yields low cost but insufficient accuracy, whereas updating only local experts preserves accuracy for the targeted group but loses global knowledge. The combined two‑level approach consistently delivers the best trade‑off.

Key insights include:

Gating‑based clustering provides a privacy‑preserving signal for grouping devices with similar drift, eliminating the need for raw data exchange.
Parameter separation allows the system to adapt quickly to local shifts without repeatedly retraining the entire model, dramatically cutting communication rounds and local compute.
Dynamic triggering based on both global accuracy and gating variance balances timely adaptation against unnecessary overhead.

Limitations are acknowledged. The number of experts and gating dimensionality are hyper‑parameters that affect performance and may require tuning per deployment. Clustering introduces additional server‑side computation, though it can be performed infrequently. In scenarios where many devices drift simultaneously, the frequency of global retraining may increase, reducing cost savings.

Future work could explore automated expert scaling, hierarchical gating mechanisms, and predictive drift detection to further reduce unnecessary retraining.

In summary, DriftGuard offers a practical, cost‑effective solution for federated learning under asynchronous data drift, delivering superior accuracy‑cost efficiency while respecting device privacy. The implementation and datasets are publicly available at https://github.com/blessonvar/DriftGuard.

DriftGuard: Mitigating Asynchronous Data Drift in Federated Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment