Byzantine Machine Learning: MultiKrum and an optimal notion of robustness
Aggregation rules are the cornerstone of distributed (or federated) learning in the presence of adversaries, under the so-called Byzantine threat model. They are also interesting mathematical objects from the point of view of robust mean estimation. The Krum aggregation rule has been extensively studied, and endowed with formal robustness and convergence guarantees. Yet, MultiKrum, a natural extension of Krum, is often preferred in practice for its superior empirical performance, even though no theoretical guarantees were available until now. In this work, we provide the first proof that MultiKrum is a robust aggregation rule, and bound its robustness coefficient. To do so, we introduce $κ^\star$, the optimal robustness coefficient of an aggregation rule, which quantifies the accuracy of mean estimation in the presence of adversaries in a tighter manner compared with previously adopted notions of robustness. We then construct an upper and a lower bound on MultiKrum’s robustness coefficient. As a by-product, we also improve on the best-known bounds on Krum’s robustness coefficient. We show that MultiKrum’s bounds are never worse than Krum’s, and better in realistic regimes. We illustrate this analysis by an experimental investigation on the quality of the lower bound.
💡 Research Summary
This paper tackles a fundamental problem in Byzantine‑resilient distributed and federated learning: how to aggregate client updates when a fraction of them may be arbitrarily malicious. While the Krum aggregation rule has been extensively studied and enjoys formal (f, κ)‑robustness guarantees, its natural extension MultiKrum is widely used in practice because it empirically yields better accuracy and faster convergence. However, before this work no theoretical robustness guarantees existed for MultiKrum, leaving a gap between practice and theory.
The authors close this gap by introducing a new, mathematically precise notion called the optimal robustness coefficient κ*. For any aggregation rule F, κ*(F) is defined as the worst‑case ratio between the distance of the aggregated output to the mean of the honest clients and the intrinsic dispersion of those honest clients. Formally, \
Comments & Academic Discussion
Loading comments...
Leave a Comment