A Federated Generalized Expectation-Maximization Algorithm for Mixture Models with an Unknown Number of Components
We study the problem of federated clustering when the total number of clusters $K$ across clients is unknown, and the clients have heterogeneous but potentially overlapping cluster sets in their local data. To that end, we develop FedGEM: a federated generalized expectation-maximization algorithm for the training of mixture models with an unknown number of components. Our proposed algorithm relies on each of the clients performing EM steps locally, and constructing an uncertainty set around the maximizer associated with each local component. The central server utilizes the uncertainty sets to learn potential cluster overlaps between clients, and infer the global number of clusters via closed-form computations. We perform a thorough theoretical study of our algorithm, presenting probabilistic convergence guarantees under common assumptions. Subsequently, we study the specific setting of isotropic GMMs, providing tractable, low-complexity computations to be performed by each client during each iteration of the algorithm, as well as rigorously verifying assumptions required for algorithm convergence. We perform various numerical experiments, where we empirically demonstrate that our proposed method achieves comparable performance to centralized EM, and that it outperforms various existing federated clustering methods.
💡 Research Summary
The paper tackles a fundamental challenge in federated learning: clustering data across many clients when the total number of clusters is unknown and each client may possess a different, possibly overlapping subset of those clusters. To address this, the authors propose FedGEM, a federated generalized expectation‑maximization (GEM) algorithm that can learn mixture models without prior knowledge of the global component count.
Problem setting
- There are G clients, each holding N_g i.i.d. samples drawn from a local mixture model with K_g components. The global number of distinct components K is unknown, and no single client observes all K clusters.
- Overlapping clusters share the same underlying parameters (means, covariances) across clients, but component weights may differ per client.
Algorithmic framework
FedGEM operates in two phases: (1) an iterative collaborative training stage and (2) a final aggregation step.
Client side – In each round t a client performs (i) one or more standard EM updates using its current parameters θ_{t‑1}^g, obtaining responsibilities γ_{k}^{g} and updated component means x̂_{k}^{g}. (ii) For each component k, it solves a convex optimization problem that yields a radius ε_{k}^{g}(t). This radius defines an Euclidean ball U_{k}^{g}(t)=B₂(x̂_{k}^{g}, ε_{k}^{g}) called an “uncertainty set”. The optimization guarantees that any parameter inside the ball does not decrease the finite‑sample expected complete‑data log‑likelihood, thus satisfying the GEM condition. The client then sends the pair (x̂_{k}^{g}, ε_{k}^{g}) for all its components to the server.
Server side – The server collects all uncertainty sets from all clients. It checks pairwise whether two balls intersect, i.e., whether ‖x̂_{k}^{g}−x̂_{k’}^{g’}‖₂ ≤ ε_{k}^{g}+ε_{k’}^{g’}. If they intersect, the corresponding local components are declared to belong to the same “super‑cluster”. For intersecting sets the server computes a consensus vector ν̂ that lies in the intersection (a clipped weighted combination of the two centers) and adds ν̂ to the candidate set of each involved client. After processing all pairs, each client’s new parameter estimate is obtained by aggregating all vectors in its candidate set (e.g., by averaging).
During the final aggregation step the server merges all components that belong to the same super‑cluster, producing a single global estimate for each discovered cluster and an estimate \hat K of the total number of clusters.
Theoretical contributions
- Convergence under strong concavity – Assuming each local Q‑function is strongly concave (a standard GEM requirement), the authors prove that the sequence of iterates stays within the corresponding uncertainty sets and converges to a neighborhood of the true parameters with high probability.
- First‑order stability (FOS) for isotropic GMMs – For isotropic Gaussian mixtures the paper derives a bi‑convex reformulation of the radius‑optimization problem and establishes the FOS condition, which yields an explicit contraction region. This guarantees linear convergence when the initialization lies inside that region.
- Cluster separability via R_min and R_max – By defining the minimal inter‑cluster distance R_min and maximal distance R_max, the analysis shows that if the uncertainty radii are smaller than R_min/2, different clusters cannot be mistakenly merged, ensuring correct recovery of \hat K.
Complexity and communication
Each client transmits only K_g pairs of a d‑dimensional mean and a scalar radius per round, resulting in O(∑_g K_g·d) communication cost, independent of the local sample size. Server‑side pairwise overlap checks have O((∑_g K_g)²) complexity; the authors suggest using approximate nearest‑neighbor or hashing techniques to keep this tractable in practice.
Empirical evaluation
The authors test FedGEM on synthetic mixtures with varying K, K_g, and cluster separations, as well as on high‑dimensional real datasets (e.g., CIFAR‑10 feature embeddings, 20 Newsgroups text embeddings). Results show:
- Accurate estimation of the true number of clusters, even when K is far larger than any individual K_g.
- Clustering quality (Adjusted Rand Index, Normalized Mutual Information) comparable to centralized EM and consistently superior to existing federated clustering baselines such as k‑FED, FedKmeans, and AFCL.
- Robustness to reduced inter‑cluster distances and to heterogeneity in component weights.
- Low communication overhead and inherent privacy preservation, as only aggregated statistics (means and radii) are shared, preventing raw data reconstruction.
Conclusion and future directions
FedGEM is the first federated GEM algorithm that simultaneously handles unknown global component count, heterogeneous and overlapping client cluster sets, and privacy constraints. It provides rigorous convergence guarantees, low‑complexity client updates, and strong empirical performance. Future work could extend the framework to non‑isotropic or non‑Gaussian mixtures, incorporate differential privacy mechanisms, and develop online or dynamic‑cluster variants for streaming federated environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment