Adaptive Transfer Clustering: A Unified Framework
We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method’s effectiveness in various scenarios.
💡 Research Summary
The paper introduces Adaptive Transfer Clustering (ATC), a unified framework for leveraging auxiliary data when clustering a primary dataset that shares the same subjects but may exhibit a different latent grouping structure. The authors formalize the problem by assuming that each dataset is generated from a mixture model (e.g., Gaussian mixture, stochastic block model, latent class model) with K latent clusters, and that the true label vectors (Z^{0*}) and (Z^{1*}) differ on a fraction (\varepsilon) of the n subjects. The key challenge is that (\varepsilon) is unknown, so the algorithm must adaptively decide how much information to borrow from the auxiliary view.
The paper first studies a warm‑up case: a one‑dimensional, two‑component symmetric Gaussian mixture model (GMM). Two baseline strategies are defined: Independent Task Learning (ITL), which clusters using only the primary data, and Data Pooling (DP), which treats the concatenated pair ((X^{0},X^{1})) as a two‑dimensional mixture and clusters jointly. ITL achieves a misclassification probability of (\Phi(-\mu/\sigma)); DP improves this to (\Phi(-\sqrt{2}\mu/\sigma)) when the labels perfectly match, but incurs an additional error term proportional to (\varepsilon) when they do not.
To bridge the gap between these extremes, the authors propose a penalized joint MAP estimator: for each subject i, solve
\
Comments & Academic Discussion
Loading comments...
Leave a Comment