Block clustering with collapsed latent block models
We introduce a Bayesian extension of the latent block model for model-based block clustering of data matrices. Our approach considers a block model where block parameters may be integrated out. The result is a posterior defined over the number of clusters in rows and columns and cluster memberships. The number of row and column clusters need not be known in advance as these are sampled along with cluster memberhips using Markov chain Monte Carlo. This differs from existing work on latent block models, where the number of clusters is assumed known or is chosen using some information criteria. We analyze both simulated and real data to validate the technique.
💡 Research Summary
This paper presents a Bayesian extension of the latent block model (LBM) for block clustering of data matrices, addressing a key limitation of existing LBM approaches: the need to pre‑specify or externally select the numbers of row and column clusters (K and G). The authors introduce a “collapsed” Bayesian LBM in which all block‑specific parameters (θₖg) and the cluster proportion parameters (ω for rows, ρ for columns) are analytically integrated out using conjugate priors. As a result, the posterior distribution depends only on K, G, and the latent allocation vectors z (row memberships) and w (column memberships). Priors for K and G are taken as truncated Poisson(1) distributions (with upper bounds K_max and G_max), which naturally penalise overly complex models while allowing the data to drive the number of clusters.
Two data models are detailed: a Bernoulli model for binary matrices with a Beta prior on θₖg, and a Gaussian model for continuous matrices with a Normal–Inverse‑Gamma prior on (μₖg, σ²ₖg). For each block the integrated likelihood Mₖg can be computed in closed form, yielding the collapsed posterior (equation 4). This eliminates the need for reversible‑jump MCMC; instead a standard Metropolis‑within‑Gibbs sampler explores the posterior.
The sampler consists of four move types: (1) a Gibbs update of a single row or column allocation; (2) a “multiple‑row/column” move inspired by Nobile & Fearnside (2007) that reallocates several rows (or columns) between two selected clusters to improve mixing; (3) a birth move that proposes adding a new cluster; and (4) a death move that removes an empty cluster. Acceptance probabilities are derived from the collapsed posterior, so the algorithm remains computationally tractable. Label‑switching is handled post‑hoc by matching sampled labelings across iterations (e.g., via minimum‑cost permutation).
Empirical evaluation covers three settings. In synthetic data with known 2×2 block structure, the collapsed Bayesian LBM outperforms the BEM2 EM algorithm (the state‑of‑the‑art variational EM for LBM) in recovering the true K and G and achieves higher Adjusted Rand Index scores. In a real‑world political dataset (U.S. congressional voting records), the method simultaneously discovers row clusters corresponding to party affiliation and column clusters reflecting policy dimensions, without pre‑specifying the number of clusters; the posterior over K and G quantifies model uncertainty, which is absent in maximum‑likelihood analyses. In a microarray case study, a Bernoulli version of the model identifies coherent gene‑condition biclusters, demonstrating applicability to high‑dimensional biological data.
Key contributions include: (i) a fully Bayesian treatment that integrates out nuisance parameters, yielding a collapsed posterior that is easy to sample; (ii) an MCMC scheme that avoids trans‑dimensional moves while still allowing K and G to vary, thus simplifying implementation and improving scalability; (iii) demonstration that the approach automatically balances model fit and complexity via the Poisson prior, reducing the need for ad‑hoc information criteria. Limitations noted are the dependence on the chosen upper bounds for K and G (which affect computational cost) and the sensitivity to prior hyper‑parameters, which the authors suggest exploring in future work. Potential extensions involve non‑conjugate priors, variational approximations for very large matrices, and incorporation of sparsity‑inducing priors for sparse count data.
Comments & Academic Discussion
Loading comments...
Leave a Comment