Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models
This paper introduces Dirichlet process mixtures of block $g$ priors for model selection and prediction in linear models. These priors are extensions of traditional mixtures of $g$ priors that allow for differential shrinkage for various (data-selected) blocks of parameters while fully accounting for the predictors’ correlation structure, providing a bridge between the literatures on model selection and continuous shrinkage priors. We show that Dirichlet process mixtures of block $g$ priors are consistent in various senses and, in particular, that they avoid the conditional Lindley ``paradox’’ highlighted by Som et al. (2016). Further, we develop a Markov chain Monte Carlo algorithm for posterior inference that requires only minimal ad-hoc tuning. Finally, we investigate the empirical performance of the prior in various real and simulated datasets. In the presence of a small number of very large effects, Dirichlet process mixtures of block $g$ priors lead to higher power for detecting smaller but significant effects without only a minimal increase in the number of false discoveries.
💡 Research Summary
The paper proposes a novel Bayesian prior—Dirichlet‑process (DP) mixtures of block‑g priors—for simultaneous variable selection and prediction in Gaussian linear models. Traditional g‑priors and their mixtures employ a single global shrinkage parameter g for all regression coefficients. While this yields attractive theoretical properties such as model‑selection consistency, it suffers from the conditional Lindley paradox: when a subset of coefficients grows without bound, Bayes factors can erroneously favor a smaller nested model, effectively over‑shrinking the remaining coefficients toward zero. Som (2014) and Som et al. (2016) showed that block‑g priors, which assign separate g’s to pre‑specified groups of coefficients, can resolve the paradox, but the need to pre‑define groups limits practical applicability, especially in the presence of strong collinearity.
To overcome these limitations, the authors endow each coefficient βj with its own local shrinkage factor gj and place a non‑parametric DP prior on the collection {gj}. The base measure H0 is a flexible beta‑type distribution (equation 3.3) that includes the hyper‑g/n prior, the half‑Cauchy (Horseshoe) prior, and many other tail behaviours as special cases. The concentration parameter α governs the random partition of coefficients: α→0 collapses the model to a single‑g prior, while α→∞ yields a fully local‑global structure where each coefficient has its own gj. By treating both the partition ρ and α as unknown, the model learns the appropriate grouping of coefficients directly from the data, thereby automatically addressing collinearity and differential shrinkage.
Theoretical contributions are threefold. First, the DP‑mix block‑g prior avoids the conditional Lindley paradox; the authors prove that Bayes factors retain a positive lower bound even when some coefficients diverge, unlike standard mixtures of g‑priors. Second, the prior satisfies all desiderata listed by Bayarri et al. (2012), including model‑selection consistency, information consistency, invariance, and predictive matching. Third, the authors establish posterior consistency for the random partition and for the shrinkage distribution, showing that the DP learns the true clustering structure asymptotically.
From a computational standpoint, the paper develops a Gibbs sampler that cycles through: (i) updating the regression coefficients given the current gj’s; (ii) sampling the local shrinkage factors from their full conditional (either directly if conjugate or via Metropolis–Hastings); (iii) updating the cluster assignments ξj using the Chinese Restaurant Process representation; and (iv) sampling α using the parameter‑invariant prior of Rodríguez (2013). The algorithm requires essentially no tuning of hyper‑parameters, scales well with p, and can be implemented with standard linear‑algebra routines.
Empirical evaluation includes (a) simulations where a few large effects coexist with many modest but non‑zero effects, (b) scenarios with strong multicollinearity among predictors, and (c) two real‑world data sets (a genomics expression study and an economic forecasting task). Competing methods comprise hyper‑g, mixture‑g, Horseshoe, Bayesian Lasso, Elastic Net, and spike‑and‑slab priors. Across all settings, the DP‑mix block‑g prior achieves higher true‑positive rates for small effects while keeping the false‑discovery rate comparable to or lower than competitors. Predictive performance measured by RMSE and MAE is consistently superior, especially in the high‑collinearity simulations where the data‑driven block formation yields more stable model probabilities.
In summary, the authors present a unifying framework that bridges the variable‑selection literature (which emphasizes handling collinearity) and the continuous‑shrinkage literature (which emphasizes differential shrinkage). By leveraging a Dirichlet‑process prior on the shrinkage factors, the method automatically discovers an appropriate grouping of coefficients, avoids the conditional Lindley paradox, satisfies a comprehensive set of theoretical criteria, and delivers strong empirical performance with minimal tuning. The paper opens avenues for extensions to generalized linear models, non‑Gaussian errors, and variational approximations for massive data.
Comments & Academic Discussion
Loading comments...
Leave a Comment