A Flexible Empirical Bayes Approach to Generalized Linear Models, with Applications to Sparse Logistic Regression
We introduce a flexible empirical Bayes approach for fitting Bayesian generalized linear models. Specifically, we adopt a novel mean-field variational inference (VI) method and the prior is estimated within the VI algorithm, making the method tuning-free. Unlike traditional VI methods that optimize the posterior density function, our approach directly optimizes the posterior mean and prior parameters. This formulation reduces the number of parameters to optimize and enables the use of scalable algorithms such as L-BFGS and stochastic gradient descent. Furthermore, our method automatically determines the optimal posterior based on the prior and likelihood, distinguishing it from existing VI methods that often assume a Gaussian variational. Our approach represents a unified framework applicable to a wide range of exponential family distributions, removing the need to develop unique VI methods for each combination of likelihood and prior distributions. We apply the framework to solve sparse logistic regression and demonstrate the superior predictive performance of our method in extensive numerical studies, by comparing it to prevalent sparse logistic regression approaches.
💡 Research Summary
**
The paper introduces a unified empirical Bayes framework for generalized linear models (GLMs) that integrates mean‑field variational inference (MF‑VI) with automatic prior estimation, called EBGLM. Traditional Bayesian GLM approaches either require pre‑specified priors and costly cross‑validation for hyper‑parameters, or rely on variational methods that are limited to specific prior‑likelihood combinations. EBGLM overcomes these limitations by reformulating the variational evidence lower bound (ELBO) into a real‑valued objective function h(θ, g) that depends only on the posterior mean vector θ and the prior parameters g.
Key technical steps:
- The log‑likelihood l(η) is expanded to second order around the variational mean predictor (\bar η = x^T\bar β). This yields an approximate ELBO where the only remaining intractable terms are per‑coordinate KL penalties r_j(θ,g).
- Each r_j is expressed as a penalization derived from a univariate Bayesian normal‑means (BNM) problem: a Gaussian observation z with variance s² combined with the prior g. Theorem 2.2 shows that the optimal variational factor q_j is the convolution of g with a Gaussian, and that r_j can be computed using the BNM marginal log‑likelihood and its derivative. Consequently, the whole variational problem reduces to minimizing
h(θ,g) = –∑_i l(x_i^Tθ) + ∑_j r_j(θ,g). - This formulation yields a penalized‑likelihood problem where the penalty acts on the posterior mean rather than the MAP estimate, aligning the solution with the Bayes risk (squared‑error) optimal estimator.
The authors instantiate the framework for sparse logistic regression using three widely used sparsity‑inducing priors: (i) point‑normal (spike‑and‑slab with a normal slab), (ii) point‑Laplace, and (iii) adaptive shrinkage (ash) mixtures of normals. All prior hyper‑parameters (e.g., mixing weight π₀, slab variance σ²) are estimated jointly with θ by optimizing h, eliminating any need for external tuning.
Because h(θ,g) is smooth and its gradient can be written in closed form, any scalable optimizer—L‑BFGS, limited‑memory BFGS, stochastic gradient descent, or even second‑order natural‑gradient methods—can be applied directly. This contrasts with many black‑box VI approaches that require Monte‑Carlo gradient estimators and variance‑reduction tricks.
Empirical evaluation includes synthetic high‑dimensional experiments and real‑world binary classification tasks (genomics, text spam detection). EBGLM consistently outperforms classic penalized methods (L1‑lasso, Elastic‑Net, MCP, SCAD) in AUC, accuracy, and F1 score, while reducing computational time by a factor of 5–10 because no cross‑validation is needed. The method also scales well to tens of thousands of predictors, as only the posterior means and a few scalar prior parameters are stored.
The paper’s contributions are fourfold: (1) a novel variational objective that depends solely on posterior means, (2) a principled way to estimate prior hyper‑parameters within the same optimization, (3) a generalizable framework applicable to any exponential‑family GLM, and (4) practical algorithms that leverage existing large‑scale optimizers. By focusing on the posterior mean rather than the MAP, EBGLM provides a Bayes‑optimal estimator for sparse regression, avoids the pitfalls of point‑mass priors that force MAP solutions to zero, and delivers both statistical and computational advantages. The work opens avenues for extending empirical‑Bayes variational methods to other non‑conjugate models, hierarchical priors, and even deep probabilistic networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment