High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation

High-Dimensional Mediation Analysis for Generalized Linear Models Using Bayesian Variable Selection Guided by Mediator Correlation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

High-dimensional mediation analysis aims to identify mediating pathways and to estimate indirect effects linking an exposure to an outcome. In this paper, we propose a Bayesian framework to address key challenges in these analyses, including high dimensionality, complex dependence among omics mediators, and non-continuous outcomes. Furthermore, commonly used approaches assume independent mediators or ignore correlations in the selection stage, which can reduce power when mediators are highly correlated. Addressing these challenges leads to a non-Gaussian likelihood and specialized selection priors, which in turn require efficient and adaptive posterior computation. Our proposed framework selects active pathways under generalized linear models while accounting for mediator dependence. Specifically, the mediators are modeled using a multivariate distribution, exposure-mediator selection is guided by a Markov random field prior on inclusion indicators, and mediator-outcome activation is restricted to mediators supported in the exposure-mediator model through a sequential subsetting Bernoulli prior. Simulation studies show improved operating characteristics in correlated-mediator settings, with appropriate error control under the global null and stable performance under model misspecification. We illustrate the method using real-world metabolomics data to study metabolites that mediate the association between adherence to the Alternate Mediterranean Diet score and two cardiometabolic outcomes.


💡 Research Summary

This paper introduces a Bayesian framework for high‑dimensional mediation analysis that simultaneously handles correlated omics mediators and non‑continuous outcomes. Traditional methods either assume independent mediators or treat variable selection and effect estimation as separate steps, which leads to loss of power when mediators are highly correlated and provides inadequate uncertainty quantification. The authors address these shortcomings by (1) modeling the q‑dimensional mediator vector M with a multivariate normal distribution whose residual covariance Σ is parameterized through a parsimonious factor‑analytic (FA) structure, thereby capturing realistic correlation patterns while keeping the model computationally tractable; (2) placing a Markov random field (MRF) prior on the binary inclusion indicators γ for the exposure‑mediator paths (τ). The MRF prior links each γj to its neighbors via the empirical correlations rjl, with hyperparameters η1γ (baseline sparsity) and η2γ (correlation‑driven smoothness). When η2γ>0, highly correlated mediators tend to be selected together, improving power without inflating false discovery rates.

For the mediator‑outcome paths (δ), the authors introduce a sequential subsetting Bernoulli (SSB) prior that conditions the inclusion indicator ωj on γj: ωj can be 1 only if the corresponding exposure‑mediator link is active. This hierarchical construction eliminates unnecessary parameters, reduces computational burden, and ensures that only mediators with a confirmed exposure effect are examined for outcome effects. Both τ and δ receive spike‑and‑slab priors, while regression coefficients for covariates and intercepts are given multivariate normal priors.

The outcome model is a generalized linear model (GLM) with a link function g(·) appropriate for the exponential family. The authors focus on binary outcomes using the logit link, but the framework readily extends to probit, Poisson, or multinomial responses. Because the likelihood is non‑Gaussian, conventional Gibbs sampling is infeasible. Instead, the posterior is explored using Hamiltonian Monte Carlo (HMC) with the No‑U‑Turn Sampler (NUTS), which efficiently traverses the high‑dimensional space and accommodates the complex MRF and SSB priors.

Simulation studies explore a range of scenarios: varying degrees of mediator correlation (ρ = 0, 0.3, 0.5), signal strength, and model misspecification (e.g., insufficient factor rank). Results demonstrate that the proposed method maintains false discovery rate (FDR) control at the nominal level under the global null, while achieving substantially higher true positive rates (TPR) than methods assuming independent mediators. The approach also shows robustness to modest misspecification of the covariance structure and to different hyper‑parameter settings, provided η2γ is chosen within a stable “phase‑transition” region identified via diagnostic plots.

The methodology is applied to metabolomics data from the Health Professionals Follow‑up Study (HPFS) and Nurses’ Health Study II (NHSII). The exposure is an Alternate Mediterranean Diet score, the outcome is a binary cardiometabolic risk indicator, and 298 plasma metabolites serve as candidate mediators. Pairwise correlations among metabolites are substantial (mean |r| ≈ 0.16, with >5 % exceeding 0.5), justifying the need for correlation‑aware selection. The proposed model identifies a concise set of metabolites whose exposure‑mediator and mediator‑outcome paths are both active, revealing biologically plausible mediation pathways that would be missed by independent‑mediator approaches.

In summary, the paper contributes four key innovations: (i) a factor‑analytic representation of mediator covariance; (ii) an MRF prior that leverages empirical mediator correlations for joint variable selection; (iii) a sequential subsetting prior that enforces logical consistency between exposure‑mediator and mediator‑outcome links while improving computational efficiency; and (iv) an HMC‑based posterior algorithm that extends Bayesian high‑dimensional mediation analysis to generalized linear outcomes. The work offers a practical, statistically rigorous solution for contemporary omics studies where both high dimensionality and complex outcome types are the norm, and it sets the stage for future extensions to survival, longitudinal, and multi‑outcome mediation frameworks.


Comments & Academic Discussion

Loading comments...

Leave a Comment