Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures

Bayesian Kernel Machine Regression via Random Fourier Features for Estimating Joint Health Effects of Multiple Exposures
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Environmental epidemiology has traditionally examined single exposure one at a time. Advances in exposure assessment and statistical methods now enable studies of multiple exposures and their combined health impacts. Bayesian Kernel Machine Regression (BKMR) is a widely used approach to flexibly estimates joint, nonlinear effects of multiple exposures. But BMKR is computationally intensive for large datasets, as repeated kernel inversion in Markov chain Monte Carlo (MCMC) can be time-consuming and often infeasible in practice. To address this issue, we propose using supervised random Fourier basis functions to replace the Gaussian process random effects. This re-frames the kernel machine regression into a linear mixed-effect model that facilitates computationally efficient estimation and prediction. Bayesian inference is conducted using MCMC with Hamiltonian Monte Carlo algorithms. Simulation studies demonstrate that our method yields results comparable to BKMR while significantly reduces the computation time. Our approach outperforms BKMR when the exposure-response surface has stronger dependency and when using predictive process as an alternative approximation method. Finally, we applied this approach to analyze over 270,000 birth records, examining associations between multiple ambient air pollutants and birthweight in Georgia.


💡 Research Summary

This paper addresses the computational bottleneck of Bayesian Kernel Machine Regression (BKMR), a popular method for estimating joint, nonlinear health effects of multiple environmental exposures. Traditional BKMR treats the exposure‑response surface as a realization of a Gaussian Process (GP) and requires repeated inversion of an n × n kernel matrix during Markov chain Monte Carlo (MCMC) sampling. As the sample size grows beyond a few thousand, the memory and time demands become prohibitive, limiting BKMR’s use in large administrative health databases.

The authors propose “Fast BKMR,” which replaces the GP random effects with supervised Random Fourier Features (RFF). By invoking Bochner’s theorem, any stationary positive‑definite kernel (here the Gaussian kernel) can be expressed as the Fourier transform of its spectral density. Sampling a finite set of frequencies ω₁,…,ω_J from the kernel’s spectral density (a multivariate normal with covariance Σ determined by the kernel length‑scale parameters θ₁,…,θ_M) yields basis functions cos(ω_jᵀx) and sin(ω_jᵀx). The GP is then approximated as a linear combination of these basis functions with random coefficients a_j and b_j, each assigned a normal prior. This reformulation converts the original infinite‑dimensional GP into a finite‑dimensional linear mixed‑effects model, eliminating the need to compute or invert the full kernel matrix.

Bayesian inference proceeds via Hamiltonian Monte Carlo (HMC), which efficiently explores the high‑dimensional posterior of all model parameters: regression coefficients for confounders (γ), random‑effect coefficients (a_j, b_j), frequencies (ω_j), kernel hyper‑parameters (θ_m), and variance components (σ², τ²). The HMC algorithm uses a leapfrog integrator with adaptive step‑size tuning to maintain an acceptance rate between 65 % and 85 %. θ_m and variance components are updated with Gibbs steps, while (γ, a_j, b_j) and ω_j are updated in separate HMC blocks. The number of Fourier features J is selected using the Watanabe‑Akaike Information Criterion (WAIC), which balances model fit and complexity.

Simulation studies explore a range of scenarios: numbers of exposures M = 2, 5, 10; sample sizes N = 200, 500, 1 000, 5 000, 10 000; and two correlation regimes for the true exposure‑response surface (strong vs. weak). The authors also examine kernel misspecification (using absolute‑value instead of squared‑distance kernels). For each setting, they compare Fast BKMR to standard BKMR and to BKMR with a predictive process (PP) low‑rank approximation. Results show that when the true surface is strongly correlated, Fast BKMR with modest J (20–200) achieves lower root‑mean‑square error (RMSE) than full BKMR while reducing computation time by an order of magnitude. In weak‑correlation settings, increasing J improves accuracy, and Fast BKMR still outperforms the PP approach. Even under kernel misspecification, Fast BKMR’s performance degrades less than the PP method.

The method is applied to a real‑world dataset of over 270 000 births in Georgia, examining the joint impact of several ambient air pollutants (PM₂.₅, NO₂, O₃, SO₂, CO) on birthweight. Fast BKMR successfully captures nonlinear joint effects, identifies exposure thresholds, and produces interpretable exposure‑response surfaces. The full analysis completes in a few hours, whereas a comparable BKMR analysis would require many tens of hours or be infeasible.

Key contributions of the paper include: (1) a supervised RFF construction that learns optimal frequencies from the data, (2) a reformulation of BKMR as a computationally tractable linear mixed model, (3) integration of HMC for efficient posterior sampling, and (4) a data‑driven WAIC‑based selection of the number of Fourier features. Limitations noted by the authors involve the need for careful choice of J, potential sensitivity to HMC tuning, and the current focus on Gaussian kernels only. Future work is suggested on adaptive J selection, extensions to non‑Gaussian kernels (e.g., Matérn), hierarchical exposure structures, and GPU‑accelerated HMC for million‑scale datasets.

Overall, Fast BKMR demonstrates that random‑feature approximations can retain the flexibility of Bayesian kernel methods while making them viable for large‑scale environmental epidemiology studies, opening the door to more comprehensive assessments of complex exposure mixtures.


Comments & Academic Discussion

Loading comments...

Leave a Comment