The Effect of Alcohol intake on Brain White Matter Microstructural Integrity: A New Causal Inference Framework for Incomplete Phenomic Data
Although substance use, such as alcohol intake, is known to be associated with cognitive decline during aging, its direct influence on the central nervous system remains incompletely understood. In this study, we investigate the influence of alcohol intake frequency on reduction of brain white matter microstructural integrity in the fornix, a brain region considered a promising marker of age-related microstructural degeneration, using a large UK Biobank (UKB) cohort with extensive phenomic data reflecting a comprehensive lifestyle profile. Two major challenges arise: 1) potentially nonlinear confounding effects from phenomic variables and 2) a limited proportion of participants with complete phenomic data. To address these challenges, we develop a novel ensemble learning framework tailored for robust causal inference and introduce a data integration step to incorporate information from UKB participants with incomplete phenomic data, improving estimation efficiency. Our analysis reveals that daily alcohol intake may significantly reduce fractional anisotropy, a neuroimaging-derived measure of white matter structural integrity, in the fornix and increase systolic and diastolic blood pressure levels. Moreover, extensive numerical studies demonstrate the superiority of our method over competing approaches in terms of estimation bias, while outcome regression-based estimators may be preferred when minimizing mean squared error is prioritized.
💡 Research Summary
The paper investigates the causal impact of alcohol consumption frequency on the microstructural integrity of brain white matter, focusing on the fornix—a key tract implicated in memory and highly sensitive to age‑related degeneration. Using the UK Biobank (UKB) cohort, the authors confront two major statistical obstacles: (1) a high‑dimensional set of phenomic covariates that may exert complex, nonlinear confounding effects, and (2) a limited number of participants (3,435) with complete phenomic data, while a much larger auxiliary sample (21,874) lacks most of these covariates. Simply discarding the auxiliary data would waste information, whereas naïvely merging the two datasets could introduce bias due to unmeasured confounding.
To address these challenges, the authors develop a novel causal inference framework that combines (i) a robust ensemble learning approach for estimating the generalized propensity score (PS) and the conditional mean (CM) of the outcome, and (ii) an empirical‑likelihood‑based data‑integration scheme that leverages the auxiliary sample without requiring its covariates. The ensemble, termed “robust causal machine learning (CML)”, incorporates multiple machine‑learning algorithms (e.g., multinomial logistic regression, random forests, gradient boosting) for both PS and CM. Weights ω_i are obtained by solving a constrained optimization problem that forces the weighted averages of all candidate PS and CM estimates to match their sample means (∑ω_i g_i = 0). This calibration guarantees consistency and √n‑rate convergence as long as at least one PS model and one CM model are correctly specified, thereby removing the need for prior knowledge about the best algorithm.
The auxiliary‑data integration treats the missing covariates as an “extreme missing‑data” problem. By constructing informative scores through empirical likelihood using only the observed outcomes Y and exposures X in the auxiliary set, the method injects additional information into the weight‑estimation step, improving efficiency while preserving unbiasedness under the assumption that the main and auxiliary samples share the same underlying distribution.
Theoretical results establish asymptotic normality of the CML estimator and demonstrate that the proposed integration reduces variance relative to using the main data alone. Extensive simulation studies confirm that the CML estimator exhibits the smallest bias among competing methods, whereas outcome‑regression‑based estimators achieve lower mean‑squared error when the true model is close to linear.
Applying the framework to the UKB data, alcohol intake was categorized into six levels (0 = none to 5 = daily). The causal effect estimates reveal that daily alcohol consumption reduces fornix fractional anisotropy (FA) by approximately 0.02–0.03 units and simultaneously raises systolic blood pressure by 3–5 mmHg and diastolic pressure by 2–4 mmHg, relative to non‑drinkers. These findings provide direct evidence that alcohol adversely affects a specific white‑matter tract and cardiovascular parameters, extending prior observational links between alcohol, cognitive decline, and overall brain aging.
In summary, the study contributes (1) a robust, machine‑learning‑driven causal inference tool capable of handling high‑dimensional, nonlinear confounding without a priori algorithm selection, and (2) a principled strategy for incorporating large auxiliary datasets with incomplete covariate information. The methodological advances enable more accurate estimation of causal effects in large observational cohorts, and the substantive results underscore the detrimental impact of regular alcohol consumption on brain microstructure and blood pressure, informing public‑health recommendations and future neuroimaging research.
Comments & Academic Discussion
Loading comments...
Leave a Comment