Causal Inference in Biomedical Imaging via Functional Linear Structural Equation Models

Causal Inference in Biomedical Imaging via Functional Linear Structural Equation Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Understanding the causal effects of organ-specific features from medical imaging on clinical outcomes is essential for biomedical research and patient care. We propose a novel Functional Linear Structural Equation Model (FLSEM) to capture the relationships among clinical outcomes, functional imaging exposures, and scalar covariates like genetics, sex, and age. Traditional methods struggle with the infinite-dimensional nature of exposures and complex covariates. Our FLSEM overcomes these challenges by establishing identifiable conditions using scalar instrumental variables. We develop the Functional Group Support Detection and Root Finding (FGS-DAR) algorithm for efficient variable selection, supported by rigorous theoretical guarantees, including selection consistency and accurate parameter estimation. We further propose a test statistic to test the nullity of the functional coefficient, establishing its null limit distribution. Our approach is validated through extensive simulations and applied to UK Biobank data, demonstrating robust performance in detecting causal relationships from medical imaging.


💡 Research Summary

This paper addresses the challenging problem of estimating causal effects of high‑dimensional functional imaging exposures on scalar clinical outcomes, a setting common in modern biomedical research but poorly served by existing methods. The authors introduce the Functional Linear Structural Equation Model (FLSEM), which consists of two linked equations: (i) an outcome model Y = Xᵀβ + ∫ Z(t)B(t)dt + ε, where X denotes a vector of scalar covariates (e.g., genetics, age, sex), Z(t) is a functional exposure (e.g., brain fMRI signal), and B(t) is the causal effect function of interest; and (ii) an exposure model Z(t) = ∑_{ℓ=1}^p f_ℓ(X_ℓ, t) + E(t), allowing each scalar covariate to influence the exposure through a possibly nonlinear function f_ℓ. Endogeneity arises because the error term ε may be correlated with the functional exposure error E(t).

To resolve endogeneity, the authors partition the scalar covariates into four groups following Guo et al. (2018): (C) confounders affecting both outcome and exposure, (P) precision variables affecting only the outcome, (I) valid instrumental variables (IVs) affecting only the exposure, and (S) irrelevant variables. Identification hinges on two key conditions. First, the operator K defined by K B(s) = ∫ E{f(X,s)f(X,t)} B(t) dt must be injective; equivalently, its null space contains only the zero function. This requires that all eigenvalues λ_k of the kernel K(s,t)=E{f(X,s)f(X,t)} be strictly positive. The paper shows that common kernels (Ornstein‑Uhlenbeck, Brownian motion, polynomial and spline kernels) satisfy this condition. Second, in the linear exposure case Z(t)=XᵀC(t)+E(t), sufficient numbers and diversity of IVs shrink the null space of K, ensuring identifiability of B(t).

Variable selection is performed by the Functional Group Support Detection and Root Finding (FGS‑DAR) algorithm. FGS‑DAR first fits a functional‑on‑scalar regression of Z(t) on X within a reproducing kernel Hilbert space (RKHS) framework, applying an L₀ penalty to achieve exact sparsity without the shrinkage bias of L₁/L₂ penalties. This step simultaneously identifies the sets I (IVs) and P (precision covariates). Using the selected IVs, a predicted exposure \hat{Z}(t) is constructed, and the causal effect B(t) is estimated via a partial functional linear model that corrects for endogeneity.

Theoretical contributions are extensive. The authors prove (1) selection consistency: with increasing sample size, FGS‑DAR recovers the true IV and precision sets with probability tending to one; (2) estimation rates: the L₂ error of \hat{B}(t) and the Euclidean error of \hat{β} both converge at the parametric √n‑rate; (3) a null‑hypothesis test for B(t)=0, whose test statistic follows a standard normal (or χ²) asymptotic distribution; and (4) robustness of these results under high‑dimensional X and complex correlation structures.

Simulation studies vary the proportion of IVs, signal‑to‑noise ratios, and functional complexity. Across all scenarios, FGS‑DAR outperforms competing methods such as FPCA‑IV, GMM‑IV, and Tikhonov‑regularized IV estimators in terms of variable‑selection accuracy (higher F1 scores), lower bias, and higher power for testing B(t)=0.

The methodology is applied to UK Biobank data comprising thousands of participants with brain fMRI recordings and Alzheimer’s disease status. After adjusting for age, sex, and millions of genetic variants, FGS‑DAR selects a parsimonious set of IVs and precision covariates. The estimated causal effect B̂(t) reveals a significant positive influence of hippocampal activation patterns on Alzheimer’s risk (p < 0.001), providing the first functional‑imaging‑based causal quantification in this large cohort.

In summary, the paper delivers a comprehensive framework for causal inference with functional exposures: it establishes clear identifiability conditions for infinite‑dimensional treatments, introduces an exact‑sparsity L₀ selection algorithm, and furnishes rigorous asymptotic guarantees. This work sets a new methodological standard for integrating high‑dimensional imaging, genetics, and clinical outcomes, and opens avenues for extensions to nonlinear causal effects, multiple functional exposures, and time‑varying instruments.


Comments & Academic Discussion

Loading comments...

Leave a Comment