Transportability without Graphs: A Bayesian Approach to Identifying s-Admissible Backdoor Sets
Transporting causal information across populations is a critical challenge in clinical decision-making. Causal modeling provides criteria for identifiability and transportability, but these require knowledge of the causal graph, which rarely holds in practice. We propose a Bayesian method that combines observational data from the target domain with experimental data from a different domain to identify s-admissible backdoor sets, which enable unbiased estimation of causal effects across populations, without requiring the causal graph. We prove that if such a set exists, we can always find one within the Markov boundary of the outcome, narrowing the search space, and we establish asymptotic convergence guarantees for our method. We develop a greedy algorithm that reframes transportability as a feature selection problem, selecting conditioning sets that maximize the marginal likelihood of experimental data given observational data. In simulated and semi-synthetic data, our method correctly identifies transportability bias, improves causal effect estimation, and performs favorably against alternatives.
💡 Research Summary
The paper tackles the problem of transporting causal effects from a source domain (e.g., a randomized controlled trial) to a target domain (e.g., electronic health records) when the underlying causal graph is unknown. Classical transportability theory relies on a known selection diagram and the existence of an s‑admissible back‑door set (sABS) that simultaneously satisfies the back‑door criterion for identifiability and the s‑admissibility condition for transportability. In practice, however, researchers rarely have access to a fully specified causal graph, making it difficult to verify these conditions.
The authors propose a Bayesian framework that circumvents the need for a known graph. They define a binary latent variable (H_Z) indicating whether a candidate covariate set (Z) is an sABS. Using Bayes’ rule, they express the posterior probability (P(H_Z = h_Z \mid D_e, D_o^\ast)) as proportional to the marginal likelihood of the experimental data given the observational data under each hypothesis. The key observation (Proposition 1) is that if (Z) is an sABS, then the post‑intervention distribution in the source domain, (P(Y \mid do(X), Z, s)), equals the conditional distribution in the target observational domain, (P(Y \mid X, Z, s^\ast)). Consequently, the experimental and observational data are compatible under the same parameterization.
To compute the marginal likelihoods, the authors adopt a hierarchical model: under the hypothesis that (Z) is an sABS, the parameters governing the experimental distribution ((\theta_e)) are identical to those estimated from the target observational data ((\theta_{o}^\ast)), allowing the observational data to serve as an informative prior. Under the alternative hypothesis, (\theta_e) and (\theta_{o}^\ast) are independent, and a non‑informative prior is used. Both closed‑form solutions (for conjugate families) and a sampling‑based approximation (Algorithm 1) using MCMC are provided. This formulation naturally balances sample‑size disparities, letting a large observational dataset regularize a smaller experimental sample.
A major theoretical contribution is the proof that any sABS must lie within the Markov boundary of the outcome variable (Y). This dramatically reduces the search space: instead of evaluating all subsets of observed covariates, the algorithm only needs to consider subsets of the Markov boundary, which is the minimal set that renders (Y) conditionally independent of all other variables. The authors then recast the search as a feature‑selection problem, employing a greedy strategy that iteratively adds or removes variables to maximize the posterior probability derived from Equation 4.
The paper includes four illustrative causal graphs (Figure 1) covering cases where an sABS exists, does not exist, or multiple candidate sets are possible. In simulated experiments, the method correctly returns the appropriate set ({Z,W}, NaN, {Z}, or {W}) for each scenario. Further semi‑synthetic experiments compare the proposed approach against baselines that use only experimental data, only observational data, or rely on a known graph. Results show that the Bayesian sABS method achieves unbiased causal effect estimates with substantially lower mean‑squared error, especially when experimental sample sizes are limited.
Assumptions underpinning the method include (1) sABS‑faithfulness, which rules out accidental parameter configurations that could make the equality in Proposition 1 hold for non‑sABS sets, and (2) the existence of a shared causal graph between domains (the authors note this can be relaxed, with details in the supplement). Limitations are acknowledged: the Markov boundary itself is defined relative to a graph, so in practice one must approximate it, and the faithfulness assumption may be violated in pathological data.
In summary, the paper delivers a novel, graph‑free Bayesian procedure for identifying s‑admissible back‑door sets, thereby enabling unbiased transport of causal effects across domains. By limiting the search to the Markov boundary and leveraging the marginal likelihood of experimental data conditioned on observational priors, the method offers both theoretical guarantees (consistency, convergence) and practical advantages (reduced computational burden, robustness to sample‑size imbalance). This contribution advances causal inference in settings where multiple data sources are available but the causal structure is partially or wholly unknown, with immediate relevance to clinical decision‑support and broader applications in transfer learning.
Comments & Academic Discussion
Loading comments...
Leave a Comment