On the identifiability of causal graphs with multiple environments
Causal discovery from i.i.d. observational data is known to be generally ill-posed. We demonstrate that if we have access to the distribution {induced} by a structural causal model, and additional data from (in the best case) \textit{only two} environments that sufficiently differ in the noise statistics, the unique causal graph is identifiable. Notably, this is the first result in the literature that guarantees the entire causal graph recovery with a constant number of environments and arbitrary nonlinear mechanisms. Our only constraint is the Gaussianity of the noise terms; however, we propose potential ways to relax this requirement. Of interest on its own, we expand on the well-known duality between independent component analysis (ICA) and causal discovery; recent advancements have shown that nonlinear ICA can be solved from multiple environments, at least as many as the number of sources: we show that the same can be achieved for causal discovery while having access to much less auxiliary information.
💡 Research Summary
The paper tackles the fundamental problem of causal graph identifiability for structural causal models (SCMs) that may contain arbitrary nonlinear mechanisms and multivariate noise. Classical results show that with only i.i.d. observational data the causal directed acyclic graph (DAG) is generally non‑identifiable because many DAGs can generate the same joint distribution. Existing solutions rely on hard interventions that directly modify the causal structure, or on a large number of soft interventions (different “environments”) that change the distribution of the noise terms. In the nonlinear setting, however, prior work typically requires a number of environments that grows at least linearly with the number of variables, making the approach impractical for high‑dimensional problems.
The authors propose a dramatically more efficient identifiability result: under mild assumptions, only two environments—the original observational setting and a single auxiliary environment with sufficiently different noise statistics—are enough to uniquely recover the entire causal graph, regardless of the number of variables. The key technical insight is to view an SCM as an Independent Component Analysis (ICA) model: the observed variables X are generated by a diffeomorphic mixing function f applied to independent latent noise sources S, i.e., X = f(S). In ICA, full identifiability requires recovering the exact mixing matrix (or function) at every point, which generally needs many environments. For causal discovery, however, only the support of the inverse Jacobian J_{f^{-1}} (i.e., which entries are zero) matters, because a zero entry corresponds to the absence of a causal edge. This support can be inferred from a single point where the Jacobian is faithful.
The paper’s formal assumptions are:
- f is a global diffeomorphism and twice differentiable.
- Each auxiliary environment rescales the original noise vector S by a diagonal matrix L_i (different variances per variable). For every variable j there exists at least one environment where the scaling factor λ_{ij}=0, effectively “turning off” the noise for that variable.
- The inverse Jacobian is faithful at the mean of S (the usual genericity condition).
- The latent noise S follows a multivariate Gaussian distribution.
Under these conditions, the authors derive the following argument. In any environment i the observed density can be written as
p_i(x) = p_θ(L_i^{-1}s)·|det J_{f^{-1}}(x)|,
where s = f^{-1}(x) and p_θ is the Gaussian density of S. Because the Gaussian log‑density depends only on the quadratic form sᵀ Σ_i^{-1} s (with Σ_i = L_i Σ L_iᵀ), the difference of log‑densities between two environments isolates the term involving Σ_i. By selecting environments where a particular λ_{ij}=0, the corresponding variance component disappears, allowing the researcher to detect which entries of J_{f^{-1}} are zero. In other words, the pattern of variance changes across the two environments directly reveals the zero‑pattern of the inverse Jacobian, which is exactly the adjacency matrix of the causal DAG (up to a permutation that can be removed because the graph is acyclic).
The authors also consider an alternative ICA representation (b_f, p̂_θ) that yields the same observed distribution. Defining the indeterminacy map h = b_f^{-1}∘f, they show that J_h must be a scaled permutation matrix at some point, implying that the supports of J_{f^{-1}} and J_{b_f^{-1}} coincide. Since permutations correspond to re‑ordering of variables, which does not affect the DAG structure, the causal graph is uniquely identified.
Empirically, the paper validates the theory on synthetic data. They generate random nonlinear mechanisms using neural networks, draw Gaussian noise, and construct two environments with distinct diagonal scaling matrices satisfying the λ_{ij}=0 condition for each variable. When the assumptions hold, the proposed method recovers the exact DAG with 100 % accuracy, even in cases where previous methods fail (e.g., when only variance shifts are present but the number of environments is minimal). They also explore relaxing the Gaussian assumption by using mixtures of Gaussians and higher‑order moment techniques, showing promising preliminary results.
The contributions are threefold: (1) a novel identifiability theorem proving that constant‑size auxiliary data (as few as two environments) suffices for full graph recovery in nonlinear SCMs; (2) a methodological bridge that leverages ICA‑causal duality but demonstrates that causal discovery is strictly easier than full ICA because only the Jacobian support is needed; (3) experimental evidence confirming the theory and outlining pathways to weaken the Gaussian noise requirement.
Overall, the work suggests that natural variations in noise variance—common in many scientific domains (e.g., sensor calibration changes, batch effects, experimental condition shifts)—can be harnessed to infer causal structure without costly interventions. This opens a new, practically feasible avenue for causal discovery in complex, high‑dimensional systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment