Parameter identification in linear non-Gaussian causal models under general confounding

Parameter identification in linear non-Gaussian causal models under general confounding
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Linear non-Gaussian causal models postulate that each random variable is a linear function of parent variables and non-Gaussian exogenous error terms. We study identification of the linear coefficients when such models contain latent variables. Our focus is on the commonly studied acyclic setting, where each model corresponds to a directed acyclic graph (DAG). For this case, prior literature has demonstrated that connections to overcomplete independent component analysis yield effective criteria to decide parameter identifiability in latent variable models. However, this connection is based on the assumption that the observed variables linearly depend on the latent variables. Departing from this assumption, we treat models that allow for arbitrary non-linear latent confounding. Our main result is a graphical criterion that is necessary and sufficient for deciding the generic identifiability of direct causal effects. Moreover, we provide an algorithmic implementation of the criterion with a run time that is polynomial in the number of observed variables. Finally, we report on estimation heuristics based on the identification result and explore a generalization to models with feedback loops.


💡 Research Summary

This paper addresses the problem of identifying direct causal effects in linear non‑Gaussian structural equation models (SEM) when latent variables induce arbitrary, potentially nonlinear confounding. While previous work on linear non‑Gaussian causal models has relied on the assumption that observed variables depend linearly on latent variables—thereby enabling connections to overcomplete independent component analysis (OICA)—the authors relax this restriction and allow the latent confounding to be completely nonlinear. The statistical model is represented by an acyclic directed mixed graph (ADMG) G = (V, E→, E↔), where each observed variable Xv satisfies Xv = Σ_{w→v} λ_{wv} Xw + εv. The error vector ε follows the connected‑set Markov property with respect to the bidirected subgraph G↔, permitting arbitrary dependence among errors that are linked by bidirected edges.

The central notion introduced is generic identifiability: a parameter (or a set of parameters) is said to be generically identifiable if, for almost all choices of the coefficient matrix Λ (outside a Lebesgue‑measure‑zero set) and for any error distribution satisfying a mild non‑Gaussianity condition (Assumption 1), the observed joint distribution of X uniquely determines the parameter. Assumption 1 is a Darmois‑Skitovich‑type condition stating that any two linear combinations of the errors are independent only when the corresponding coefficient vectors have disjoint support on variables that are directly linked by a bidirected edge. This assumption holds for all non‑Gaussian distributions that are not degenerate in a specific low‑dimensional sense.

The main theoretical contribution is a necessary and sufficient graphical criterion for generic identifiability of a direct effect λ_{wv}. The authors introduce the concept of removable ancestors Rv, defined as ancestors of v whose sibling sets do not contain any node outside the sibling set of v. Using this notion, they derive a linear system that characterizes all alternative coefficient matrices that could generate the same observed distribution. The key result states that λ_{wv} is generically identifiable if and only if (i) v is not a removable ancestor of u, and (ii) there exists a non‑intersecting system of directed paths from u to v that respects the ancestral relations of G (formally, a non‑intersecting system Π ∈ ˜P({u},{v}) exists). This condition subsumes classical instrumental‑variable (IV) criteria and the “bow‑free” condition, but is strictly more general because it does not require the latent confounding to be linear.

On the algorithmic side, the paper presents a polynomial‑time procedure (roughly O(p³) where p = |V|) to test the graphical criterion. The algorithm proceeds by (a) computing the sets of removable ancestors for each node, (b) constructing candidate path systems using maximum matching in bipartite graphs, and (c) checking for the existence of non‑intersecting directed paths. The authors prove that the algorithm correctly decides generic identifiability for every edge in the ADMG.

Section 5 delves into the genericity assumption. By truncating cumulants at arbitrary order, the authors show that the set of error distributions violating Assumption 1 has lower dimensionality in two natural families: (i) distributions with finite moments up to any fixed order, and (ii) distributions arising from linear latent confounding. Hence, the genericity condition is satisfied for “almost all” non‑Gaussian error laws encountered in practice.

The paper also sketches an extension to cyclic (feedback) models. In this setting, the matrix I − Λ may not be automatically invertible; the authors assume invertibility and adapt the graphical condition accordingly. Full treatment of cyclic models is left for future work.

For estimation, the authors propose a heuristic that first recovers residuals (estimates of ε) using ICA‑type methods, then tests independence among these residuals to enforce the constraints implied by the identified graph. The coefficient matrix Λ is then obtained by solving a constrained optimization problem that matches the empirical covariances while respecting the identified zero‑patterns. Simulation studies on synthetic ADMGs with varying sizes, edge densities, and non‑Gaussian error distributions demonstrate that the proposed estimator attains low bias and variance, outperforming OICA‑based methods especially when latent confounding is nonlinear.

In conclusion, the paper delivers a complete graphical characterization of generic identifiability for direct causal effects in linear non‑Gaussian SEMs with arbitrary latent confounding, an efficient algorithm to verify the condition, and a practical estimation framework. It bridges a gap between the rich theory of ICA‑based identifiability (which assumes linear latent effects) and the more realistic setting where latent confounders may act nonlinearly. Future directions include a full treatment of cyclic graphs, scalable algorithms for very large networks, and refined statistical tests for the non‑Gaussianity assumption.


Comments & Academic Discussion

Loading comments...

Leave a Comment