Causal Network Discovery from Interventional Count Data with Latent Linear DAGs

The increasing availability of interventional data offers new opportunities for causal discovery, with gene perturbation studies providing a prominent example. Such data are typically count-valued and subject to substantial measurement error arising …

Authors: Yijiao Zhang, Hongzhe Li

Causal Network Discovery from Interventional Count Data with Latent Linear DAGs
Causal Net w ork Disco v ery from In terv en tional Coun t Data with Laten t Linear D A Gs Yijiao Zhang and Hongzhe Li ∗ Departmen t of Biostatistics, Epidemiology and Informatics Univ ersit y of P ennsylv ania Abstract The increasing a v ailability of in terven tional data offers new opp ortunities for causal disco very , with gene p erturbation studies pro viding a prominent example. Suc h data are t ypically count-v alued and sub ject to substan tial measurement error arising from tec hnical v ariability and laten t state heterogeneity . Motiv ated by these chal- lenges, w e study iden tification and estimation in laten t linear structural causal mo dels for in terven tional coun t data. W e prop ose a laten t linear Gaussian directed acyclic graph (DA G) mo del with Poisson measuremen t error that explicitly separates the laten t causal structure from the observ ed coun ts. Under a mean-shift interv ention design, w e establish p opulation-lev el identifiabilit y of the laten t causal DA G. Building on these identification results, we dev elop an estimation pro cedure based on sparse in verse matrix estimation and provide theoretical guarantees on estimation error and finite-sample causal discov ery . Sim ulation studies and applications to Perturb-seq data demonstrate the practical effectiv eness of the prop osed metho d. Keywor ds: Causal disco v ery; Measuremen t error; P oisson mo del; Perturb-seq data. ∗ Corresp onding author. Yijiao Zhang is a p ostdo ctoral fello w and Hongzhe Li is Perelman Professor in the Department of Biostatistis, Epidemiology and Informatics at the Universit y of Pennsylv ania. 1 1 In tro duction Understanding causal relationships among v ariables is a cen tral problem across many sci- en tific disciplines, including biology , medicine, and economics. Directed acyclic graphs (D A Gs) provide a widely used framew ork for represen ting causal systems. Ho w ever, ob- serv ational data alone are generally insufficient to iden tify the underlying causal D AG. In general, they determine only a Marko v equiv alence class, within which the directions of edges that are crucial for causal in terpretation are not uniquely identifiable. The increasing av ailability of in terv entional data offers new opp ortunities for identi- fying causal graphs. A prominent example arises in gene regulatory netw ork inference, where recen t CRISPR-based p erturbation technologies (Shalem et al., 2014) hav e enabled P erturb-seq exp erimen ts that combine p o oled gene p erturbations with single-cell RNA se- quencing, yielding high-throughput interv entional expression data with known targets at single-cell resolution (Replogle et al., 2022). This exp erimen tal design pro vides rich inter- v en tional information for studying causal regulatory relationships. Despite these adv antages, analyzing P erturb-seq data p oses several substantial c hal- lenges. First, gene expression measurements in single-cell p erturbation exp eriments are inheren tly coun t-v alued, reflecting sequencing read counts rather than contin uous expres- sion levels. Most existing in terven tional causal discov ery metho ds rely on Gaussian or con tin uous appro ximations and are therefore not w ell suited for such data. Additionally , observ ed coun ts exhibit substan tial technical noise, arising from library size effects and exp erimen tal batch effects, whic h induces non-negligible measurement error relative to the underlying latent expression. Ignoring this measuremen t structure can lead to biased in- ference and misleading causal conclusions (Zhang et al., 2018; Sark ar and Stephens, 2021), y et it is rarely addressed in existing in terven tional metho ds. 2 Moreo v er, P erturb-seq data may exhibit unmeasured confounding arising from latent cellular states and shared regulatory programs, which can sim ultaneously influence multiple genes. Such laten t confounding induces spurious dep endencies among genes and violates the causal sufficiency or indep endent noise assumptions commonly adopted by existing in ter- v en tional causal disco v ery metho ds, p otentially leading to biased DA G structure learning. F urthermore, in Perturb-seq experiments, the n um b er of cells av ailable for eac h individ- ual p erturbation is often limited. In con trast, identifiabilit y results for man y in terven tional- data-based metho ds are t ypically established under infinite-sample assumptions, and their practical performance relies on stable estimation of score functions. Such estimation b e- comes particularly unstable in high-dimensional settings with small p er-interv ention sample sizes, limiting the applicability of existing metho ds to realistic P erturb-seq data. Our contribution Motiv ated by these considerations, we fo cus on the problem of iden- tification and estimation of latent linear causal D A G mo dels for in terv entional count data with measuremen t errors. Our main contributions are threefold. Metho dologically , w e prop ose a laten t linear Gaussian DA G mo del with P oisson mea- suremen t error, which decouples the causal structure among laten t expression lev els from the observed measurement pro cess. Under a mean-shift interv ention design in whic h each gene is p erturb ed at least once, we establish p opulation-lev el identifiabilit y of the laten t causal D AG. Our iden tification results do not rely on the causal faithfulness assumption, whic h can b e restricted in high-dimensions (Uhler et al., 2013). The prop osed metho d w orks under soft interv entions and allow for the presence of latent confounding. Computationally , building on these identifiabilit y results, w e dev elop a new estimation pro cedure for the latent causal co efficient matrix that integrates ideas from sparse inv erse matrix estimation with explicit D AG constrain ts. The resulting optimization problem is 3 efficien tly solv ed using the alternating direction metho d of multipliers. Theoretically , we derive non-asymptotic estimation error b ounds for the prop osed es- timator of the causal co efficient matrix. Under a b eta-min type condition on the inter- v en tional strength and signal strength on the true edges, w e c haracterize the finite-sample D A G discov ery rate and quantify the probability of exact D A G reco very . Related W ork Belo w, we pro vide a non-exhaustive review of the most closely related w ork on causal disco very . F or more comprehensive surveys, see (Heinze-Deml et al., 2018; Squires and Uhler, 2023). Causal discov ery from observ ational data has b een extensively studied via c onstr aint- b ase d and sc or e-b ase d metho ds. Constr aint-b ase d metho ds, suc h as the PC algorithm (Spirtes et al., 2000), rely on conditional indep endence testing and typically rely on strong faithfulness assumptions. Sc or e-b ase d methods formulate DA G learning as the optimiza- tion of a scoring function, ranging from greedy search (Chick ering, 2002) to con tinuous optimization (Zheng et al., 2018b; Bello et al., 2022). These approaches av oid faithfulness assumptions and instead require b eta-min conditions for consistent reco v ery (v an de Geer and B ¨ uhlmann, 2013). Purely observ ational data identify a Mark ov equiv alence class, unless additional distributional assumptions are imp osed, suc h as non-gaussian errors (Shimizu et al., 2006) or equal error v ariances (P eters and B ¨ uhlmann, 2014). In terv entional data provide information to break Marko v equiv alence and enable full D A G iden tification. When in terv entional targets are kno wn, Hauser and B ¨ uhlmann (2012) prop osed a score-based approac h that iden tifies D AGs up to an in terv en tional Mark ov equiv alence class. W ang et al. (2017) developed a hybrid metho d that alternates b et ween structure optimization and conditional indep endence testing, while Xue et al. (2025) ex- tended con tin uous score-based optimization to in terven tional settings. These metho ds do 4 not allow latent confounding and rely on har d interv entions, under which the dep endence of the interv ention target on its paren ts is remov ed. Related work considering unkno wn in terv ention targets includes (Squires et al., 2020; Li et al., 2025; V arici et al., 2025). Sev eral lines of w orks study causal discov ery with latent confounding. A classical line of work studies causal disco very ov er acyclic directed mixed graphs (ADMGs), whic h en- co de conditional independence on the observ ed margin. Represen tativ e approac hes in- clude constraint-based metho ds such as FCI and its v arian ts (Spirtes et al., 2000; Colombo et al., 2012), score-based metho ds based on contin uous optimization (Bhattachary a et al., 2021), and approaches exploiting non-Gaussianit y (W ang and Drton, 2023). Another line of w ork imp oses structured assumptions on the error cov ariance and p erforms deconfounding through precision matrix decomp osition (F rot et al., 2019; Pal et al., 2025). Our metho d accommo dates laten t confounding without explicitly mo deling the confounding effects. Most existing metho ds are not tailored to count-v alued data. Sev eral works study causal disco v ery for observ ational P oisson mo dels (Park and Raskutti, 2015; P ark and P ark, 2019) and zero-inflated Poisson mo dels (Choi and Ni, 2023), but they do not accoun t for laten t confounding or measurement error. A smaller b o dy of work considers DA G learning in the presence of measurement error (Zhang et al., 2018; Dai et al., 2022; Saeed et al., 2020), whic h is largely constraint-based and restricted to observ ational settings. Organization The remainder of the pap er is organized as follows. W e introduce the mo del, identifiabilit y results, and estimation pro cedure in Sections 2 - 4. Section 5 de- v elops theoretical guarantees, including estimation error b ounds and finite-sample DA G iden tification results. Sim ulation studies are rep orted in Section 6, follo wed b y real data applications in Section 7. Section 8 concludes with a discussion. 5 2 Statistical Mo del for P erturb-seq Single Cell RNA- seq Data W e first introduce some basic notations. Denote r p s “ 1 , . . . , p. F or a matrix B “ p B j k q P R p ˆ p , w e define the elemen twise max norm } B } max “ max j,k | B j k | , the sp ectral norm } B } 2 “ sup } x } 2 “ 1 } B x } 2 , matrix ℓ 1 norm } B } L 1 “ max 1 ď j ď q ř p i “ 1 | a ij | , the element wise ℓ 1 norm } B } 1 “ ř p i “ 1 ř p j “ 1 | a ij | , and the F robenius norm } B } F “ p ř j,k B 2 j k q 1 { 2 . W e denote by } B } 0 the n um b er of nonzero en tries in B , i.e., } B } 0 “ |tp j, k q : B j k ‰ 0 u| . W e consider gene expression count data collected under m ultiple interv entional environ- men ts generated b y CRISPR p erturbations. Let X p m q “ p X p m q 1 , . . . , X p m q p q J P R p denote the observ ed gene expression v ector in environmen t m P t 0 , 1 , . . . , M u , where m “ 0 corre- sp onds to the control (non-interv ened) condition. Each interv entional environmen t m ą 0 targets a single gene for p erturbation, and we denote by I p m q “ ℓ m P r p s the index of the p erturb ed gene in en vironmen t m . T o separate the measuremen t pro cess from the underlying expression pro cess in single- cell RNA sequencing data (Sark ar and Stephens, 2021), w e mo del the observed gene expres- sion coun ts through an explicit measuremen t lay er and a latent expression la yer. Sp ecifi- cally , for eac h environmen t m , the observ ed count X p m q j of gene j is mo deled as a Poisson random v ariable conditional on the library size L P R , the cell-level cov ariates C P R q , and the laten t true expression level Z p m q j : X p m q j | L, C , Z p m q j „ Poisson ` L exp t s j p C q ` Z p m q j u ˘ , j “ 1 , . . . , p. (1) Here, L accoun ts for sequencing depth, while s j p C q P R captures the systematic effects of observ able co v ariates such as batc h effects that ma y effect the observed sequence coun ts for gene j . After adjusting for technical effects and observ able cell heterogeneity , Z p m q j 6 represen ts the latent expression level. W e assume that Z p m q K p C , L q for all m P t 0 , . . . , p u , implying that the sequencing depth and p ossible batch effects are indep endent of the ob- serv ed sequencing coun ts. T o characterize the laten t causal structure underlying gene expression, we consider a linear Gaussian structural causal mo del (SCM) on the laten t expression vector Z p m q “ p Z p m q 1 , . . . , Z p m q p q J P R p for eac h en vironmen t m P t 0 , . . . , p u . Sp ecifically , we consider Z p m q “ AZ p m q ` η p m q ` ε p m q , ε p m q „ N ` 0 , Σ p m q e ˘ , p m “ 0 , . . . , p q , (2) η p m q “ η p 0 q ` α m e ℓ m , p m “ 1 , . . . , p q (3) where A P R p ˆ p is the causal co efficient matrix, η p m q P R p is an en vironmen t-sp ecific in tercept, α m P R is a scalar and e j P R p denotes the j -th canonical basis v ector. The sample size under the m -th en vironment is denoted b y n m . The shared co efficien t matrix A in (2) induces a D AG G “ pr p s , E q on the v ertex set r p s “ t 1 , . . . , p u , where the edge set is defined as E “ tp i, j q : A j i ‰ 0 u . In particular, a nonzero entry A j i corresp onds to a directed edge i Ñ j in G . W e assume that A is suc h that G is acyclic. Th us, A represen ts the laten t causal mec hanism and is assumed to be in v arian t across en vironments. More generally , for an y matrix A P R p ˆ p , we denote b y G p A q the directed graph induced b y A under this rule. In addition, model (2) assumes a me an shift b et ween the in terven tional and control en vironmen ts, captured by the con trast of the intercepts η p m q ´ η p 0 q . The exogenous noise ε p m q is mo deled as Gaussian with en vironment-specific cov ariance Σ p m q e , which ma y b e non- diagonal, allo wing p ossible latent confounding. Equation (3) further imp oses a one-sp arse mean-shift structure: only the intercept of the targeted gene ℓ m is directly shifted b y an amoun t α m , where | α m | quan tifies the interv ention strength in en vironment m . The acyclicit y of A means there exists a p erm utation matrix P (a top ological order of 7 G ) such that P J AP is strictly low er triangular. In particular, I ´ A is in vertible and hence w e define B : “ p I ´ A q ´ 1 . Under this representation, the laten t v ariables satisfy Z p m q “ B ε p m q ` µ p m q , µ p m q : “ E r Z p m q s “ B η p m q . (4) Equation (4) shows that the one-sparse shift in η p m q can propagate through the causal system and induce dense changes in do wnstream genes. W e briefly discuss the plausibility of the mo deling assumptions ab o ve and their con- nections to the literature. The in v ariance of A across environmen ts reflects the princi- ple of in v arian t causal mechanisms under interv entions, which is a standard premise in in terv ention-based causal disco v ery (Shen et al., 2025; Li et al., 2025). Mean-shift in terven tions constitute a class of soft interv entions that mo dify the condi- tional distribution of an interv ened target through shifts in its mean while preserving its dep endence on paren t v ariables, and ha v e b een widely used in interv entional causal dis- co v ery (Liu et al., 2025; Li et al., 2025). In con trast, hard or do-interv entions commonly assumed in existing interv entional causal discov ery metho ds (Hauser and B ¨ uhlmann, 2012; W ang et al., 2017; Xue et al., 2025) completely remov e the dep endence of the target on its paren ts. Soft in terven tions are often more realistic in practice, as biological p erturbations are typically partial and noisy rather than fully disruptive. The one-sparse form in (3) can b e seen as an sp ecial case of the so-called ‘sparse mechanism shift h yp othesis’ in (Sch¨ olk opf et al., 2021) and was similarly used in (Li et al., 2025). Moreo v er, the deterministic mean-shift assumption can b e generalized to a stochastic mean shift by introducing a random v ector ˜ η p m q P R p with mean η p m q , p ossibly allowing mild correlation with the error term. Our theoretical results extend naturally to this setting, since the iden tification relies only on the first-order momen t of ˜ η p m q . This stochastic form ulation is more realistic in practice, as it captures heterogeneous p erturbation effects 8 across cells arising from v ariable p erturbation efficiency and incomplete p enetrance. 3 Iden tification of the D A G Under Unmeasured Con- founders Under linear Gaussian SCMs, observ ational data alone only identify the causal structure up to a Marko v equiv alence class. By con trast, the in v ariance of the causal mechanism A across en vironmen ts, together with mean-shifts in the latent expression mo del, enables iden tification of the underlying causal structure through cross-en vironment mean c hanges. A k ey observ ation is that the mean difference b et ween en vironment m and the control is ∆ µ p m q : “ µ p m q ´ µ p 0 q “ α m B e ℓ m , (5) whic h implies that ∆ µ p m q equals the ℓ m -th column of B up to a scalar factor. Moreo ver, under acyclicity , it is easy to verify that P J B P is unit low er triangular. This unit-diagonal constrain t fixes the scale and sign of each column of B , so that the ℓ m -th column of B is iden tifiable from ∆ µ p m q whenev er α m ‰ 0. Provided that for each no de ℓ P r p s there exists at least one environmen t m with ℓ m “ ℓ and α m ‰ 0, stac king the recov ered columns yields iden tifiabilit y of B and hence of the causal mechanism A . It therefore suffices to iden tify the laten t means t µ p m q u M m “ 0 from the observ ed data. Under the Poisson–lognormal observ ation mo del, momen ts of the observed counts X p m q admit an explicit mapping to moments of the latent expression v ariables Z p m q . Sp ecifically , letting S j p C, L q “ L exp p s j p C qq , the first- and second-order momen ts of the scaled counts Y p m q j “ X p m q j { S j p C, L q are link ed to the momen ts of the latent Gaussian v ariable Z p m q j via µ p m q j “ log ` E r Y p m q j s ˘ ´ 1 2 Σ p m q j j , (6) 9 where Σ p m q j j “ log ˜ E “ Y p m q j p Y p m q j ´ 1 q ‰ ` E r Y p m q j s ˘ 2 ¸ . These relationships connect the mean of the latent expression mo del µ p m q to the observ able momen ts of the cov ariate-adjusted scaled counts t Y p m q j u j Pr p s . W e note that the co v ariate effect s j p C q op erates solely at the measuremen t level and remains in v ariant across environ- men ts. As suc h, it can b e regarded as a n uisance that do es not affect the identification of the laten t causal structure. W e then hav e the follo wing iden tification results. Theorem 3.1. Under the mo del (1) – (3) , the c ausal me chanism A is identifiable by jointly lever aging observational and interventional data, pr ovide d that e ach no de is subje ct to at le ast one non-vanishing one-sp arse me an-shift intervention in the latent expr ession mo del. Theorem 3.1 establishes identifiabilit y of the causal mechanism A at the population level b y jointly leveraging observ ational and interv entional data. Here, a non-vanishing mean- shift in terven tion refers to a p erturbation with mean-shift strength α m ‰ 0 in the latent expression mo del. In finite samples, successful reco v ery further requires the mean-shift magnitude | α m | to exceed a problem-dep endent low er bound, whic h dep ends on factors such as the graph density and the num b er of interv entional samples. A precise ch aracterization of this requirement is provided in Section 5. Compared with existing metho ds for causal disco v ery from interv entional data, our framew ork do es not require har d or do-type in terv entions as in (Hauser and B ¨ uhlmann, 2012; W ang et al., 2017; Xue et al., 2025). DOTEARS (Xue et al., 2025) also consid- ers linear SCMs, but assumes diagonal noise co v ariance and in v arian t exogenous v ariances for non-targeted no des across en vironmen ts, whereas our metho d allo ws non-diagonal and en vironmen t-sp ecific cov ariance matrices. F or observ ational coun t data, existing identi- fiabilit y results either exclude measurement error (Park and P ark, 2019) or reco ver only 10 Mark o v equiv alence classes (Saeed et al., 2020). A k ey distinction is that our framew ork accommo dates latent confounding. In linear structural causal mo dels, latent confounding manifests as correlated errors (Pearl, 2009; Bhattac hary a et al., 2021), so in our setting directed edges are enco ded in the co efficient matrix A , while bidirected dep endence corresponds to nonzero off-diagonal entries of the noise cov ariance matrix. F rom the p ersp ectiv e of mixed graphs, Theorem 3.1 sho ws that the directed edge structure remains identifiable even in the presence of laten t confounding. Unlik e approaches based on explicit cov ariance mo deling and deconfounding via precision matrix decomp osition (F rot et al., 2019; P al et al., 2025), our metho d exploits interv entional mean shifts and the fact that, under the mean-shift assumption, differences in laten t means are unaffected b y latent confounding, th us av oiding explicit mo deling of the confounding structure and no de-wise deconfounding pro cedures. 4 Estimation of Causal Co efficien t Matrix and D A G Reco v ery In this section, we describ e the estimation pro cedure for the causal co efficien t matrix A . The procedure consists of four main steps: (i) estimating and remo ving measuremen t- la y er effects, (ii) mapping observ ed momen ts to latent means under the P oisson–lognormal mo del, (iii) reco vering the columns of B “ p I ´ A q ´ 1 via cross-en vironment mean shifts, and (iv) estimating A under sparsit y and acyclicit y constrain ts. The complete algorithm is summarized in Algorithm 1. W e begin b y estimating the co v ariate effect s j p C q in the measuremen t la yer. In practice, s j p C q can b e estimated using a P oisson regression with offset log L and predictors C . 11 Sp ecifically , for each gene j , w e can estimate the co v ariate effect s j p C q using a generalized linear mo del with log link and offset log L , log E r X p m q ij | C i , L i s “ log L i ` β J j C i ` δ p m q j , (7) where δ p m q j are en vironmen t-sp ecific intercepts that capture mean shifts and do not affect iden tification of β j . Let p β j denote the (quasi-)maximum lik eliho o d estimate, and define p s j p C i q “ C J i p β j . Using p s j p¨q , we can form the rescaled counts p Y p m q ij “ X p m q ij {p L p m q i exp p p s j p C p m q i qqq from whic h the laten t mean v ector p µ p m q is obtained by replacing the population exp ectation E r¨s in (6) with the empirical exp ectation p E r¨s as sp ecified in (9). F or each interv entional en vironment m P t 1 , . . . , p u , w e compute laten t mean con trasts ∆ p µ p m q “ p µ p m q ´ p µ p 0 q and reconstruct the ℓ -th column of B by p b ℓ “ ∆ p µ p m q { ∆ p µ p m q ℓ where l “ I p m q . Collecting these columns yields p B “ r p b 1 , . . . , p b p s P R p ˆ p . Finally , to reco v er the causal matrix A , we estimate a sparse weigh ted adjacency ma- trix A under acyclicit y constraints. Given the estimate p B « p I ´ A q ´ 1 , we consider the constrained optimization problem p A P arg min A } A } 1 s.t. h p A q “ 0 , } p B p I ´ A q ´ I } max ď λ s , (8) where h p A q “ 0 enforces the D A G constraint, and the ℓ 8 constrain t controls the appro xi- mation error of p B p I ´ A q in a CLIME-t yp e regularization framework (Cai et al., 2011). W e include the acyclicity constrain t h p¨q to ensure that B is unit-diagonal. T ypical c hoices of h p¨q include h p A q “ tr ` exp p A ˝ A q ˘ ´ p as in NOTEARS (Zheng et al., 2018b), or h p A q “ ´ log | uI ´ A | ` p log u for u P R u as in DA GMA (Bello et al., 2022). T o obtain the estimated causal graph, we then apply entrywise hard thresholding to p A and form the thresholded estimator p A τ b y p p A τ q ij : “ ˆ a ij 1 ␣ | ˆ a ij | ě τ ( , for i, j P r p s . T o handle the non-smo oth ℓ 8 constrain t, we introduce an auxiliary v ariable Z “ p B p I ´ 12 Algorithm 1 PLN-intervn: Causal disco v ery from in terven tional P oisson coun t data Input: Observed data tp X p m q , L p m q , C p m q qu M m “ 0 , interv ention targets t I p m q “ ℓ m u M m “ 1 , reg- ularization lev el λ s , threshold level τ ą 0. 1. Measurement-la y er adjustmen t: Obtain p s j p C i q by fitting (7) and compute scaled coun ts p Y p m q ij “ X p m q ij L` L p m q i exp t p s j p C p m q i qu ˘ . 2. Mapping observ ed momen ts to laten t means: Compute p µ p m q via the transfor- mation of empirical moments via p µ p m q j “ log ` p E r p Y p m q j s ˘ ´ 1 2 p Σ p m q j j , where p Σ p m q j j “ log ˜ p E r p Y p m q j p p Y p m q j ´ 1 qs p E r p Y p m q j s 2 ¸ . (9) 3. Column reco very of B : F or the environmen t m targeting ℓ “ I p m q : reconstruct the ℓ -th column of B by p b ℓ “ ∆ p µ p m q { ∆ p µ p m q ℓ . Collect the columns into a matrix to obtain p B “ r p b 1 , . . . , p b p s P R p ˆ p . 4. Sparse D A G estimation of A : Obtain p A b y solving (8). Output: Estimated causal matrix p A and recov ered graph p G “ G p p A τ q . A q ´ I and rewrite problem (8) equiv alen tly as min A,Z } A } 1 s.t. Z “ p B p I ´ A q ´ I , } Z } max ď λ s , h p A q “ 0 . (10) This reform ulation separates the non-smo oth constraint from the acyclicity constraint and facilitates efficien t optimization. W e solve (10) using an augmented Lagrangian approac h com bined with the alternating direction metho d of multipliers (ADMM). In tro ducing a dual v ariable V P R p ˆ p for the linear constrain t Z “ p B p I ´ A q ´ I , the augmented Lagrangian is giv en b y L ρ p A, Z , V q “ } A } 1 ` α h p A q ` µ 2 h p A q 2 ` ρ 2 › › Z ´ ` p B p I ´ A q ´ I ˘ ` V ρ › › 2 F ´ 1 2 ρ } V } 2 F , (11) where ρ ą 0 is the ADMM p enalt y parameter, and α, µ ą 0 are the Lagrange multiplier and quadratic p enalt y co efficient asso ciated with the acyclicit y constrain t h p A q “ 0. 13 ADMM Up dates. Given the augmen ted Lagrangian (11), the ADMM iterations pro- ceed by alternating up dates of the primal v ariables Z and A , follow ed by a dual up date of V . A t iteration t ` 1, the up dates are: Z p t ` 1 q “ Π }¨} max ď λ s ` p B ´ p B A p t q ´ I ´ 1 ρ V p t q ˘ , A p t ` 1 q “ arg min A ρ 2 } p B A ´ C p t ` 1 q } 2 F ` α h p A q ` µ 2 h p A q 2 ` } A } 1 , where C p t ` 1 q “ p B ´ I ´ Z p t ` 1 q ´ 1 ρ V p t q , V p t ` 1 q “ V p t q ` ρ ` Z p t ` 1 q ´ p p B ´ p B A p t ` 1 q ´ I q ˘ . (12) Here Π }¨} max ď λ s denotes the element wise pro jection onto the ℓ 8 ball of radius λ s , imple- men ted via element wise truncation to r´ λ s , λ s s . The up date of A p t q is con vex apart from the acyclicity constraint and is solved using a pro ximal gradien t metho d. The iteration terminates when the primal and dual residual norms, relativ e change in A , and the acyclic- it y violation | h p A q| fall b elo w presp ecified tolerances. More implementation details are pro vided in Supplemen tary Material Section A.1. Choice of λ s . The sparsit y parameter λ s is selected using a pseudo-BIC (PBIC) criterion. F or each candidate λ s , w e define the criterion PBIC p λ s q “ } p B p I ´ p A p λ s qq ´ I } 2 F } p B ´ I } 2 F ` C w ¨ } p A p λ s q} 0 p p p ´ 1 q{ 2 , (13) where C w ą 0 con trols the trade-off b etw een mo del fit and sparsit y . In our implemen tation w e set C w “ 2 and select λ s b y minimizing PBIC p λ s q o v er a grid of candidate v alues. 5 Theoretical Prop erties In this section, we inv estigate the estimation error of the causal co efficient matrix and the graph recov ery consistency for the prop osed estimator in Section 4. W e first introduce additional notations. 14 Let A ˚ P R p ˆ p denote the true co efficient matrix of the laten t Gaussian structural causal mo del, B ˚ : “ p I ´ A ˚ q ´ 1 , and G ˚ “ G p A ˚ q . Let µ p m q˚ “ B ˚ η p m q denote the true latent mean. F or τ ą 0, define the entrywise hard-thresholded matrix A τ b y p A τ q ij : “ a ij 1 ␣ | a ij | ě τ ( , 1 ď i, j ď p . Define n min “ min m Pr p sYt 0 u n m . W e make the follo wing assumptions. Assumption 5.1 ( Moments ) . Supp ose ther e exist some c onstants r, c, δ, c min ą 0 such that (i) p ď cn r , (ii) E | Y p m q j | 4 ` 8 r ` δ ď K , and (iii) E r Y p m q j s ě c min for j P r p s and m P r p s Y t 0 u . Assumption 3.1 imposes momen t conditions on the scaled observ ed P oisson coun ts to ensure w ell-b eha ved concen tration of momen t estimators. It is mild for P oisson-lognormal mo dels, as Gaussian laten t v ariables imply finite moments for the P oisson rates and the scaled counts; similar conditions are standard in related work (Cai et al., 2011; Xiao et al., 2022). The lo wer b ound E r Y p m q j s ě c min excludes degenerate v ariables and ensures stable momen t estimation. Assumption 5.2 ( Mean Shift Strength ) . L et t ą 0 b e some smal l c onstant. Assume that the str ength of the me an shift is str ong enough, that is, min m Pr p s | α m | ě ∆ min , wher e ∆ min ě c 1 a log p { n min and c 1 “ 8 a p 2 K ` 1 qp t ` 3 q . Assumption 3.2 sp ecifies a minimal in terven tion strength for finite-sample identifiabilit y , serving as a finite-sample counterpart of p opulation-level mean-shift conditions in literature of interv ention-based causal disco very . The requirement ∆ min Á a log p { n min matc hes the in trinsic noise lev el and is therefore statistically necessary . Moreov er, it is empirically v erifiable since ∆ µ p m q ℓ m can b e estimated from sample means across en vironments. The follo wing lemma establishes a uniform tail b ound for the estimation error of the sample means across all v ariables and en vironmen ts. 15 Lemma 5.1. Under Assumption 5.1, it holds that P ˜ max m,j | ˆ µ p m q j ´ µ p m q˚ j | ě c 1 4 c min c log p n min ¸ “ O p n ´ δ 4 min ` p ´ r q . Theorem 5.2 provides finite-sample error b ounds for estimating the laten t causal co- efficien t matrix A ˚ under in terv entional Poisson coun t data. The en trywise b ound } p A ´ A ˚ } max “ O ´ p c min ∆ min q ´ 1 a log p { n min ¯ is the key rate for edge-wise accuracy and supp ort reco v ery , while the sp ectral and F rob enius b ounds further quantify global estimation error under w eak row-sparsit y s 0 p p q for q P r 0 , 1 q . The app earance of ∆ min in the denominator reflects the impact of mean-shift signal strength on estimating the co efficient matrix A , as weak er mean shifts reduce the separability b et ween control and in terv entional data and necessarily inflate the estimation error. The dep endence on s 0 p p q captures the effect of graph densit y , with larger s 0 p p q corresp onding to denser underlying graphs and leading to increased estimation error in the sp ectral and F rob enius norms. Theorem 5.2 ( Estimation ) . Assume that } I ´ A ˚ } L 1 ď M for some lar ge M ą 0 and A ˚ is we akly sp arse in the sense that max j ř p k “ 1 | a ˚ j k | q ď s 0 p p q for some 0 ď q ă 1 . Under Assumptions 5.1 and 5.2, and by cho osing λ s “ c 1 M {p c min ∆ min q a log p { n min , it holds that } p A ´ A ˚ } max ď 4 c 1 M 2 c min ∆ min c log p n min , } p A ´ A ˚ } 2 ď c 2 p 4 M q 1 ´ q p c 1 M c min ∆ min c log p n min q 1 ´ q s 0 p p q , } p A ´ A ˚ } 2 F p ď c 2 p 4 M q 2 ´ q p c 1 M c min ∆ min c log p n min q 2 ´ q s 0 p p q , with pr ob ability gr e ater than 1 ´ O p n ´ δ { 4 min ` p ´ t q , wher e c 2 “ 1 ` 2 1 ´ q ` 3 1 ´ q . T o characterize the reconstructabilit y of the underlying DA G, we define the index ζ : “ ζ p A ˚ q “ inf A : G p A τ q‰ G ˚ , h p A q“ 0 , } A } 1 ď} A ˚ } 1 } B ˚ p I ´ A q ´ I } max . 16 The index ζ p A ˚ q measures the minimal separation b et ween the true mo del and any alter- nativ e co efficien t matrix whose thresholded graph differs from G ˚ , and is closely related in spirit to the degree of reconstructabilit y introduced in Y uan et al. (2019). Assumption 5.3 ( D A G Reconstructability ) . Assume that ζ ě c 3 a log p { n min , wher e c 3 “ p c 1 ` 2 ? 2 q M {p c min ∆ min q . Define a min : “ min p j,k q : A ˚ j k ‰ 0 | A ˚ j k | . A sufficient condition for Assumption 5.3 is that a min ě 2 τ and τ ě p 1 ` M q c 3 a log p { n min , where τ is the hard-threshold level. It is essen tially a b eta-min t yp e condition similarly used in (Cai et al., 2011; v an de Geer and B ¨ uhlmann, 2013; Y uan et al., 2019; P ark and P ark, 2019). Remark 5.1. We r emark that the b eta-min c ondition is not dir e ctly c omp ar able to the str ong faithfulness assumption; the former is a lo c al signal str ength c ondition c ommonly use d in high-dimensional settings, wher e as the latter is a glob al distributional assumption. Se e van de Ge er and B¨ uhlmann (2013) for detaile d discussion. Theorem 5.3 ( Graph Reco v ery ) . Denote κ “ p ζ ´ λ s q c min ∆ min { M wher e λ s is define d in The or em 5.2. Under Assumptions in The or em 5.2, it holds that P p p G “ G ˚ q ą P p sign p p A τ q “ sign p A ˚ qq ą 1 ´ c 4 exp ˜ ´ n min 4 min # κ 2 K , 3 κ d p log p q 3 n min + ` 2 log p ¸ , for some c onstant c 4 ą 4 . F urther, under Assumption 5.3 and as n min Ñ 8 , we have P p p G “ G ˚ q ą P p sign p p A τ q “ sign p A ˚ qq Ñ 1 . Theorem 5.3 establishes finite-sample recov ery of the latent causal D A G via thresholding the estimated co efficien t matrix. The con vergence probabilit y is go verned b y the degree of D AG reconstructability ζ and the mean-shift strength ∆ min through κ : larger v alues of either quan tity accelerate con vergence of P p p G “ G ˚ q to one, while w eak er reconstructabilit y or mean shifts slow down reco very . 17 6 Sim ulation Ev aluations 6.1 Setup Data generation. W e consider a mean-shift setting of the laten t linear Gaussian struc- tural mo del described in Section 2, and examine the p erformance of differen t methods under differen t graph densities and mean-shift strengths. The underlying causal mechanism A was generated from an Erd˝ os–R´ enyi mo del under a random top ological ordering to ensure acyclicit y . F or each ordered pair of nodes i ă j , a directed edge i Ñ j was added indep enden tly with probabilit y such that the exp ected a v erage no de degree of d , where d “ 2 for the low-densit y setting and d “ 2 for the high-densit y setting. Nonzero edge weigh ts w ere dra wn uniformly from ˘r 0 . 3 , 0 . 6 s . The in tercept v ector in the control data w as fixed as η p 0 q “ p´ 0 . 5 , . . . , ´ 0 . 5 q . F or in terv entional data, we consider setting the mean shift strength at differen t levels, α m P t´ 2 , ´ 4 u . Library sizes were sampled indep endently as L i „ Lognormal p log 10 , 0 . 1 2 q . F or simplicit y , no additional co v ariates C i w ere included. T o in tro duce laten t confounding, we consider p erv asive low-rank confounders by set- ting the n um b er of unmeasured factors to n uc “ 2 and generating a loading matrix U P R p ˆ n uc with i.i.d. en tries from N p 0 , σ 2 c q , where σ c con trols the confounding strength. The noise co v ariance in all environmen ts is sp ecified as Σ p 0 q e “ diag p Uniform p 0 . 5 , 0 . 8 qq ` U U J , where the lo w-rank term U U J induces residual correlations across genes. W e consider σ c P t 0 , 0 . 25 , 0 . 5 u , graph dimension p P t 50 , 100 u , control sample size n 0 “ 5000, and in terv entional sample size n m P t 100 , 200 , 500 u for eac h m P r p s . Comp eting metho ds. W e compare the prop osed metho ds with a range of existing ap- proac hes for DA G learning. NOTEARS (Zheng et al., 2018a), LiNGAM (Shimizu et al., 18 2006), and MRS-P oissonSEM (P ark and Park, 2019) estimate DA Gs from observ ational data, among which MRS-P oissonSEM is sp ecifically designed for P oisson count data, while the others rely on contin uous-v alued assumptions. DOTEARS (Xue et al., 2025), GIES (Hauser and B ¨ uhlmann, 2012), and IGSP (W ang et al., 2017) lev erage in terven tional data for causal identification. Anchor-PLN-PC (Saeed et al., 2020) accommo dates the P oisson log-normal mo del while explicitly accounting for measurement error and sample heterogene- it y . It first estimates an undirected dep endency graph and then applies the PC algorithm to recov er the corresponding Marko v equiv alence class of the underlying D AG. Finally , PLN-in tervn is the prop osed metho d that exploits in terven tional Poisson log-normal data. The prop osed metho ds and Anchor-PLN-PC explicitly mo del library size effects and w ere thus applied directly to the ra w coun t data. F or comp eting methods without an explicit library size comp onent, coun ts were first normalized b y the library size L . F urther- more, except for MRS-P oisson-SEM, metho ds based on approximate con tin uous assump- tions w ere applied to log p 1 ` x q -transformed normalized coun ts. F or all metho ds, the estimated adjacency matrix p A was thresholded at τ “ 0 . 01, yield- ing p A τ as the final estimate. Additional sim ulations examining sensitivit y to differen t thresholds are rep orted in Supplementary Material Section A.2. Ev aluation metrics. T o assess graph reco very p erformance, w e ev aluate edge-lev el accu- racy using precision, recall, and the F1 score, and structural accuracy using the Structural Hamming Distance (SHD). Let TP , FP , FN, and TN denote the n umbers of true p ositives, false p ositives, false negatives, and true negatives of the directed edge estimation. The pre- cision and recall are defined as Precision “ TP {p TP ` FP q , and Recall “ TP {p TP ` FN q . The F1 score is the harmonic mean of precision and recall, F1 “ 2 Precision ˆ Recall Precision ` Recall “ 2 TP 2 TP ` FP ` FN , 19 whic h provides an edge-level measure of graph reco very accuracy b y balancing false dis- co v eries and missed edges. The structural Hamming distance (SHD) is a global metric that counts the num b er of edge additions, deletions, and reversals needed to transform the estimated graph in to the true graph, with smaller v alues indicating more accurate reco very . 6.2 Results W e rep ort in the main pap er the F1 score and SHD under mo derate laten t confounding ( σ c “ 0 . 25) with graph dimension p “ 50. Additional results are deferred to Supplemen- tary Material Section A.2, including Precision and Recall under this setting, robustness to differen t confounding levels, sim ulations in higher dimensions, and sensitivity to threshold c hoices. Figures 1 and 2 summarize the p erformance of differen t methods across v arying graph densities, mean-shift strengths, and interv entional sample sizes, measured by F1 score and SHD, resp ectively . W e highlight sev eral key observ ations b elo w. First, the p erformance of the proposed method improv es systematically with stronger mean shifts, lo wer graph density , and larger interv entional sample sizes, as reflected by higher F1 scores and low er SHD. These trends are consistent with the theoretical results in Theorem 5.3, where larger mean-shift strength ∆ min , larger sample size n min , and smaller M “ } I ´ A ˚ } L 1 lead to faster conv ergence of the estimated graph to the true DA G. Second, compared with observ ational-data-based methods, PLN-in tervn consisten tly outp erforms NOTEARS, LiNGAM, and MRS-P oisson-SEM across nearly all regimes in terms of b oth F1 score and SHD. In more challenging settings characterized by high graph densit y and w eak mean shifts, MRS-Poisson-SEM can ac hieve slightly higher F1 scores than the prop osed metho d when the in terven tional sample size is small. How ever, this marginal adv an tage disapp ears once the in terven tional sample size b ecomes mo derate. Moreo ver, 20 sensitivit y analysis in Supplementary Material Section A.2.5 sho ws that MRS-P oisson- SEM is m uch more sensitive to the threshold τ than our metho d, and the rep orted results use a τ fav orable to MRS-Poisson-SEM. Figure 1: F1 score of different methods under different graph densities, different mean shift strengths, and v arying sample sizes of in terv entional data. Third, w e observ e that in this setting, the in terven tional-data-based metho ds, including DOTEARS, GIES, and IGSP , can ev en underp erform the observ ation-based MRS-Poisson- SEM. This phenomenon may b e attributed to tw o factors. First, these metho ds are pri- marily designed for hard or do-type in terv entions, and their p erformance may degrade under the soft mean-shift in terven tions considered here. Second, these approaches do not explicitly mo del the coun t-v alued nature of the data, whic h can lead to inferior p erfor- mance compared to MRS-Poisson-SEM, despite the latter relying only on observ ational 21 data. This observ ation further highligh ts the imp ortance of appropriately mo deling count data in causal discov ery for p erturbation studies. Figure 2: SHD of different methods under differen t graph densities, different mean shift strengths, and v arying sample sizes of in terv entional data. Finally , although Anchor-PLN-PC explicitly accounts for count-v alued data and mea- suremen t error, its F1 score remains relatively low in our simulations. A p otential expla- nation is that the PC algorithm requires estimating the co v ariance matrix of the laten t v ariables in (2), which can b e unstable in high-dimensional settings. Moreov er, ev en with a reliable co v ariance estimate, PC-based metho ds are only guaranteed to recov er a Marko v equiv alence class. In con trast, by lev eraging interv entional data, the prop osed metho d is able to identify the full causal D A G, leading to impro v ed structural reco very . 22 7 Application to P erturb-seq dataset W e analyze a single-cell CRISPR p erturbation dataset from the Perturb-seq study of Re- plogle et al. (2022), fo cusing on the K562 chronic my elogenous leukemia cell line. In this exp erimen t, K562 cells expressing a CRISPR interference (CRISPRi; dCas9–KRAB) sys- tem were sub jected to p o oled p erturbations targeting appro ximately 2,000–2,500 essential genes. These genes are defined as those required for cell viability and proliferation, and are enriched for core cellular pro cesses such as transcription, translation, RNA pro cessing, and cell cycle regulation. Single-cell RNA sequencing with guide capture w as used to link eac h p erturbation to its transcriptomic resp onse, enabling systematic characterization of gene regulatory effects at single-cell resolution. After prepro cessing to remov e genes with insufficient p erturb ed-cell counts, extreme sparsit y , or weak p erturbation effects, 794 genes remain, of which we retain the 200 most v ariable genes in con trol cells for our analysis. The resulting dataset con tains 10,691 con- trol cells and an a verage of 194 p erturb ed cells p er in terv ention, with a minimum of 101 p erturb ed cells p er gene. T o construct a biologically informed reference graph, we emplo y ed a transcription fac- tor–target interaction netw ork deriv ed from ChIP-seq data (Chev alley et al., 2025), with in teractions restricted to those relev ant to or observ ed in the K562 cell line. Within the selected 200 genes, this reference net work con tains 638 directed edges, with a graph densit y 0.016. W e compare the prop osed metho d with the comp eting approaches describ ed in Sec- tion 6. Detailed prepro cessing and implementation steps are provided in Supplemen tary Material Section B.1. The ChIP-seq–deriv ed transcription factor–target netw ork provides a useful but inher- en tly limited reference for ev aluating the learned D A G from p erturb-seq data. While it 23 offers indep enden t evidence of regulatory relationships, it reflects ph ysical TF–DNA bind- ing rather than functional or causal effects on gene expression. Thus, agreement supp orts biological plausibilit y , particularly for direct in teractions, whereas discrepancies ma y arise from con text-sp ecific regulation, indirect effects captured by p erturb-seq, or incomplete and noisy ChIP-seq cov erage. Accordingly , this comparison should b e view ed as a comple- men tary , rather than definitiv e, v alidation of causal structure. 7.1 Globle reco v ery Figure 3 presents the precision and recall of different metho ds as functions of the threshold parameter τ applied to p A τ , with τ v arying from 0.15 to 0.001 in steps of 0.001. F or constrain t-based metho ds, including IGSP and Anc hor-PLN, the significance lev el is fixed at 0.05, and the results are shown as single p oints. The left v ertical axis rep orts precision and the right vertical axis reports recall. Supplementary Material Section B.4 pro vides a complemen tary no de-lev el ev aluation, where precision and recall are computed separately for eac h source gene. Figure 3 rev eals three main findings. First, the prop osed metho d consistently achiev es the highest precision across the entire range of thresholds, with its precision curve uniformly dominating those of all comp eting metho ds. When τ ă 0 . 1, its recall also substantially exceeds that of other approaches. In contrast, comp eting metho ds fail to surpass the densit y baseline in precision and exhibit limited recall. These results suggest that the prop osed metho d more effectiv ely exploits in terv en tional information to reco v er regulatory edges. Second, among the comp eting approaches, MRS-PoissonSEM and IGSP p erform com- parativ ely b etter than others. MRS-PoissonSEM attains slightly higher precision within a narro w threshold range, whereas IGSP achiev es higher recall. LiNGAM, DOTEARS, and 24 NOTEARS exhibit mo derate precision but generally low recall, while Anc hor-PLN fails to iden tify edges on this dataset. Figure 3: Precison-Recall curves of differen t metho ds on ChIP-seq dataset. Third, the prop osed metho d demonstrates relativ ely stable b ehavior with resp ect to threshold v ariation. As the threshold decreases, recall increases substan tially and steadily while precision remains largely stable. Con versely , as the threshold increases, precision gradually impro ves, indicating that edges selected under stricter thresholds tend to b e more informativ e and reliable. In contrast, metho ds suc h as MRS-PoissonSEM and DOTEARS are relativ ely more sensitive to threshold c hoice. 7.2 Distributional shifts in downstream and upstream genes T o examine whether the inferred causal directions are supp orted b y in terv entional re- sp onses, we analyzed distributional shifts following gene p erturbations based on the es- timated causal graph G p p A τ q by differen t metho ds. F or each p erturb ed gene, w e iden tified its upstream (ancestors) and downstream (descendants) genes according to the directed 25 structure of G p p A τ q . W e use τ “ 0 . 15 for the prop osed metho d and τ “ 0 . 03 for the com- p eting methods, selected based on Figure 3 to balance precision and recall and main tain comparable net work sparsity . W e then p erformed tw o-sample Kolmogoro v–Smirnov (KS) tests comparing gene expression b et ween the corresp onding p erturbation en vironment and the control condition for all other genes, and partitioned the resulting KS p -v alues into up- stream and do wnstream groups. Under a correctly inferred causal structure, do wnstream genes are exp ected to exhibit stronger distributional shifts, reflected b y smaller KS p -v alues. Figure 4: Distributional shifts quan tified b y ´ log 10 p -v alues from t wo-sample Kolmogoro v– Smirno v (KS) tests comparing p erturb ed and con trol conditions. Upstream (ancestors) and downstream (descendan ts) genes are defined according to the estimated causal graph G p p A τ q obtained from obtained from seven different methods. The upp er panel p o ols all gene–p erturbation pairs and the lo wer panel summarizes medians at the perturbation lev el. Figure 4 presen ts the results for different metho ds at t wo complemen tary lev els. The 26 upp er panel p o ols all p erturbation–gene pairs and compares the distributions of ´ log 10 KS p -v alues b et ween do wnstream and upstream genes for eac h metho d. F or the pro- p osed metho d, do wnstream genes displa y larger v alues than upstream genes, indicating stronger p erturbation-induced shifts. In con trast, for alternative methods, the separa- tion b etw een downstream and upstream distributions is substan tially weak er. Notably , for MRS-P oissonSEM, upstream genes exhibit slightly larger ´ log 10 KS p -v alues than do wn- stream genes on av erage, suggesting rev ersal or misidentification of causal directions. This observ ation is consisten t with the netw ork structure shown in Supplementary Section B.3, where MRS-P oissonSEM tends to pro duce graphs dominated by high in-degree no des. The lo wer panel of Figure 4 further assesses p erturbation-level consistency b y summa- rizing eac h p erturbation separately . F or each metho d and p erturbation, we compare the median ´ log 10 KS p -v alues of its downstream and upstream gene sets. Eac h p oint corre- sp onds to one p erturbation. F or the prop osed metho d, most p oin ts lie ab ov e the diagonal line and are lo cated tow ard the upp er-left region of the panel, indicating that downstream genes consistently exhibit stronger distributional shifts across p erturbations. In con trast, for the comp eting metho ds, many p oin ts cluster near the origin, suggesting weak or indis- tinguishable distributional differences b etw een upstream and downstream gene sets. 7.3 Inferred gene regulatory netw ork Figure 5 visualizes the directed gene regulatory net w ork inferred from the estimated co ef- ficien t matrix p A τ obtained b y the proposed PLN-intervn metho d with threshold τ “ 0 . 2. Eac h no de represents a gene, and eac h directed edge corresp onds to a regulatory relation- ship inferred from the p erturbation data. Node size reflects the total num b er of incident edges, capturing the o v erall connectivity of eac h gene in the net work. No de color enco des 27 the relative balance betw een outgoing and incoming regulation, quan tified b y the ratio d out {p d out ` d in q , where d out and d in denote the out-degree and in-degree of the corresp ond- ing gene, resp ectively . Edges highlighted in red indicate regulatory relationships that are supp orted b y the ChIP-seq dataset. W e also pro vide inferred gene regulatory netw orks b y other candidate metho ds in Supplementary Material Section B.3. Figure 5: Gene regulatory netw ork inferred b y PLN-intervn ( τ “ 0 . 2). No de size indicates connectivit y , and no de color indicates the out-/in-degree ratio. Insp ection of the inferred netw ork rev eals a small num b er of h ub genes with markedly large out-degrees, including HSP A5, BOP1, UBTF, GA T A1, and SF3B1. Among these, UBTF and GA T A1 are well-established transcriptional regulators (F erreira et al., 2005; Sanij et al., 2015), and man y of their outgoing edges are supp orted by ChIP-seq data, lending supp ort to the biological plausibility of the inferred structure. Notably , among the 28 relativ ely few true edges identified by comp eting metho ds suc h as MRS-PoissonSEM and GIES, sev eral outgoing edges from UBTF and GA T A1 are also reco v ered. HSP A5 emerges as the most prominent h ub gene, which is biologically plausible given its cen tral role in regulating stress-resp onse and cell-cycle programs. Its strongest inferred target, HSP90B1, is lik ewise inv olved in endoplasmic reticulum protein folding, further supp orting this reg- ulatory relationship (Steiner, 2022). More detailed biological interpretation is provided in Supplemen tary Material Section B.2. 8 Discussion This pap er studies causal discov ery from in terv entional coun t data motiv ated by single-cell p erturbation experiments suc h as Perturb-seq. These data present k ey challenges for ex- isting metho ds, including discrete observ ations, substan tial measurement noise, and laten t cellular heterogeneity . Most current approaches assume con tin uous data with indep endent noise, limiting their applicabilit y in this setting. W e address these challenges by prop osing a laten t structural framew ork that separates the causal mec hanism on laten t gene expres- sion from the Poisson measuremen t pro cess. Under a mean-shift in terven tion design, we establish iden tifiability of the latent causal structure through cross-en vironment shifts in laten t means. W e further develop an estimation pro cedure that com bines momen t-based reco v ery of latent mean shifts with sparse D A G estimation, and provide finite-sample guar- an tees. Sim ulations and analysis of P erturb-seq data demonstrate impro ved p erformance when b oth coun t structure and interv entional design are prop erly accoun ted for. Sev eral extensions are p ossible. First, the framew ork is not limited to the Poisson–lognormal mo del; it can b e extended to other count mo dels, suc h as zero-inflated or drop out-aw are P oisson mo dels (Saeed et al., 2020), pro vided suitable moment relationships are a v ail- 29 able. Second, while w e focus on mean-shift interv entions, other types ma y also b e infor- mativ e. In particular, v ariance-shift interv entions that mo dify the noise co v ariance, e.g., Σ p m q e “ Σ p 0 q e ` β m e ℓ m e J ℓ m , could enable iden tification via second-order con trasts, since the leading eigen vector of Σ p m q e ´ Σ p 0 q e reco v ers the ℓ m -th column of B up to scale. This raises the question of whether identifiabilit y results analogous to Theorem 3.1 can b e established in this setting. Ho wev er, such approaches typically require larger sample sizes to estimate co v ariance structures reliably , lea ving open the question of ho w to develop metho ds suitable for small-sample regimes. Data Av ailabilit y The data that supp ort the findings of this study are op enly a v ailable at https://gwps.wi.mit.edu. Ac kno wledgemen ts This researc h is supp orted b y NIH gran ts GM129781 and HG013841. Supplemen tary Materials The supplemen tary material includes additional sim ulation details and ev aluations, detailed results for real data application and all the pro ofs. References Bello, K., B. Aragam, and P . K. Ravikumar (2022). Dagma: Learning directed acyclic graphs with gradien t-based metho ds. In Pr o c e e dings of the 39th International Confer enc e 30 on Machine L e arning (ICML) , pp. 1777–1793. PMLR. Bhattac hary a, R., T. Nagara jan, D. Malinsky , and I. Shpitser (2021). Differen tiable causal disco v ery under unmeasured confounding. In International Confer enc e on Artificial In- tel ligenc e and Statistics , pp. 2314–2322. PMLR. Cai, T. T., W. Liu, and X. Luo (2011). Constrained ℓ 1 minimization for sparse precision matrix estimation. Journal of the Americ an Statistic al Asso ciation 106 (494), 594–607. Chev alley , M., Y. H. Ro ohani, A. Mehrjou, J. Lesk ov ec, and P . Sch wab (2025). A large-scale b enc hmark for netw ork inference from single-cell p erturbation data. Communic ations Biolo gy 8 (1), 412. Chic k ering, D. M. (2002). Optimal structure identification with greedy search. Journal of machine le arning r ese ar ch 3 (No v), 507–554. Choi, J. and Y. Ni (2023). Mo del-based causal discov ery for zero-inflated coun t data. Journal of Machine L e arning R ese ar ch 24 (200), 1–32. Colom b o, D., M. H. Maath uis, M. Kalisch, and T. S. Ric hardson (2012). Learning high- dimensional directed acyclic graphs with latent and selection v ariables. The A nnals of Statistics , 294–321. Dai, H., P . Spirtes, and K. Zhang (2022). Indep endence testing-based approach to causal disco v ery under measuremen t error and linear non-gaussian mo dels. A dvanc es in Neur al Information Pr o c essing Systems 35 , 27524–27536. F erreira, R., K. Ohneda, M. Y amamoto, and S. Philipsen (2005). Gata1 function, a paradigm for transcription factors in hematop oiesis. Mole cular and Cel lular Biol- o gy 25 (4), 1215–1227. 31 F rot, B., P . Nandy , and M. H. Maathuis (2019). Robust causal structure learning with some hidden v ariables. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dol- o gy 81 (3), 459–487. Hauser, A. and P . B ¨ uhlmann (2012). Characterization and greedy learning of in terven tional mark o v equiv alence classes of directed acyclic graphs. The Journal of Machine L e arning R ese ar ch 13 (1), 2409–2464. Heinze-Deml, C., M. H. Maathuis, and N. Meinshausen (2018). Causal structure learning. A nnual R eview of Statistics and Its Applic ation 5 (1), 371–391. Li, J., B. B. Ch u, I. F. Scheller, J. Gagneur, and M. H. Maath uis (2025). Ro ot cause disco v ery via p ermutations and cholesky decomp osition. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , qk af066. Liu, E., J. Zhang, and C. Uhler (2025). Learning genetic p erturbation effects with v aria- tional causal inference. bioRxiv , 2025–06. P al, S., J. O’quinn, K. Aryan, H. Pua, J. P . Long, and A. Asiaee (2025). Dag decoration: Con tin uous optimization for structure learning under hidden confounding. arXiv pr eprint arXiv:2510.02117 . P ark, G. and S. Park (2019). High-dimensional p oisson structural equation mo del learning via l 1 -regularized regression. Journal of Machine L e arning R ese ar ch 20 (95), 1–41. P ark, G. and G. Raskutti (2015). Learning large-scale p oisson dag mo dels based on o verdis- p ersion scoring. A dvanc es in neur al information pr o c essing systems 28 . P earl, J. (2009). Causality: Mo dels, R e asoning, and Infer enc e (2 ed.). Cambridge: Cam- bridge Univ ersit y Press. 32 P eters, J. and P . B ¨ uhlmann (2014). Iden tifiability of gaussian structural equation mo dels with equal error v ariances. Biometrika 101 (1), 219–228. Replogle, J. M., R. A. Saunders, A. N. P ogson, J. A. Hussmann, A. Lenail, A. Guna, L. Mascibroda, E. J. W agner, K. Adelman, G. Lithwic k-Y anai, et al. (2022). Map- ping information-ric h genot yp e-phenotype landscap es with genome-scale p erturb-seq. Cel l 185 (14), 2559–2575. Saeed, B., A. Bely aev a, Y. W ang, and C. Uhler (2020). Anc hored causal inference in the presence of measuremen t error. In Confer enc e on unc ertainty in artificial intel ligenc e , pp. 619–628. PMLR. Sanij, E., J. Diesch, A. Lesmana, G. Poortinga, N. Hein, G. Lidgerwoo d, D. P . Cameron, J. Ellul, G. J. Go o dall, L. H. W ong, A. S. Dhillon, N. Hamdane, L. I. Roth blum, R. B. P earson, I. Haviv, T. Moss, and R. D. Hannan (2015). A no vel role for the p ol i tran- scription factor ubtf in maintaining genome stabilit y through the regulation of highly transcrib ed p ol ii genes. Genome R ese ar ch 25 (2), 201–212. Sark ar, A. and M. Stephens (2021). Separating measuremen t and expression mo dels clarifies confusion in single-cell rna sequencing analysis. Natur e genetics 53 (6), 770–777. Sc h¨ olkopf, B., F. Lo catello, S. Bauer, N. R. Ke, N. Kalch brenner, A. Goy al, and Y. Bengio (2021). T ow ard causal represen tation learning. Pr o c e e dings of the IEEE 109 (5), 612–634. Shalem, O., N. E. Sanjana, E. Hartenian, X. Shi, D. A. Scott, T. S. Mikkelsen, D. Hec kl, B. L. Eb ert, D. E. Ro ot, J. G. Do enc h, et al. (2014). Genome-scale crispr-cas9 kno ck out screening in human cells. Scienc e 343 (6166), 84–87. Shen, X., P . B ¨ uhlmann, and A. T aeb (2025). Causality-orien ted robustness: exploiting 33 general noise in terven tions in linear structural causal mo dels. Journal of the A meric an Statistic al Asso ciation (just-accepted), 1–20. Shimizu, S., P . O. Ho y er, A. Hyv¨ arinen, and A. Kerminen (2006). A linear non-gaussian acyclic mo del for causal discov ery . Journal of Machine L e arning R ese ar ch 7 , 2003–2030. Spirtes, P ., C. N. Glymour, and R. Scheines (2000). Causation, pr e diction, and se ar ch . MIT press. Squires, C. and C. Uhler (2023). Causal structure learning: A com binatorial p ersp ective. F oundations of Computational Mathematics 23 (5), 1781–1815. Squires, C., Y. W ang, and C. Uhler (2020). Perm utation-based causal structure learning with unkno wn in terven tion targets. In Confer enc e on Unc ertainty in A rtificial Intel li- genc e , pp. 1039–1048. PMLR. Steiner, L. A. (2022). Helping gata1 mak e complex decisions. Blo o d 139 (24), 3457–3459. Uhler, C., G. Raskutti, P . B ¨ uhlmann, and B. Y u (2013). Geometry of the faithfulness assumption in causal inference. The A nnals of Statistics , 436–463. v an de Geer, S. and P . B ¨ uhlmann (2013). ℓ 0 -p enalized maximum likelihoo d for sparse directed acyclic graphs. The Annals of Statistics 41 (2), 536–567. V arici, B., E. Acart ¨ urk, K. Shanmugam, A. Kumar, and A. T a jer (2025). Score-based causal representation learning: Linear and general transformations. Journal of Machine L e arning R ese ar ch 26 (112), 1–90. W ang, Y., L. Solus, K. D. Y ang, and C. Uhler (2017). P ermutation-based causal inference algorithms with in terven tions. In Pr o c e e dings of the 31st International Confer enc e on 34 Neur al Information Pr o c essing Systems , NIPS’17, Red Ho ok, NY, USA, pp. 5824–5833. Curran Asso ciates Inc. W ang, Y. S. and M. Drton (2023). Causal discov ery with unobserved confounding and non-gaussian data. Journal of Machine L e arning R ese ar ch 24 (271), 1–61. Xiao, F., J. T ang, H. F ang, and R. Xi (2022). Estimating graphical mo dels for count data with applications to single-cell gene net work. A dvanc es in Neur al Information Pr o c essing Systems 35 , 29038–29050. Xue, A., J. Rao, S. Sank araraman, and H. Pimentel (2025). dotears: Scalable and con- sisten t directed acyclic graph estimation using observ ational and in terv en tional data. iScienc e 28 (2). Y uan, Y., X. Shen, W. Pan, and Z. W ang (2019). Constrained likelihoo d for reconstructing a directed acyclic gaussian graph. Biometrika 106 (1), 109–125. Zhang, K., M. Gong, J. D. Ramsey , K. Batmanghelic h, P . Spirtes, and C. Glymour (2018). Causal discov ery with linear non-gaussian mo dels under measurement error: Structural iden tifiabilit y results. In UAI , pp. 1063–1072. Zheng, X., B. Aragam, P . Ravikumar, and E. P . Xing (2018a). Dags with no tears: contin- uous optimization for structure learning. NIPS’18, Red Ho ok, NY, USA, pp. 9492–9503. Curran Asso ciates Inc. Zheng, X., B. Aragam, P . K. Ravikumar, and E. P . Xing (2018b). Dags with no tears: Con tin uous optimization for structure learning. In A dvanc es in Neur al Information Pr o- c essing Systems (NeurIPS) , V olume 31, pp. 9472–9483. 35

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment