Causal Network Discovery from Interventional Count Data with Latent Linear DAGs

Causal Net w ork Disco v ery from In terv en tional Coun t Data with Laten t Linear D A Gs Yijiao Zhang and Hongzhe Li ∗ Departmen t of Biostatistics, Epidemiology and Informatics Univ ersit y of P ennsylv ania Abstract The increasing a v ailability of in terven tional data oﬀers new opp ortunities for causal disco very , with gene p erturbation studies pro viding a prominent example. Suc h data are t ypically count-v alued and sub ject to substan tial measurement error arising from tec hnical v ariability and laten t state heterogeneity . Motiv ated by these chal- lenges, w e study iden tiﬁcation and estimation in laten t linear structural causal mo dels for in terven tional coun t data. W e prop ose a laten t linear Gaussian directed acyclic graph (DA G) mo del with Poisson measuremen t error that explicitly separates the laten t causal structure from the observ ed coun ts. Under a mean-shift interv ention design, w e establish p opulation-lev el identiﬁabilit y of the laten t causal DA G. Building on these identiﬁcation results, we dev elop an estimation pro cedure based on sparse in verse matrix estimation and provide theoretical guarantees on estimation error and ﬁnite-sample causal discov ery . Sim ulation studies and applications to Perturb-seq data demonstrate the practical eﬀectiv eness of the prop osed metho d. Keywor ds: Causal disco v ery; Measuremen t error; P oisson mo del; Perturb-seq data. ∗ Corresp onding author. Yijiao Zhang is a p ostdo ctoral fello w and Hongzhe Li is Perelman Professor in the Department of Biostatistis, Epidemiology and Informatics at the Universit y of Pennsylv ania. 1 1 In tro duction Understanding causal relationships among v ariables is a cen tral problem across many sci- en tiﬁc disciplines, including biology , medicine, and economics. Directed acyclic graphs (D A Gs) provide a widely used framew ork for represen ting causal systems. Ho w ever, ob- serv ational data alone are generally insuﬃcient to iden tify the underlying causal D AG. In general, they determine only a Marko v equiv alence class, within which the directions of edges that are crucial for causal in terpretation are not uniquely identiﬁable. The increasing av ailability of in terv entional data oﬀers new opp ortunities for identi- fying causal graphs. A prominent example arises in gene regulatory netw ork inference, where recen t CRISPR-based p erturbation technologies (Shalem et al., 2014) hav e enabled P erturb-seq exp erimen ts that combine p o oled gene p erturbations with single-cell RNA se- quencing, yielding high-throughput interv entional expression data with known targets at single-cell resolution (Replogle et al., 2022). This exp erimen tal design pro vides rich inter- v en tional information for studying causal regulatory relationships. Despite these adv antages, analyzing P erturb-seq data p oses several substantial c hal- lenges. First, gene expression measurements in single-cell p erturbation exp eriments are inheren tly coun t-v alued, reﬂecting sequencing read counts rather than contin uous expres- sion levels. Most existing in terven tional causal discov ery metho ds rely on Gaussian or con tin uous appro ximations and are therefore not w ell suited for such data. Additionally , observ ed coun ts exhibit substan tial technical noise, arising from library size eﬀects and exp erimen tal batch eﬀects, whic h induces non-negligible measurement error relative to the underlying latent expression. Ignoring this measuremen t structure can lead to biased in- ference and misleading causal conclusions (Zhang et al., 2018; Sark ar and Stephens, 2021), y et it is rarely addressed in existing in terven tional metho ds. 2 Moreo v er, P erturb-seq data may exhibit unmeasured confounding arising from latent cellular states and shared regulatory programs, which can sim ultaneously inﬂuence multiple genes. Such laten t confounding induces spurious dep endencies among genes and violates the causal suﬃciency or indep endent noise assumptions commonly adopted by existing in ter- v en tional causal disco v ery metho ds, p otentially leading to biased DA G structure learning. F urthermore, in Perturb-seq experiments, the n um b er of cells av ailable for eac h individ- ual p erturbation is often limited. In con trast, identiﬁabilit y results for man y in terven tional- data-based metho ds are t ypically established under inﬁnite-sample assumptions, and their practical performance relies on stable estimation of score functions. Such estimation b e- comes particularly unstable in high-dimensional settings with small p er-interv ention sample sizes, limiting the applicability of existing metho ds to realistic P erturb-seq data. Our contribution Motiv ated by these considerations, we fo cus on the problem of iden- tiﬁcation and estimation of latent linear causal D A G mo dels for in terv entional count data with measuremen t errors. Our main contributions are threefold. Metho dologically , w e prop ose a laten t linear Gaussian DA G mo del with P oisson mea- suremen t error, which decouples the causal structure among laten t expression lev els from the observed measurement pro cess. Under a mean-shift interv ention design in whic h each gene is p erturb ed at least once, we establish p opulation-lev el identiﬁabilit y of the laten t causal D AG. Our iden tiﬁcation results do not rely on the causal faithfulness assumption, whic h can b e restricted in high-dimensions (Uhler et al., 2013). The prop osed metho d w orks under soft interv entions and allow for the presence of latent confounding. Computationally , building on these identiﬁabilit y results, w e dev elop a new estimation pro cedure for the latent causal co eﬃcient matrix that integrates ideas from sparse inv erse matrix estimation with explicit D AG constrain ts. The resulting optimization problem is 3 eﬃcien tly solv ed using the alternating direction metho d of multipliers. Theoretically , we derive non-asymptotic estimation error b ounds for the prop osed es- timator of the causal co eﬃcient matrix. Under a b eta-min type condition on the inter- v en tional strength and signal strength on the true edges, w e c haracterize the ﬁnite-sample D A G discov ery rate and quantify the probability of exact D A G reco very . Related W ork Belo w, we pro vide a non-exhaustive review of the most closely related w ork on causal disco very . F or more comprehensive surveys, see (Heinze-Deml et al., 2018; Squires and Uhler, 2023). Causal discov ery from observ ational data has b een extensively studied via c onstr aint- b ase d and sc or e-b ase d metho ds. Constr aint-b ase d metho ds, suc h as the PC algorithm (Spirtes et al., 2000), rely on conditional indep endence testing and typically rely on strong faithfulness assumptions. Sc or e-b ase d methods formulate DA G learning as the optimiza- tion of a scoring function, ranging from greedy search (Chick ering, 2002) to con tinuous optimization (Zheng et al., 2018b; Bello et al., 2022). These approaches av oid faithfulness assumptions and instead require b eta-min conditions for consistent reco v ery (v an de Geer and B ¨ uhlmann, 2013). Purely observ ational data identify a Mark ov equiv alence class, unless additional distributional assumptions are imp osed, suc h as non-gaussian errors (Shimizu et al., 2006) or equal error v ariances (P eters and B ¨ uhlmann, 2014). In terv entional data provide information to break Marko v equiv alence and enable full D A G iden tiﬁcation. When in terv entional targets are kno wn, Hauser and B ¨ uhlmann (2012) prop osed a score-based approac h that iden tiﬁes D AGs up to an in terv en tional Mark ov equiv alence class. W ang et al. (2017) developed a hybrid metho d that alternates b et ween structure optimization and conditional indep endence testing, while Xue et al. (2025) ex- tended con tin uous score-based optimization to in terven tional settings. These metho ds do 4 not allow latent confounding and rely on har d interv entions, under which the dep endence of the interv ention target on its paren ts is remov ed. Related work considering unkno wn in terv ention targets includes (Squires et al., 2020; Li et al., 2025; V arici et al., 2025). Sev eral lines of w orks study causal discov ery with latent confounding. A classical line of work studies causal disco very ov er acyclic directed mixed graphs (ADMGs), whic h en- co de conditional independence on the observ ed margin. Represen tativ e approac hes in- clude constraint-based metho ds such as FCI and its v arian ts (Spirtes et al., 2000; Colombo et al., 2012), score-based metho ds based on contin uous optimization (Bhattachary a et al., 2021), and approaches exploiting non-Gaussianit y (W ang and Drton, 2023). Another line of w ork imp oses structured assumptions on the error cov ariance and p erforms deconfounding through precision matrix decomp osition (F rot et al., 2019; Pal et al., 2025). Our metho d accommo dates laten t confounding without explicitly mo deling the confounding eﬀects. Most existing metho ds are not tailored to count-v alued data. Sev eral works study causal disco v ery for observ ational P oisson mo dels (Park and Raskutti, 2015; P ark and P ark, 2019) and zero-inﬂated Poisson mo dels (Choi and Ni, 2023), but they do not accoun t for laten t confounding or measurement error. A smaller b o dy of work considers DA G learning in the presence of measurement error (Zhang et al., 2018; Dai et al., 2022; Saeed et al., 2020), whic h is largely constraint-based and restricted to observ ational settings. Organization The remainder of the pap er is organized as follows. W e introduce the mo del, identiﬁabilit y results, and estimation pro cedure in Sections 2 - 4. Section 5 de- v elops theoretical guarantees, including estimation error b ounds and ﬁnite-sample DA G iden tiﬁcation results. Sim ulation studies are rep orted in Section 6, follo wed b y real data applications in Section 7. Section 8 concludes with a discussion. 5 2 Statistical Mo del for P erturb-seq Single Cell RNA- seq Data W e ﬁrst introduce some basic notations. Denote r p s “ 1 , . . . , p. F or a matrix B “ p B j k q P R p ˆ p , w e deﬁne the elemen twise max norm } B } max “ max j,k | B j k | , the sp ectral norm } B } 2 “ sup } x } 2 “ 1 } B x } 2 , matrix ℓ 1 norm } B } L 1 “ max 1 ď j ď q ř p i “ 1 | a ij | , the element wise ℓ 1 norm } B } 1 “ ř p i “ 1 ř p j “ 1 | a ij | , and the F robenius norm } B } F “ p ř j,k B 2 j k q 1 { 2 . W e denote by } B } 0 the n um b er of nonzero en tries in B , i.e., } B } 0 “ |tp j, k q : B j k ‰ 0 u| . W e consider gene expression count data collected under m ultiple interv entional environ- men ts generated b y CRISPR p erturbations. Let X p m q “ p X p m q 1 , . . . , X p m q p q J P R p denote the observ ed gene expression v ector in environmen t m P t 0 , 1 , . . . , M u , where m “ 0 corre- sp onds to the control (non-interv ened) condition. Each interv entional environmen t m ą 0 targets a single gene for p erturbation, and we denote by I p m q “ ℓ m P r p s the index of the p erturb ed gene in en vironmen t m . T o separate the measuremen t pro cess from the underlying expression pro cess in single- cell RNA sequencing data (Sark ar and Stephens, 2021), w e mo del the observed gene expres- sion coun ts through an explicit measuremen t lay er and a latent expression la yer. Sp eciﬁ- cally , for eac h environmen t m , the observ ed count X p m q j of gene j is mo deled as a Poisson random v ariable conditional on the library size L P R , the cell-level cov ariates C P R q , and the laten t true expression level Z p m q j : X p m q j | L, C , Z p m q j „ Poisson ` L exp t s j p C q ` Z p m q j u ˘ , j “ 1 , . . . , p. (1) Here, L accoun ts for sequencing depth, while s j p C q P R captures the systematic eﬀects of observ able co v ariates such as batc h eﬀects that ma y eﬀect the observed sequence coun ts for gene j . After adjusting for technical eﬀects and observ able cell heterogeneity , Z p m q j 6 represen ts the latent expression level. W e assume that Z p m q K p C , L q for all m P t 0 , . . . , p u , implying that the sequencing depth and p ossible batch eﬀects are indep endent of the ob- serv ed sequencing coun ts. T o characterize the laten t causal structure underlying gene expression, we consider a linear Gaussian structural causal mo del (SCM) on the laten t expression vector Z p m q “ p Z p m q 1 , . . . , Z p m q p q J P R p for eac h en vironmen t m P t 0 , . . . , p u . Sp eciﬁcally , we consider Z p m q “ AZ p m q ` η p m q ` ε p m q , ε p m q „ N ` 0 , Σ p m q e ˘ , p m “ 0 , . . . , p q , (2) η p m q “ η p 0 q ` α m e ℓ m , p m “ 1 , . . . , p q (3) where A P R p ˆ p is the causal co eﬃcient matrix, η p m q P R p is an en vironmen t-sp eciﬁc in tercept, α m P R is a scalar and e j P R p denotes the j -th canonical basis v ector. The sample size under the m -th en vironment is denoted b y n m . The shared co eﬃcien t matrix A in (2) induces a D AG G “ pr p s , E q on the v ertex set r p s “ t 1 , . . . , p u , where the edge set is deﬁned as E “ tp i, j q : A j i ‰ 0 u . In particular, a nonzero entry A j i corresp onds to a directed edge i Ñ j in G . W e assume that A is suc h that G is acyclic. Th us, A represen ts the laten t causal mec hanism and is assumed to be in v arian t across en vironments. More generally , for an y matrix A P R p ˆ p , we denote b y G p A q the directed graph induced b y A under this rule. In addition, model (2) assumes a me an shift b et ween the in terven tional and control en vironmen ts, captured by the con trast of the intercepts η p m q ´ η p 0 q . The exogenous noise ε p m q is mo deled as Gaussian with en vironment-speciﬁc cov ariance Σ p m q e , which ma y b e non- diagonal, allo wing p ossible latent confounding. Equation (3) further imp oses a one-sp arse mean-shift structure: only the intercept of the targeted gene ℓ m is directly shifted b y an amoun t α m , where | α m | quan tiﬁes the interv ention strength in en vironment m . The acyclicit y of A means there exists a p erm utation matrix P (a top ological order of 7 G ) such that P J AP is strictly low er triangular. In particular, I ´ A is in vertible and hence w e deﬁne B : “ p I ´ A q ´ 1 . Under this representation, the laten t v ariables satisfy Z p m q “ B ε p m q ` µ p m q , µ p m q : “ E r Z p m q s “ B η p m q . (4) Equation (4) shows that the one-sparse shift in η p m q can propagate through the causal system and induce dense changes in do wnstream genes. W e brieﬂy discuss the plausibility of the mo deling assumptions ab o ve and their con- nections to the literature. The in v ariance of A across environmen ts reﬂects the princi- ple of in v arian t causal mechanisms under interv entions, which is a standard premise in in terv ention-based causal disco v ery (Shen et al., 2025; Li et al., 2025). Mean-shift in terven tions constitute a class of soft interv entions that mo dify the condi- tional distribution of an interv ened target through shifts in its mean while preserving its dep endence on paren t v ariables, and ha v e b een widely used in interv entional causal dis- co v ery (Liu et al., 2025; Li et al., 2025). In con trast, hard or do-interv entions commonly assumed in existing interv entional causal discov ery metho ds (Hauser and B ¨ uhlmann, 2012; W ang et al., 2017; Xue et al., 2025) completely remov e the dep endence of the target on its paren ts. Soft in terven tions are often more realistic in practice, as biological p erturbations are typically partial and noisy rather than fully disruptive. The one-sparse form in (3) can b e seen as an sp ecial case of the so-called ‘sparse mechanism shift h yp othesis’ in (Sch¨ olk opf et al., 2021) and was similarly used in (Li et al., 2025). Moreo v er, the deterministic mean-shift assumption can b e generalized to a stochastic mean shift by introducing a random v ector ˜ η p m q P R p with mean η p m q , p ossibly allowing mild correlation with the error term. Our theoretical results extend naturally to this setting, since the iden tiﬁcation relies only on the ﬁrst-order momen t of ˜ η p m q . This stochastic form ulation is more realistic in practice, as it captures heterogeneous p erturbation eﬀects 8 across cells arising from v ariable p erturbation eﬃciency and incomplete p enetrance. 3 Iden tiﬁcation of the D A G Under Unmeasured Con- founders Under linear Gaussian SCMs, observ ational data alone only identify the causal structure up to a Marko v equiv alence class. By con trast, the in v ariance of the causal mechanism A across en vironmen ts, together with mean-shifts in the latent expression mo del, enables iden tiﬁcation of the underlying causal structure through cross-en vironment mean c hanges. A k ey observ ation is that the mean diﬀerence b et ween en vironment m and the control is ∆ µ p m q : “ µ p m q ´ µ p 0 q “ α m B e ℓ m , (5) whic h implies that ∆ µ p m q equals the ℓ m -th column of B up to a scalar factor. Moreo ver, under acyclicity , it is easy to verify that P J B P is unit low er triangular. This unit-diagonal constrain t ﬁxes the scale and sign of each column of B , so that the ℓ m -th column of B is iden tiﬁable from ∆ µ p m q whenev er α m ‰ 0. Provided that for each no de ℓ P r p s there exists at least one environmen t m with ℓ m “ ℓ and α m ‰ 0, stac king the recov ered columns yields iden tiﬁabilit y of B and hence of the causal mechanism A . It therefore suﬃces to iden tify the laten t means t µ p m q u M m “ 0 from the observ ed data. Under the Poisson–lognormal observ ation mo del, momen ts of the observed counts X p m q admit an explicit mapping to moments of the latent expression v ariables Z p m q . Sp eciﬁcally , letting S j p C, L q “ L exp p s j p C qq , the ﬁrst- and second-order momen ts of the scaled counts Y p m q j “ X p m q j { S j p C, L q are link ed to the momen ts of the latent Gaussian v ariable Z p m q j via µ p m q j “ log ` E r Y p m q j s ˘ ´ 1 2 Σ p m q j j , (6) 9 where Σ p m q j j “ log ˜ E “ Y p m q j p Y p m q j ´ 1 q ‰ ` E r Y p m q j s ˘ 2 ¸ . These relationships connect the mean of the latent expression mo del µ p m q to the observ able momen ts of the cov ariate-adjusted scaled counts t Y p m q j u j Pr p s . W e note that the co v ariate eﬀect s j p C q op erates solely at the measuremen t level and remains in v ariant across environ- men ts. As suc h, it can b e regarded as a n uisance that do es not aﬀect the identiﬁcation of the laten t causal structure. W e then hav e the follo wing iden tiﬁcation results. Theorem 3.1. Under the mo del (1) – (3) , the c ausal me chanism A is identiﬁable by jointly lever aging observational and interventional data, pr ovide d that e ach no de is subje ct to at le ast one non-vanishing one-sp arse me an-shift intervention in the latent expr ession mo del. Theorem 3.1 establishes identiﬁabilit y of the causal mechanism A at the population level b y jointly leveraging observ ational and interv entional data. Here, a non-vanishing mean- shift in terven tion refers to a p erturbation with mean-shift strength α m ‰ 0 in the latent expression mo del. In ﬁnite samples, successful reco v ery further requires the mean-shift magnitude | α m | to exceed a problem-dep endent low er bound, whic h dep ends on factors such as the graph density and the num b er of interv entional samples. A precise ch aracterization of this requirement is provided in Section 5. Compared with existing metho ds for causal disco v ery from interv entional data, our framew ork do es not require har d or do-type in terv entions as in (Hauser and B ¨ uhlmann, 2012; W ang et al., 2017; Xue et al., 2025). DOTEARS (Xue et al., 2025) also consid- ers linear SCMs, but assumes diagonal noise co v ariance and in v arian t exogenous v ariances for non-targeted no des across en vironmen ts, whereas our metho d allo ws non-diagonal and en vironmen t-sp eciﬁc cov ariance matrices. F or observ ational coun t data, existing identi- ﬁabilit y results either exclude measurement error (Park and P ark, 2019) or reco ver only 10 Mark o v equiv alence classes (Saeed et al., 2020). A k ey distinction is that our framew ork accommo dates latent confounding. In linear structural causal mo dels, latent confounding manifests as correlated errors (Pearl, 2009; Bhattac hary a et al., 2021), so in our setting directed edges are enco ded in the co eﬃcient matrix A , while bidirected dep endence corresponds to nonzero oﬀ-diagonal entries of the noise cov ariance matrix. F rom the p ersp ectiv e of mixed graphs, Theorem 3.1 sho ws that the directed edge structure remains identiﬁable even in the presence of laten t confounding. Unlik e approaches based on explicit cov ariance mo deling and deconfounding via precision matrix decomp osition (F rot et al., 2019; P al et al., 2025), our metho d exploits interv entional mean shifts and the fact that, under the mean-shift assumption, diﬀerences in laten t means are unaﬀected b y latent confounding, th us av oiding explicit mo deling of the confounding structure and no de-wise deconfounding pro cedures. 4 Estimation of Causal Co eﬃcien t Matrix and D A G Reco v ery In this section, we describ e the estimation pro cedure for the causal co eﬃcien t matrix A . The procedure consists of four main steps: (i) estimating and remo ving measuremen t- la y er eﬀects, (ii) mapping observ ed momen ts to latent means under the P oisson–lognormal mo del, (iii) reco vering the columns of B “ p I ´ A q ´ 1 via cross-en vironment mean shifts, and (iv) estimating A under sparsit y and acyclicit y constrain ts. The complete algorithm is summarized in Algorithm 1. W e begin b y estimating the co v ariate eﬀect s j p C q in the measuremen t la yer. In practice, s j p C q can b e estimated using a P oisson regression with oﬀset log L and predictors C . 11 Sp eciﬁcally , for each gene j , w e can estimate the co v ariate eﬀect s j p C q using a generalized linear mo del with log link and oﬀset log L , log E r X p m q ij | C i , L i s “ log L i ` β J j C i ` δ p m q j , (7) where δ p m q j are en vironmen t-sp eciﬁc intercepts that capture mean shifts and do not aﬀect iden tiﬁcation of β j . Let p β j denote the (quasi-)maximum lik eliho o d estimate, and deﬁne p s j p C i q “ C J i p β j . Using p s j p¨q , we can form the rescaled counts p Y p m q ij “ X p m q ij {p L p m q i exp p p s j p C p m q i qqq from whic h the laten t mean v ector p µ p m q is obtained by replacing the population exp ectation E r¨s in (6) with the empirical exp ectation p E r¨s as sp eciﬁed in (9). F or each interv entional en vironment m P t 1 , . . . , p u , w e compute laten t mean con trasts ∆ p µ p m q “ p µ p m q ´ p µ p 0 q and reconstruct the ℓ -th column of B by p b ℓ “ ∆ p µ p m q { ∆ p µ p m q ℓ where l “ I p m q . Collecting these columns yields p B “ r p b 1 , . . . , p b p s P R p ˆ p . Finally , to reco v er the causal matrix A , we estimate a sparse weigh ted adjacency ma- trix A under acyclicit y constraints. Given the estimate p B « p I ´ A q ´ 1 , we consider the constrained optimization problem p A P arg min A } A } 1 s.t. h p A q “ 0 , } p B p I ´ A q ´ I } max ď λ s , (8) where h p A q “ 0 enforces the D A G constraint, and the ℓ 8 constrain t controls the appro xi- mation error of p B p I ´ A q in a CLIME-t yp e regularization framework (Cai et al., 2011). W e include the acyclicity constrain t h p¨q to ensure that B is unit-diagonal. T ypical c hoices of h p¨q include h p A q “ tr ` exp p A ˝ A q ˘ ´ p as in NOTEARS (Zheng et al., 2018b), or h p A q “ ´ log | uI ´ A | ` p log u for u P R u as in DA GMA (Bello et al., 2022). T o obtain the estimated causal graph, we then apply entrywise hard thresholding to p A and form the thresholded estimator p A τ b y p p A τ q ij : “ ˆ a ij 1 ␣ | ˆ a ij | ě τ ( , for i, j P r p s . T o handle the non-smo oth ℓ 8 constrain t, we introduce an auxiliary v ariable Z “ p B p I ´ 12 Algorithm 1 PLN-intervn: Causal disco v ery from in terven tional P oisson coun t data Input: Observed data tp X p m q , L p m q , C p m q qu M m “ 0 , interv ention targets t I p m q “ ℓ m u M m “ 1 , reg- ularization lev el λ s , threshold level τ ą 0. 1. Measurement-la y er adjustmen t: Obtain p s j p C i q by ﬁtting (7) and compute scaled coun ts p Y p m q ij “ X p m q ij L` L p m q i exp t p s j p C p m q i qu ˘ . 2. Mapping observ ed momen ts to laten t means: Compute p µ p m q via the transfor- mation of empirical moments via p µ p m q j “ log ` p E r p Y p m q j s ˘ ´ 1 2 p Σ p m q j j , where p Σ p m q j j “ log ˜ p E r p Y p m q j p p Y p m q j ´ 1 qs p E r p Y p m q j s 2 ¸ . (9) 3. Column reco very of B : F or the environmen t m targeting ℓ “ I p m q : reconstruct the ℓ -th column of B by p b ℓ “ ∆ p µ p m q { ∆ p µ p m q ℓ . Collect the columns into a matrix to obtain p B “ r p b 1 , . . . , p b p s P R p ˆ p . 4. Sparse D A G estimation of A : Obtain p A b y solving (8). Output: Estimated causal matrix p A and recov ered graph p G “ G p p A τ q . A q ´ I and rewrite problem (8) equiv alen tly as min A,Z } A } 1 s.t. Z “ p B p I ´ A q ´ I , } Z } max ď λ s , h p A q “ 0 . (10) This reform ulation separates the non-smo oth constraint from the acyclicity constraint and facilitates eﬃcien t optimization. W e solve (10) using an augmented Lagrangian approac h com bined with the alternating direction metho d of multipliers (ADMM). In tro ducing a dual v ariable V P R p ˆ p for the linear constrain t Z “ p B p I ´ A q ´ I , the augmented Lagrangian is giv en b y L ρ p A, Z , V q “ } A } 1 ` α h p A q ` µ 2 h p A q 2 ` ρ 2 › › Z ´ ` p B p I ´ A q ´ I ˘ ` V ρ › › 2 F ´ 1 2 ρ } V } 2 F , (11) where ρ ą 0 is the ADMM p enalt y parameter, and α, µ ą 0 are the Lagrange multiplier and quadratic p enalt y co eﬃcient asso ciated with the acyclicit y constrain t h p A q “ 0. 13 ADMM Up dates. Given the augmen ted Lagrangian (11), the ADMM iterations pro- ceed by alternating up dates of the primal v ariables Z and A , follow ed by a dual up date of V . A t iteration t ` 1, the up dates are: Z p t ` 1 q “ Π }¨} max ď λ s ` p B ´ p B A p t q ´ I ´ 1 ρ V p t q ˘ , A p t ` 1 q “ arg min A ρ 2 } p B A ´ C p t ` 1 q } 2 F ` α h p A q ` µ 2 h p A q 2 ` } A } 1 , where C p t ` 1 q “ p B ´ I ´ Z p t ` 1 q ´ 1 ρ V p t q , V p t ` 1 q “ V p t q ` ρ ` Z p t ` 1 q ´ p p B ´ p B A p t ` 1 q ´ I q ˘ . (12) Here Π }¨} max ď λ s denotes the element wise pro jection onto the ℓ 8 ball of radius λ s , imple- men ted via element wise truncation to r´ λ s , λ s s . The up date of A p t q is con vex apart from the acyclicity constraint and is solved using a pro ximal gradien t metho d. The iteration terminates when the primal and dual residual norms, relativ e change in A , and the acyclic- it y violation | h p A q| fall b elo w presp eciﬁed tolerances. More implementation details are pro vided in Supplemen tary Material Section A.1. Choice of λ s . The sparsit y parameter λ s is selected using a pseudo-BIC (PBIC) criterion. F or each candidate λ s , w e deﬁne the criterion PBIC p λ s q “ } p B p I ´ p A p λ s qq ´ I } 2 F } p B ´ I } 2 F ` C w ¨ } p A p λ s q} 0 p p p ´ 1 q{ 2 , (13) where C w ą 0 con trols the trade-oﬀ b etw een mo del ﬁt and sparsit y . In our implemen tation w e set C w “ 2 and select λ s b y minimizing PBIC p λ s q o v er a grid of candidate v alues. 5 Theoretical Prop erties In this section, we inv estigate the estimation error of the causal co eﬃcient matrix and the graph recov ery consistency for the prop osed estimator in Section 4. W e ﬁrst introduce additional notations. 14 Let A ˚ P R p ˆ p denote the true co eﬃcient matrix of the laten t Gaussian structural causal mo del, B ˚ : “ p I ´ A ˚ q ´ 1 , and G ˚ “ G p A ˚ q . Let µ p m q˚ “ B ˚ η p m q denote the true latent mean. F or τ ą 0, deﬁne the entrywise hard-thresholded matrix A τ b y p A τ q ij : “ a ij 1 ␣ | a ij | ě τ ( , 1 ď i, j ď p . Deﬁne n min “ min m Pr p sYt 0 u n m . W e make the follo wing assumptions. Assumption 5.1 ( Moments ) . Supp ose ther e exist some c onstants r, c, δ, c min ą 0 such that (i) p ď cn r , (ii) E | Y p m q j | 4 ` 8 r ` δ ď K , and (iii) E r Y p m q j s ě c min for j P r p s and m P r p s Y t 0 u . Assumption 3.1 imposes momen t conditions on the scaled observ ed P oisson coun ts to ensure w ell-b eha ved concen tration of momen t estimators. It is mild for P oisson-lognormal mo dels, as Gaussian laten t v ariables imply ﬁnite moments for the P oisson rates and the scaled counts; similar conditions are standard in related work (Cai et al., 2011; Xiao et al., 2022). The lo wer b ound E r Y p m q j s ě c min excludes degenerate v ariables and ensures stable momen t estimation. Assumption 5.2 ( Mean Shift Strength ) . L et t ą 0 b e some smal l c onstant. Assume that the str ength of the me an shift is str ong enough, that is, min m Pr p s | α m | ě ∆ min , wher e ∆ min ě c 1 a log p { n min and c 1 “ 8 a p 2 K ` 1 qp t ` 3 q . Assumption 3.2 sp eciﬁes a minimal in terven tion strength for ﬁnite-sample identiﬁabilit y , serving as a ﬁnite-sample counterpart of p opulation-level mean-shift conditions in literature of interv ention-based causal disco very . The requirement ∆ min Á a log p { n min matc hes the in trinsic noise lev el and is therefore statistically necessary . Moreov er, it is empirically v eriﬁable since ∆ µ p m q ℓ m can b e estimated from sample means across en vironments. The follo wing lemma establishes a uniform tail b ound for the estimation error of the sample means across all v ariables and en vironmen ts. 15 Lemma 5.1. Under Assumption 5.1, it holds that P ˜ max m,j | ˆ µ p m q j ´ µ p m q˚ j | ě c 1 4 c min c log p n min ¸ “ O p n ´ δ 4 min ` p ´ r q . Theorem 5.2 provides ﬁnite-sample error b ounds for estimating the laten t causal co- eﬃcien t matrix A ˚ under in terv entional Poisson coun t data. The en trywise b ound } p A ´ A ˚ } max “ O ´ p c min ∆ min q ´ 1 a log p { n min ¯ is the key rate for edge-wise accuracy and supp ort reco v ery , while the sp ectral and F rob enius b ounds further quantify global estimation error under w eak row-sparsit y s 0 p p q for q P r 0 , 1 q . The app earance of ∆ min in the denominator reﬂects the impact of mean-shift signal strength on estimating the co eﬃcient matrix A , as weak er mean shifts reduce the separability b et ween control and in terv entional data and necessarily inﬂate the estimation error. The dep endence on s 0 p p q captures the eﬀect of graph densit y , with larger s 0 p p q corresp onding to denser underlying graphs and leading to increased estimation error in the sp ectral and F rob enius norms. Theorem 5.2 ( Estimation ) . Assume that } I ´ A ˚ } L 1 ď M for some lar ge M ą 0 and A ˚ is we akly sp arse in the sense that max j ř p k “ 1 | a ˚ j k | q ď s 0 p p q for some 0 ď q ă 1 . Under Assumptions 5.1 and 5.2, and by cho osing λ s “ c 1 M {p c min ∆ min q a log p { n min , it holds that } p A ´ A ˚ } max ď 4 c 1 M 2 c min ∆ min c log p n min , } p A ´ A ˚ } 2 ď c 2 p 4 M q 1 ´ q p c 1 M c min ∆ min c log p n min q 1 ´ q s 0 p p q , } p A ´ A ˚ } 2 F p ď c 2 p 4 M q 2 ´ q p c 1 M c min ∆ min c log p n min q 2 ´ q s 0 p p q , with pr ob ability gr e ater than 1 ´ O p n ´ δ { 4 min ` p ´ t q , wher e c 2 “ 1 ` 2 1 ´ q ` 3 1 ´ q . T o characterize the reconstructabilit y of the underlying DA G, we deﬁne the index ζ : “ ζ p A ˚ q “ inf A : G p A τ q‰ G ˚ , h p A q“ 0 , } A } 1 ď} A ˚ } 1 } B ˚ p I ´ A q ´ I } max . 16 The index ζ p A ˚ q measures the minimal separation b et ween the true mo del and any alter- nativ e co eﬃcien t matrix whose thresholded graph diﬀers from G ˚ , and is closely related in spirit to the degree of reconstructabilit y introduced in Y uan et al. (2019). Assumption 5.3 ( D A G Reconstructability ) . Assume that ζ ě c 3 a log p { n min , wher e c 3 “ p c 1 ` 2 ? 2 q M {p c min ∆ min q . Deﬁne a min : “ min p j,k q : A ˚ j k ‰ 0 | A ˚ j k | . A suﬃcient condition for Assumption 5.3 is that a min ě 2 τ and τ ě p 1 ` M q c 3 a log p { n min , where τ is the hard-threshold level. It is essen tially a b eta-min t yp e condition similarly used in (Cai et al., 2011; v an de Geer and B ¨ uhlmann, 2013; Y uan et al., 2019; P ark and P ark, 2019). Remark 5.1. We r emark that the b eta-min c ondition is not dir e ctly c omp ar able to the str ong faithfulness assumption; the former is a lo c al signal str ength c ondition c ommonly use d in high-dimensional settings, wher e as the latter is a glob al distributional assumption. Se e van de Ge er and B¨ uhlmann (2013) for detaile d discussion. Theorem 5.3 ( Graph Reco v ery ) . Denote κ “ p ζ ´ λ s q c min ∆ min { M wher e λ s is deﬁne d in The or em 5.2. Under Assumptions in The or em 5.2, it holds that P p p G “ G ˚ q ą P p sign p p A τ q “ sign p A ˚ qq ą 1 ´ c 4 exp ˜ ´ n min 4 min # κ 2 K , 3 κ d p log p q 3 n min + ` 2 log p ¸ , for some c onstant c 4 ą 4 . F urther, under Assumption 5.3 and as n min Ñ 8 , we have P p p G “ G ˚ q ą P p sign p p A τ q “ sign p A ˚ qq Ñ 1 . Theorem 5.3 establishes ﬁnite-sample recov ery of the latent causal D A G via thresholding the estimated co eﬃcien t matrix. The con vergence probabilit y is go verned b y the degree of D AG reconstructability ζ and the mean-shift strength ∆ min through κ : larger v alues of either quan tity accelerate con vergence of P p p G “ G ˚ q to one, while w eak er reconstructabilit y or mean shifts slow down reco very . 17 6 Sim ulation Ev aluations 6.1 Setup Data generation. W e consider a mean-shift setting of the laten t linear Gaussian struc- tural mo del described in Section 2, and examine the p erformance of diﬀeren t methods under diﬀeren t graph densities and mean-shift strengths. The underlying causal mechanism A was generated from an Erd˝ os–R´ enyi mo del under a random top ological ordering to ensure acyclicit y . F or each ordered pair of nodes i ă j , a directed edge i Ñ j was added indep enden tly with probabilit y such that the exp ected a v erage no de degree of d , where d “ 2 for the low-densit y setting and d “ 2 for the high-densit y setting. Nonzero edge weigh ts w ere dra wn uniformly from ˘r 0 . 3 , 0 . 6 s . The in tercept v ector in the control data w as ﬁxed as η p 0 q “ p´ 0 . 5 , . . . , ´ 0 . 5 q . F or in terv entional data, we consider setting the mean shift strength at diﬀeren t levels, α m P t´ 2 , ´ 4 u . Library sizes were sampled indep endently as L i „ Lognormal p log 10 , 0 . 1 2 q . F or simplicit y , no additional co v ariates C i w ere included. T o in tro duce laten t confounding, we consider p erv asive low-rank confounders by set- ting the n um b er of unmeasured factors to n uc “ 2 and generating a loading matrix U P R p ˆ n uc with i.i.d. en tries from N p 0 , σ 2 c q , where σ c con trols the confounding strength. The noise co v ariance in all environmen ts is sp eciﬁed as Σ p 0 q e “ diag p Uniform p 0 . 5 , 0 . 8 qq ` U U J , where the lo w-rank term U U J induces residual correlations across genes. W e consider σ c P t 0 , 0 . 25 , 0 . 5 u , graph dimension p P t 50 , 100 u , control sample size n 0 “ 5000, and in terv entional sample size n m P t 100 , 200 , 500 u for eac h m P r p s . Comp eting metho ds. W e compare the prop osed metho ds with a range of existing ap- proac hes for DA G learning. NOTEARS (Zheng et al., 2018a), LiNGAM (Shimizu et al., 18 2006), and MRS-P oissonSEM (P ark and Park, 2019) estimate DA Gs from observ ational data, among which MRS-P oissonSEM is sp eciﬁcally designed for P oisson count data, while the others rely on contin uous-v alued assumptions. DOTEARS (Xue et al., 2025), GIES (Hauser and B ¨ uhlmann, 2012), and IGSP (W ang et al., 2017) lev erage in terven tional data for causal identiﬁcation. Anchor-PLN-PC (Saeed et al., 2020) accommo dates the P oisson log-normal mo del while explicitly accounting for measurement error and sample heterogene- it y . It ﬁrst estimates an undirected dep endency graph and then applies the PC algorithm to recov er the corresponding Marko v equiv alence class of the underlying D AG. Finally , PLN-in tervn is the prop osed metho d that exploits in terven tional Poisson log-normal data. The prop osed metho ds and Anchor-PLN-PC explicitly mo del library size eﬀects and w ere thus applied directly to the ra w coun t data. F or comp eting methods without an explicit library size comp onent, coun ts were ﬁrst normalized b y the library size L . F urther- more, except for MRS-P oisson-SEM, metho ds based on approximate con tin uous assump- tions w ere applied to log p 1 ` x q -transformed normalized coun ts. F or all metho ds, the estimated adjacency matrix p A was thresholded at τ “ 0 . 01, yield- ing p A τ as the ﬁnal estimate. Additional sim ulations examining sensitivit y to diﬀeren t thresholds are rep orted in Supplementary Material Section A.2. Ev aluation metrics. T o assess graph reco very p erformance, w e ev aluate edge-lev el accu- racy using precision, recall, and the F1 score, and structural accuracy using the Structural Hamming Distance (SHD). Let TP , FP , FN, and TN denote the n umbers of true p ositives, false p ositives, false negatives, and true negatives of the directed edge estimation. The pre- cision and recall are deﬁned as Precision “ TP {p TP ` FP q , and Recall “ TP {p TP ` FN q . The F1 score is the harmonic mean of precision and recall, F1 “ 2 Precision ˆ Recall Precision ` Recall “ 2 TP 2 TP ` FP ` FN , 19 whic h provides an edge-level measure of graph reco very accuracy b y balancing false dis- co v eries and missed edges. The structural Hamming distance (SHD) is a global metric that counts the num b er of edge additions, deletions, and reversals needed to transform the estimated graph in to the true graph, with smaller v alues indicating more accurate reco very . 6.2 Results W e rep ort in the main pap er the F1 score and SHD under mo derate laten t confounding ( σ c “ 0 . 25) with graph dimension p “ 50. Additional results are deferred to Supplemen- tary Material Section A.2, including Precision and Recall under this setting, robustness to diﬀeren t confounding levels, sim ulations in higher dimensions, and sensitivity to threshold c hoices. Figures 1 and 2 summarize the p erformance of diﬀeren t methods across v arying graph densities, mean-shift strengths, and interv entional sample sizes, measured by F1 score and SHD, resp ectively . W e highlight sev eral key observ ations b elo w. First, the p erformance of the proposed method improv es systematically with stronger mean shifts, lo wer graph density , and larger interv entional sample sizes, as reﬂected by higher F1 scores and low er SHD. These trends are consistent with the theoretical results in Theorem 5.3, where larger mean-shift strength ∆ min , larger sample size n min , and smaller M “ } I ´ A ˚ } L 1 lead to faster conv ergence of the estimated graph to the true DA G. Second, compared with observ ational-data-based methods, PLN-in tervn consisten tly outp erforms NOTEARS, LiNGAM, and MRS-P oisson-SEM across nearly all regimes in terms of b oth F1 score and SHD. In more challenging settings characterized by high graph densit y and w eak mean shifts, MRS-Poisson-SEM can ac hieve slightly higher F1 scores than the prop osed metho d when the in terven tional sample size is small. How ever, this marginal adv an tage disapp ears once the in terven tional sample size b ecomes mo derate. Moreo ver, 20 sensitivit y analysis in Supplementary Material Section A.2.5 sho ws that MRS-P oisson- SEM is m uch more sensitive to the threshold τ than our metho d, and the rep orted results use a τ fav orable to MRS-Poisson-SEM. Figure 1: F1 score of diﬀerent methods under diﬀerent graph densities, diﬀerent mean shift strengths, and v arying sample sizes of in terv entional data. Third, w e observ e that in this setting, the in terven tional-data-based metho ds, including DOTEARS, GIES, and IGSP , can ev en underp erform the observ ation-based MRS-Poisson- SEM. This phenomenon may b e attributed to tw o factors. First, these metho ds are pri- marily designed for hard or do-type in terv entions, and their p erformance may degrade under the soft mean-shift in terven tions considered here. Second, these approaches do not explicitly mo del the coun t-v alued nature of the data, whic h can lead to inferior p erfor- mance compared to MRS-Poisson-SEM, despite the latter relying only on observ ational 21 data. This observ ation further highligh ts the imp ortance of appropriately mo deling count data in causal discov ery for p erturbation studies. Figure 2: SHD of diﬀerent methods under diﬀeren t graph densities, diﬀerent mean shift strengths, and v arying sample sizes of in terv entional data. Finally , although Anchor-PLN-PC explicitly accounts for count-v alued data and mea- suremen t error, its F1 score remains relatively low in our simulations. A p otential expla- nation is that the PC algorithm requires estimating the co v ariance matrix of the laten t v ariables in (2), which can b e unstable in high-dimensional settings. Moreov er, ev en with a reliable co v ariance estimate, PC-based metho ds are only guaranteed to recov er a Marko v equiv alence class. In con trast, by lev eraging interv entional data, the prop osed metho d is able to identify the full causal D A G, leading to impro v ed structural reco very . 22 7 Application to P erturb-seq dataset W e analyze a single-cell CRISPR p erturbation dataset from the Perturb-seq study of Re- plogle et al. (2022), fo cusing on the K562 chronic my elogenous leukemia cell line. In this exp erimen t, K562 cells expressing a CRISPR interference (CRISPRi; dCas9–KRAB) sys- tem were sub jected to p o oled p erturbations targeting appro ximately 2,000–2,500 essential genes. These genes are deﬁned as those required for cell viability and proliferation, and are enriched for core cellular pro cesses such as transcription, translation, RNA pro cessing, and cell cycle regulation. Single-cell RNA sequencing with guide capture w as used to link eac h p erturbation to its transcriptomic resp onse, enabling systematic characterization of gene regulatory eﬀects at single-cell resolution. After prepro cessing to remov e genes with insuﬃcient p erturb ed-cell counts, extreme sparsit y , or weak p erturbation eﬀects, 794 genes remain, of which we retain the 200 most v ariable genes in con trol cells for our analysis. The resulting dataset con tains 10,691 con- trol cells and an a verage of 194 p erturb ed cells p er in terv ention, with a minimum of 101 p erturb ed cells p er gene. T o construct a biologically informed reference graph, we emplo y ed a transcription fac- tor–target interaction netw ork deriv ed from ChIP-seq data (Chev alley et al., 2025), with in teractions restricted to those relev ant to or observ ed in the K562 cell line. Within the selected 200 genes, this reference net work con tains 638 directed edges, with a graph densit y 0.016. W e compare the prop osed metho d with the comp eting approaches describ ed in Sec- tion 6. Detailed prepro cessing and implementation steps are provided in Supplemen tary Material Section B.1. The ChIP-seq–deriv ed transcription factor–target netw ork provides a useful but inher- en tly limited reference for ev aluating the learned D A G from p erturb-seq data. While it 23 oﬀers indep enden t evidence of regulatory relationships, it reﬂects ph ysical TF–DNA bind- ing rather than functional or causal eﬀects on gene expression. Thus, agreement supp orts biological plausibilit y , particularly for direct in teractions, whereas discrepancies ma y arise from con text-sp eciﬁc regulation, indirect eﬀects captured by p erturb-seq, or incomplete and noisy ChIP-seq cov erage. Accordingly , this comparison should b e view ed as a comple- men tary , rather than deﬁnitiv e, v alidation of causal structure. 7.1 Globle reco v ery Figure 3 presents the precision and recall of diﬀerent metho ds as functions of the threshold parameter τ applied to p A τ , with τ v arying from 0.15 to 0.001 in steps of 0.001. F or constrain t-based metho ds, including IGSP and Anc hor-PLN, the signiﬁcance lev el is ﬁxed at 0.05, and the results are shown as single p oints. The left v ertical axis rep orts precision and the right vertical axis reports recall. Supplementary Material Section B.4 pro vides a complemen tary no de-lev el ev aluation, where precision and recall are computed separately for eac h source gene. Figure 3 rev eals three main ﬁndings. First, the prop osed metho d consistently achiev es the highest precision across the entire range of thresholds, with its precision curve uniformly dominating those of all comp eting metho ds. When τ ă 0 . 1, its recall also substantially exceeds that of other approaches. In contrast, comp eting metho ds fail to surpass the densit y baseline in precision and exhibit limited recall. These results suggest that the prop osed metho d more eﬀectiv ely exploits in terv en tional information to reco v er regulatory edges. Second, among the comp eting approaches, MRS-PoissonSEM and IGSP p erform com- parativ ely b etter than others. MRS-PoissonSEM attains slightly higher precision within a narro w threshold range, whereas IGSP achiev es higher recall. LiNGAM, DOTEARS, and 24 NOTEARS exhibit mo derate precision but generally low recall, while Anc hor-PLN fails to iden tify edges on this dataset. Figure 3: Precison-Recall curves of diﬀeren t metho ds on ChIP-seq dataset. Third, the prop osed metho d demonstrates relativ ely stable b ehavior with resp ect to threshold v ariation. As the threshold decreases, recall increases substan tially and steadily while precision remains largely stable. Con versely , as the threshold increases, precision gradually impro ves, indicating that edges selected under stricter thresholds tend to b e more informativ e and reliable. In contrast, metho ds suc h as MRS-PoissonSEM and DOTEARS are relativ ely more sensitive to threshold c hoice. 7.2 Distributional shifts in downstream and upstream genes T o examine whether the inferred causal directions are supp orted b y in terv entional re- sp onses, we analyzed distributional shifts following gene p erturbations based on the es- timated causal graph G p p A τ q by diﬀeren t metho ds. F or each p erturb ed gene, w e iden tiﬁed its upstream (ancestors) and downstream (descendants) genes according to the directed 25 structure of G p p A τ q . W e use τ “ 0 . 15 for the prop osed metho d and τ “ 0 . 03 for the com- p eting methods, selected based on Figure 3 to balance precision and recall and main tain comparable net work sparsity . W e then p erformed tw o-sample Kolmogoro v–Smirnov (KS) tests comparing gene expression b et ween the corresp onding p erturbation en vironment and the control condition for all other genes, and partitioned the resulting KS p -v alues into up- stream and do wnstream groups. Under a correctly inferred causal structure, do wnstream genes are exp ected to exhibit stronger distributional shifts, reﬂected b y smaller KS p -v alues. Figure 4: Distributional shifts quan tiﬁed b y ´ log 10 p -v alues from t wo-sample Kolmogoro v– Smirno v (KS) tests comparing p erturb ed and con trol conditions. Upstream (ancestors) and downstream (descendan ts) genes are deﬁned according to the estimated causal graph G p p A τ q obtained from obtained from seven diﬀerent methods. The upp er panel p o ols all gene–p erturbation pairs and the lo wer panel summarizes medians at the perturbation lev el. Figure 4 presen ts the results for diﬀerent metho ds at t wo complemen tary lev els. The 26 upp er panel p o ols all p erturbation–gene pairs and compares the distributions of ´ log 10 KS p -v alues b et ween do wnstream and upstream genes for eac h metho d. F or the pro- p osed metho d, do wnstream genes displa y larger v alues than upstream genes, indicating stronger p erturbation-induced shifts. In con trast, for alternative methods, the separa- tion b etw een downstream and upstream distributions is substan tially weak er. Notably , for MRS-P oissonSEM, upstream genes exhibit slightly larger ´ log 10 KS p -v alues than do wn- stream genes on av erage, suggesting rev ersal or misidentiﬁcation of causal directions. This observ ation is consisten t with the netw ork structure shown in Supplementary Section B.3, where MRS-P oissonSEM tends to pro duce graphs dominated by high in-degree no des. The lo wer panel of Figure 4 further assesses p erturbation-level consistency b y summa- rizing eac h p erturbation separately . F or each metho d and p erturbation, we compare the median ´ log 10 KS p -v alues of its downstream and upstream gene sets. Eac h p oint corre- sp onds to one p erturbation. F or the prop osed metho d, most p oin ts lie ab ov e the diagonal line and are lo cated tow ard the upp er-left region of the panel, indicating that downstream genes consistently exhibit stronger distributional shifts across p erturbations. In con trast, for the comp eting metho ds, many p oin ts cluster near the origin, suggesting weak or indis- tinguishable distributional diﬀerences b etw een upstream and downstream gene sets. 7.3 Inferred gene regulatory netw ork Figure 5 visualizes the directed gene regulatory net w ork inferred from the estimated co ef- ﬁcien t matrix p A τ obtained b y the proposed PLN-intervn metho d with threshold τ “ 0 . 2. Eac h no de represents a gene, and eac h directed edge corresp onds to a regulatory relation- ship inferred from the p erturbation data. Node size reﬂects the total num b er of incident edges, capturing the o v erall connectivity of eac h gene in the net work. No de color enco des 27 the relative balance betw een outgoing and incoming regulation, quan tiﬁed b y the ratio d out {p d out ` d in q , where d out and d in denote the out-degree and in-degree of the corresp ond- ing gene, resp ectively . Edges highlighted in red indicate regulatory relationships that are supp orted b y the ChIP-seq dataset. W e also pro vide inferred gene regulatory netw orks b y other candidate metho ds in Supplementary Material Section B.3. Figure 5: Gene regulatory netw ork inferred b y PLN-intervn ( τ “ 0 . 2). No de size indicates connectivit y , and no de color indicates the out-/in-degree ratio. Insp ection of the inferred netw ork rev eals a small num b er of h ub genes with markedly large out-degrees, including HSP A5, BOP1, UBTF, GA T A1, and SF3B1. Among these, UBTF and GA T A1 are well-established transcriptional regulators (F erreira et al., 2005; Sanij et al., 2015), and man y of their outgoing edges are supp orted by ChIP-seq data, lending supp ort to the biological plausibility of the inferred structure. Notably , among the 28 relativ ely few true edges identiﬁed by comp eting metho ds suc h as MRS-PoissonSEM and GIES, sev eral outgoing edges from UBTF and GA T A1 are also reco v ered. HSP A5 emerges as the most prominent h ub gene, which is biologically plausible given its cen tral role in regulating stress-resp onse and cell-cycle programs. Its strongest inferred target, HSP90B1, is lik ewise inv olved in endoplasmic reticulum protein folding, further supp orting this reg- ulatory relationship (Steiner, 2022). More detailed biological interpretation is provided in Supplemen tary Material Section B.2. 8 Discussion This pap er studies causal discov ery from in terv entional coun t data motiv ated by single-cell p erturbation experiments suc h as Perturb-seq. These data present k ey challenges for ex- isting metho ds, including discrete observ ations, substan tial measurement noise, and laten t cellular heterogeneity . Most current approaches assume con tin uous data with indep endent noise, limiting their applicabilit y in this setting. W e address these challenges by prop osing a laten t structural framew ork that separates the causal mec hanism on laten t gene expres- sion from the Poisson measuremen t pro cess. Under a mean-shift in terven tion design, we establish iden tiﬁability of the latent causal structure through cross-en vironment shifts in laten t means. W e further develop an estimation pro cedure that com bines momen t-based reco v ery of latent mean shifts with sparse D A G estimation, and provide ﬁnite-sample guar- an tees. Sim ulations and analysis of P erturb-seq data demonstrate impro ved p erformance when b oth coun t structure and interv entional design are prop erly accoun ted for. Sev eral extensions are p ossible. First, the framew ork is not limited to the Poisson–lognormal mo del; it can b e extended to other count mo dels, suc h as zero-inﬂated or drop out-aw are P oisson mo dels (Saeed et al., 2020), pro vided suitable moment relationships are a v ail- 29 able. Second, while w e focus on mean-shift interv entions, other types ma y also b e infor- mativ e. In particular, v ariance-shift interv entions that mo dify the noise co v ariance, e.g., Σ p m q e “ Σ p 0 q e ` β m e ℓ m e J ℓ m , could enable iden tiﬁcation via second-order con trasts, since the leading eigen vector of Σ p m q e ´ Σ p 0 q e reco v ers the ℓ m -th column of B up to scale. This raises the question of whether identiﬁabilit y results analogous to Theorem 3.1 can b e established in this setting. Ho wev er, such approaches typically require larger sample sizes to estimate co v ariance structures reliably , lea ving open the question of ho w to develop metho ds suitable for small-sample regimes. Data Av ailabilit y The data that supp ort the ﬁndings of this study are op enly a v ailable at https://gwps.wi.mit.edu. Ac kno wledgemen ts This researc h is supp orted b y NIH gran ts GM129781 and HG013841. Supplemen tary Materials The supplemen tary material includes additional sim ulation details and ev aluations, detailed results for real data application and all the pro ofs. References Bello, K., B. Aragam, and P . K. Ravikumar (2022). Dagma: Learning directed acyclic graphs with gradien t-based metho ds. In Pr o c e e dings of the 39th International Confer enc e 30 on Machine L e arning (ICML) , pp. 1777–1793. PMLR. Bhattac hary a, R., T. Nagara jan, D. Malinsky , and I. Shpitser (2021). Diﬀeren tiable causal disco v ery under unmeasured confounding. In International Confer enc e on Artiﬁcial In- tel ligenc e and Statistics , pp. 2314–2322. PMLR. Cai, T. T., W. Liu, and X. Luo (2011). Constrained ℓ 1 minimization for sparse precision matrix estimation. Journal of the Americ an Statistic al Asso ciation 106 (494), 594–607. Chev alley , M., Y. H. Ro ohani, A. Mehrjou, J. Lesk ov ec, and P . Sch wab (2025). A large-scale b enc hmark for netw ork inference from single-cell p erturbation data. Communic ations Biolo gy 8 (1), 412. Chic k ering, D. M. (2002). Optimal structure identiﬁcation with greedy search. Journal of machine le arning r ese ar ch 3 (No v), 507–554. Choi, J. and Y. Ni (2023). Mo del-based causal discov ery for zero-inﬂated coun t data. Journal of Machine L e arning R ese ar ch 24 (200), 1–32. Colom b o, D., M. H. Maath uis, M. Kalisch, and T. S. Ric hardson (2012). Learning high- dimensional directed acyclic graphs with latent and selection v ariables. The A nnals of Statistics , 294–321. Dai, H., P . Spirtes, and K. Zhang (2022). Indep endence testing-based approach to causal disco v ery under measuremen t error and linear non-gaussian mo dels. A dvanc es in Neur al Information Pr o c essing Systems 35 , 27524–27536. F erreira, R., K. Ohneda, M. Y amamoto, and S. Philipsen (2005). Gata1 function, a paradigm for transcription factors in hematop oiesis. Mole cular and Cel lular Biol- o gy 25 (4), 1215–1227. 31 F rot, B., P . Nandy , and M. H. Maathuis (2019). Robust causal structure learning with some hidden v ariables. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dol- o gy 81 (3), 459–487. Hauser, A. and P . B ¨ uhlmann (2012). Characterization and greedy learning of in terven tional mark o v equiv alence classes of directed acyclic graphs. The Journal of Machine L e arning R ese ar ch 13 (1), 2409–2464. Heinze-Deml, C., M. H. Maathuis, and N. Meinshausen (2018). Causal structure learning. A nnual R eview of Statistics and Its Applic ation 5 (1), 371–391. Li, J., B. B. Ch u, I. F. Scheller, J. Gagneur, and M. H. Maath uis (2025). Ro ot cause disco v ery via p ermutations and cholesky decomp osition. Journal of the R oyal Statistic al So ciety Series B: Statistic al Metho dolo gy , qk af066. Liu, E., J. Zhang, and C. Uhler (2025). Learning genetic p erturbation eﬀects with v aria- tional causal inference. bioRxiv , 2025–06. P al, S., J. O’quinn, K. Aryan, H. Pua, J. P . Long, and A. Asiaee (2025). Dag decoration: Con tin uous optimization for structure learning under hidden confounding. arXiv pr eprint arXiv:2510.02117 . P ark, G. and S. Park (2019). High-dimensional p oisson structural equation mo del learning via l 1 -regularized regression. Journal of Machine L e arning R ese ar ch 20 (95), 1–41. P ark, G. and G. Raskutti (2015). Learning large-scale p oisson dag mo dels based on o verdis- p ersion scoring. A dvanc es in neur al information pr o c essing systems 28 . P earl, J. (2009). Causality: Mo dels, R e asoning, and Infer enc e (2 ed.). Cambridge: Cam- bridge Univ ersit y Press. 32 P eters, J. and P . B ¨ uhlmann (2014). Iden tiﬁability of gaussian structural equation mo dels with equal error v ariances. Biometrika 101 (1), 219–228. Replogle, J. M., R. A. Saunders, A. N. P ogson, J. A. Hussmann, A. Lenail, A. Guna, L. Mascibroda, E. J. W agner, K. Adelman, G. Lithwic k-Y anai, et al. (2022). Map- ping information-ric h genot yp e-phenotype landscap es with genome-scale p erturb-seq. Cel l 185 (14), 2559–2575. Saeed, B., A. Bely aev a, Y. W ang, and C. Uhler (2020). Anc hored causal inference in the presence of measuremen t error. In Confer enc e on unc ertainty in artiﬁcial intel ligenc e , pp. 619–628. PMLR. Sanij, E., J. Diesch, A. Lesmana, G. Poortinga, N. Hein, G. Lidgerwoo d, D. P . Cameron, J. Ellul, G. J. Go o dall, L. H. W ong, A. S. Dhillon, N. Hamdane, L. I. Roth blum, R. B. P earson, I. Haviv, T. Moss, and R. D. Hannan (2015). A no vel role for the p ol i tran- scription factor ubtf in maintaining genome stabilit y through the regulation of highly transcrib ed p ol ii genes. Genome R ese ar ch 25 (2), 201–212. Sark ar, A. and M. Stephens (2021). Separating measuremen t and expression mo dels clariﬁes confusion in single-cell rna sequencing analysis. Natur e genetics 53 (6), 770–777. Sc h¨ olkopf, B., F. Lo catello, S. Bauer, N. R. Ke, N. Kalch brenner, A. Goy al, and Y. Bengio (2021). T ow ard causal represen tation learning. Pr o c e e dings of the IEEE 109 (5), 612–634. Shalem, O., N. E. Sanjana, E. Hartenian, X. Shi, D. A. Scott, T. S. Mikkelsen, D. Hec kl, B. L. Eb ert, D. E. Ro ot, J. G. Do enc h, et al. (2014). Genome-scale crispr-cas9 kno ck out screening in human cells. Scienc e 343 (6166), 84–87. Shen, X., P . B ¨ uhlmann, and A. T aeb (2025). Causality-orien ted robustness: exploiting 33 general noise in terven tions in linear structural causal mo dels. Journal of the A meric an Statistic al Asso ciation (just-accepted), 1–20. Shimizu, S., P . O. Ho y er, A. Hyv¨ arinen, and A. Kerminen (2006). A linear non-gaussian acyclic mo del for causal discov ery . Journal of Machine L e arning R ese ar ch 7 , 2003–2030. Spirtes, P ., C. N. Glymour, and R. Scheines (2000). Causation, pr e diction, and se ar ch . MIT press. Squires, C. and C. Uhler (2023). Causal structure learning: A com binatorial p ersp ective. F oundations of Computational Mathematics 23 (5), 1781–1815. Squires, C., Y. W ang, and C. Uhler (2020). Perm utation-based causal structure learning with unkno wn in terven tion targets. In Confer enc e on Unc ertainty in A rtiﬁcial Intel li- genc e , pp. 1039–1048. PMLR. Steiner, L. A. (2022). Helping gata1 mak e complex decisions. Blo o d 139 (24), 3457–3459. Uhler, C., G. Raskutti, P . B ¨ uhlmann, and B. Y u (2013). Geometry of the faithfulness assumption in causal inference. The A nnals of Statistics , 436–463. v an de Geer, S. and P . B ¨ uhlmann (2013). ℓ 0 -p enalized maximum likelihoo d for sparse directed acyclic graphs. The Annals of Statistics 41 (2), 536–567. V arici, B., E. Acart ¨ urk, K. Shanmugam, A. Kumar, and A. T a jer (2025). Score-based causal representation learning: Linear and general transformations. Journal of Machine L e arning R ese ar ch 26 (112), 1–90. W ang, Y., L. Solus, K. D. Y ang, and C. Uhler (2017). P ermutation-based causal inference algorithms with in terven tions. In Pr o c e e dings of the 31st International Confer enc e on 34 Neur al Information Pr o c essing Systems , NIPS’17, Red Ho ok, NY, USA, pp. 5824–5833. Curran Asso ciates Inc. W ang, Y. S. and M. Drton (2023). Causal discov ery with unobserved confounding and non-gaussian data. Journal of Machine L e arning R ese ar ch 24 (271), 1–61. Xiao, F., J. T ang, H. F ang, and R. Xi (2022). Estimating graphical mo dels for count data with applications to single-cell gene net work. A dvanc es in Neur al Information Pr o c essing Systems 35 , 29038–29050. Xue, A., J. Rao, S. Sank araraman, and H. Pimentel (2025). dotears: Scalable and con- sisten t directed acyclic graph estimation using observ ational and in terv en tional data. iScienc e 28 (2). Y uan, Y., X. Shen, W. Pan, and Z. W ang (2019). Constrained likelihoo d for reconstructing a directed acyclic gaussian graph. Biometrika 106 (1), 109–125. Zhang, K., M. Gong, J. D. Ramsey , K. Batmanghelic h, P . Spirtes, and C. Glymour (2018). Causal discov ery with linear non-gaussian mo dels under measurement error: Structural iden tiﬁabilit y results. In UAI , pp. 1063–1072. Zheng, X., B. Aragam, P . Ravikumar, and E. P . Xing (2018a). Dags with no tears: contin- uous optimization for structure learning. NIPS’18, Red Ho ok, NY, USA, pp. 9492–9503. Curran Asso ciates Inc. Zheng, X., B. Aragam, P . K. Ravikumar, and E. P . Xing (2018b). Dags with no tears: Con tin uous optimization for structure learning. In A dvanc es in Neur al Information Pr o- c essing Systems (NeurIPS) , V olume 31, pp. 9472–9483. 35

Causal Network Discovery from Interventional Count Data with Latent Linear DAGs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment