Binary Expansion Group Intersection Network

Conditional independence is central to modern statistics, but beyond special parametric families it rarely admits an exact covariance characterization. We introduce the binary expansion group intersection network (BEGIN), a distribution-free graphica…

Authors: Sicheng Zhou, Kai Zhang

Binary Expansion Group Intersection Network
Binary Expansion Group In tersection Net w ork Sic heng Zhou ∗ and Kai Zhang † Marc h 27, 2026 Abstract Conditional indep endence is cen tral to mo dern statistics, but beyond special parametric families it rarely admits an exact co v ariance characterization. W e in tro duce the binary exp ansion gr oup inter- se ction network (BEGIN), a distribution-free graphical representation for multiv ariate binary data and bit-enco ded m ultinomial v ariables. F or arbitrary binary random vectors and bit represen tations of m ulti- nomial v ariables, we pro ve that conditional independence is equiv alent to a sparse linear represen tation of conditional exp ectations, to a blo ck factorization of the corresponding interaction co v ariance matrix, and to blo ck diagonalit y of an asso ciated generalized Sc hur complemen t. The resulting graph is indexed by the in tersection of multiplicativ e groups of binary in teractions, yielding an analogue of Gaussian graph- ical mo deling b eyond the Gaussian setting. This viewpoint treats data bits as atoms and local BEGIN molecules as building blocks for large Mark ov random fields. W e also sho w ho w dy adic bit represen tations allo w BEGIN to approximate conditional independence for general random vectors under mild regularity conditions. A k ey tec hnical device is the Hadamar d prism , a linear map that links in teraction co v ariances to group structure. 1 In tro duction Conditional independence is a cornerstone of statistical reasoning. It underlies the interpreta- tion of multiv ariate asso ciations, the construction of graphical mo dels, and man y pro cedures in causal inference, v ariable selection, and structure learning. In classical settings, condi- tional independence is often studied through parametric mo dels whose cov ariance structure has a direct probabilistic in terpretation. In man y mo dern applications, ho wev er, the av ail- able data are heterogeneous, high dimensional, or only weakly mo deled, so exact parametric assumptions are difficult to justify . This tension is esp ecially acute in distribution-free inference. On the one hand, condi- tional indep endence is one of the most natural w a ys to formalize “no direct asso ciation after adjustmen t. ” On the other hand, fully distribution-free conditional-indep endence testing is fundamen tally imp ossible without additional structure; see, for example, Shah and P eters (2020). The challenge, then, is to identify structure that is b oth mathematically exact and as assumption-lean as p ossible. This pap er builds on the multiresolution viewp oint of binary exp ansion statistics (Zhang, 2019; Zhang et al., 2021; Brown et al., 2025), whic h treats data bits as atomic units of information. A t the bit lev el, binary v ariables exhibit an exact linearity that do es not p ersist for general v ariables. That observ ation suggests asking whether conditional indep endence can b e c haracterized exactly through co v ariances once one works with suitable binary in teraction features beyond the original v ariables themselv es. Graphical mo dels pro vide the natural language for this question (Lauritzen, 1996; W ain- wrigh t and Jordan, 2008; Koller and F riedman, 2009; Drton and Maathuis, 2017). In Gaus- sian graphical mo dels, conditional indep endence is equiv alen t to sparsity of the precision ∗ Sicheng Zhou is an undergraduate studen t (E-mail: sichengz@mit.edu), Department of Electrical Engineering and Computer Science. Massach usetts Institute of T echnology , Cambridge, MA 02139. † Kai Zhang is a Professor (Corresponding author. E-mail: zhangk@email.unc.edu), Department of Statistics and Operations Research, University of North Carolina at Chap el Hill, Chap el Hill, NC 27599. 1 matrix. That equiv alence driv es b oth the interpretation of the graph and the design of scalable estimation pro cedures. Outside the Gaussian family , ho w ever, zeros in the inv erse co v ariance matrix generally do not enco de conditional independence. F or discrete data, several imp ortan t alternativ es are a v ailable. Classical log-linear mo dels already provide exact factorization-based characterizations of binary conditional indep en- dence, whereas Ising mo dels and more general Mark ov random fields imp ose explicit factor- ization assumptions. Loh and W ain wrigh t (2012) study generalized co v ariance matrices built from sufficient statistics of multinomial exp onen tial-family mo dels and show that their in- v erses can reflect graph structure under those mo deling assumptions. Lauritzen et al. (2021) deriv e strong implications for binary distributions under total p ositivit y constraints. Our goal here is different: an exact cov ariance-based c haracterization in an in teraction basis for arbitrary m ultiv ariate binary distributions. With the central result Theorem 2.3, this pap er mak es the following four main contribu- tions: (i) W e pro ve that for arbitrary binary random vectors ( A , B , C ), conditional indep endence A ⊥ ⊥ C | B is equiv alen t to a co v ariance structure indexed b y the intersection of m ultiplicative groups of binary in teractions. The k ey ob ject is not the in verse of the full co v ariance matrix, but a generalized Sch ur complement asso ciated with the in teraction blo c k generated b y B . (ii) W e show that the same framew ork remains v alid for bit-enco ded multinomial v ariables, including rank-deficien t cases created by deterministic constrain ts or structural zeros. (iii) W e in tro duce the Hadamar d prism , a con v enient linear map for the co v ariance algebra of binary interactions that clarifies the link b etw een co v ariance of binary interactions, W alsh–Hadamard transforms, and Bo olean F ourier analysis. (iv) W e extend the framework b eyond discrete data by showing that dyadic quan tiza- tions preserve conditional indep endence asymptotically and yield explicit appro xima- tion bounds under H¨ older-t yp e con tinuit y of the relev ant conditional laws. T aken together, these results yield the graph interpretation in Corollary 2.5, which con- tin ues to hold in singular multinomial enco dings. T o our kno wledge, this corollary pro vides the first exact distribution-free cov ariance-based graphical characterization of conditional indep endence for arbitrary multiv ariate binary data. The graph that emerges is indexed not merely by the original v ariables but b y in tersections of m ultiplicative groups generated b y their in teractions. F or that reason, we call the resulting representation the binary exp ansion gr oup interse ction network (BEGIN). A useful wa y to p osition BEGIN is relative to Gaussian graphical mo dels and generalized co v ariance constructions. BEGIN is Gaussian-like in spirit b ecause conditional independence is read off from a sparse matrix ob ject and the resulting structure suggests no dewise pro jec- tion viewp oin ts. It is fundamentally non-Gaussian, ho w ev er, b ecause the relev ant no des are in teraction features and the correct matrix ob ject is a generalized Sch ur complemen t rather than an ordinary precision matrix. Compared with exp onen tial-family generalized cov ariance metho ds, BEGIN requires w eak er mo deling assumptions: it does not rely on strict positivity , a prescribed clique factorization, or a particular parametric family . By viewing data bits as atoms and BEGIN as molecules, w e provide examples sho wing how lo cal BEGIN structures can serv e as building blo cks for larger Mark o v random fields Section 2 develops the BEGIN c haracterization for binary v ariables and bit-represen ted m ultinomial v ariables. Section 3 studies dyadic approximations for general random v ectors. 2 Section 4 closes with brief remarks on implications and future w ork. Pro ofs are provided in the supplemen tary material. Notation W e use the following notation throughout. F or y ∈ R p , diag( y ) denotes the diagonal matrix with diagonal entries y . F or a matrix M , M + denotes the Mo ore–Penrose in verse. The sym b ol H p denotes the 2 p × 2 p Hadamard matrix obtained by Sylvester’s construction, H p = 1 1 1 − 1 ! ⊗ · · · ⊗ 1 1 1 − 1 ! | {z } p factors . (1.1) F or a binary random vector X = ( X 1 , . . . , X p ) ∈ {± 1 } p , let ⟨ X ⟩ denote the multiplicativ e group generated b y the co ordinates of X , and for multiple binary vectors X 1 , . . . , X k , let ⟨ X 1 , . . . , X k ⟩ denote the group generated b y the union of their co ordinates. The p ower ve ctor X ⊗ ∈ {± 1 } 2 p collects the elements of ⟨ X ⟩ via X ⊗ := 1 X 1 ! ⊗ 1 X 2 ! ⊗ · · · ⊗ 1 X p ! . (1.2) W e index the co ordinates of X ⊗ b y Λ ∈ { 0 , 1 } p and write X Λ := Q p j =1 X Λ j j . F or a matrix M and index sets R , C , M [ R , C ] denotes the corresp onding submatrix. F or a random ob ject Z , Supp( Z ) denotes its supp ort with p ositive probabilities. F or a set S , |S | denotes its cardinalit y . F or probabilit y measures P and Q on a common measurable space, TV( P , Q ) := sup S | P ( S ) − Q ( S ) | denotes total v ariation distance. 2 Main Results 2.1 Characterization of Conditional Indep endence for Binary V ariables Let A ∈ {± 1 } r , B ∈ {± 1 } s , and C ∈ {± 1 } t b e binary random vectors. Definition 2.1. W e say that A and C ar e conditionally indep endent given B , written A ⊥ ⊥ C | B , if for al l a , b , c with P ( B = b ) > 0 , P ( A = a , C = c | B = b ) = P ( A = a | B = b ) P ( C = c | B = b ) . T o connect conditional indep endence with co v ariance structure, w e allow zero-probabilit y cells in the joint distribution of ( A , B , C ). This is essential for multinomial v ariables enco ded b y data bits. If a multinomial v ariable has m categories with p ositive probabilit y and 2 k − 1 < m ≤ 2 k , then it can b e represented b y k bits, but that represen tation may imp ose deterministic constraints among interaction features and hence lead to singular co v ariance matrices. The follo wing rank characterization makes that singularity transparent. Theorem 2.2. F or a binary r andom ve ctor X ∈ {± 1 } p , let Σ ⟨ X ⟩\{ 1 } := Co v  ⟨ X ⟩ \ { 1 } , ⟨ X ⟩ \ { 1 }  b e the c ovarianc e matrix of the nonc onstant elements of ⟨ X ⟩ . Then rank  Σ ⟨ X ⟩\{ 1 }  = | Supp( X ) | − 1 . 3 W e no w define the interaction index sets that go v ern BEGIN: B := ⟨ B ⟩ \ { 1 } , L := ⟨ A , B ⟩ \ ⟨ B ⟩ , R := ⟨ B , C ⟩ \ ⟨ B ⟩ . Let Σ be the cov ariance matrix of the concatenated interaction v ector indexed by B ∪ L ∪ R , ordered as ( B , L , R ). When the join t pmf of ( A , B , C ) is strictly p ositive, Σ is p ositive definite; in general it is only p ositive semidefinite. Theorem 2.3. (BEGIN) The fol lowing statements ar e e quivalent. (a) A ⊥ ⊥ C | B . (b) Sparse conditional-exp ectation representation. F or every Λ 1 ∈ { 0 , 1 } r and Λ 2 ∈ { 0 , 1 } t , ther e exist c o efficient ve ctors α Λ 1 , γ Λ 2 ∈ R 2 s such that E h A Λ 1 | B , C i = E h A Λ 1 | B i = α ⊤ Λ 1 B ⊗ , E h C Λ 2 | A , B i = E h C Λ 2 | B i = γ ⊤ Λ 2 B ⊗ . (2.1) (c) Blo ck factorization of co v ariance blo c ks. Ther e exist matric es M 1 and M 2 such that Σ =    Σ B Σ B M ⊤ 1 Σ B M ⊤ 2 M 1 Σ B Σ L M 1 Σ B M ⊤ 2 M 2 Σ B M 2 Σ B M ⊤ 1 Σ R    . (2.2) (d) Blo ck-diagonal generalized Sc h ur complemen t. The gener alize d Schur c omplement of Σ B in Σ , S := Σ [ L ∪ R , L ∪ R ] − Σ [ L ∪ R , B ] Σ + B Σ [ B , L ∪ R ] , is blo ck diagonal with r esp e ct to ( L , R ) ; that is, S = S L 0 0 S R ! . Theorem 2.3 iden tifies the exact cov ariance ob ject b ehind conditional indep endence for bi- nary data. Part (b) follows from the binary expansion linear effect (BELIEF) representation in Brown et al. (2025). The no v elty here is that this in teraction-lev el linearit y is equiv alent to the cov ariance factorization in part (c) and to the generalized Sc h ur-complement sparsit y in part (d). Under conditional indep endence, the conditional exp ectation of every interaction on the left or righ t dep ends on ( A , B , C ) only through the 2 s in teraction co ordinates in B ⊗ ; without conditional indep endence, the corresp onding represen tations generally require 2 r + s or 2 s + t co efficien ts. P arts (c) and (d) of Theorem 2.3 pro vide a one-to-one characterization of conditional indep endence in terms of the cov ariance structure of ⟨ A , B ⟩ ∩ ⟨ B , C ⟩ for an arbitrary binary v ector ( A , B , C ). W e emphasize that conditional independence in binary v ariables must b e expressed through intersections of groups: for binary v ariables, the σ -field is determined by the group they generate. If one replaces B b y a prop er subset of the group intersection, the equiv alence can fail in either direction: sparsit y of the corresponding Sc hur complemen t need not imply conditional independence, and conditional indep endence need not imply sparsit y . The supplemen tary material provides explicit coun terexamples for b oth failures. Because the relev an t cov ariance blo cks are indexed b y in tersections of m ultiplicative groups of binary in teractions, w e call this structure the binary exp ansion gr oup interse ction network (BEGIN). 4 Theorem 2.3 also sho ws wh y the ordinary in verse co v ariance matrix is not the righ t ob ject for describing conditional indep endence outside the Gaussian setting. When Σ is singular, the Mo ore–Penrose inv erse Σ + need not reflect the relev an t sparsit y pattern. BEGIN in- stead isolates the interaction blo ck generated by B and the asso ciated generalized Sch ur complemen t. This is motiv ated b y Theorem 2.5 of Bro wn et al. (2025), whic h implies that the ro ws of Σ [ L ∪ R , B ] lie in the row space of Σ B . Equiv alently , if we define M := Σ [ L ∪ R , B ] Σ + B , (2.3) then Σ [ L ∪ R , B ] = M Σ B , Σ [ B , L ∪ R ] = Σ B M ⊤ . (2.4) This deterministic row-space iden tity motiv ates the use of the Sch ur–Banac hiewicz inv erse of Ouellette (1981), rather than the more commonly used Moore–Penrose inv erse Σ + . Definition 2.4. Define the Sc hur–Banac hiewicz generalized inv erse of Σ by Ω := Σ + B + Σ + B FS + F ⊤ Σ + B − Σ + B FS + − S + F ⊤ Σ + B S + ! , F := Σ [ B , L ∪ R ] . (2.5) W e use the Sch ur–Banac hiewicz inv erse rather than Σ + b ecause its Ω [ L ∪ R , L ∪ R ] is exactly S + . It therefore preserv es the separation structure in Corollary 2.5 induced by the generalized Sc h ur complement, ev en when Σ is singular, and yields an exact c haracterization for rank-deficien t m ultinomial encodings. Corollary 2.5. The Schur–Banachiewicz inverse Ω is symmetric and satisfies Σ Ω Σ = Σ . Mor e over, the fol lowing statements ar e e quivalent. (a) A ⊥ ⊥ C | B . (b) Ω [ L , R ] = 0 . (c) In the undir e cte d gr aph with vertex set B ∪ L ∪ R and an e dge b etwe en two distinct vertic es whenever the c orr esp onding entry of Ω is nonzer o, the set B sep ar ates L fr om R . Corollary 2.5 shows that BEGIN plays the same structural role for binary interaction features that the precision matrix pla ys in Gaussian graphical mo dels. In particular, since Ω [ L ∪ R , L ∪ R ] = S + , the graph can b e read directly from the sparsity pattern of the Sc h ur– Banac hiewicz in verse. As in Gaussian graphical mo deling, this viewp oin t suggests p ossible no dewise estimation strategies, though dev eloping their finite-sample theory is b ey ond the scop e of this note. Unlik e the Gaussian case, how ever, the relev ant no des are interaction features and the underlying matrix ma y be singular. It is also useful to compare BEGIN with the generalized cov ariance approach of Loh and W ainwrigh t (2012) and with the Ising mo del. A pairwise Ising mo del is a strictly p ositive Mark ov random field on the original v ariables, so its graph is sp ecified by a factorized likeli- ho o d; classical log-linear mo dels lik ewise provide exact factorization-based characterizations of binary conditional indep endence. Loh and W ain wright (2012) remain within discrete exp onen tial-family graphical mo dels and show that, after augmenting the co v ariance ma- trix b y sufficient statistics dictated by a triangulation, its inv erse is block graph-structured. BEGIN differs in that it identifies conditional indep endence exactly through the co v ariance conditions in Theorem 2.3, without assuming strict p ositivity , a clique factorization, or a fixed parametric likelihoo d. In this sense, BEGIN can b e viewed as a lo cal building blo c k for Mark ov random fields o ver binary or m ultinomial v ariables. Section 2.2 provides examples illustrating ho w suc h local BEGIN structures can b e assembled into larger Mark ov graphs. 5 W e also note that when Supp( B ) ⊊ {± 1 } s , the co efficients in part (b) and the matrices in part (c) need not b e unique. The equiv alence itself is unaffected: BEGIN is a structural statemen t ab out existence, factorization, and sparsit y , not ab out unique represen tations. 2.2 Examples By incorp orating interactions, the BEGIN framework can represen t conditional indep endence structures that are difficult to displa y faithfully in classical w ays and can serv e as building blo c ks for Mark o v structures o v er binary v ariables. 1. Three binary v ariables. In the simplest case r = s = t = 1, Figure 1(a) sho ws BEGIN for A ⊥ ⊥ C | B . In addition to the original v ariables, BEGIN in tro duces the in teraction no des AB and B C . The graph splits naturally into a left wing { A, AB } , a cen ter { B } , and a righ t wing { C , B C } . The left wing together with the center generates ⟨ A, B ⟩ = { 1 , A, B , AB } , the center together with the right wing generates ⟨ B , C ⟩ = { 1 , B , C , B C } , and their in tersection is ⟨ B ⟩ \ { 1 } = { B } . 2. A binary first-order Mark o v c hain. Let ( A 1 , . . . , A k ) ∈ {± 1 } k b e a (not necessar- ily stationary) first-order Mark ov c hain. BEGIN contains the chain no des A 1 , . . . , A k together with the in teraction no des A j A j +1 for j = 1 , . . . , k − 1; see Figure 1(b) for k = 4. An unrestricted joint pmf on {± 1 } k has 2 k − 1 free parameters, whereas a nonstationary first-order Mark ov mo del has only 2 k − 1. BEGIN makes that reduc- tion visible at the in teraction-no de level and suggests a sparse matrix representation for the c hain. Moreov er, Figure 1(b) is the union of the o v erlapping BEGIN molecules ⟨ A 1 , A 2 ⟩ ∩ ⟨ A 2 , A 3 ⟩ and ⟨ A 2 , A 3 ⟩ ∩ ⟨ A 3 , A 4 ⟩ ; more generally , the chain is assem bled from the BEGIN molecules on ⟨ A j , A j +1 ⟩ ∩ ⟨ A j +1 , A j +2 ⟩ , j = 1 , . . . , k − 2. 3. A higher-order conditioning set. Brown et al. (2025) pro vide an example in whic h B ⊥ ⊥ ( A 1 , A 2 , A 3 ) | ( A 1 A 2 , A 2 A 3 , A 3 A 1 ) . This form of conditional independence is not naturally expressed b y a standard graph on A 1 , A 2 , A 3 , and B . BEGIN, b y contrast, yields a direct undirected graph on in teraction no des corresp onding to ⟨ A 1 A 2 , A 1 A 3 , B ⟩ ∩ ⟨ A 1 , A 2 , A 3 ⟩ ; see Figure 1(c). 4. A Mark ov random field b ey ond the Ising mo del. The BEGIN structure in the previous example can also serve as a building blo ck for a four-node global Marko v ran- dom field. Under the relab eling X 1 = B , X 2 = A 1 A 2 , X 3 = A 1 A 2 A 3 , and X 4 = A 1 A 3 , Figure 1(c) represents X 1 ⊥ ⊥ X 3 | ( X 2 , X 4 ) through the group in tersection ⟨ X 1 , X 2 , X 4 ⟩ ∩ ⟨ X 2 , X 3 , X 4 ⟩ . If ( X 1 , X 2 , X 3 , X 4 ) ∈ {± 1 } 4 further satisfies the BEGIN molecule X 2 ⊥ ⊥ X 4 | ( X 1 , X 3 ) , then these tw o statemen ts are exactly the nontrivial sep- aration relations of the four-cycle standard graph X 1 − X 2 − X 3 − X 4 − X 1 . Hence the distribution satisfies the global Marko v prop erty with resp ect to this graph. If the joint pmf is strictly p ositiv e, then ( X 1 , X 2 , X 3 , X 4 ) is an Ising mo del on the four-cycle. With- out strict p ositivity , the same pair of BEGIN molecules still defines a global Marko v random field, but not necessarily an Ising mo del, b ecause zeros in the join t pmf are allo wed. Th us, this pair of BEGIN molecules yields a class of four-cycle global Mark ov random fields that is strictly larger than the Ising family . 6 A AB B C BC (a) BEGIN for A ⊥ ⊥ C | B as ⟨ A, B ⟩ ∩ ⟨ B , C ⟩ . A 1 A 2 A 3 A 4 A 1 A 2 A 2 A 3 A 3 A 4 (b) BEGIN for a Marko v chain ( A 1 , A 2 , A 3 , A 4 ). B A 1 A 2 B A 1 A 3 B A 2 A 3 B A 1 A 2 A 2 A 3 A 1 A 3 A 1 A 2 A 3 A 1 A 2 A 3 (c) Left wing, center, and righ t wing of BEGIN for B ⊥ ⊥ ( A 1 , A 2 , A 3 ) | ( A 1 A 2 , A 2 A 3 , A 1 A 3 ), corresp onding to ⟨ A 1 A 2 , A 1 A 3 , B ⟩ ∩ ⟨ A 1 , A 2 , A 3 ⟩ . Figure 1: Examples of BEGIN, where conditional indep endence is represented through intersections of m ultiplicative groups of binary interactions. 2.3 The Hadamard prism The pro of of Theorem 2.3, pro vided in the supplemen tary material, relies on a linear mapping from R 2 p to R 2 p × 2 p that pac kages the cov ariance algebra into a matrix operator. This map- ping is closely related to con v olution on ( Z 2 ) p and is diagonalized b y the W alsh–Hadamard transform. Related constructions also app ear in the literature on group-circulan t matrices and Bo olean F ourier analysis (T erras, 1999; O’Donnell, 2014). Because this op erator is cen- tral to the co v ariance c haracterization underlying BEGIN and may also b e of indep endent in terest for future research, we give it a dedicated name. Definition 2.6. F or y ∈ R 2 p , define the Hadamard prism of y by η p ( y ) := 1 2 p H p diag( H p y ) H p . (2.6) Because H p is orthogonal up to scale, the eigenv alues of η p ( y ) are prop ortional to the co ordinates of H p y . F or binary interaction v ectors, those co ordinates are directly linked to cell probabilities and in teraction means (Zhang, 2019). The Hadamard prism also satisfies a recursion that is useful for second-moment calculations: for y 1 , y 2 ∈ R 2 d , η d +1 y 1 y 2 !! = η d ( y 1 ) η d ( y 2 ) η d ( y 2 ) η d ( y 1 ) ! . (2.7) This recursiv e form also suggests that the Hadamard prism ma y b e useful b ey ond BEGIN for studying structured cov ariance patterns of binary v ariables. 3 Appro ximating Conditional Indep endence for General V ariables The binary and multinomial theory ab ov e can b e used as a m ultiresolution approximation device for general random vectors. F ollowing Zhang (2019), Zhang et al. (2021) and Bro wn 7 et al. (2025), w e consider binary expansion as a wa y to enco de real-v alued v ariables through data bits. The classical expansion U = ∞ X k =1 A k 2 k , U ∈ [ − 1 , 1] , A k ∈ {± 1 } , suggests appro ximating U by its d -bit truncation U d = P d k =1 A k / 2 k . After marginal standardization, let ( U , V , W ) ∈ [ − 1 , 1] r × [ − 1 , 1] s × [ − 1 , 1] t . Define the dy adic quan tizer Q d ( x ) := − 1 + 2 − d + 2 1 − d j 2 d − 1 ( x + 1) k ∈ n − 1 + 2 − d , − 1 + 2 − d + 2 1 − d , . . . , 1 − 2 − d o , and apply it comp onen twise to vectors. The next result shows that exact dyadic condi- tional indep endence at ev ery resolution implies the p opulation notion and yields an explicit appro ximation rate under H¨ older-type contin uity . Theorem 3.1. F or ( U , V , W ) ∈ [ − 1 , 1] r × [ − 1 , 1] s × [ − 1 , 1] t and d ≥ 1 , define U d := σ  Q d ( U )  , V d := σ  Q d ( V )  , W d := σ  Q d ( W )  . Then the fol lowing statements hold. (a) If U d ⊥ ⊥ W d | V d for every d ≥ 1 , then U ⊥ ⊥ W | V . (b) Supp ose U ⊥ ⊥ W | V . A ssume ther e exist α ∈ (0 , 1] and c onstants L U , L W < ∞ such that for al l v , v ′ ∈ [ − 1 , 1] s , TV  L ( U | V = v ) , L ( U | V = v ′ )  ≤ L U ∥ v − v ′ ∥ α 2 , and TV  L ( W | V = v ) , L ( W | V = v ′ )  ≤ L W ∥ v − v ′ ∥ α 2 . Define ∆ d := sup S ∈U d , T ∈W d E     P ( S ∩ T | V d ) − P ( S | V d ) P ( T | V d )     . Then, for every d ≥ 1 , ∆ d ≤ L U L W s α 2 2 α (1 − d ) − 2 . In p articular, ∆ d → 0 as d → ∞ . P art (a) shows that exact dy adic conditional indep endence at every resolution implies the p opulation statement. Part (b) con trols the conv erse direction quan titatively . Without con- tin uity assumptions, discretization can create spurious conditional asso ciations or Simpson’s parado x; see, for example, Gong and Meng (2021). Under H¨ older-t yp e regularit y , how ev er, the dy adic approximation error deca ys at an explicit rate. This pro vides theoretical sup- p ort for using BEGIN on the leading data bits of contin uous or mixed-t yp e v ariables as a principled appro ximation to conditional indep endence. 4 Discussion This note establishes an exact cov ariance-based graphical characterization of conditional in- dep endence for arbitrary m ultiv ariate binary data in the binary-expansion in teraction basis, including singular multinomial enco dings. The characterization is distribution-free and is 8 expressed in ob jects that are natural from the p ersp ectiv e of multiresolution binary expan- sion. F or m ultinomial and discretized con tinuous v ariables, the same viewp oint provides a principled wa y to relate exact bit-lev el statemen ts to appro ximation results for more general v ariables. These results suggest sev eral directions for future work. One concerns structur e le arn- ing : how should one estimate the sparse BEGIN graph efficien tly from finite samples when the in teraction feature space is large? The Sc hur-complemen t characterization and the BE- LIEF represen tation suggest no dewise pro cedures, regularized in v erse problems, and screen- ing rules tailored to interaction groups, but dev eloping their finite-sample prop erties lies b ey ond the scop e of this note. A second concerns statistic al the ory : high-dimensional consis- tency , robustness to approximate sparsity , and finite-sample guarantees remain op en. A third concerns c ausal and scientific interpr etation : the bit-level p ersp ectiv e suggests resolution- dep enden t notions of adjustmen t, mediation, and an appro ximation of causality . A c kno wledgmen ts Zhang’s research w as partially supp orted b y NSF gran ts DMS-2152289 and TI-2449855, as w ell as BSF gran t 2024055. The initial form ulation and pro of of Theorem 2.3 were completed while Zhou w as a junior student at the Princeton In ternational Sc ho ol of Mathematics and Science (PRISMS). Zhou and Zhang thank PRISMS for supp orting researc h collab orations in volving high sc ho ol students. The Hadamard prism w as dev elop ed during Zhang’s visit to Mic hael Baio cc hi at Stanford Universit y . Zhang thanks Baio cc hi and Stanford Universit y for the hospitality . The authors thank Michael Baio cchi, Emman uel Cand` es, P eng Ding, F ang Han, Jan Hannig, Daniel Kessler, Han Liu, Y ufeng Liu, Xiao-Li Meng, Heyang Ni, Art Ow en, Ev an Sc hw artz, Chengc hun Shi, Daniel Y ekutieli, W an Zhang, Y uhao Zhou, Hongtu Zh u, and Jos ´ e Zubizarreta for helpful comments and discussions. References Bro wn, B., K. Zhang, and X.-L. Meng (2025). BELIEF in dep endence: Leveraging atomic lin- earit y in data bits for rethinking generalized linear mo dels. The A nnals of Statistics 53 (3), 1068–1094. Drton, M. and M. H. Maathuis (2017). Structure learning in graphical mo deling. A nnual R eview of Statistics and Its A pplic ation 4 (1), 365–393. Gong, R. and X.-L. Meng (2021). Judicious judgment meets unsettling up dating: dilation, sure loss and Simpson’s parado x. Statistic al Scienc e 36 (2), 169–190. K oller, D. and N. F riedman (2009). Pr ob abilistic gr aphic al mo dels: principles and te chniques . MIT press. Lauritzen, S., C. Uhler, and P . Zwiernik (2021). T otal p ositivity in exp onen tial families with application to binary v ariables. The A nnals of Statistics 49 (3), 1436–1459. Lauritzen, S. L. (1996). Gr aphic al mo dels . Clarendon Press. Loh, P .-L. and M. J. W ainwrigh t (2012). Structure estimation for discrete graphical mo d- els: Generalized cov ariance matrices and their inv erses. A dvanc es in Neur al Information Pr o c essing Systems 25 . 9 O’Donnell, R. (2014). A nalysis of b o ole an functions . Cambridge Universit y Press. Ouellette, D. V. (1981). Sch ur complemen ts and statistics. Line ar A lgebr a and its A pplic a- tions 36 , 187–295. Shah, R. D. and J. P eters (2020). The hardness of conditional indep endence testing and the generalised co v ariance measure. The A nnals of Statistics 48 (3), 1514 – 1538. T erras, A. (1999). F ourier analysis on finite gr oups and applic ations . Num b er 43. Cam bridge Univ ersity Press. W ainwrigh t, M. J. and M. I. Jordan (2008). Graphical mo dels, exp onen tial families, and v ariational inference. F oundations and T r ends ® in Machine L e arning 1 (1-2), 1–305. Zhang, K. (2019). BET on indep endence. Journal of the A meric an Statistic al A sso cia- tion 114 (528), 1620–1637. Zhang, K., W. Zhang, Z. Zhao, and W. Zhou (2021). BEA UTY p o wered BEAST. arXiv pr eprint arXiv:2103.00674 . 10

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment