Detection of local geometry in random graphs: information-theoretic and computational limits

We study the problem of detecting local geometry in random graphs. We introduce a model $\mathcal{G}(n, p, d, k)$, where a hidden community of average size $k$ has edges drawn as a random geometric graph on $\mathbb{S}^{d-1}$, while all remaining edg…

Authors: Jinho Bok, Shuangping Li, Sophie H. Yu

Detection of local geometry in random graphs: information-theoretic and computational limits
Detection of lo cal geometry in random graphs: information-theoretic and computational limits Jinho Bok ∗ , Sh uangping Li † , and Sophie H. Y u ‡ Marc h 25, 2026 Abstract W e study the problem of detecting lo cal geometry in random graphs. W e in tro duce a mo del G ( n, p, d, k ), where a h idden comm unit y of a v erage size k has edges dra wn as a random geometric graph on S d − 1 , while all remaining edges follow the Erd˝ os–R ´ en yi mo del G ( n, p ). The random geometric graph is generated by thresholding inner pro ducts of laten t vectors on S d − 1 , with eac h edge having marginal probability equal to p . This implies that G ( n, p, d, k ) and G ( n, p ) are indistinguishable at the lev el of the marginals, and the signal lies entirely in the edge dep endencies induced by the lo cal geometry . W e inv estigate b oth the information-theoretic and computational limits of detection. On the information-theoretic side, our upp er b ounds follow from three tests based on signed triangle coun ts: a global test, a scan test, and a constrained scan test; our low er b ounds follow from t wo complemen tary metho ds: truncated second moment via Wishart–GOE comparison, and tensorization of KL divergence. These results together settle the detection threshold at d = e Θ( k 2 ∨ k 6 /n 3 ) for fixed p , and extend the state-of-the-art b ounds from the full model (i.e., k = n ) for v anishing p . On the computational side, we iden tify a computational–statistical gap and provide evidence via the lo w-degree polynomial framew ork, as w ell as the suboptimality of signed cycle coun ts of length ℓ ≥ 4. ∗ J. Bok is with the Department of Statistics and Data Science, The Wharton Sc ho ol, Universit y of Pennsylv ania. Email: jinhobok@wharton.upenn.edu . † S. Li is with the Departmen t of Statistics and Data Science, Y ale Univ ersity . Email: shuangping.li@yale.edu . ‡ S. H. Y u is with the Op erations, Information and Decisions Department, The Wharton School, Universit y of P ennsylv ania. Email: hysophie@wharton.upenn.edu . Con ten ts 1 In tro duction 1 1.1 Our con tributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Related literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Main results 7 2.1 Information-theoretic upp er bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Information-theoretic lo w er b ound . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Computational lo w er b ound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3 T ec hnical ov erview 11 4 Pro ofs for information-theoretic upp er b ound 13 4.1 Global test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 4.2 Scan test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 Constrained scan test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Pro ofs for information-theoretic low er b ound 19 5.1 T runcated second moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 T ensorization of KL divergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6 Pro ofs for computational low er b ound 27 6.1 Lo w-degree lo wer b ound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.2 Sub optimalit y of longer cycle counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7 Discussion 30 References 31 A Auxiliary lemmas 37 A.1 Concentration inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A.2 Upp er b ound on signed subgraph coun ts . . . . . . . . . . . . . . . . . . . . . . . . . 39 B Deferred pro ofs in Section 4 39 B.1 V ariance of signed triangle count (Lemma 4.4) . . . . . . . . . . . . . . . . . . . . . 39 B.2 T ypical b eha vior of signed w edge coun t (Lemma 4.5) . . . . . . . . . . . . . . . . . . 41 B.2.1 V ariance of signed wedge coun t . . . . . . . . . . . . . . . . . . . . . . . . . . 44 C Deferred pro ofs in Section 5 46 C.1 TV distance betw een Wishart and spherical Wishart (Prop osition 5.1) . . . . . . . . 46 C.2 Decomposition of likelihoo d ratio (Lemma 5.3) . . . . . . . . . . . . . . . . . . . . . 51 C.3 Upper b ound on cub ed h yp ergeometric (Lemma 5.4) . . . . . . . . . . . . . . . . . . 55 C.4 T ypical b eha vior of neighborho o d distributions (Lemma 5.5) . . . . . . . . . . . . . 56 D Deferred pro ofs in Section 6 61 D.1 Tight bounds on signed cycle counts (Proposition 6.1) . . . . . . . . . . . . . . . . . 61 D.1.1 Gegen bauer p olynomials and spherical harmonics . . . . . . . . . . . . . . . . 61 D.1.2 Tigh t b ounds on signed cycle coun ts . . . . . . . . . . . . . . . . . . . . . . . 62 1 In tro duction Net w orks across multiple domains often contain inherent structures [ New10 , Bar16 ]: communities in so cial net w orks [ HLL83 , GN02 , F or10 ], functional mo dules in biological systems [ HHLM99 , SM03 , BO04 ], and anomalous subgraphs in comm unication net works [ PCMP05 , A TK15 ]. Detecting suc h structure from noisy observ ations is a fundamen tal statistical problem, whic h has also driv en significan t adv ances in probability theory , com binatorics, and theory of algorithms. Prominen t mo dels for this task include the sto c hastic blo c k mo del [ HLL83 , DKMZ11 , Abb17 ], the plan ted clique [ Jer92 , Kuˇ c95 ], the planted dense subgraph [ ACV14 , HWX15 , V A C15 ], and the planted matc hing [ MMX21 , D WXY23 ], each serving as a b enc hmark for understanding statistical and computational phase transitions in structured random graphs. The hidden subgraphs in these mo dels are often assumed to b e “simple”, ha ving a distinctive com binatorial shap e or an elev ated edge densit y relative to the background. While analytically con v enien t, such assumptions can b e misaligned with real-world net works, where the defining sig- nature of a subgraph ma y lie not in its density or shap e but in how its vertices relate through their laten t features. In particular, edges are influenced by the similarity b et ween those features (e.g., p ersonal profiles, textual represen tations, biological summaries), which is often mo deled through laten t space [ HRH02 , Pen03 , HR T07 , NC16 ]. Under this p erspective, the structure to b e detected is b etter described by its in teraction patterns that are consisten t with the underlying geometry . This distinction is especially imp ortan t in settings where eac h vertex resembles or imitates the others. F or instance, in so cial netw orks, the subgraph of interest ma y b e a small group of genuine users among b ots [ FVD + 16 ], or a set of accoun ts under coordination for influence [ PHT + 21 ]; similarly , in economic netw orks the subgraph ma y consist of firms under collusion in a marketplace [ MO18 , WK19 ]. In suc h cases, eac h vertex may app ear to b e statistically similar despite the in teractions within the subgraph at the lev el of the laten t space. F urthermore, those in teractions are often inheren tly in tricate, characterized b y contextual or longitudinal features in high dimensions. Motiv ated by the geometry-based signals as describ ed, we in tro duce a random graph mo del in whic h a small, hidden communit y exhibits lo c al ge ometry . W e fo cus on the fundamental task of dete ction : given an observed graph G on v ertex set [ n ], decide whether it was generated from a n ull mo del with no signal or from our prop osed alternativ e mo del. F ormally , we consider the h yp othesis testing problem P := G ( n, p, d, k ) vs. Q := G ( n, p ) , where Q is the Erd˝ os–R ´ en yi mo del [ ER59 ] and P is a plan ted version of the high-dimensional random geometric graph mo del [ DGLU11 ]. Under P , a hidden set S of exp ected size k (which w e refer to as the comm unit y) carries laten t feature v ectors, and edges within S are formed according to geometric proximit y in a d -dimensional laten t space; all remaining edges b eha ve as in G ( n, p ). W e now give the formal definition of P . Definition 1.1 (Random graphs with local high-dimensional geometry) . A sample G ∼ P is dr awn as fol lows: 1. Each vertex i ∈ [ n ] joins the c ommunity S indep endently with pr ob ability k /n . 2. Each c ommunity vertex i ∈ S r e c eives a latent fe atur e ve ctor U i i . i . d . ∼ U ( S d − 1 ) . 3. F or any i, j ∈ [ n ] with i  = j , if i, j ∈ S , e dge ij is pr esent iff ⟨ U i , U j ⟩ ≥ τ ( p, d ) ; otherwise e dge ij is pr esent indep endently with pr ob ability p . The thr eshold τ ( p, d ) is chosen so that P ( ⟨ U i , U j ⟩ ≥ τ ( p, d )) = p . 1 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 (a) 1 2 3 4 5 6 7 8 9 10 1 1 12 13 14 15 16 17 18 19 20 (b) Figure 1: Tw o drawings of the same graph sampled from G ( n, p, d, k ) with n = 20, p = 0 . 18, k = 7; w e set d = 2 for visualization purp oses. (a) V ertices are p ositioned to reflect the latent geometry: the k = 7 comm unity vertices (circled, teal) ha ve latent vectors drawn from U ( S 1 ), so they lie on a circle in the latent space. The orange edges—based on geometric proximit y—rev eal the resulting cycle-ric h structure induced b y the lo cal geometry . (b) V ertices are p ositioned randomly , and the plan ted comm unity b ecomes visually indistinguishable from the Erd˝ os–R ´ enyi bac kground. W e only observe the final graph G ; neither the communit y S nor the latent v ectors { U i } are observ ed. Moreo v er, b y construction, ev ery edge marginally app ears with probabilit y p , and vertices ha v e the same marginal neighborho o d distribution regardless of whether they b elong to S . Th us, the signal is not visible at the lev el of the first moment and is instead carried by the dep endence structure induced by the lo cal geometry . An illustration of a sample from G ( n, p, d, k ) is pro vided in Figure 1 . When k = n , we write the distribution G ( n, p, d, k ) as G ( n, p, d ); this is the traditional high-dimensional random geometric graph [ DGLU11 ] studied in the literature, whic h w e refer to as the full mo del. W e study when detection is p ossible as n → ∞ , allo wing the parameters p, d, k to dep end on n (and hence on eac h other). W e use the following standard notions. Definition 1.2. F or the dete ction pr oblem of P vs. Q , a test statistic f ( G ) with thr eshold γ achieves (a) strong detection if P G ∼Q ( f ( G ) > γ ) + P G ∼P ( f ( G ) ≤ γ ) = o (1) ; (b) w eak detection if P G ∼Q ( f ( G ) > γ ) + P G ∼P ( f ( G ) ≤ γ ) = 1 − Ω(1) . It is w ell-kno wn (by the Neyman–Pearson lemma) that the infimum of the sum of type I and t yp e II errors equals 1 − d TV ( P , Q ). In particular, no test can ac hieve w eak detection if d TV ( P , Q ) = o (1). In tuitively , detection b ecomes harder as the dimension d gro ws, since the geometric constrain ts induce w eaker dep endencies among edges in higher dimensions. Our main goal is to quantitativ ely c haracterize ho w large d can b e (as a function of n, p, k ) while detection remains p ossible. 2 0 1 / 2 3 / 4 1 γ 0 3 / 2 3 β ( a ) α = 0 1 / 3 1 / 2 2 / 3 3 / 4 1 γ 0 1 / 3 1 / 2 2 7 / 3 ( b ) α = 1 / 3 P ossible & easy P ossible & hard Unknown & hard Impossible Figure 2: Phase diagram for detection, for (a) α = 0 and (b) α = 1 / 3. Note that the tw o plots are in differen t scales. In the p ossible & easy phase (green), strong detection can b e done b y an efficien t test statistic. In the p ossible & hard phase (y ello w), strong detection can b e done b y an inefficien t test statistic, and weak detection is imp ossible for lo w-degree p olynomial algorithms. In the unknown & hard phase (grey), it is op en whether strong detection is p ossible, but weak detection is imp ossible for lo w-degree p olynomial algorithms. Finally , in the imp ossible phase (magen ta), weak detection is impossible. The imp ossible phases extend to all v alues of β > 0 b ey ond those presented in the plots. 1.1 Our con tributions W e characterize when the detection betw een P and Q is p ossible, tracing out b oth the information- theoretic and computational limits as functions of n, p, d, k . The following theorem summarizes our main results in the log-density setting [ BCC + 10 ]; see Figure 2 for the resulting phase diagrams. Theorem 1.3 (Informal) . L et p = Θ( n − α ) , d = Θ( n β ) , k = Θ( n γ ) , wher e 0 ≤ α < 1 , β > 0 , and 0 < γ ≤ 1 . (i) If β < 6 γ − 3 α − 3 , str ong dete ction is p ossible with a test statistic that is efficiently c omputable. (ii) If β < 2 γ − 3 α , str ong dete ction is p ossible with a test statistic that is inefficiently c omputable. (iii) If any of the fol lowing holds, we ak dete ction is imp ossible: • β > 2 γ ∨ (6 γ − 3) ; • β > (2 γ − 2 α ) ∨ (4 γ − 2 α − 1) and γ > α . (iv) If β > 6 γ − 3 α − 3 , we ak dete ction is imp ossible for low-de gr e e p olynomial algorithms. 3 Information-theoretic limits. Parts (i)–(iii) of Theorem 1.3 together characterize the information- theoretic threshold for detection. On the upp er b ound side, w e propose three tests based on signe d triangle c ounts : a global test (coun ting o v er the entire graph), a scan test (taking the maxim um signed triangle count o v er all subsets of size ≈ k ), and a constrained scan test (further restricting to subsets with con trolled w edge sums). The scan test cov ers a complementary regime to the global test—whic h together suffice for fixed p —and the constrained scan test strictly extends the param- eter regime of the scan test when p = e o (1). F or details, see Theorems 2.1 , 2.2 and 2.3 . On the lo w er b ound side, we develop tw o complementary approac hes: truncated second moment, which captures the dep endence on the a verage communit y size k ; and tensorization of KL divergence, whic h captures the dep endence on the edge densit y p . F or details, see Theorems 2.5 and 2.6 . In the dense case ( α = 0), com bining those upp er and low er bounds settles the detection threshold sharply at d = e Θ( k 2 ∨ k 6 /n 3 ); see Figure 2 (a). F or 0 < α < 1, our results generalize the state-of-the-art b ounds for the full mo del G ( n, p, d ) (the sp ecial case k = n , i.e., γ = 1), recov ering the upp er b ound d = e o ( n 3 p 3 ) and low er b ound d = e Ω( n 3 p 2 ) of [ LMSY22 ] and extending them to our plan ted setting for all k = Θ( n γ ) with γ ∈ ( α , 1); see Figure 2 (b). Our results lea ve a gap (the unkno wn & hard region in Figure 2 (b)) b et ween upp er and low er b ounds when 0 < α < 1; w e conjecture that the upp er bounds are tight and that this region is in fact in the imp ossible phase. Computational limits. Part (iv) of Theorem 1.3 pro vides evidence that the regime b ey ond part (i) is computationally hard, based on the low-de gr e e p olynomial fr amework [ Hop18 , KWB22 , W ei25 ] (see Sections 1.2 and 2.3 for background); note that among our prop osed tests, only the global signed triangle coun t runs in p olynomial time. Sp ecifically , we show that no p olynomial of degree at most ⌊ (log n/ log(log n )) 2 ⌋ ac hieves weak separation whenever β > 6 γ − 3 α − 3, matching the threshold of the global test; see Theorem 2.9 . W e further show that no signed cycle coun t of length ℓ ≥ 4 improv es up on the triangle coun t, providing additional evidence that the global signed triangle coun t ma y b e the asymptotically optimal efficient test among all signed cycle counts; see Prop osition 2.10 . Computational–statistical gap. Parts (i) and (ii), together with (iv) identify a regime of 6 γ − 3 α − 3 < β < 2 γ − 3 α (the y ellow regions in Figure 2 ), where detection is information- theoretically p ossible yet no efficient algorithm is kno wn. Interestingly , for the full mo del G ( n, p, d ) (i.e., k = n without lo calit y), a computational–statistical gap is not kno wn and conjecturally do es not exist [ LMSY22 ]. The gap here th us app ears to b e a consequence of the lo cal nature of the plan ted geometric structure: the hidden communit y ma y b e placed at exp onen tially many p ossible lo cations, in tro ducing a combinatorial searc h barrier for efficient algorithms. 1.2 Related literature W e review several lines of researc h that are closely related to our w ork. High-dimensional random geometric graphs. A classical line of work studied random geo- metric graphs for fixed dimension d ; see [ Pen03 ] for an ov erview. The study of random graphs with high-dimensional (i.e., d → ∞ as n → ∞ ) geometry w as initiated b y [ DGLU11 ], where the authors studied the clique num b er of G ( n, p, d ) and show ed that for d ≥ exp( e Ω( n 2 )), the graph is indistin- guishable from G ( n, p ). Ever since, there has b een a gro wing line of work on random geometric graphs in high dimensions, inv estigating a v ariety of algorithmic and statistical phenomena. F or the detection b et w een G ( n, p, d ) and G ( n, p ), the breakthrough work of [ BDER16 ] settled the threshold of d = Θ( n 3 ) for fixed p . That pap er also in tro duced a test that coun ts the n um b er of 4 signed triangles (see Section 3 for details), whic h attains the best kno wn upp er b ound of d = e o ( n 3 p 3 ) for general p [ LMSY22 ]. Sev eral w orks since then hav e improv ed low er b ounds for the regime of p = o (1). Namely , [ BBN20 ] show ed a low er b ound of d = e Ω( n 3 p ∨ n 7 / 2 p 2 ∨ n ) for p = e Ω(1 /n 2 ). This w as later improv ed in [ LMSY22 ], which is the state-of-the-art result: d = e Ω( n 3 p 2 ) for p = Ω(1 /n ), and d = Ω((log n ) 36 ) for p = Θ(1 /n ). Notably , it is an op en problem to reduce the gap of polynomial factor p b et ween the upper and lo w er b ounds. Besides detection, G ( n, p, d ) ha ve receiv ed increasing atten tion in recent years, in terms of its sp ectral prop ert y [ LMSY23 , ABI24 , CZ25 ], lo w-degree moments [ BB24b ], coupling with G ( n, p ) [ LMSY22 , BB25b ], latent estimation [ MZ24 ], and rare even ts [ DL W25 ], to name a few. F ur- thermore, there has b een activ e research on mo dels that are different from but closely related to G ( n, p, d ), where the difference lies in v arious factors suc h as edge connection rule [ LR23a , LR23b , MWX26 ], metric of the latent geometry [ BB24a , BGPS25 ], isotropy [ EM20 , BBH24 ], ho- mogeneit y [ BKL19 ], cluster structure [ LS23 ], and com binations thereof. F or further discussion on high-dimensional random geometric graphs, we refer readers to a recen t survey [ DDC23 ]. Plan ted subgraphs. Our mo del can be viewed as a particular case within a class of random graphs kno wn as planted subgraphs. In general, these can b e generated by first drawing a bac k- ground Erd˝ os–R ´ en yi graph G ∼ G ( n, p ) and indep enden tly a subgraph H ov er the complete graph from another distribution. F or v ertex set S of H , the final graph is then obtained by either re- placing the induced subgraph G [ S ] with H , or taking the union G ∪ H . Plan ted subgraphs can b e considered as graph (binary) versions of spik ed random matrices [ Joh01 , BBAP05 ], whic h are fundamen tal ob jects in probability theory and statistics. Within the v ast landscap e of random graph mo dels with communit y structures, many planted subgraphs can b e characterized with the existence of a single comm unit y . The literature on mo dels with multiple communities (e.g., the sto c hastic blo c k mo del [ HLL83 ]) is extensive and merits a separate discussion; here w e fo cus on the single comm unit y case. An iconic example of plan ted subgraphs is the planted clique [ Jer92 , Kuˇ c95 ], where a clique (i.e., a complete graph) of small size is hidden within a graph from G ( n, 1 / 2). This simple model is w ell-kno wn for exhibiting a computational–statistical gap, with v ast connections across theoretical computer science [ JP00 , HK11 ] and high-dimensional statistics [ BR13 , BB20 ]. The literature has since b een expanding, with v arious choices for the plan ted subgraph suc h as a dense subgraph [ A CV14 , HWX15 , V AC15 ], a tree [ MST19 ], a cycle [ BDT + 20 , GSXY25 ], a matc hing [ MMX21 , D WXY23 , WM25 , ABAL + 26 ], to name a few. Besides subgraph-specific results, a recent line of w ork [ Hul22 , EH25 , LPRZ25 , MNWS + 25 , YZZ25 ] aims to provide a unified theory for general subgraphs. Signals b ey ond mean. A common feature in plan ted subgraphs is that the signal exhibits at the level of mean. F or example, in plan ted dense subgraph [ ACV14 ] the edge density is higher on a verage within the communit y than the rest of the graph. In other words, the signal already exists at the low est p ossible lev el (the first moment), and often there is no need to consider an y in teraction (e.g., higher-order moments) b et w een the inputs. As a result, it is often the case that if properly done, thresholding the mean suffices. This is not the case for our mo del, as each vertex marginally has the same distribution. A few pap ers hav e studied settings where the hidden signals are not observed at the mean level. F or example, [ ACBL12 , ACBL15 , ACBL V18 ] studied detection problems where among samples of mean 0, only a small unkno wn subset has dep endence within, such as p ositiv e correlation or Mark o vian structure. Another notable recen t pap er [ KSWY25 ] extensiv ely analyzed a new directed 5 random graph mo del, where a small unknown subset of vertices ha v e latent ranking. Betw een the rank ed vertices in that mo del, a directed edge from a v ertex of higher rank to a v ertex of lo w er rank is more lik ely to b e added, compared to the other direction. As the o verall edge densit y is the same ov er the whole graph, there is no signal at the mean lev el; in particular, the mo del is equiv alen t to G ( n, p ) if the direction is ignored. As a result, the detection is done b y considering the un usual consistency of p airwise orderings, rather than the edge density . 1 Random graphs with geometry-based comm unities. In recen t y ears, v arious random graph mo dels with b oth laten t geometry and communit y structures hav e b een studied. Often the com- m unit y structures in those mo dels are in the style of the sto c hastic blo c k model, consisting of m ultiple comm unities: examples include the geometric blo c k mo del [ GMPS18 , GMPS23 ], the geo- metric SBM [ ABS21 , GNW24 , GJ25 ], the geometric hidden comm unit y mo del [ GGNW24 , GJ26 ], and differen t v ariants thereof [ PP20 , ABD21 , AKL24 ]. Our mo del differs from those in that it has a single hidden communit y rather than m ultiple. One notable exception is the planted dense cycle [ MWZ23 , MWZ25 ], a latent-based mo del where signed triangle coun t is also used for detection; ho w ev er, in terms of the mo deling comp onen ts this model is also fundamentally distinct from ours. W e highlight t wo models in the literature that share certain common features with our mo del. In the mo del introduced in [ BBCvdH20 ], a small communit y made of an Erd˝ os–R ´ enyi graph is hidden in a larger random geometric graph. Hence, at a conceptual lev el our mo del can b e view ed as an inv erted version of theirs, where the roles of comm unit y and non-comm unit y are flipp ed. In another mo del introduced in [ BMS25 ], edges within a hidden comm unity are affected by its laten t geometry , whereas edges outside are formed indep enden tly . Thus, in principle w e combine the communit y structure and the latent geometry in the same wa y . Despite those similarities, the mathematical details of those mo dels are quite different from ours and hence our results cannot b e directly compared to theirs. Moreo v er, we quan titativ ely characterize how high dimensionality in the geometry affects the graph, whereas the results in b oth of those works are either indep enden t of d or for fixed d . Lo w-degree p olynomial framework. Man y high-dimensional statistical models exhibit a phe- nomenon known as computational–statistical gap, where in certain parameter regimes a task is information-theoretically feasible but app ears to lack an y p olynomial-time algorithm. A prominent approac h for analyzing this phenomenon is the low-degree p olynomial framework [ Hop18 ], which studies algorithms expressible as lo w-degree polynomials of the input. F or input dimension n , one considers p olynomials of degree at most D ; the guiding heuristic is that D = O (log n ) often matc hes the p o w er of p olynomial-time algorithms for man y av erage-case problems. This heuristic is supp orted b y many examples, as p olynomials of degree O (log n ) can implement or approximate v arious efficien t algorithms including spectral methods, subgraph-counting pro cedures, and appro x- imate message passing (see [ W ei25 , Section 6.2]). Accordingly , hardness at degree D = ω (log n ) is widely regarded as evidence of computational hardness b ey ond p olynomial time. F or h yp othesis testing, the low-degree p olynomial framew ork examines whether lo w-degree p olynomials can achiev e sep ar ation betw een the null and the alternativ e distributions (see Defi- nition 2.8 ). In practice, this is often analyzed via the low-degree lik eliho od ratio, i.e., the norm of the pro jection of the lik eliho od ratio onto the space of p olynomials of degree at most D ; see, e.g., [ KWB22 ] for details. Understanding the rigorous implications of this criterion has recently 1 W e men tion that despite the apparen t differences betw een the models, in terestingly , our analysis shares certain k ey tec hnical results with theirs (e.g., Lemma 5.4 ). 6 attracted significant attention and is an activ e area of research [ HW21 , BHJK25 , HKK + 26 , JV26 ]; w e refer to a recent surv ey [ W ei25 ] for a further discussion. 1.3 Notations W e denote [ n ] := { 1 , . . . , n } , and  [ n ] k  to b e the set of all size- k subsets of [ n ]. F or graph H , V ( H ) and E ( H ) resp ectiv ely denote the set of its vertices and edges; v ( H ) and e ( H ) denote their resp ectiv e cardinalities. K n denotes the complete graph on [ n ]. W e use S d − 1 to denote the unit sphere in R d , and U ( S d − 1 ) to denote the uniform distribution o v er S d − 1 (i.e., the Haar measure); throughout, U i i . i . d . ∼ U ( S d − 1 ) for i ∈ [ n ]. F or symmetric matrix A , ∥ A ∥ op denotes its operator norm. All logarithms are with base exp(1), and log a b denotes (log b ) a . F or asymptotics, we alw ays assume n → ∞ with k = k ( n ) → ∞ , and d to b e sufficien tly large. W e use standard big- O notation, where for any a n , b n , a n = O ( b n ) and a n ≲ b n denote a n ≤ C b n for some absolute constan t C > 0; a n = Ω( b n ) and a n ≳ b n denote b n = O ( a n ); a n = Θ( b n ) denotes a n = O ( b n ) and a n = Ω( b n ). Also, a n = o ( b n ) and a n ≪ b n denote lim n →∞ ( a n /b n ) = 0; a n = ω ( b n ) and a n ≫ b n denote b n = o ( a n ). F or each of those w e use e O , e Ω , e Θ , e o, e ω to hide p olylog( n ) factors. 2 Main results In this section, w e presen t the information-theoretic upp er and lo w er b ounds, and the computational lo w er bound for the detection problem. Throughout this section, w e assume p ≤ 1 / 2; for an y fixed p ∈ (1 / 2 , 1), it can b e readily deduced (e.g., follo wing [ BDER16 , Lemmas 3 & 4]) that our detection threshold for fixed p ∈ (0 , 1 / 2] extends. 2.1 Information-theoretic upp er b ound W e present three differen t tests for the detection b et ween P and Q . First, we consider the glob al test for counting signed triangles, whose test statistic is defined as f tri ( G ) := X i 0 such that if 1 k ≤ p ≤ 1 2 and C 2 . 1 ∨ (5 log (1 /p )) 4 ≤ d ≪ k 6 p 3 n 3 log 3 (1 /p ) , the testing err or satisfies P G ∼Q ( f tri ( G ) > γ tri ) + P G ∼P ( f tri ( G ) ≤ γ tri ) = o (1) , wher e the thr eshold is chosen as 2 γ tri := 1 2 E G ∼P [ f tri ( G )] = 1 2  n 3   k n  3 E G ∼G ( n,p,d ) [( G 12 − p )( G 23 − p )( G 13 − p )] . 2 The constant factor of 1 / 2 in fron t of γ tri is arbitrary and can be replaced with an y fixed constant in (0 , 1). This also applies to the thresholds in the scan test and the constrained scan test. The equalit y b et ween expressions in E G ∼P and E G ∼G ( n,p,d ) follo ws from Lemma 4.1 . 7 In the global test, w e essen tially compare the signed triangle coun t within the communit y with the fluctuation of the signed triangle coun t ov er the entire graph. This fluctuation can b e quite large, in particular when n is muc h larger than k . It is thus natural to consider the sc an test , where w e instead consider the individual fluctuations of subgraphs of similar size. In particular, let f scan ( G ) := max A ⊆ [ n ] , | A | = k − X i 0 such that if p ≤ 1 2 , k ≥ C 2 . 2 log 2 n and C 2 . 2 ∨ (5 log (1 /p )) 4 ≤ d ≤ k 2 p 6 log 6 (1 /p ) C 2 . 2 log n , the testing err or satisfies P G ∼Q ( f scan ( G ) > γ scan ) + P G ∼P ( f scan ( G ) ≤ γ scan ) = o (1) , (2.3) wher e the thr eshold is chosen as γ scan := 1 2  k − 3  E G ∼G ( n,p,d ) [( G 12 − p )( G 23 − p )( G 13 − p )] . (2.4) As we will see later, the global test and the scan test are sufficient for an y fixed p in that there exists a matching lo wer bound (up to a logarithmic factor). How ev er, the detection threshold pro vided b y the scan test quickly degrades as p → 0. T o impro ve upon this, we add certain constrain ts on top of the scan test, which w e call the c onstr aine d sc an test . T o b e sp ecific, among the subgraphs of size ≈ k , we only consider those that satisfy additional conditions on (signed) w edge coun ts. F ormally , let e f scan ( G ) := max A ∈C ( G ) , | A | = k − X i 0 such that d ≥ n δ , n − 1+ δ ≤ p ≤ 1 / 2 . Then ther e exists a c onstant C 2 . 3 = C 2 . 3 ( δ ) > 0 such that if d ≤ 1 C 2 . 3 k 2 p 3 log 3 (1 /p ) (log 2 n )(log 2 k ) , ( 2.3 ) holds with e f scan ( G ) in plac e of f scan ( G ) , that is, P G ∼Q  e f scan ( G ) > γ scan  + P G ∼P  e f scan ( G ) ≤ γ scan  = o (1) . 8 Remark 2.4 (Comparison b et w een tests) . As note d e arlier, the glob al test (The or em 2.1 ) and the sc an test (The or em 2.2 ) to gether suffic e for char acterizing the dete ction thr eshold for any fixe d p , up to a lo garithmic factor. In terms of the p erformanc e guar ante es fr om the the or ems, the glob al test is b etter if k ≫ ( n 3 / log n ) 1 / 4 , wher e as the sc an test is b etter if k ≪ ( n 3 / log n ) 1 / 4 . When p = o (1) , the c onstr aine d sc an test (The or em 2.3 ) is b etter than the sc an test exc ept for a very limite d r e gime. In fact, it dir e ctly fol lows fr om our analysis that for p = O ((log n ) − 1 (log 2 k ) − 1 ) , the c onstr aine d sc an test suc c e e ds if d = O ( k 2 p 3 log 3 (1 /p ) / log n ) , strictly impr oving the sc an test; se e ( 4.8 ) and the surr ounding ar guments ther e. As in the dense c ase, whether the glob al test or the c onstr aine d sc an test is guar ante e d for a b etter p erformanc e dep ends on how k c omp ar es to e Θ( n 3 / 4 ) . 2.2 Information-theoretic lo wer b ound W e presen t differen t thresholds for the impossibility of detection, based on t wo differen t approac hes. First, we focus on capturing the dep endence on k . F or this, we consider calculating the truncated second momen t b et ween certain random matrices that generate the random graphs. Theorem 2.5 (Lo w er b ound via truncated second momen t) . Ther e exists a c onstant C 2 . 5 > 0 such that the fol lowing holds: if k ≤ n 5 , d ≥ C 2 . 5 log(1 /p ) and d ≫ k 2 ∨ k 6 n 3 , no test achieves we ak dete ction. 3 By com bining this lo wer b ound with the upp er b ounds (Theorems 2.1 and 2.2 ), one can conclude that for an y fixed p the detection threshold is given as d = e Θ  k 2 ∨ k 6 n 3  . On the other hand, the threshold in Theorem 2.5 essen tially has no dependence on p . A common k ey feature in recen t works [ BBN20 , LMSY22 , LR23b ] that consider p = o (1) is to leverage the tensorization prop ert y (i.e., chain rule) of KL divergence, which allo ws “local” comparison b etw een the m odels; for further details, see the tec hnical o verview in Section 3 . Our next result refines suc h approac hes for our setting, which in addition has a comm unity structure. Theorem 2.6 (Lo w er b ound via tensorization) . Ther e exists a c onstant C 2 . 6 > 0 such that the fol lowing holds: if C 2 . 6 log n k ≤ p ≤ 1 2 and d ≥ C 2 . 6  k 2 p 2 ∨ k 4 p 2 n  log 2 ( k /p ) log 2 (1 /p ) log 3 n , no test achieves we ak dete ction. In terms of the dep endence on p we obtain a p olynomial factor of p 2 for the threshold, whic h matc hes and extends (by considering k = n ) the state-of-the-art results of d = e Ω( n 3 p 2 ) [ LMSY22 , BGPS25 ]. Remark 2.7 (Comparison betw een lo wer bounds) . F or lower b ounds, we fo cus on the r e gime of p ≥ e O (1 /k ) ; for the sp arse r e gime p = Θ(1 /k ) , d TV ( P , Q ) = o (1) alr e ady holds for d = Ω(polylog ( k )) 3 If k > n/ 5, no test ac hieves weak detection if d ≫ k 3 = Θ( n 3 ) ev en when the communit y lo cation is known. 9 [ LMSY22 ]. We cho ose not to pursue the c ase of p = o (1 /k ) , as the aver age de gr e e within the c ommunity is alr e ady o (1) ther e. The or em 2.6 do es not strictly extend The or em 2.5 , in terms of its dep endenc e on k /n . Inde e d, it c an b e che cke d that dep ending on the size of k , The or em 2.5 c overs a wider r e gime: sp e cific al ly, when n 1 / 2 ≤ k = e O ( n 3 / 4 ) and p = e Ω( √ n/k ) , or k = e Ω( n 3 / 4 ) and p = e Ω( k /n ) . This mainly c omes fr om the differ enc es in their underlying appr o aches; se e Se ction 3 for a detaile d discussion. In brief, the pr o of of The or em 2.5 essential ly pr o c e e ds by b ounding TV distanc e with χ 2 diver genc e, which se ems to b e essential for c apturing the tight dep endenc e on the c ommunity size. This c annot b e dir e ctly adapte d to the pr o of of The or em 2.6 : that c omes at the c ost of losing the chain-rule structur e of KL diver genc e, which is essential for al l existing appr o aches that c aptur e dep endenc e on p . We b elieve that impr oving the dep endenc e on k /n for p = o (1) would r e quir e substantial ly new ide as, which we le ave as an op en question. 2.3 Computational lo wer b ound While the global signed triangle coun t can clearly b e calculated in p olynomial time, the scan-based tests in general seem to require sup erp olynomial time as brute-force algorithms. This suggests the existence of a computational–statistical gap for our detection problem; we claim that this is indeed the case. Our analysis is based on the lo w-degree p olynomial framework [ Hop18 , KWB22 , W ei25 ], which considers the follo wing criterion for p olynomials as test statistics. Definition 2.8. L et f b e a p olynomial. A test statistic f ( G ) achieves (a) strong separation if E G ∼P [ f ( G )] − E G ∼Q [ f ( G )] = ω  p V ar G ∼Q [ f ( G )] ∨ p V ar G ∼P [ f ( G )]  ; (b) w eak separation if E G ∼P [ f ( G )] − E G ∼Q [ f ( G )] = Ω  p V ar G ∼Q [ f ( G )] ∨ p V ar G ∼P [ f ( G )]  . In the lo w-degree p olynomial framew ork, a negativ e result for this criterion with degree ω (log n ) serv es as evidence that no p olynomial-time algorithms exist (for bac kground, see Section 1.2 ). Recall from Theorem 2.1 that detection can be done efficien tly for d = e o ( k 6 p 3 /n 3 ). The following result complements this, sho wing that no lo w-degree p olynomial can significantly improv e that threshold ev en b y weak separation. Theorem 2.9 (Lo w-degree low er b ound) . Assume that ther e exists a c onstant δ > 0 such that d ≥ n δ , n − 1+ δ ≤ p ≤ 1 / 2 . If ther e exists any c onstant ε > 0 such that d ≥ k 6 n 3 − ε p 3 , no de gr e e- ⌊ (log n/ log (log n )) 2 ⌋ p olynomial achieves we ak sep ar ation. A related question is whether there are efficien t algorithms other than the global signed triangle coun t. A natural extension of the signed triangle coun t is the class of signed cycle coun ts, frequently app earing in latent geometry detection [ BB24a , BB25a ]. In the following prop osition, we provide a negative answer, sho wing that any longer cycle coun t is strictly less p o w erful than the triangle coun t. Prop osition 2.10 (Suboptimality of longer cycle counts) . L et ℓ ≥ 3 and d b e sufficiently lar ge with d ≥ (5 log (1 /p )) 4 . If the glob al signe d c ount of length- ℓ cycle achieves str ong sep ar ation, then d ≪  k 2 p log(1 /p ) n  ℓ/ ( ℓ − 2) . 10 In this prop osition, the signed triangle count succeeds for the largest range of d , as the right hand side is maximized at ℓ = 3. This suggests that the global signed triangle coun t may b e the asymptotically optimal efficien t test. 3 T ec hnical ov erview Information-theoretic upp er b ound. A k ey feature of random geometric graphs is homophily: adjacen t v ertices share similar latent v ectors, making their common neigh bors more likely to b e adjacen t as w ell. As a result, geometric graphs con tain more triangles than an Erd˝ os–R ´ enyi graph with the same edge density . Our test statistics are based on the signed triangle coun t P i

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment