The Overlap Gap Property in Principal Submatrix Recovery

THE O VERLAP GAP PR OPER TY IN PRINCIP AL SUBMA TRIX RECO VER Y D A VID GAMARNIK, A UK OSH JA GANNA TH, AND SUBHABRA T A SEN Abstract. W e study supp ort recov ery for a k × k principal submatrix with elev ated mean λ/ N , hidden in an N × N symmetric mean zero Gaussian matrix. Here λ > 0 is a universal constan t, and w e assume k = N ρ for some constant ρ ∈ (0 , 1). W e establish that there exists a constan t C > 0 such that the MLE reco vers a constant prop ortion of the hidden submatrix if λ ≥ C q 1 ρ log 1 ρ , while suc h reco very is information theoretically imp ossible if λ = o ( q 1 ρ log 1 ρ ). The MLE is computationally in tractable in general, and in fact, for ρ > 0 suﬃcien tly small, this problem is conjectured to exhibit a statistic al-computational gap . T o pro vide rigorous evidence for this, we study the likelihoo d landscap e for this problem, and establish that for some ε > 0 and q 1 ρ log 1 ρ  λ  1 ρ 1 / 2+ ε , the problem exhibits a v ariant of the Overlap-Gap-Pr op erty (OGP) . As a direct consequence, w e establish that a family of local MCMC based algorithms do not achiev e optimal recov ery . Finally , w e establish that for λ > 1 /ρ , a simple sp ectral method reco v ers a constant prop ortion of the hidden submatrix. 1. Introduction In this pap er, we study supp ort reco v ery for a planted principal submatrix in a large symmetric Gaussian matrix. F ormally , w e observ e a symmetric matrix A = ( A ij ) ∈ R N × N , A ij = θ ij + W ij . (1.1) Throughout, w e assume that W is a GOE random matrix; in other words, { W ij : i ≤ j } are indep enden t Gaussian random v ariables, with { W ij : i < j } i.i.d. N (0 , 1 / N ), and { W ii : 1 ≤ i ≤ N } i.i.d. N (0 , 2 / N ). Regarding the mean matrix Θ = ( θ ij ), we assume that there exists U ⊂ [ N ] := { 1 , 2 , · · · , N } , | U | = k , suc h that θ ij = ( λ N if i, j ∈ U 0 o.w. , where λ > 0 is a constant indep enden t of N . Equiv alently , the observ ed matrix A may b e re-written as A = λ N v v T + W, (1.2) where v = ( v i ) ∈ { 0 , 1 } N , with P i v i = k . Throughout the subsequen t discussion, w e will denote the set of such b oolean vectors as Σ N ( k N ). In the setting introduced ab o v e, the follo wing statistical questions are natural. (1) (Detection) Can we detect the presence of the plan ted submatrix, i.e., can w e consistently test H 0 : λ = 0 vs . H 1 : λ > 0 . Date : December 15, 2020. 2010 Mathematics Subject Classiﬁc ation. Primary: 68Q87, 60C05, Secondary: 82B44, 68Q25, 62H25. Key wor ds and phrases. submatrix recov ery , ov erlap gap prop ert y , spin glasses. 1 (2) (Recov ery) How large should λ b e, such that the supp ort of v can b e reco v ered appr oxi- mately ? (3) (Eﬃcient Recov ery) When can supp ort recov ery b e accomplished using a com putationally feasible pro cedure? Here, we study supp ort reco v ery in the sp ecial case k = N ρ , for some ρ ∈ (0 , 1). T o ensure that this problem is w ell-deﬁned for all N , w e work along a sequence ρ N → ρ suc h that N ρ N ∈ N and N ρ ∈ ( N ρ N − 1 / 2 , N ρ N + 1 / 2]. Note that in this case, the corresp onding submatrix detection question [25] is trivial, and a test which rejects for large v alues of the sum of the matrix en tries consisten tly detects the underlying submatrix for any λ > 0. Motiv ated by the sparse PCA problem, w e will study supp ort reco v ery in this setup in the double limit ρ → 0, follo wing N → ∞ . Deshpande-Mon tanari [34] initiated a study of the problem (1.2), and established that Ba y es optimal recov ery of the matrix Θ can b e accomplished using an Approximate Message P assing based algorithm, whenev er ρ > 0 is suﬃciently large (sp eciﬁcally , ρ > 0 . 041). In [57, 58], the authors analyze optimal Bay esian estimation in the ρ → 0 regime, and based on the b eha vior of the ﬁxed p oin ts of a state-ev olution system, conjecture the existence of an algorithmically hard phase in this problem; sp eciﬁcally , they conjecture that the minimum signal λ required for accurate supp ort reco v ery using feasible algorithms should b e signiﬁcan tly higher than the information theoretic minim um. This conjecture has b een rep eatedly quoted in v arious follo w up w orks [36, 55, 56], but to the b est of our kno wledge, it has not b een rigorously established in the prior literature. In this pap er, w e study the lik eliho od landscap e of this problem, and provide rigorous evidence to the v eracit y of this conjecture. F rom a purely conceptual viewp oin t, the existence of a computationally hard phase in problem (1.2) is particularly striking. In the con text of rank one matrix estimation contaminated with additiv e Gaussian noise (1.2), it is known that if the spik e v is sampled uniformly at random from the unit sphere, PCA reco v ers the underlying signal, whenever its detection is p ossible [65]. In con trast, for rank one tensor estimation under additiv e Gaussian noise [74], there exists an extensiv e gap betw een the threshold for detection [49], and the threshold where tractable algorithms are successful [17, 46, 81]. Thus at ﬁrst glance, the matrix and tensor problems app ear to b eha v e v ery diﬀeren tly . Ho w ever, as the presen t pap er establishes, once the planted spike is suﬃcien tly sparse, a hard phase re-app ears in the matrix problem. This problem has natural connections to the planted clique problem [5], sparse PCA [6], bi- clustering [21, 39, 76], and communit y detection [1, 63, 66]. All these problems are exp ected to exhibit a statistical-computational gap—there are regimes where optimal statistical p erformance migh t b e imp ossible to ac hiev e using computationally feasible statistical pro cedures. The p oten tial existence of such fundamen tal computational barriers has attracted signiﬁcant atten tion recently in Statistics, Machine Learning, and Theoretical Computer Science. A n um b er of widely distinct approac hes hav e b een used to understand this phenomenon b etter—prominen t ones include av er- age case reductions [20, 23, 24, 27, 44, 60], con v ex relaxations [14, 28, 35, 45, 59, 61], query low er b ounds [38, 75], and direct analysis of sp eciﬁc algorithms [13, 30, 54]. The submatrix recov ery problem itself has b een in vestigated from a num b er of distinct p erspectives. W e defer an in-depth discussion of these approaches to the end of the In tro duction. In comparison to the approac hes originating from Computer Science and Optimization, a com- pletely diﬀerent p ersp ectiv e to this problem comes from statistical physics, particularly the study of spin glasses. This approac h seeks to understand algorithmic hardness in random optimization problems as a consequence of the underlying geometry of the problem— sp eciﬁcally , the structure of the near optimizers. The Overlap Gap prop ert y (OGP) has emerged as a recurrent theme in this con text (for an illustration, see Fig 1.3). At a high level, the Overlap Gap Prop ert y (OGP) lo oks at approximate optimizers in a problem, and establishes that any t wo suc h approximate optimizers m ust either b e close to each other, or far a wa y . In other words, the one-dimensional set of distances 2 b et ween the near optimal states is disconnected. This prop ert y has b een established for v arious problems arising from theoretical computer science and combinatorial optimization, for instance random constraint satisfaction [2, 40, 62], Max Indep enden t Set [39, 73], and a maxcut problem on h yp ergraphs [29]. F urther, OGP has b een sho wn to act as a barrier to the success of a family of “lo cal algorithms” on sparse random graphs [31, 29, 39, 40]. This p erspective has b een in tro duced to the study of inference problems arising in Statistics and Machine Learning [17, 39, 41, 42, 43] in recen t works b y the ﬁrst tw o authors whic h has yielded new insights in to algorithmic barriers in these problems. As an aside, w e note that exciting new developmen ts in the study of mean ﬁeld spin glasses [3, 64, 79], establish that in certain problems without OGP , it is actually p ossible to obtain appro ximate optimizers using appropriately designed p olynomial time algorithms. This lends further credence to the b elief that OGP captures a fundamental algorithmic barrier in ran- dom optimization problems— understanding this phenomenon in greater depth remains an exciting direction for future research. 1.1. Results. W e initiate the study of appr oximate supp ort r e c overy in the setting of (1.2). T o in tro duce this notion, let us b egin with the follo wing deﬁnitions and observ ations. F or v ∈ { 0 , 1 } N , deﬁne the supp ort of v to b e the subset S ( v ) ⊂ [ N ], suc h that S ( v ) = { k ∈ [ N ] : v k = 1 } . T o estimate the supp ort, it is eviden tly equiv alent to pro duce an estimator ˆ v that takes v alues in the Bo olean h yp ercube { 0 , 1 } N . Observe that if ˆ v ∈ Σ N ( ρ ) is drawn uniformly at random, then the intersection of the supp ort of ˆ v and v satisﬁes 1 N | S ( v ) ∩ S ( ˆ v ) | = 1 N ( ˆ v , v ) → ρ 2 , where ( · , · ) denotes the usual Euclidean inner pro duct. F or an estimator ˆ v of v , in the follo wing, w e call the normalized inner pro duct ( ˆ v, v ) / N the overlap of the estimator with v , or simply the o v erlap. W e are interested in the statistical and algorithmic feasibility of reco v ering a non-trivial fraction of the supp ort. T o this end, we introduce the notion of appro ximate reco v ery , whic h is deﬁned as follo ws. Deﬁnition 1.1 (Appro ximate Reco v ery) . A sequence of estimators ˆ v N = ˆ v N ( A ) ∈ Σ N ( ρ N ) is said to recov er the supp ort of v approximately if there exists c > 0, indep enden t of N , ρ , suc h that h v , ˆ v i > cρN with high probability as N → ∞ . Observ e that since ˆ v , v ∈ Σ N ( ρ N ), ˆ v reco vers the supp ort approximately if and only if it achiev es a non-trivial Hamming loss, i.e., d H ( v , ˆ v ) ≤ 2 N ρ (1 − c ) for some c > 0 w i th high probability . (Here d H is the Hamming distance). W e study here the question of approximate supp ort reco v ery in this con text, and exhibit a regime of parameters ( λ, ρ ) where this problem exhibits OGP . This provides rigorous evidence to the existence of a computationally hard phase in this problem, as suggested in [57, 58]. T o substantiate this claim, w e establish that OGP acts as a barrier to the success of certain Mark o v-c hain based algorithms. Finally , w e sho w that for very large signal strengths λ , approximate supp ort recov ery is easy , and can b e achiev ed by rounding the largest eigenv ector of A . T o state our results in detail, w e ﬁrst introduce the Maximum Likelihoo d Estimator (MLE) in this setting. Let H N ( x ) = ( x, W x ) , (1.3) and consider the MLE for v , ˆ v M L = argmax x ∈ Σ N ( ρ N ) ( x, Ax ) = argmax x ∈ Σ N ( ρ N ) n H N ( x ) + λ N ( x, v ) 2 o . (1.4) 3 ρ 2 ρ q E ( q ; ρ, λ ) 0 q 1 q 2 E threshold Figure 1. An illustration of OGP . Observe that under the conditions of Deﬁnition 1.3, there exists a certain threshold energy E threshold suc h that the set of p ossible o v erlaps { 1 N ( x, v ) : x ∈ Σ N ( ρ ) , ( x, Ax ) > N E threshold } has a gap. Our ﬁrst result is information theoretic, and deriv es the minimum signal strength λ required for appro ximate supp ort recov ery Theorem 1.2. If λ = o  q 1 ρ log 1 ρ  , appr oximate r e c overy is information the or etic al ly imp ossi- ble. On the other hand, for any ε > 0 , if λ > (2 + ε ) q 1 ρ log 1 ρ , the MLE r e c overs the supp ort appr oximately. Th us for an y ε > 0 and λ > (2 + ε ) q 1 ρ log 1 ρ , there exists at least one estimator, namely , the MLE, whic h p erforms approximate reco very . Ho w ev er, the MLE is computationally intractable in general. Our next result analyzes the likelihoo d landscap e for this problem, and establishes that the problem exhibits OGP for certain ( λ, ρ ) parameters in this phase. T o this end, we introduce a v ersion of the overlap gap pr op erty in this context. Consider the constrained maximum lik eliho od, E N ( q ; ρ, λ ) = 1 N max x ∈ Σ N ( ρ N ) ( x,v )= N q H N ( x ) + λq 2 , (1.5) whic h denotes the maxim um lik eliho od sub ject to the additional constrain t of ac hieving ov erlap q with v . F or any q ∈ [0 , ρ ], ﬁx a sequence q N → q suc h that N q N ∈ N and N q ∈ [ N q N − 1 / 2 , N q N + 1 / 2). W e establish in Theorem 2.1 b elo w that for all q ∈ [0 , ρ ], we hav e that E N ( q N ; ρ N , λ ) → E ( q ; ρ, λ ) as N → ∞ and that E ( q ; ρ, λ ) can b e computed by a deterministic Parisi-t yp e [68] v ariational problem. In the subsequent, we refer to E ( q ; ρ, λ ) as the c onstr aine d gr ound state ener gy , or simply the c onstr aine d ener gy . W e are no w in the p osition to deﬁne the ov erlap gap prop ert y . Deﬁnition 1.3. F or some ε > 0, w e say that the mo del (1.2) with sparsity ρ and signal-to-noise ratio λ > 0 exhibits the ε - overlap gap pr op erty ( ε -OGP) if there exist three p oin ts w < x < y < ρ suc h that: (1) w < ρ 2 < x , y < ρ 1+ ε , (2) max { E ( w ; ρ, λ ) , E ( y ; ρ, λ ) } < E ( x ; ρ, λ ), (3) sup ( w,ερ ] E ( q ; ρ, λ ) < sup [ ερ,ρ ] E ( q ; ρ, λ ). 4 Our main result establishes that the planted submatrix problem (1.2) admits the ov erlap gap prop ert y in the limit of high sparsit y and mo derate signal-to-noise ratios. Theorem 1.4. F or any α < 2 − √ 2 ≈ 0 . 586 and C 1 > 2 , ther e exist ε > 0 , and C 2 such that for ρ suﬃciently smal l, for al l C 1 q 1 ρ log 1 ρ < λ <  1 ρ  α , the plante d sub-matrix pr oblem has the ε -overlap gap pr op erty. Note that by Theorem 1.2, appro ximate recov ery is p ossible in the en tire regime co vered by Theorem 1.4; how ev er, the lik eliho od-landscap e exhibits OGP in this part of the parameter space. Put simply , the o verlap gap prop ert y states that the MLE achiev es an o verlap q 2 that is substantially b etter than ρ 2 , but in the in terv al [0 , q 2 ], the constrained energy has another lo cal maxim um. Heuristically , this suggests that a lo cal optimization pro cedure, if initialized uniformly at random (and thus starting at ov erlap ρ 2 ), will get trapp ed at a lo cal maximum q 1 whic h is sub-optimal as compared to the true global optimum in terms of b oth the lik eliho od and the ov erlap. W e illustrate this notion visually in Figure 1.1. Observ e that the hard phase b ecomes more prominent as ρ → 0. Finally , we establish that the OGP established ab o ve acts as a barrier to a family of lo cal MCMC t yp e algorithms. T o this end, consider a Gibbs distribution on the conﬁguration space Σ N ( ρ N ) with π β ( { x } ) ∝ exp( β ( x, Ax )) , for some β > 0 and A deﬁned as in (1.2). Note that for any ﬁxed N , as β → ∞ , the sample from π β appro ximates the MLE 1.4. Th us a simple proxy for the MLE seeks to sample from the distribution π β for β suﬃcien tly large. It is natural to use lo cal Marko v chains to sample from this distribution. Sp eciﬁcally , construct a graph with v ertices b eing the elements of Σ N ( ρ N ), and add an edge b et ween t w o states x, x 0 ∈ Σ N ( ρ N ) if they are at Hamming distance 2. Finally , let Q x denote a nearest-neigh b or Mark ov chain on this graph started from X 0 = x that is reversible with resp ect to the stationary distribution π β . The following theorem establishes hitting time bounds for any suc h Mark o v Chain. Theorem 1.5. F or ( λ, ρ ) as in The or em 1.4, ther e exist p oints a < ρ 2 < b with b/ρ → 0 as ρ → 0 and an h > 0 such that for some c > 0 , with pr ob ability 1 − O ( e − cN ) , if I = ( a, b ) , then the exit time of I , τ I c , satisﬁes Z Q x ( τ I c ≤ T ) dπ β ( x |I ) ≤ T e − chN . The pro of of this result immediately follo ws by combining Theorem 1.4 with Corollary 3.4. Th us OGP indeed acts as a barrier to these algorithms, and furnishes rigorous evidence of a hard phase in this problem. As an aside, we also note that Theorem 2.1 implies lim N →∞ 1 N max x ∈ Σ N ( ρ N ) H N ( x ) = max q ∈ [0 ,ρ ] E ( q ; ρ, 0) a . s . Our proof of Theorem 1.4 establishes that { E ( q ; ρ, 0) : q ∈ [0 , ρ ] } is maximized at q = ρ 2 . Thus, observing that W d = ( G + G T ) / √ 2 N , where G = ( G ij ) ∈ R N × N is a matrix of i.i.d. N (0 , 1) random v ariables, w e hav e, lim N →∞ 1 N 3 / 2 max x ∈ Σ N ( ρ N ) ( x, Gx ) = 1 √ 2 E ( ρ 2 ; ρ, 0) a . s . (1.6) T o each N ρ N × N ρ N principal submatrix of the matrix G , assign a score, which corresp onds to the sum of its en tries. The LHS of (1.6) represents the largest score, as we scan ov er all N ρ N × N ρ N principal submatrices of G . In [21], the authors deriv e upp er and lo w er bounds on the maxim um a v erage v alue of k × k submatrices in an iid Gaussian matrix, for k ≤ exp( o (log N )). Our corollary 5 extends the results of [21] to the case of principal submatrices with size N ρ N × N ρ N , and provides a tigh t ﬁrst order characterization of the maxim um. W e note that in spin-glass terminology , the score of the largest N ρ N × N ρ N submatrix (not necessarily principal) corresp onds to the ground state of a bipartite spin-glass model[16], which is out of reach of curren t techniques. Returning to the planted mo del (1.2), we complete our discussion b y establishing that when the signal λ is appropriately large, approximate support reco very is easily achiev ed using a sp ectral algorithm. T o this end, we in tro duce the following tw o-step estimation algorithm, which rounds the leading eigen vector of A . (1) Let ˆ v = ( ˆ v i ) denote the eigen v ector corresp onding to the largest eigen v alue of the matrix A , with k ˆ v k 2 = N ρ . Denote ˜ S = { i ∈ [ N ] : ˆ v i ≥ δ 2 } . (2) If | ˜ S | < ρN , sample ( ρN − | ˜ S | ) elements at random from [ N ] \ ˜ S and augment to the set ˜ S in order to construct ˆ S . (3) Otherwise, sample ρN elemen ts uniformly at random from ˜ S , and denote this set as ˆ S . (4) Finally , construct v ˆ S = 1 ˆ S , i.e., a v ector with ones corresponding to the en tries in ˆ S , and zeros otherwise. Observ e that the set ˆ S dep ends on δ . W e keep this dep endence implicit for notational conv enience. Then we ha ve the following lemma. Lemma 1.6. F or any ε > 0 and λ > (1 + ε ) 1 ρ , ther e exists δ := δ ( ε ) such that with h i gh pr ob ability as N → ∞ , v ˆ S r e c overs the supp ort appr oximately. 1.2. Comparison with existing results. In this section, we compare our approac h with existing results, and summarize our main tec hnical con tributions. The information theoretic limits for exact reco v ery of plan ted submatrices under Gaussian additiv e noise was derived in [26]. [27] studies the submatrix lo calization problem under additive sub-Gaussian noise, and characterizes the informa- tion theoretic threshold for exact reco v ery . F urther, they derive a computational low er b ound via an av erage case reduction to the plan ted clique problem. Their results establish a crucial distinc- tion b et w een submatrix detection and lo calization problems—in contrast to the detection problem, there alwa ys exists a gap betw een the information theoretic threshold and the computational one in the lo calization problem. [13, 54] study the p erformance of sp eciﬁc thresholding approaches for the submatrix lo calization problem. W e emphasize that this line of w ork fo cuses on exact r e c overy , and thus these results are not directly comparable with our results. Closer in spirit to our work are the recent results in [37]—they study exact reco very for the plan ted Wigner mo del (1.2). Their results correspond to planted k × k principal submatrices with k = o ( n ), and th us do not compare directly with our results. Based on the low degree likelihoo d ratio metho d, they suggest the existence of a hard regime if r k N ≤ λρ ≤ min n 1 , k √ N o . Note that while their conjecture is restricted to the regime k = o ( N ), their thresholds formally agree with our conclusions up on setting k = N ρ . In particular, it suggests that the threshold for the sp ectral algorithm derived in Lemma 1.6 is the computational threshold for this problem. Impro ving our OGP result to the en tire regime q 1 ρ log 1 ρ  λ  1 ρ is an in triguing op en problem. W e lea ve this for future study . Motiv ated by these conjectures, [8] ha v e studied the regime k = o ( N ) using the lens of OGP , and hav e provided rigorous structural evidence to the existence of a hard phase for exact recov ery . The results of [56] derive the sharp (up to constant) threshold for approximate reco v ery in a related Bay esian problem. Assuming a product prior on the spike v (1.2), they characterize the limiting (normalized) m utual information. W e note that in this Ba yesian setting, the analysis is 6 substan tially simpliﬁed by the “Nishimori Identit y”—in statistical physics parlance, the problem is “replica symmetric”, and thus the limiting mutual information is expressed in terms of a simple scalar optimization problem. In contrast, the MLE corresp onds to the ground state of the mo del, and in general is exhibits “full replica symmetry breaking”. T o analyze its p erformance, we need to lev erage recent adv ances in the study of mean ﬁeld spin glasses, sp eciﬁcally related to the analysis of Parisi t ype formulas. W e b eliev e that our approach can also b e useful in analyzing the landscap e of the p osterior distribution in high dimensional inference problems—we lea ve this for future study . Finally , the limiting (normalized) m utual information has been derived in a version of this problem, corresp onding to the k = o ( N ) setting [15]. These results hav e b een derived by an appropriate extension of the adaptive interp olation metho d . 1.3. T ec hnical Con tributions. Let us no w discuss the main tec hnical contributions of our pap er. The pro of of Theorem 1.4 is in t w o parts: we develop a v ariational form ula for the large N limit of the maximum lik eliho od constrained to a giv en ov erlap, and analyze this v ariational problem. W e b egin b y viewing Σ N ( ρ N ) as the conﬁguration space of a spin system with energy H N . W e compute the free energy of this spin system on the subspace of ﬁxed o verlaps. On this subspace, H N can be viewed as a tw o sp ecies generalized Sherrington-Kirkpatric k model, and we compute the free energy via Guerra interpolation and the Aizenman–Sims–Star sc heme [4]. Due to the sp ecial symmetries of the problem considered here, w e can dev elop a Parisi-t yp e form ula for the free energy as the sum of tw o Parisi type functionals and a Lagrange multiplier term in the spirit of [69], with mo diﬁcations to account for the c hange in alphab et, the additional constrain ts, and the “m ultiple sp ecies”. The key observ ation is that, due to these symmetries, we can decouple the tw o species through a Lagrange m ultiplier argument. With the p ositiv e temp erature free energy in hand, it remains to compute the zero temp erature limit. W e do so using the results of [50], which computed the zero-temp erature Γ-limit of the non-linear term in general Parisi-t ype functionals. Let us pause to comment on the relation b et w een this approac h and prior work. F ree energies of spin glasses with m ultiple sp ecies and non-symmetric alphab ets hav e been studied in other works [70, 72, 71, 48], and t ypically require substan tially more sophisticated to ols than what we use here. In particular, prior results crucially use Panc henk o’s synchronization mec hanism. W e emphasize that the free energy w e consider here cannot be directly obtained from these w orks. In principle, one migh t compute it b y extending the work of [48] to more general alphab ets — we exp ect this w ould require signiﬁcan tly more adv anced mac hinery from the theory of spin glasses and v ariational calculus. F urthermore, the v ariational problems obtained b y this approac h are t ypically intractable. Our approach b ypasses these issue, and identiﬁes a simple formula in this sp ecial case. Let us no w turn to the analysis of the “ground state energy” functional. W e establish the o verlap gap prop ert y b y analyzing the sign changes of the ﬁrst deriv ative of the restricted ground state energy functional. This analysis requires a precise understanding of the scaling prop erties of the deriv ative as ρ → 0. T o this end, w e utilize the natural connection of Parisi functionals with a family of SDEs, which, in turn, can b e explicitly analyzed in the regime ρ → 0. Outline. The remainder of this paper is structured as follo ws. Theorem 1.2 is established in Section 7. The ﬁrst part follows by a data pro cessing argument, and the second follo ws b y a Slepian-t yp e b ound. Theorem 1.4 is the main contribution of this paper, and is established in Section 2. The pro of dep ends crucially on Theorem 2.1, which deriv es a Parisi type v ariational problem for the restricted energy E ( · ; ρ, λ ). In turn, to derive Theorem 2.1, w e ﬁrst establish a ﬁnite temp erature P arisi formula (Prop osition 4.1) in Section 6, and subsequently compute a limit of this formula as the temp erature conv erges to zero (see Sections 4 and 5). Similar zero temperature formulae ha v e b een instrumental in establishing prop erties of mean ﬁeld spin glasses in the low-temperature regime (see e.g. [9, 10, 52, 50]). W e emphasize that even with Theorem 2.1, the pro of of Theorem 1.4 is subtle, and critically dep ends on understanding the scaling of the v ariational formula as 7 ρ → 0. Lemma 1.6 is established in Section 8. Finally , Section 3 studies the eﬀect of OGP on this problem, and establishes that certain lo cal Mark ov Chain based recov ery algorithms are st ymied b y this structural barrier. Ac kno wledgmen ts. The authors thank an anon ymous referee for p oin ting out a substantial impro v ement to Theorem 1.2 as well as for sev eral constructive comments that ha ve improv ed the exp osition of this pap er. SS thanks Y ash Deshpande for in tro ducing him to the problem. DG gratefully ackno wledges the supp ort of ONR gran t N00014-17-1- 2790. AJ gratefully ackno wl- edges NSER C [RGPIN-2020-04597, DGECR-2020-00199] and the partial supp ort of NSF gran t NSF OISE-1604232. Cette recherc he a ´ et ´ e ﬁnanc´ ee par le Conseil de recherc hes e n sciences naturelles et en g´ enie du Canada (CRSNG). 2. Proof of the overlap gap proper ty T o establish Theorem 1.4, we ﬁrst require the following v ariational form ula for the limiting constrained energy (1.5). Let M ([0 , q ]) denote the space of non-negativ e Radon measures on [0 , ρ ] equipp ed with the w eak-* top ology . Let A denote the set A = { ν ∈ M ([0 , q ]) : ν = m ( s ) ds + cδ ρ , m ( s ) ≥ 0 non-decreasing } F or ν ∈ A and Λ = (Λ 1 , Λ 2 ) ∈ R 2 , let u i ν : [0 , ρ ] × R → R solv e the Cauch y problem ( ∂ t u i ν + 2( ∂ 2 x u i ν + m ( t )( ∂ x u i ν ) 2 ) = 0 ( t, x ) ∈ [0 , ρ ) × R u i ν ( ρ, x ) = ( x + Λ i + 2 c ) + , x ∈ R (2.1) where ( · ) + denotes the p ositiv e part, and ∂ 2 x is the Laplace op erator. Note that any ν ∈ A is lo cally absolutely contin uous with resp ect to the Leb esgue measure on (0 , 1), so that m is almost surely well-deﬁned. As explained in [50, App endix A], there is a unique weak solution to this PDE. Consider then the functional P ( ν, Λ ) = ρu 1 ν (0 , 0) + (1 − ρ ) u 2 ν (0 , 0) − Λ 1 q − Λ 2 ( ρ − q ) − 2 Z sdν ( s ) . W e then hav e the follo wing, Theorem 2.1. F or q N , ρ N as ab ove, we have that E ( q ; ρ, λ ) := lim N →∞ E N ( q N ; ρ N , λ ) = λq 2 + min ν ∈A , Λ ∈ R 2 P ( ν, Λ ) , almost sur ely. W e defer the pro of of this result to Section 4. Let us now complete the pro of of the ov erlap gap prop ert y , Theorem 1.4, assuming Theorem 2.1. Pr o of of The or em 1.4. By Theorem 2.1, w e hav e that E ( q ; ρ, λ ) = λq 2 + min ν ∈A , Λ P ( ν, Λ ) . Observ e that setting c = ν ( { 1 } ), one may make the linear transformation Λ 7→ Λ + 2 c without c hanging the v alue of the functional. Thus it suﬃces to consider the problem min ν ∈A 0 , Λ ∈ R 2 P ( ν, Λ) (2.2) where A 0 are those ν ∈ A with ν ( { ρ } ) = c = 0. By Lemma B.2, we hav e that P is join tly strictly con v ex in ( m, Λ) when restricted to this subspace. Thus the minimum ( ν ∗ , Λ ∗ ) is unique. As such 8 w e can restrict this problem further to the compact set S = { ν : || ν || T V ≤ || ν ∗ ||} ∪ B δ (Λ ∗ ) for some δ > 0. As λq 2 + P ( ν, Λ) is conv ex in q , w e may use Danskin’s env elop e theorem, to show that E ( q ; ρ, λ ) = λq 2 + min S P ( ν, Λ) is diﬀerentiable in q with deriv ativ e ∂ E ∂ q = 2 λq + (Λ ∗ 2 − Λ ∗ 1 ) , (2.3) where Λ ∗ i denote the maximizers of (2.2) corresp onding to q . On the other hand one can show that u i ν is classically diﬀerentiable in its ﬁnal time data, and thus in Λ i for t < ρ by a standard diﬀeren tiable dependence argument (see, e.g., [18, Lemma A.5] for a similar agumen t),. Th us w e ma y diﬀerentiate in Λ to obtain the follo wing ﬁxed p oin t equation for the optimal Λ ∗ : ∂ ∂ Λ 1 u 1 ν (0 , 0) = q ρ ∂ ∂ Λ 2 u 2 ν (0 , 0) = ρ − q 1 − ρ . (2.4) Recall that u i ν solv es the PDE (2.1), and th us ∂ Λ i u i ν is given b y the solution w i = ∂ Λ i u i ν to ( ∂ t w i + L i w i = 0 w i ( ρ, x ) = 1 x +Λ i ≥ 0 , (2.5) where L i = 2 ∂ 2 x + 4 m ( t ) ∂ x u i ν ∂ x is an elliptic operator. Observ e that L i is the inﬁnitesimal generator of the diﬀusion X i t , given b y the solution to the sto c hastic diﬀerential equation dX i t = 2 dB t + 4 m ( t ) ∂ x u i ν ( t, X i t ) dt. (2.6) By Ito’s lemma we hav e, for t ∈ (0 , ρ ), x ∈ R , the sto c hastic representation formula for ∂ Λ i u i ν , ∂ Λ i u i ν ( t, x ) = P ( X i ρ + Λ i ≥ 0 | X i t = x ) . In particular, for t = x = 0, we hav e ∂ Λ i u i ν (0 , 0) = P ( X i ρ + Λ i ≥ 0 | X i 0 = 0) . (2.7) Th us (2.4) and (2.3) identiﬁes Λ ∗ i with quantiles of X i ρ . Next, we deriv e b ounds on Λ ∗ 1 , Λ ∗ 2 b y analyzing the diﬀusion itself. Since ∂ x u i ν is weakly diﬀer- en tiable, we see that it w eakly solves (2.5) as well. Thus ∂ x u i ν ( t, x ) = P ( X i ρ + Λ i ≥ 0 | X i t = x ) (2.8) as well. In particular w e obtain the maxim um principle 0 ≤ ∂ x u i ν ≤ 1. Finally w e require the follo wing estimate on R ρ 0 m ( t ) dt for the unique minimizer m ( s ). Lemma 2.2. We have that R ρ 0 m ( t ) dt ≤ 1 2 p ρ log (1 /ρ ) . W e defer the pro of of this estimate to Section 5 and complete the pro of assuming this estimate. Using 0 ≤ ∂ x u i ν ≤ 1 to control the drift term in (2.6), w e b ound the ab o v e probability as P (2 B ρ ≥ − Λ i | B 0 = 0) ≤ P  X i ρ ≥ − Λ i | X i 0 = 0  ≤ P  2 B ρ ≥ − Λ i − 4 Z ρ 0 m ( t ) dt  . 9 Com bining this with the stationary p oin t conditions for Λ ∗ j (2.4), and Lemma 2.2, w e ha v e, setting Φ as the CDF of a standard Gaussian random v ariable, 2 √ ρ Φ − 1  q ρ  − 2 r ρ log 1 ρ ≤ Λ ∗ 1 ≤ 2 √ ρ Φ − 1  q ρ  2 √ ρ Φ − 1  ρ − q 1 − ρ  − 2 r ρ log 1 ρ ≤ Λ ∗ 2 ≤ 2 √ ρ Φ − 1  ρ − q 1 − ρ  . (2.9) Armed with these b ounds on the optimizers Λ ∗ 1 , Λ ∗ 2 , we can complete the pro of in a relativ ely straigh tforw ard manner. First, observ e that at q = ρ 2 , (2.4) ensures that Λ ∗ 1 = Λ ∗ 2 , and consequently , (2.3) implies ∂ ∂ q E ( ρ 2 ; ρ, λ ) = 2 λρ 2 > 0 . Finally , we ev aluate ∂ q E ( q , ρ, λ ) at q = ρ 2 − δ , In particular, since α < 2 − √ 2 and λ ≤ ρ − α , c ho ose δ suc h that α < 3 − 2 δ 2 < 2 − √ 2 . In this case, we again ha v e, using (2.3), ∂ q E ( ρ 2 − δ ; ρ, λ ) = 2 ρ 2 − δ λ + (Λ ∗ 2 − Λ ∗ 1 ) . (2.10) Again, using the b ounds derived in (2.9), we ha v e, ∂ q E ( ρ 2 − δ ; ρ, λ ) ≤ 2 ρ 2 − δ λ + 2 √ ρ Φ − 1  ρ − ρ 2 − δ 1 − ρ  − 2 √ ρ Φ − 1 ( ρ 1 − δ ) + 2 r ρ log 1 ρ , ≤ 2 λρ 2 − δ − 2 r 2 ρ log 1 ρ (1 + o ρ (1)) + (2 p (1 − δ ) + √ 2) r 2 ρ log 1 ρ ) ≤ r 2 ρ log 1 ρ λ s ρ 3 − 2 δ log( 1 ρ ) − 2 + 2 p (1 − δ ) + √ 2 + o (1) ! . If we tak e ρ suﬃciently small, we obtain ∂ q E ( ρ 2 − δ ; ρ, λ ) < 0 b y our choice of δ . The ab o ve calculation establishes that there exists and ε 0 > 0 suc h that for an y C 1 > 0, ρ > 0 suﬃcien tly small, and λ ∈  C 1 r log 1 ρ ρ , ρ − α  , there exist ρ 2 < ˜ q 0 < ρ 1+ ε 0 suc h that ∂ q E ( ρ 2 ; ρ, λ ) > 0 , ∂ q E ( ˜ q 0 ; ρ, λ ) < 0 . Th us there exist w < ρ 2 < x < y = ˜ q 0 suc h that max { E ( w ; ρ, λ ) , E ( y ; ρ, λ ) } < E ( x ; ρ, λ ) . Next, w e claim that if w e choose C 1 > 2, then max ( w,θ ρ ) E ( q ; ρ, λ ) < max ( θρ,ρ ) E ( q ; ρ, λ ) for 0 < θ < θ 0 for some θ 0 = θ 0 ( C 1 ) > 0. Assuming this claim, if we take ε = ε 0 ∧ θ 0 , we see that the p oin ts w < x < y satisfy (i)-(iii) in deﬁnition 1.3. In particular, ε -OGP holds as desired. As the pro of of this claim is subsumed by our pro of of Part 2 of Theorem 1.2 we end by proving said theorem. (Sp eciﬁcally , see (2.13) b elo w and tak e θ 0 ( C 1 ) = c ( C 1 − 2) there.)  10 Pro of of P art 2 of Theorem 1.2 . Fix ε > 0, and let 0 < c = c ( ε ) < 1, to b e sp eciﬁed later. Recall again that b y Slepian’s comparison inequality [22], comparing H N to an I ID pro cess with the same v ariance, yields E max x ∈ A H N ( x ) ≤ q 2 max x V ar( H N ( x )) log | A | , (2.11) for any A ⊂ Σ N . Applying (2.11) in the case A = Σ N ( ρ N ), yields lim N →∞ 1 N E h max x ∈ Σ N ( ρ N ) ( x, W x ) i ≤ 2 r ρ 3 log 1 ρ (1 + o ρ (1)) . (2.12) By (2.11) and the concen tration of Gaussian maxima [22, Theorem 5.8], with high probabilit y as N → ∞ , 1 N max x ∈ Σ N ( ρ N ):( x,v ) ≤ cN ρ h λ N ( x, v ) 2 + H N ( x ) i ≤ λc 2 ρ 2 + 2 r ρ 3 log 1 ρ (1 + o ρ (1)) + o N (1) . On the other hand, plugging-in x = v , 1 N max x ∈ Σ N ( ρ N ):( x,v ) >cN ρ h λ N ( x, v ) 2 + H N ( x ) i ≥ λρ 2 + o N (1) . Th us selecting c suc h that (2 + ε ) · (1 − c 2 ) > 2, w e hav e 1 N max x ∈ Σ N ( ρ N ):( x,v ) ≤ cN ρ h λ N ( x, v ) 2 + H N ( x ) i < 1 N max x ∈ Σ N ( ρ N ):( x,v ) >cN ρ h λ N ( x, v ) 2 + H N ( x ) i (2.13) for all small enough ρ > 0, implying ( ˆ x MLE , v ) ≥ cN ρ with high probabilit y as N → ∞ . W e conclude that the MLE recov ers the supp ort approximately in this regime.  3. From Overlap Gap Proper ty to Free Energy Wells In this section, we will sho w that the o v erlap gap prop ert y implies a hardness type result for Mon te Carlo Mark o v chains. T o this end, let us ﬁrst deﬁne the relev ant dynamics. Fix β > 0 and consider the Gibbs distribution π β ( dx ) ∝ e β ( x,Ax ) dx, where dx is the counting measure on Σ N ( ρ ). Construct a graph G N with v ertices Σ N ( ρ ), and add an edge b et w een x, x 0 ∈ Σ N ( ρ ) if and only if their Hamming distance is exactly 2. Let ( X t ) t ≥ 0 denote an y nearest neigh b or Mark o v c hain on G N , reversible with resp ect to the stationary distribution π β . By this we mean that the transition matrix Q for X t satisﬁes detailed balance with respect to π β and Q ( x, y ) = 0 if x and y are not connected by an edge. W e show here that when OGP holds, if we run the Mark ov Chain with initial data, π β ( dx |I ), where I is as in Theorem 1.5 and β is suﬃcien tly large, it tak es at least exp onen tial time for the c hain to hit the region of order ρ o v erlap. 3.1. F ree energy wells. Let G = ( V , E ) b e a ﬁnite graph and ν denote a probabilit y measure on V . F or a ∈ R and an  > 0, let B  ( a ) = [ a − , a +  ] . F or an y function f : V → R , consider the follo wing “rate function”, I f ( a ;  ) = − log ν ( { x : f ( x ) ∈ B  ( a ) } ) . F or an y tw o vertices x, y ∈ V w e say that x ∼ y if x and y are connected by an edge. W e say that a function f : V → R is K -Lipsc hitz if there is some K such that max x ∼ y | f ( x ) − f ( y ) | ≤ K. 11 Deﬁnition 3.1. W e sa y that f has an  - fr e e ener gy wel l of depth h in [ a, b ] if there exists a c ∈ [ a, b ] suc h that B  ( a ) , B  ( b ) and B  ( c ) are disjoin t and min { I f ( a,  ) , I f ( b,  ) } − I f ( c,  ) ≥ h. W e then hav e the following whose pro of is an adaptation of [17, Theorem 7.4] to this setting. See also [43]. Theorem 3.2. L et X t denote a ne ar est neighb or Markov chain on G which is r eversible with r esp e ct to ν . If f is  -Lipschitz and has an  -fr e e ener gy wel l of depth h in [ a, b ] , then for A = f − 1 ([ a, b ]) , the exit time of A , denote d τ A c , satisﬁes Z Q x ( τ A c ≤ T ) dν ( x | A ) ≤ T e − h for any T , wher e Q x is the law of X t . Pr o of. In the following, let A −  = f − 1 ([ a, b ] \ ( B  ( a ) ∪ B  ( b ))) , A = f − 1 ([ a, b ]) , and B  = A \ A −  . Let us deﬁne the b oundary of A to b e the set ∂ A = { x ∈ A : ∃ y ∈ A c : x ∼ y } . Observ e that since f is  -Lipschitz, ∂ A ⊂ B  . Let ¯ X t b e the Mark o v chain deﬁned on A which is X t reﬂected at the b oundary of A . That is, ¯ X t has transition matrix ( ¯ Q ( x, y )) which is identical to A if x ∈ A \ ∂ A and y ∈ A and for x ∈ ∂ A , ¯ Q ( x, y ) ∝ ( Q ( x, y ) y ∈ A 0 else . Note that by detailed balance ¯ X t is reversible with resp ect to ˜ ν = ν ( ·| A ), the in v ariant measure of X t conditioned on A . Let τ ∂ A denote the ﬁrst time either X t or ¯ X t hits ∂ A . Note that for t ≤ τ ∂ A , the Mark o v Chains X t and ¯ X t , started from a common state in A follo w the same tra jectory . As a result, Z A Q x ( τ ∂ A < T ) d ˜ ν ( x ) = Z A ¯ Q x ( τ ∂ A < T ) d ˜ ν ( x ) . W e no w estimate the righ t hand side. Since ∂ A ⊂ B  , Z A ¯ Q x ( τ ∂ A < T ) d ˜ ν ≤ Z A ¯ Q x ( ∃ i ≤ T : X i ∈ B  ) d ˜ ν ≤ X i Z A ¯ Q x ( X i ∈ B  ) d ˜ ν = T ˜ ν ( B  ) ≤ T e − h , where the last equality follows by stationarity and the last inequality follows by assumption that f has an  -free energy well of height h .  3.2. F rom the Overlap Gap Prop ert y to F ree energy wells at low temp erature. W e now establish that if the ov erlap gap prop ert y holds, then the ov erlap m ( x ) = 1 N ( x, v ) has a free energy w ell for β > 0 suﬃcien tly large. In the follo wing, let Σ N ( ρ, q ) =    x ∈ Σ N ( ρ ) : N ρ X i =1 x i = N q , N X i = N ρ +1 x i = N ( ρ − q )    . (3.1) In deﬁning the set Σ N ( ρ, q ), w e implicitly use the distributional in v ariance of A under ro w/column p erm utations to assume, without loss of generalit y , that v is of the form v = (1 , . . . , 1 , 0 , . . . , 0) , (3.2) 12 where the ﬁrst N ρ N en tries are 1 and the remaining are 0. F or λ, β > 0, let F N ( λ, β , ρ, q ) = 1 N E log   X x ∈ Σ N ( ρ,q ) exp  β  λN q 2 + H N ( x )     , (3.3) where H N is deﬁned by (1.3) and Σ N ( ρ, q ) is deﬁned in (3.1). Let F N ( λ, β , ρ N ) b e deﬁned similarly except with the sum running o v er the set Σ N ( ρ N ). Using [50, Lemma 2.6], we ha ve, | 1 β F N ( λ, β , ρ N , q N ) − E h 1 N max x ∈ Σ N ( ρ N ,q N ) ( x, Ax ) i | ≤ log | Σ N ( ρ N , q N ) | N β ≤ C ( ρ, q ) β , (3.4) for some C ( ρ, q ) > 0 indep enden t of N . Observe that by combining this b ound with (1.5) implies that if the limit F ( λ, β , ρ, q ) = lim F N ( λ, β , ρ N , q N ) exists, then E ( q ; ρ, λ ) = lim β →∞ 1 β F ( λ, β , ρ, q ) . Theorem 3.3. F or any ε, λ, ρ > 0 , if the ε -overlap gap pr op erty holds, then ther e ar e p oints w < ρ 2 < x < y < ρ with y = o ρ ( ρ ) such that for N suﬃciently lar ge and any δ > 0 , the overlap m ( x ) has an δ -fr e e ener gy wel l of depth N h in [ w , y ] with pr ob ability 1 − O ( e − cN ) . Pr o of. By deﬁnition of the ε -ov erlap gap prop ert y , w e may take w < ρ 2 < x < y = o ρ ( ρ ) , such that max { E ( w ; ρ, λ ) , E ( y ; ρ, λ ) } < E ( x ; ρ, λ ) . F urthermore, by contin uit y of the map q 7→ E ( q ; ρ, λ ) (see (2.3)), w e ma y assume that there is an δ > 0 and h > 0, suc h that max q ∈ B δ ( w ) ∪ B δ ( y ) E ( q ; ρ, λ ) − min q 0 ∈ B δ ( x ) E ( q 0 ; ρ, λ ) ≤ − h and such that B δ ( w ) , B δ ( y ), and B δ ( x ) are pairwise disjoint. By (3.4), we then hav e that max q ∈ B δ ( w ) ∪ B δ ( y ) F N ( λ, β , q ) − min q 0 ∈ B δ ( x ) F N ( λ, β , q 0 ) ≤ − h 2 for β suﬃciently large. F or a set E ⊂ Σ N , let Z N ( E ) = P x ∈ E e β ( x,Ax ) , and let A N = m (Σ N ( ρ )) denote the image of the o v erlap function. Note that |A N ∩ B δ ( a ) | ≤ C · N · δ for some constant C > 0. By a union b ound, w e hav e | 1 N log Z N ( B δ ( a )) − max q ∈ B δ ( a ) ∩A N 1 N log Z N (Σ N ( ρ, q )) | ≤ C · δ . Recall that by Gaussian concen tration of measure [68, Theorem 1.2], there is a C > 0 suc h that for N ≥ 1 and η > 0, max q ∈A N 1 N log P  | F N ( λ, β , q ) − 1 N log ( Z N (Σ N ( ρ, q )) | ≥ η  ≤ − C η 2 1 N log P  | F N ( λ, β ) − 1 N log Z N (Σ N ( ρ )) | ≥ η  ≤ − C η 2 . If we tak e δ and η suﬃcien tly small, w e see that max n 1 N log Z N ( B δ ( w )) , 1 N log Z N ( B δ ( y )) o − 1 N log Z N ( B δ ( x )) ≤ − h 4 , with probabilit y 1 − O ( e − cN ), where we ha v e combined the abov e concen tration bounds with a union b ound, using the fact again that |A N | ≤ C · N . Setting I m as the rate-function corresp onding to the 13 o v erlap m with resp ect to the measure π β ∝ exp( β ( x, Ax )) on Σ N ( ρ ), and subtracting the ab o v e from 1 N log Z N (Σ N ( ρ )), we ha ve, min { I m ( a ; δ ) , I m ( b ; δ ) } − I m ( q 1 ; δ ) ≥ N h 4 with probability 1 − O ( e − cN ) . This yields the desired result.  Observ e that m ( x ) is 1 / N -Lipsc hitz. Combining Theorem 3.2 with Theorem 3.3 then immediately yields the follo wing corollary . Corollary 3.4. F or any ε, λ, ρ > 0 , if the ε -overlap gap pr op erty holds, then ther e ar e p oints w < ρ 2 < x < y < ρ with y = o ρ ( ρ ) and an h > 0 such that with pr ob ability 1 − O ( e − c 0 N ) , if I = ( a, b ) , then the exit time of I , τ I c , satisﬁes Z Q x ( τ I c ≤ T ) dπ ( x |I ) ≤ T e − chN for some c > 0 . 4. V aria tional formula for constrained energy W e establish Theorem 2.1 in this section. T o b egin we in troduce a relaxation of the optimization problem (1.4), called the “p ositiv e temp erature free energy” of the problem. Recall that we may assume, without loss of generality , that v is of the form v = (1 , . . . , 1 , 0 , . . . 0) , (4.1) where the ﬁrst N ρ N en tries are 1 and the remaining are 0. Recall that ρ N is a sequence suc h that N ρ N ∈ N and ρ N → ρ . F or an y q ∈ [0 , ρ ], ﬁx a sequence q N → q such that N q N ∈ N and N q ∈ [ N q N − 1 / 2 , N q N + 1 / 2). In the subsequen t, we will refer to a sequence ( ρ N , q N ) → ( ρ, q ) that satisﬁes these conditions as an admissible se quenc e . W e begin by deriving the follo wing formula for the limiting free energy , F ( β , ρ, q ) = lim F N (0 , β , ρ N , q N ), where F N is as in (3.3) and β > 0 is ﬁxed. T o this end, let Λ 1 , Λ 2 ∈ R , and µ ∈ M 1 ([0 , q ]), where M 1 ([0 , q ]) is the space of probability measures on [0 , q ] equipp ed with the weak-* top ology . Let u i µ,β b e the unique weak solutions to the Cauc h y problem ( ∂ t u i µ,β + 2( ∂ 2 x u i µ,β + β µ ([0 , t ])( ∂ x u i µ,β ) 2 ) = 0 ( t, x ) ∈ [0 , ρ ) × R u i µ,β ( ρ, x ) = 1 β log (1 + exp( β ( x + Λ i ))) . (4.2) F or a deﬁnition of w eak solution as w ell as w ell-p osedness of the Cauc h y problem see [51, Sec. 2] (alternativ ely , see [18, App endix A]). Consider the functional P β ( µ, Λ 1 , Λ 2 ; q ) = log 2 β − Λ 1 q − Λ 2 ( ρ − q ) + ρu 1 µ,β (0 , 0) + (1 − ρ ) u 2 µ,β (0 , 0) − 2 Z ρ 0 sβ µ ([0 , s ]) ds. (4.3) W e then hav e the follo wing. Prop osition 4.1. F or β > 0 and any admissible se quenc e ( ρ N , q N ) → ( ρ, q ) we have that F ( β , ρ, q ) = lim N →∞ F N (0 , β , ρ N , q N ) exists and satisﬁes 1 β F ( β , ρ, q ) = min µ ∈ M 1 ([0 ,q ]) , Λ 1 , Λ 2 ∈ R P β ( µ, Λ 1 , Λ 2 ; q ) . (4.4) In p articular, this minimum is achieve d. 14 W e defer the pro of of this result to Section 6. 4.1. Pro of of v ariational formula. W e now compute the zero-temp erature limit of the p ositiv e temp erature problem. Theorem 4.2. We have that lim β →∞ 1 β F ( β , ρ, q ) = min ν ∈A , Λ ∈ R 2 P ( ν, Λ ; q ) Recall that b y (3.4), this immediately implies Theorem 2.1. T o this end, we study the conv ergence of the ab o v e v ariational problem as β → ∞ . Let us ﬁrst recall the notion of sequential Γ-conv ergence. Deﬁnition 4.3. Let X b e a top ological space. W e sa y that a sequence of functionals F n : X → [ −∞ , ∞ ] se quential ly Γ -c onver ges to F : X → [ −∞ , ∞ ] if (1) The Γ − lim inequality holds: F or every x and sequence x n → x , lim n →∞ F n ( x n ) ≥ F ( x ) . (2) The Γ − lim inequality holds: F or every x , there exists a sequence x n → x suc h that lim n →∞ F n ( x n ) ≤ F ( x ) . F or a sequence of functionals F β indexed by a real parameter β , we say that F β sequen tially Γ- con v erges to F if for any sequence β n → ∞ , the sequence F β n sequen tially Γ conv erges to F . Recall that in [50, Theorem 3.2] it w as shown that the functionals F β ( ν, Λ i ) = ( u i µ,β (0 , 0) ν = β µ ([0 , s ]) ds, µ ∈ M 1 ([0 , q ]) + ∞ o.w. sequen tially Γ-conv erges to the solution of (2.1), F β ( ν, Λ i ) Γ → u i ν (0 , 0) . (4.5) Let G β ( ν, Λ 1 , Λ 2 ) = ρ F β ( ν, Λ 1 ) + (1 − ρ ) F β ( ν, Λ 2 ) G ( ν, Λ ) = ρu 1 ν (0 , 0) + (1 − ρ ) u 2 ν (0 , 0) . F urthermore, let E β ( ν, Λ ) = G β ( ν, Λ 1 , Λ 2 ) − Λ 1 q − Λ 2 ( ρ − q ) − Z sdν ( s ) , E ( ν , Λ ) = G ( ν, Λ ) − Λ 1 q − Λ 2 ( ρ − q ) − Z sdν ( s ) . The preceding results (with minor modiﬁcation) will yield the follo wing result. Lemma 4.4. We have that E β Γ → E . Pr o of. By (4.5),we ha ve the Γ − lim inequalit y: for any sequence ( ν β , Λ β ), lim G β ( ν β , Λ β 1 , Λ β 2 ) ≥ ρ lim F β ( ν β , Λ β i ) + (1 − ρ ) lim F β ( ν β , Λ β i ) = G ( ν, Λ ) . It remains to pro v e the Γ − lim upp er b ound. This will follow since the recov ery sequence in the Γ − lim upp erb ound of (4.5) do es not depend on the functional itself. 15 More precisely , ﬁx ( ν, Λ ). Consider the sequence ( ν β , Λ β ) with Λ β = Λ , and ν β constructed as in [52, Lemma 2.1.2] (see alternativ ely [50, Lemma 3.4]). Consequen tly , w e hav e that lim G β ( ν β , Λ ) = lim F β ( ν β , Λ 1 ) + F β ( ν β , Λ 2 ) = G ( ν, Λ ) , along this sequence as each of the summands conv erge by applying [50, Lemma 3.3] termwise.  Lemma 4.5. Any se quenc e of minimizers of E β is pr e-c omp act. F urthermor e, any limit p oint of such a se quenc e is a minimizer of E and lim min E β = min E . In fact, this sequence is unique, ho w ev er w e will not require this. The compactness of ν β is established in the following lemma, whose pro of is deferred to Section 5. Lemma 4.6. F or every β > 0 , k ν β k T V ≤ 1 2 p ρ log (1 /ρ ) . Pro of of Lemma 4.5 . The compactness of ν β follo ws from Lemma 4.6. On the other hand, by Lemma 5.1 b elo w, Λ β i lie in a uniformly b ounded set. Th us the sequence is pre-compact. The second half of the result follo ws by the fundamental theorem of Γ-conv ergence.  Pro of of Theorem 4.2 . This follows by com bining Lemma 4.4-4.5, and Prop osition 4.1.  Pro of of Theorem 2.1. This follows immediately up on com bining (3.4) with Theorem 4.2.  5. Bounds on optimal measures In this section, we pro v e Lemmas 2.2 and 4.6. T o this end, we need the follo wing useful notation. Let Λ = (Λ 1 , Λ 2 ), and let us abuse notation to denote by dρ , the tw o atomic measure on { 1 , 2 } , dρ = ρδ 1 + (1 − ρ ) δ 2 . Let v i ( t, x ) b e giv en by the c hange of v ariables v i ( t, x ) = β u i µ,β ( t, x/β ; Λ i /β ) . (5.1) Note that v i solv es ( ∂ t v + 2 β 2 ( ∂ 2 x v + µ ([0 , s ])( ∂ x v ) 2 ) = 0 ( t, x ) ∈ [0 , ρ ) × R 2 v ( ρ, x ) = log(1 + exp( x + Λ)) , (5.2) with Λ = Λ i . It is then helpful to rewrite the functional P β = β P β in the follo wing form, P β ( µ, Λ 1 , Λ 2 ; q ) = − q Λ 1 − ( ρ − q )Λ 2 + Z v i (0 , 0) dρ ( i ) − 2 β 2 Z sµ ( s ) ds. In the follo wing, it will be useful to note the ﬁrst order optimality conditions for this functional. T o this end, let ˆ X s solv e the sto c hastic diﬀerential equation d ˆ X i s = 4 β 2 µ ([0 , t ]) ∂ x v i ( s, ˆ X i s ) + 2 β dB t (5.3) with initial data ˆ X i 0 = 0, and let G µ, Λ ( t ) = Z ρ t 4 β 2  Z E ( ∂ x v i ) 2 ( s, ˆ X i s ) dρ ( i ) − s  ds. Lemma 5.1. We have the fol lowing. (1) F or every β > 0 ther e is a unique minimizing triple ( µ, Λ ) of P β . F urthermor e, the set of optimizing { Λ β } β lie in a c omp act subset of R 2 . 16 (2) This triple satisﬁes the optimality c onditions µ ( G µ, Λ ( s ) = min s G µ, Λ ( s )) = 1 (5.4) Z ∂ Λ i v i (0 , 0) dρ ( i ) = ρ. (5.5) (3) In p articular, for any q ∈ supp ( µ ) , q = Z E ( ∂ x v i ) 2 ( q , ˆ X i q ) dρ ( i ) . (5.6) Pr o of. W e b egin with item (1), and ﬁrst prov e the uniqueness of the minimizing pair. T o see this, w e note that b y [50, Lemma 4.2], the map ( µ, Λ) 7→ w (0 , 0) where w w eakly solves (5.2) is strictly conv ex. Th us P β is strictly con vex. T o sho w existence of a minimizing pair, note that since M 1 ([0 , q ]) is weak-* compact, it suﬃces to sho w that Λ lives in a compact subset of R 2 . By the parab olic comparison principle (see [50, Lemma 4.6] in this case), it follows that, for an y µ , v i (0 , 0) ≥ log(1 + exp(Λ i )) . Th us P β ( µ, Λ 1 , Λ 2 ; q ) ≥ log (1 + exp(Λ 1 )) − q Λ 1 + log (1 + exp(Λ 2 )) − ( ρ − q )Λ 2 − Z sds, whic h div erges to inﬁnity as Λ 1 , Λ 2 → ±∞ , from which the compactness result follo ws. In fact, this shows that the set is β indep enden t. It remains to derive the optimality conditions. F or item (3), note that the ﬁxed p oin t equation (5.6) for the supp ort follo ws up on diﬀerentiating G and applying (5.4). T o obtain (5.5), ﬁrst note that v i are diﬀeren tiable in Λ i — this follows by a classical diﬀerentiable dep endence argument, see, e.g., [18, Lemma A.5]. Explicitly diﬀeren tiating the functional in Λ i , (5.5) then follo ws up on observing the relation ρ  q ρ  + (1 − ρ )  ρ − q 1 − ρ  = ρ. The ﬁrst-order stationary condition for G (5.4) then follo ws b y ﬁrst ﬁxing Λ i and computing the ﬁrst v ariation of the maps µ 7→ v i (0 , 0; Λ i ) . This has b een done in [50, Lemma 4.3] follo wing [53, Lemma 3.2.1]. In particular, this yields the follo wing ﬁrst v ariation form ula for P β : if µ t is a weak-* right diﬀerentiable path in M 1 ([0 , q ]) ending at µ , in the sense that ˙ µ = lim t → 0 + µ t − µ t exists weak-*, then d dt | t =0 P β ( µ t , Λ ; q ) = Z G µ, Λ ( s ) d ˙ µ. By the ﬁrst order optimalit y condition for con v ex functions, it follows that the righthand side is non-negativ e for all such paths if and only if w e choose µ 0 to b e the optimizer of P β ( · , Λ ; q ). If we then take the path µ t = tδ s 0 + (1 − t ) µ , we see that G µ, Λ ( s 0 ) ≥ Z G µ, Λ dµ, from which (5.4) follows.  17 Recall the function F introduced in Prop osition 4.1. Armed with Lemma 5.1, w e next establish a form ula for the β -deriv ative of F . Lemma 5.2. We have that ∂ β F ( β , ρ, q ) = 2 β Z ( ρ 2 − q 2 ) dµ ( q ) . Pr o of. W e start with (4.3) and (4.4), and observe that w e can equiv alently express F ( β , ρ, q ) = log 2 + min Λ i ,µ P β ( µ, Λ 1 , Λ 2 ; q ) . By the same argument as [50, Theorem 4.1], we see that ∂ β v i (0 , 0) = β  4 ρ E  ∂ 2 x v i ( ρ, ˆ X i ρ ) + ( ∂ x v i ) 2 ( ρ, ˆ X i ρ )  − 4 Z ρ 0 s E ( ∂ x v i ) 2 ( s, ˆ X i s ) dµ  , (5.7) where ˆ X i solv es the sto chastic diﬀeren tial equation (5.3). [18, Lemma A.5] implies that w i = ∂ Λ i v ( t, x ) weakly solves ( ∂ t w i + L i w = 0 ( t, x ) ∈ [0 , ρ ) × R w i ( T , x ) = ∂ Λ i log(1 + exp( x + Λ i )) x ∈ R , where L i is the inﬁnitesimal generator for ˆ X i . Therefore, E h ∂ Λ i v i ( ρ, ˆ X i ρ ) | ˆ X i 0 = 0 i = ∂ Λ i v i (0 , 0) . By direct computation and (5.2), ∂ 2 x v i ( ρ, x ) + ( ∂ x v i ) 2 ( ρ, x ) = ∂ Λ i v i ( ρ, x ) . Com bining these observ ations with (5.7), and using the optimality conditions (5.5) and (5.6) yields ∂ β F ( β , ρ, q ) = ρ∂ β v 1 (0 , 0) + (1 − ρ ) ∂ β v 2 (0 , 0) − 4 β Z ρ 0 sµ ([0 , s ]) ds = β  4 ρ · ρ − 4 Z ρ 0 s 2 dµ ( s )  − 2 β Z ρ 0 ( ρ 2 − s 2 ) dµ ( s ) = 2 β Z ( ρ 2 − s 2 ) dµ ( s ) as desired.  W e no w turn to the pro of of Lemmas 2.2 and 4.6. Pro ofs of Lemmas 2.2 and 4.6 . Let us b egin b y observing that Lemma 2.2 follo ws from Lemma 4.6. T o see this, recall the functional P β from (4.4) and let µ β denote the corresp onding minimizer. Then, b y Lemma 4.5, there is a limit point of the sequence ν β = β µ β ([0 , s ]) dt , call it ν = ˜ m ( t ) dt + cδ ρ , and any such limit p oin t is a minimizer of P . Observe that, b y the reduction from (2.2), and the strict con v exit y of the corresponding problem, ˜ m = m . W e no w observe that along any subsequence con v erging to ν , Z ρ 0 m ( t )d t ≤ lim β →∞ β Z ρ 0 µ β ([0 , t ])d t. (5.8) The desired b ound then follows from Lemma 4.6. Let us no w turn to the pro of of Lemma 4.6. By F ubini’s theorem, w e hav e that β Z ρ 0 µ β ([0 , t ])d t = β Z ρ 0 ( ρ − q )d µ β ( q ) ≤ 1 2 ρ β Z ρ 0 ( ρ 2 − q 2 )d µ β ( q ) = 1 4 ρ ∂ β F ( β , ρ, q ) , 18 where the last inequality follo ws using Lemma 5.2. Next, recall F N ( β , ρ N , q N ) from (3.3). By diﬀeren tiation, observe that F N ( β , ρ N , q N ) is conv ex in β , and thus Prop osition 4.1 implies that F ( β , ρ, q ) is conv ex as well. Thus ∂ β F N ( β , ρ N , q N ) → ∂ β F ( β , ρ, q ), by Griﬃth’s lemma for conv ex functions. Consequently , we hav e, β Z ρ 0 µ β ([0 , t ])d t ≤ 1 4 ρ lim N →∞ ∂ β F N ( β , ρ N , q N ) . Finally , note that ∂ β F N ( β , ρ N , q N ) = 1 N E [ h H N i ] ≤ 1 N E max Σ N ( ρ N ,q N ) H N ≤ 2 r ρ 3 log 1 ρ (1 + o N (1)) , where h·i denotes in tegration with resp ect to the Gibbs measure π ( { σ } ) ∝ exp( − H N ( σ )) 1 Σ N ( ρ N ,q N ) , the ﬁrst equality follows b y Gaussian in tegration-by -parts [68, Lemma 1.1], and the last b ound follo ws by bounding h H N i b y the maximum and applying Slepian’s comparison inequality (2.11) with A = Σ N ( ρ N , q N ). Thus we obtain, Z ρ 0 β µ β ([0 , t ]) dt ≤ 1 2 p ρ log 1 /ρ, as desired.  6. Proof of Free energy formula In this section, we aim to prov e that for ( ρ N , q N ) → ( ρ, q ) admissible, lim N →∞ F N ( β , ρ N , q N ) = inf µ ∈ M 1 ([[0 ,ρ ]) Λ ∈ R 2 P β ( µ, Λ; q ) . (6.1) Note that this agrees with the statemen t of Proposition 4.1 after dividing through b y β , after noting that by Lemma 5.1, the inﬁm um is actually achiev ed. T o this end, ﬁrst write v as in (3.2). W e ma y then view x ∈ { 0 , 1 } N as x ∈ { 0 , 1 } N 1 × { 0 , 1 } N 2 where N 1 = N ρ N . W e call x the conﬁguration, and x i the spin of the i particle. W e call I 1 = [ N 1 ] the ﬁrst sp e cies of particles and I 2 = [ N ] \ I 1 the se c ond sp e cies of particles. W e may then view the log-lik eliho od, H N , as the Hamiltonian for a two-sp e cies spin glass mo del, H N ( x ) = √ 2 √ N X i,j g ij x i x j , where here g ij are i.i.d. N (0 , 1) (and in particular, ( g ij ) is not symmetric). Th us our goal is to compute the free energy of this tw o-sp ecies spin glass mo del constrained to certain classes of conﬁgurations. In the follo wing, it will b e useful to deﬁne the overlap b et ween tw o p oin ts σ 1 , σ 2 ∈ Σ N ( ρ ) by R ( σ 1 , σ 2 ) = 1 N N X i =1 σ ` i σ ` i . When the notation is unam biguous we will also denote R ( σ i , σ j ) = R ij . (In particular, w e let R 11 = R ( σ 1 , σ 1 )). It will also b e useful to deﬁne the intr asp e cies overlaps , R (1) ( σ 1 , σ 2 ) = 1 N 1 N 1 X i =1 σ 1 i σ 2 i , R (2) ( σ 2 , σ 2 ) = 1 N 2 N X i = N 1 +1 σ 1 i σ 2 i . Note that for i, j = 1 , 2, w e hav e R ij = ρ N R (1) ij + (1 − ρ N ) R (2) ij . 19 6.1. The Upp er Bound. Let A r denote the ro oted tree of depth r where each non-leaf vertex has countably man y children, i.e., the ﬁrst r levels of the Ulam-Harris tree. Note that we may view the leav es of this tree as the set N r , where α = ( α 1 , . . . , α r ) denotes the ro ot leaf path ∅ → α . W e denote this path by p ( α ). F or more on this notation see App endix A. Theorem 6.1. Supp ose that ( ρ, q ) is such that Σ N ( ρ, q ) is non-empty. Then for N ≥ 1 , we have that E F N ( β , ρ, q ) ≤ inf µ, Λ P β ( µ, Λ; q ) . (6.2) Pr o of. Let us b egin with the case where µ has ﬁnite supp ort. T o see this, let ( v α ) α ∈ N r denote a Ruelle probability cascade (RPC) corresp onding to parameters ( µ ` ) (F or a deﬁnition of Ruelle probabilit y cascades see App endix A.). Let ( Z ( α )) denote the centered Gaussian pro cess on N r with cov ariance E Z ( α ) Z ( γ ) = 4 β 2 Q | α ∧ γ | , where | α ∧ γ | denotes the depth of the least common ancestor of α and γ in A r , and let ( Z i ( α )) denote i.i.d. copies of this pro cess. Similarly let ( Y ( α )) b e the cen tered Gaussian pro cess on N r with cov ariance E Y ( α ) Y ( γ ) = 2 β 2 Q 2 | α ∧ γ | . T ake the pro cesses Y and Z to b e indep enden t of eac h other and H N . F or t ∈ [0 , 1], deﬁne the in terp olating Hamiltonian H t ( σ, α ) : Σ N ( ρ, q ) × N r → R giv en by H t ( σ, α ) = √ t ( β H N ( σ ) + √ N Y ( α )) + √ 1 − t N X i =1 Z i ( α ) σ i ! . Finally let φ ( t ) = 1 N E log X α,σ v α exp( H t ( σ, α )) . T o con trol this, let us recall Gaussian in tegration-b y-parts for Gibbs measures (see, e.g., [68, Lemma 1.1]): Lemma. (Inte gr ation by p arts for Gibbs me asur es) L et X b e at most c ountable and let ( u ( x )) , ( v ( x )) b e c enter e d Gaussian pr o c esses on X with mutual c ovarianc e C ( x, y ) = E u ( x ) v ( y ) . L et G b e the pr ob ability me asur e with G ( { x } ) ∝ exp( v ( x )) . We have the identity E Z X u ( x ) dG ( x ) = E Z Z X 2 C ( x 1 , x 1 ) − C ( x 1 , x 2 ) dG ⊗ 2 . (6.3) If we let ( σ ` , α ` ) denote indep enden t draws from the Gibbs measure π t ( { σ, α } ) ∝ exp( H t ( σ, α )) , then integrating b y-parts, w e hav e that φ 0 ( t ) = 1 N E h ∂ t H t ( σ, α ) i = 1 N E h C 11 − C 12 i where C ij = E ∂ t H t ( σ i , α i ) · H t ( σ j , α j ) = β 2 ( R ij − Q | α i ∧ α j | ) 2 i, j = 1 , 2 . In the case that i = j observe that, by deﬁnition of Σ N ( ρ, q ), R 11 = ρ = Q r so that C 11 = 0. Th us φ 0 ≤ 0. The result then follows by comparing the b oundary conditions. In particular, re-arranging the inequality φ (1) ≤ φ (0) yields E F N ≤ 1 N E log X α v α X σ ∈ Σ N ( ρ,q ) exp( X i Z i ( α ) σ i ) − 1 N E log X α v α exp( √ N Y ( α )) , (6.4) 20 By Corollary A.3, the second term is equal to 1 N E log X α v α e √ N Y ( α ) = 1 2 Z 4 β 2 sµ ([0 , s ]) ds. It remains to upp er-b ound the ﬁrst term. T o this end, observ e that the set Σ N ( ρ, q ) is deﬁned via a constraint on the intra-species o v erlaps. If w e add Lagrange m ultipliers, Λ 1 and Λ 2 , for the constraints that R (1) 11 = ρ and R (2) 11 = ( ρ − q ), the ﬁrst term in (6.4) is equal to 1 N E log X α v α X σ ∈ Σ N ( ρ,q ) exp  X i Z i ( α ) σ i + Λ 1 N 1 R (1) 11 + Λ 2 N 2 R (2) 11  − Λ 1 q − Λ 2 ( ρ − q ) ≤ 1 N E log X α v α X σ ∈ Σ N exp  X i Z i ( α ) σ i + Λ 1 N 1 R (1) 11 + Λ 2 N 2 R (2) 11  − Λ 1 q − Λ 2 ( ρ − q ) , (6.5) where the inequalit y follows b y the set containmen t Σ N ( ρ, q ) ⊆ Σ N = { 0 , 1 } N . Observe that the summation in σ is now ov er a pro duct space, and that R ( ` ) 11 = 1 N 1 P i ≤ N 1 σ ` i , since σ i ∈ { 0 , 1 } . Th us the ﬁrst term in this displa y is of the form 1 N E log X α v α N 1 Y i =1 (1 + exp( Z i ( α ) + Λ 1 )) N Y i = N 1 +1 (1 + exp( Z i ( α ) + Λ 2 )) . Applying (b oth parts of ) Theorem A.2 w e see that this is giv en by ρ E log X α v α (1+exp( Z ( α )+Λ 1 ))+(1 − ρ ) E log X v α (1+exp ( Z ( α ) + Λ 2 )) = ρϕ 1 µ (0 , 0)+(1 − ρ ) ϕ 2 µ (0 , 0) , (6.6) where here we used that N = N 1 + N 2 = ρN + (1 − ρ ) N , and ϕ i µ = v i for v i as in (5.1). This gives the desired upp er b ound, E F N ( β , ρ , q ) ≤ P β ( µ, Λ , q ) (6.7) for µ of ﬁnite supp ort. W e obtain the result for general µ b y contin uit y . Speciﬁcally , recall that u i µ admits contin uous dep endence on µ (see [51, Sec 2.4]) and the last term in the deﬁnition of P β is a b ounded linear functional of µ .(Alternatively , w e may use Lemma 6.5 b elo w for b oth terms.) Thus P β w eak-* con tin uous in µ . Since the set of measures of ﬁnite supp ort is weak-* dense in M 1 ([0 , ρ ]), we obtain (6.7) for all µ ∈ M 1 ([0 , ρ ]) and Λ. Minimizing yields the desired b ound.  6.2. Lo w er b ound via Aizenman-Sims-Starr scheme. It no w remains to pro v e the match- ing lo w er b ound. In the following let Z N ( E ) = P σ ∈ E exp( H ( σ )) . W e b egin with the following inequalit y: for an y M ≥ 1, lim N →∞ 1 N E log Z N (Σ N ( ρ N , q N )) ≥ lim N →∞ 1 M [ E log Z N + M (Σ N + M ( ρ N , q N )) − E log Z N (Σ N ( ρ N , q N ))] . W e call these new M co ordinates c avity c o or dinates . Let us take M = M 1 + M 2 where we add M 1 ca vit y co ordinates to the ﬁrst sp ecies and M 2 to the second. Throughout the following w e will take the admissible sequence to b e such that | ρ N − ρ | , | q N − q | ≤ C N . (6.8) (The b ound for q N follo ws by deﬁnition of admissibility .) 21 Let us no w decomp ose the Hamiltonian into the part induced b y the ca vity and the remaining. W e set H 0 N ( σ ) = √ 2 β √ N + M N X i,j =1 g ij σ i σ j , z N ,i ( σ ) = √ 2 β √ N + M N X j =1 ( g ij + g j i ) σ j , y N ( σ ) = √ 2 β p N ( N + M ) N X i,j =1 g 0 ij σ i σ j , where the g 0 ij are indep enden t of the g ij . It then follows that β H N + M ( σ ) = H 0 N ( σ ) + N + M X i = N +1 z N ,i ( σ ) σ i + r 1 ( σ ) β H N ( σ ) = H 0 N ( σ ) + √ M y N ( σ ) + r 2 ( σ ) , where r i is a cen tered Gaussian pro cess with V ar( r i ) = O ( 1 N ). Then for J = { N + 1 , . . . , N + M } , E log Z N + M (Σ N + M ( ρ N + M , q N + M )) = E log X σ ∈ Σ N + M ( ρ N + M ,q N + M ) exp( H 0 N ( σ )) · Y i ∈ J exp( z N ,i ( σ ) σ i ) + o (1) (6.9) E log Z N (Σ N ( ρ N , q N )) = E log X σ ∈ Σ N ( ρ N ,q N ) exp( H 0 N ( σ ) + √ M y N ( σ )) + o (1) , (6.10) where we ha ve eliminated the r i dep endence on the righ t hand side b y the following lemma. Lemma 6.2. L et X b e a ﬁnite set. L et r : X → R b e a c enter e d Gaussian pr o c ess on X and let H ( x ) b e a Gaussian pr o c ess indep endent of r . Then | E log X x ∈X exp( H ( x ) + r ( x )) − E log X x ∈X exp( H ( x )) | ≤ V ar ( r ) . Pr o of. Let Y t ( x ) = H ( x ) + √ tr ( x ) and ψ ( t ) = E log X exp Y t . Then if w e let G t ∈ P r ( X ) b e G t ( { x } ) ∝ e Y t then we ha ve ∂ t ψ ( t ) = E Z ∂ t Y t dG t = E Z C ( x 1 , x 1 ) − C ( x 1 , x 2 ) dG ⊗ 2 where C ( x, y ) = E ∂ t Y t ( x ) Y t ( y ) = 1 2 E r t ( x ) r t ( y ) ≤ 1 2 V ar ( r ) . Consequen tly | ∂ t ψ ( t ) | ≤ V ar ( r ) . F rom which it follows that | ψ (1) − ψ (0) | ≤ V ar ( r ) . Ev aluating ψ at 0 and 1 yields the desired b ound.  22 6.2.1. R e duction to c ontinuous functionals. Our goal no w is to compute the limit of the diﬀerence of (6.9) and (6.10). W e b egin by showing that their diﬀerence can b e related to the diﬀerence of t w o contin uous functionals on an appropriate space of probability measures. T o this end, w e ﬁrst claim that, up on passing to a subsequence in N , we may choose a sequence ( r M , u M ) → ( ρ, q ) such that Σ N + M ( ρ N + M , q N + M ) ⊃ Σ N ( ρ N , q N ) × Σ M ( r M , u M ) where here w e hav e abused notation and view the cavit y co ordinates Σ M = Σ M 1 × Σ M 2 in this pro duct as b eing distributed b et ween the sp ecies in suc h a w ay that N 1 + M 1 is the size of the ﬁrst sp ecies. T o see that suc h a sequence exists, observe that if we deﬁne r M ( N ) and u M ( N ) b y M r M ( N ) = ( N + M ) ρ N + M − N ρ N M u M ( N ) = ( N + M ) q N + M − N q N , then b y the choice of the sequence ( ρ N , q N ) (see (6.8)), these are b oth integers b ounded b y M . As suc h, we ma y pass to a subsequence along which the pair conv erges. In particular, even tually along this subsequence in N , ( M r M , M u M ) will b e constant. The desired prop erties of ( r M , u M ) follows immediately by deﬁnition of ( ρ N , q N ). Using this sequence, we may low er b ound (6.9) b y E log X σ ∈ Σ N + M ( ρ N + M , q N + M ) exp( β H N + M ( σ )) ≥ E log X σ ∈ Σ N ( ρ N ,q N ) exp( H 0 ( σ )) · X  ∈ Σ M ( r M ,u M ) Y i ∈ J exp( z N ,i ( σ )  i ) + o (1) . (6.11) F urthermore, since E y N ( σ 1 ) y N ( σ 2 ) = 2 β 2 N ( N + M )  σ 1 · σ 2  2 = 2 β 2 N N + M R 2 12 = 2 β 2 R 2 12 + o (1) , if we let ( y 0 ( σ )) denote the Gaussian pro cess with cov ariance E y 0 ( σ 1 ) y 0 ( σ 2 ) = 2 β 2 R 2 12 , then we ma y apply Lemma 6.2 to express (6.10) as E log Z N (Σ N ( ρ N , q N )) = E log X σ ∈ Σ N ( ρ N ,q N ) exp( H 0 N ( σ ) + √ M y 0 ( σ )) + o (1) . (6.12) Deﬁne G 0 to b e the Gibbs distribution corresp onding to H 0 on the subset Σ N ( ρ N , q N ), G 0 ( { σ } ) ∝ exp( H 0 ( σ )) 1 σ ∈ Σ N ( ρ N ,q N ) , where w e k eep the dep endence on N implicit. Let h·i G 0 denote expectation with resp ect to this measure. Subtracting (6.12) from (6.11), w e ma y lo w er b ound the diﬀerence of (6.9) and (6.10) b y E log * X  ∈ Σ M ( r M ,u M ) exp( X i z N ,i ( σ )  i ) + G 0 − E log D exp( √ M y N ( σ )) E G 0 , up to a o (1) correction. W e now pro vide alternative representations for these t w o terms. T o this end, let R b e the space of inﬁnite Gram arrays with entries b ounded by 1. Let M exch b e the subspace of probability measures on R that are we akly exchange able , i.e., if R is dra wn from some R ∈ M exch , then ( R `` 0 ) ( d ) = ( R π ( ` ) π ( ` 0 ) ) , 23 for any p erm utation π : N → N which p erm utes ﬁnitely man y num b ers. By standard argumen ts, one can sho w that there are weak-* contin uous functionals F i,M : M exch → R such that if we let R N denote the law of the Gram arra y formed by the ov erlaps of ( σ ` ) ∼ ( G 0 ) ⊗∞ —called the overlap distribution corresp onding to G 0 —then F 1 ,M ( R N ) = 1 M E log * X  ∈ Σ M ( r M ,u M ) exp( X i z N ,i ( σ )  i ) + G 0 F 2 ,M ( R N ) = 1 M E log D exp( √ M y N ( σ )) E G 0 , This follows from the following general contin uity theorem for functionals of this form. Let S ⊂ Σ M ( ρ ), and let ( w α ) α ∈ A b e the weigh ts of a (p ossibly random) probabilit y measure on a countable set A . Denote the measure b y Γ( { α } ) = w α . Let R A = ( R α 1 ,α 2 ) α 1 ,α 2 ∈ A b e a doubly inﬁnite, p ositiv e semideﬁnite matrix. W e think of the matrix R A as the collection of allow ed v alues for an o verlap and call it an abstr act overlap structur e (deﬁned b y w α ). Consider the functionals f 1 ,M = 1 M E log " X α ∈ A w α X σ ∈ S exp M X i =1 Z i ( α ) σ i !# (6.13) f 2 ,M = 1 M E log " X α ∈ A w α exp( √ M Y ( α )) # . (6.14) Here Z i ( α ) are iid copies of a centered Gaussian pro cess Z ( α ) with co v ariance C ov ( Z ( α 1 ) , Z ( α 2 )) = C Z ( R α 1 α 2 ) , for some con tinuous function C Z . Similarly Y ( α ) is a cen tered Gaussian pro cess with cov ariance C ov ( Y ( α 1 ) , Y ( α 2 )) = C Y ( R α 1 α 2 ) , for some con tinuous function C Y . Let ( α (  )) ` ≥ 1 b e iid draws from Γ, and for n ≥ 1 consider the random matrix R n =  R α ( ` ) ,α ( ` 0 )  `,` 0 ∈ [ n ] . W e then hav e the follo wing result from [72, Lemma 8]. Theorem. F or any  > 0 ther e ar e c ontinuous b ounde d functions g Z  , g Y  : R n 2 → R such that | f 1 ,M − E g Z  ( R n ) | ≤  | f 2 ,M − E g Y  ( R n ) | ≤ . F urthermor e, these functions dep end only on M , S, C Z , C Y , ρ, and  . Before con tin uing we note here that since the diagonal terms of R n are alw a ys ρ , this functional only dep ends on the diagonal through ρ . Let us now show that we can slightly mo dify R N in suc h a wa y that w e can compute these limits explicitly . 6.2.2. Perturb ation of overlap distribution. Before computing this limit, w e b egin b y observing that w e may assume, up to a p erturbation, that these measures satisfy what is called the Ghirlanda- Guerr a identities whic h are deﬁned as follo ws. Let R = ( R `` 0 ) `,` 0 ≥ 1 satisfy R ∼ R for some R ∈ M exch . W e call such a matrix a Gr am-de Finetti arr ay . 24 Deﬁnition 6.3. Let R = ( R `,` 0 ) b e a Gram-de Finnetti arra y with | R ij | ≤ 1. W e sa y that the la w of ( R `,` 0 ) satisﬁes the Ghirlanda-Guerr a identitites if for every n ≥ 1, f ∈ L ∞ ( R n 2 ) and p ≥ 1, E f ( R n ) · R p 1 ,n +1 = 1 n " E f ( R n ) · E R p 12 + N X ` =2 E f ( R n ) · R p 1 ` # . W e begin b y observing that w e may p erturb the Hamiltonian H so that it satisﬁes the Ghirlanda- Guerra identities. Lemma 6.4. L et h N ( σ ) = X x p 2 p g p ( σ ) wher e ( x p ) ∈ [1 , 2] N , and g p ( σ ) = 1 N p 2 X g i 1 ...i p σ i 1 · · · σ i p wher e ( g i 1 ...,i p ) ar e i.i.d. and indep endent of W. L et s N → ∞ with N 1 / 4  s N  N 1 / 2 . Ther e is a se quenc e of choic es of p ar ameters ( x N p ) such that lim | 1 N E log X exp( β H N ( σ ) + s N β h N ( σ )) − 1 N E log X exp( β H ( σ )) | = 0 (6.15) and such that if R denotes the limiting law of the Gr am arr ay forme d by overlaps of i.i.d. samples fr om the Gibbs me asur e π pert ( { σ } ) ∝ exp( − β H N ( σ ) − β h ( σ )) , then R satisﬁes the Ghirlanda-Guerr a identities. Pr o of Sketch . Results of this type are standard in the spin glass literature. F or a textb o ok pre- sen tation see [68, Chap. 3]. W e only sketc h the k ey p oin ts here and how they diﬀer from [68]. F or any sequence of c hoices of parameters, b y the condition on s N , (6.15) holds b y an application of Jensen’s inequality . One can then show that if we choose the parameters ( x p ) to b e drawn i.i.d. from the uniform measure on [1 , 2], then E x E h| ( g p − E h g p i ) |i → 0 . Consequen tly for any choice of n ≥ 1 and f ∈ L ∞ ( R n ) , we obtain, conditionally on x , E h f ( R n ) g p i = E h f ( R n ) i · E h g p i + o (1) . Applying Gaussian in tegration, w e obtain E  f ( R n )( R p 11 + R p 12 + · · · + R p 1 n − ( n + 1) R p 1 n +1  = E h f i E h R p 11 − R p 12 i , whic h yields, up on re-arrangement 1 n ( E h f R p 11 i − E h f i E h R p 11 i ) + 1 n E h f i E h R p 12 i + n X i =2 E h f · R p 1 i i ! = E  f R p 1 n +1  + o (1) . The main issue in settings where the Gibbs measure is not on the discrete h yp ercube {− 1 , 1 } N are the terms with the self-o verlap, R 11 . This, how ev er, is not an issue in our setting as R 11 is constan t, so that the ﬁrst term in the ab o ve v anishes iden tically for all N . The remaining argumen t is then unc hanged from [68, Sec 3.2]. In particular, by this v anishing, w e obtain that the error b et w een the left and right hand sides of the Ghirlanda-Guerra iden tity for an y p b y this argument. The existence of a suitable sequence then follows by the probabilistic metho d as in [68, Lemma 3.3].  25 As a consequence of this, it suﬃces to ev aluate the limits of F i,M on the o v erlap distribution, R 0 N , corresp onding to the p erturb ed Gibbs distribution G 0 pert,N ( { σ } ) ∝ exp( β H N ( σ ) + β h N ( σ )), where h N is obtained form the preceding lemma (and we still restrict to Σ N ( ρ N , q N )). As R is compact, w e see that any weak-* limit p oin t of this sequence satisﬁes the Ghirlanda-Guerra iden tities. Let us now brieﬂy recall the structure of the subspace of M exch whic h satisfy these iden tities. T o this end, w e ﬁrst recall the construction of ov erlap distributions corresp onding to Ruelle probabilit y cascades. Fix r ≥ 1 and a pair of sequences 0 = Q 0 < . . . < Q r = ρ 0 = µ − 1 < µ 0 < . . . < µ r = 1 . Let ( v α ) α ∈ N r denote the weigh ts of a Ruelle probabilit y cascade corresp onding to the parameters ( µ ` ) r ` = − 1 . Corresp onding to this sequence, we may deﬁne the (random) probability measure on the ball of  2 of radius ρ given by π = X v α δ s α s α = X γ ∈ p ( α ) q Q | γ | − Q | γ |− 1 e γ , (6.16) where ( e α ) α ∈A r \{∅} . Let ( σ ` ) ∼ π ⊗∞ and let R denote the law of their corresp onding Gram matrix. If we let µ ∈ M 1 ([0 , ρ ]) b e deﬁned by µ ( { Q ` } ) = x ` , then by Lemma A.1 µ is the law of the ﬁrst oﬀ-diagonal en try of the gram array . In particular, µ is a suﬃcient statistic for the family of such R and w e will denote this law b y R ( µ ). Let M f ⊆ M 1 ([0 , ρ ]) b e those measures of ﬁnite supp ort. As a consequence of P anc henko’s Ultrametricit y theorem [67] and the Baﬃoni-Rossati theorem [11] and [80, Theorem 15.3.6], the space of Gram-de Finnetti arrays is giv en b y the closure of the set G = {R ( µ ) : µ ∈ M f } in the w eak-* top ology . W e denote a law of this t yp e by R ( µ ). In particular, if ( R n ) ⊂ G is a sequence of la ws in this closure, then the law of the oﬀ-diagonals of R n con v erge to the la w of the oﬀ diagonals of R if and only if µ n ( · ) = R n ( R 12 ∈ · ) ha v e µ n → µ . F or a precise statement of these t w o results, see [68, Theorem 2.13, Theorem 2.17]. (As w e will alwa ys b e using this result in the case that the diagonals are constant, this conv ergence is will b e eﬀectively for the en tire arra y for our purp oses.) Th us it suﬃces to compute these limiting functionals on Ruelle cascades and then, take their limits (pro vided they exist). T o carry out this program, it will b e important to note that, restricted to the space of Ruelle probabilit y cascades, these functionals are in fact Lipsc hitz. 6.2.3. Lipschitz Continuity. Let us restrict our atten tion to functionals of the form F i,M on the space of Ruelle probabilit y cascades. If we let R denote the la w of the o v erlap arra y corresp onding to a Ruelle cascade as ab o ve, then by construction F 1 ,M ( R ) = 1 M E log X v α X σ ∈ Σ M ( ρ,q ) exp( X i Z i ( α ) σ j ) F 2 ,M ( R ) = 1 M E log X v α exp( √ M Y ( α )) . Let us no w understand these functionals in more detail. T o this end, let R N r = ( s α · s α 0 ) α,α 0 ∈ N r denote the collection of ov erlaps deﬁned by s α from (6.16), we may deﬁne the functional f 1 ,M ( µ ; S ) to b e the functional as in (6.13) with weigh ts given by the v α and abstract ov erlap structure R N r for some choice S ⊂ Σ M ( ρ ). Deﬁne f 2 ,M ( µ ; S ) similarly with (6.14). In this case, if we let R denote 26 suc h an o verlap distribution and let µ denote the law of the ﬁrst oﬀ diagonal, observe that w e hav e f i,M ( µ ; S ) = F i,M ( R ) i = 1 , 2 . In particular, this functional dep ends on R only through µ . Let us study the regularity of the map µ 7→ f 1 ,M ( µ ; S ). T o this end, equip M 1 ([0 , ρ ]) with the the Kan toro vic h metric, d ( µ, ν ) = Z 1 0 | µ − 1 ( s ) − ν − 1 ( s ) | ds = Z ρ 0 | µ ([0 , s ]) − ν ([0 , s ]) | , where for a probability measure µ, w e let µ − 1 denote its quantile function. Recall that d metrizes the weak-* top ology on M 1 ([0 , ρ ]) . W e then hav e the follo wing. Lemma 6.5. The fol lowing holds for any M ≥ 1 . (1) F or any S ⊂ Σ M ( ρ ) , the map µ 7→ f 1 ,M ( µ ; S ) is Lipschitz (uniformly in M , ρ ). (2) The map µ 7→ f 2 ,M ( µ ) is Lipschitz (uniformly in M , ρ ). In p articular, these functionals ar e wel l-deﬁne d and Lipschitz (uniformly in M , ρ ) on al l of M 1 ([0 , ρ ]) . Pr o of. W e b egin with the ﬁrst claim. T o this end, ﬁx tw o measures µ, ˜ µ of ﬁnite supp ort. Consider their quan tile functions µ − 1 and ˜ µ − 1 . Observ e that w e ma y view these as monotone paths from 0 to ρ indexed by [0 , 1]. In particular, we ma y parameterize these paths (and thus the measures) as follo ws. Since these paths ha ve ﬁnitely many jumps, there is some r ≥ 1 and some sequence 0 = x − 1 < x 0 < ... < x r = 1 suc h the jumps in either path are given by an increasing subset of the times ( x ` ) . F urthermore, we ma y parameterize the supp orts of µ and ˜ µ b y tw o ﬁnite sequences 0 = Q 0 ≤ ... ≤ Q r = ρ 0 = ˜ Q 0 ≤ · · · ≤ ˜ Q r = ρ, (here we allo w rep etition) that satisfy Q k = µ − 1 ( x k ) and ˜ Q k = ˜ µ − 1 ( x k ). Let ( v α ) denote the Ruelle probability cascade corresp onding to ( x r ) , H t ( , α ) = √ t X Z i ( α )  i + √ 1 − t X ˜ Z i ( α )  i , and Φ( t ) = 1 M E log X N r v α X σ ∈ S e H t ( σ,α ) . Since the measure induced by x and the sequence Q is µ and like wise for x, ˜ Q , and ˜ µ , we ha ve that Φ(0) = f 1 ,M ( ˜ µ ; S ) Φ(1) = f 1 ,M ( µ ; S ) . Diﬀeren tiating in time w e ﬁnd that Φ 0 ( t ) = 1 M E h ∂ t H t ( σ, α ) i t , where h·i t denotes integration with resp ect to the measure G t ( , α ) ∝ v α exp( H t ( , α )) . Observe that 1 M E ∂ t H t ( , α ) H t (  0 , α 0 ) = 2 β 2  Q | α ∧ α 0 | − ˜ Q | α ∧ α 0 |  R ( ,  0 ) . Since | R ( ,  0 ) | ≤ ρ , and Q r = ˜ Q r = 1, we see by Gaussian in tegration-b y-parts (6.3) that | Φ 0 ( t ) | ≤ 2 β 2 ρ E D | Q α 1 ∧ α 2 − ˜ Q α 1 ∧ α 2 | E t , where α 1 , α 2 are (the second terms of ) t wo indep enden t draws from G t . 27 No w since the law of H t ( , α ) is inv ariant for 0 ≤ t ≤ 1 w e may apply the Bolthausen-Sznitman in v ariance, sp eciﬁcally Theorem A.2, to ﬁnd that the marginal of G t on N r is equal to v α for all t . Consequen tly E D | Q α 1 ∧ α 2 − ˜ Q α 1 ∧ α 2 | E = E X α,α 0 v α v α 0 | Q | α ∧ α 0 | − ˜ Q | α ∧ α 0 | | = X 1 ≤ k ≤ r | Q k − ˜ Q k | E X | α ∧ α 0 | = k v α v α 0 = X 1 ≤ k ≤ r | Q k − ˜ Q k | ( x k − x k − 1 ) = Z 1 0 | µ − 1 ( x ) − ˜ µ − 1 ( x ) | dx where the second to last inequalit y follo ws b y Lemma A.1. Combining this with the preceding yields | f 1 ,M ( µ ; S ) − f 1 ,M ( ˜ µ, S ) | ≤ 2 β 2 ρd ( µ, ˜ µ ) , whic h yields the desired since ρ ≤ 1. W e no w turn to the second claim. This follows by direct calculation. Here we may apply Theorem A.2 and Corollary A.3, to ﬁnd that f 2 ,M ( µ ) = 4 β 2 Z ρ 0 t 2 µ ([0 , t ]) dt, from which it follows that | f 2 ,M ( µ ) − f 2 ,M ( ˜ µ ) | ≤ 4 β 2 ρ Z ρ 0 | µ ([0 , t ]) − ˜ µ ([0 , t ]) | dt = 4 β 2 ρd ( µ, ˜ µ ) . whic h yields the desired as ρ ≤ 1.  6.2.4. Computing limits. Let µ denote the limiting la w of R 12 with resp ect to G 0 pert , and let µ r denote a sequence of discretization of µ with ﬁnitely many atoms with µ r → µ weak-* and µ r ( { ρ } ) > 0. Since since ρ is alw ays charged for this sequence, the ov erlap distirbution R ( µ r ) has diagonal equal to ρ . Thus if R is the limiting ov erlap distribution corresp onding to G 0 pert , then, as explained at the end of Section 6.2.2, the la ws R ( µ r ) → R w eak-* since µ determines the oﬀ-diagonal of R and the diagonals are constant an equal to ρ . W e ﬁrst observe that F 2 ma y b e handled as previously: w e hav e that lim N →∞ F 2 ,M ( R 0 N ) = lim r →∞ 1 M E log X v α e √ M Y ( α ) = lim r →∞ 1 2 Z ρ 0 4 β 2 sµ r ([0 , s ]) ds = 2 β 2 Z ρ 0 sµ ([0 , s ]) ds, where the ﬁrst equalit y follo ws b y contin uit y of F 2 ,M , the second by Corollary A.3, and the ﬁnal follo ws b y w eak-* contin uit y of b ounded linear functionals. It remains to consider the ﬁrst term, F 1 ,M . Let A M = lim N →∞ 1 M E log * X  ∈ Σ M ( r M ,u M ) exp( X i z N ,i ( σ )  i ) + G 0 pert . Then by Lemma 6.5, it follo ws that A M ≥ 1 M E log X α ∈ N r v α X  ∈ Σ M ( r M ,u M ) exp X i Z i ( α )  i ! − K d ( µ r , µ ) , where ( v α ) are the weigh ts of the Ruelle probability cascade with parameters ( µ r ), and K does not depend on M . Recall from the pro of of Theorem 6.1 that we were able to pro duce an upp er b ound for this term where Λ i pla y the role of Lagrange multipliers for the constraints deﬁning Σ M ( r M , u M ). One can also pro duce a matching low er b ound by minimizing ov er the choice of Lagrange multipliers whose pro of is deferred to the next section. 28 Theorem 6.6. L et µ ∈ M 1 ([0 , ρ ]) have ﬁnite supp ort. L et ( ρ M , q M ) → ( ρ, q ) admissible. We have that lim M →∞ f 1 ,M ( µ, Σ M ( ρ M , q M )) = inf Λ 1 , Λ 2 ρϕ 1 µ (0 , 0) + (1 − ρ ) ϕ 2 µ (0 , 0) − Λ 1 q − Λ 2 ( ρ − q ) . In particular, w e ha v e that, for eac h r ≥ 1, lim M →∞ F 1 ,M ( µ r ) = lim M →∞ f 1 ( µ r , Σ M ( ρ M , q M )) = inf Λ 1 , Λ 2 ρϕ 1 µ r (0 , 0) + (1 − ρ ) ϕ 2 µ r (0 , 0) − Λ 1 q − Λ 2 ( ρ − q ) . As the functionals µ 7→ f 1 ( µ, Σ M ( ρ M , q M )) are uniformly Lipschitz by Lemma 6.5, so is their limit. In particular w e see that lim M →∞ A M ≥ inf Λ 1 , Λ 2 ρϕ 1 µ (0 , 0) + (1 − ρ ) ϕ 2 µ (0 , 0) − Λ 1 q − Λ 2 ( ρ − q ) − 2 K d ( µ r , µ ) , for some K . If w e then send r → ∞ , we see that we hav e lim N →∞ ˜ F N ≥ inf Λ 1 , Λ 2 P β ( µ, Λ 1 , Λ 2 ) . (6.17) this yields the desired low er b ound. Pro of of Proposition 4.1 . Up on combining (6.2) with (6.17) we obtain (6.1). Recalling the equalit y P β = β P β and dividing by β yields the result, except with an inﬁm um. W e then rec all Lemma 5.1 to conclude that this inﬁmum is actually ac hieved.  6.3. Remo ving constrain ts via Lagrange multipliers. W e now turn to the pro of of The- orem 6.6. The lemmas men tioned in this pro of will b e prov ed in the follo wing section. Let S = { ( x, y ) ∈ [0 , 1] 2 : y ≤ x } . Pro of of Theorem 6.6 . Observe that as in (6.5) we were able to obtain the corresp onding upp er b ound. The question is then to show that the inﬁm um is ac hieved. W e follow a strategy similar to [72, Sec. 3]. In the follo wing, it will b e useful to recall that the quantities w e wish to compute are in v ariant under p erm utations of the en tries of the vector v . F or a p oin t x = ( ρ, q ) ∈ S and  > 0, consider the set Σ  M ( x ) = n σ ∈ Σ M : 1 M M X i =1 σ i ∈ [ ρ − , ρ +  ] , 1 M d M ρ e X i =1 σ i ∈ [ q − , q +  ] o . W e b egin by observing that the error caused by this epsilon dilation is negligible. Let S M ⊆ S b e those x ∈ S such that Σ M ( x ) = Σ M ( x 1 , x 2 ) is non-empt y . Lemma 6.7. Ther e is an C > 0 such that for  > δ > 0 suﬃciently smal l and any M ≥ 1 we have sup x ∈S M | f 1 ,M ( µ, Σ  M ( x )) − f 1 ( µ, Σ M ( x )) | ≤ C √  sup x ∈S | f 1 ,M ( µ, Σ  M ( x )) − f 1 ( µ, Σ δ M ( x )) | ≤ C √  − δ Next, we begin by observing that on such sets, the limit of these functionals exist and is concav e. Lemma 6.8. F or e ach  > 0 , the limit f 1 ( µ, , z ) = lim M →∞ f 1 ,M ( µ, Σ  M ( z )) exists and is c onc ave. 29 Com bining these claims w e ﬁnd that f 1 ( µ ; ρ, q ) = lim M →∞ f 1 ,M ( µ, Σ M ( ρ M , q M )) = lim  → 0 f 1 ( µ ; ρ, q ,  ) . (6.18) exists and is concav e in ( ρ, q ). F urthermore, applying the ﬁrst claim again, we ﬁnd that is 1 / 2- H¨ older. In particular, it is uniformly contin uous on S . It remains for us to relate f 1 ( µ ; ρ, q ) to Parisi type functionals. T o this end, we observe the follo wing. F or any Λ = (Λ 1 , Λ 2 ) ∈ R 2 , let F ( Λ ) = ρϕ 1 µ (0 , 0; Λ 1 ) + (1 − ρ ) ϕ 2 µ (0 , 0; Λ 2 ) − Λ 1 q − Λ 2 ( ρ − q ) (here we ha ve made the dep endence of ϕ i on Λ i explicit for clarit y). W e then ha v e the following. Lemma 6.9. F or any (Λ 1 , Λ 2 ) , we have that F ( Λ ) = max x ∈S { f 1 ( µ, x ) + Λ 1 q + Λ 2 ( ρ − q ) } (6.19) T o complete the result, w e recall that f 1 ( µ, x ) is conca ve and con tin uous in x . Th us, we ma y tak e its Legendre transform and apply (6.19) to ﬁnd that f 1 ( µ, x ) = inf Λ { F ( Λ ) − Λ 1 q − Λ 2 ( ρ − q ) } as desired.  6.4. Pro of of lemmas used Theorem 6.6. Pro of of Lemma 6.7 . W e pro v e the ﬁrst bound. The second follows b y the same argumen t upon noting that Σ δ M ( ρ, q ) ⊆ Σ  M ( ρ, q ). Let π : Σ  M ( x ) → Σ M ( x ) maps a p oin t  to the closest p oint in Σ M with resp ect to the Hamming distance. Note that in the  2 distance w e ha ve d ( x , π ( x )) ≤ C N . (Throughout this pro of C will b e a constant that may change from line to line.) let Z ( α ) = ( Z i ( α )), and let ˜ Z ( α ) b e an indep enden t cop y of Z . Let Z t ( α, σ ) = √ t Z ( α ) ·  + √ 1 − t ˜ Z ( α ) · π ( σ ) and consider φ ( t ) = 1 M E log X v α X σ ∈ Σ  M ( x ) e Z t ( α,σ ) , Since 1 M E ∂ t Z t ( α, σ ) Z t ( α 0 , σ 0 ) = 2 β 2 ( Q | α ∧ α 0 | ( R ( σ, σ 0 ) − R ( π ( σ ) , π ( σ 0 ))) so that | φ 0 | ≤ C  . φ (0) = 1 M E log X v α X σ ∈ Σ M ( x ) car d ( π − 1 ( σ )) e Z ( α ) · σ . No w by a standard counting argumen t, 1 M log max car d ( π − 1 ( σ )) ≤ log 2 − J (1 − C  ) ≤ C √  where J ( x ) = 1 2 (1 + x ) log(1 + x ) + 1 2 (1 − x ) log(1 − x ) , where the last inequalit y holds for  suﬃcien tly small. Thus f 1 ,M ( µ, Σ M ( x )) ≤ φ (0) ≤ f 1 .M ( µ, Σ M ( x )) + C √ . On the other hand, b y the preceding, | φ (1) − φ (0) | ≤ C  . Com bining these inequalities and recalling the deﬁnition of φ (1) then yields the claim.  30 Pro of of Lemma 6.8 . T o see this, note that for any M 1 , M 2 ≥ 1, let M = M 1 + M 2 and θ = M 1 M . F or t w o p oin ts x , y ∈ S let z = θ x + (1 − θ ) y . Let i : Σ  M 1 ( x ) × Σ  M 2 ( y )  → Σ M b e the map that tak es a p oin t ( u, v ) to the p oin t π ( u, v ) whose ﬁrst d M 1 x 1 e co ordinates are those of u and next d M 2 x 2 e p oin ts are those of v , and whose remaining co ordinates are given b y those of u and v in that order. Eviden tly i is an injection. In fact its image is contained in Σ  +1 / M M ( z ) (see Lemma ... b elo w) Therefore, applying the second inequality from Lemma 6.7, w e see that f 1 ,M ( µ, Σ  M ( z )) ≥ λf 1 ,M 1 ( µ, Σ  M ( x )) + (1 − λ ) f 1 ,M 2 ( µ, Σ  M ( y )) − p 1 / M . T aking x = y , w e see that the sequence a M = M f 1 ,M ( µ, Σ  M ( z )) satisﬁes a m 1 + m 2 + φ ( m 1 + m 2 ) ≥ a m 1 + a m 2 , for φ ( t ) = √ t . As φ is increasing and satisﬁes R ∞ 1 φ ( t ) t − 2 dt < ∞ , w e may apply the de Brujin-Erdos sup eradditivit y lemma (see b elo w) to ﬁnd that the desired limit exists. F urthermore, by taking the limits in the case x 6 = y , we see that the limit is concav e.  W e hav e used here the follo wing classical result of de Brujin-Erd¨ os [33, Theorem 23] (or see [77, Theorem 1.9.2]). Theorem. L et ( a n ) b e a r e al se quenc e and φ a non-de cr e asing function with R ∞ 1 φ ( t ) t − 2 dt < ∞ . If ( a n ) satisﬁes the sup er additivity criterion a n + m + φ ( n + m ) ≥ a n + a m 1 2 n ≤ m ≤ 2 n, then lim n a n /n c onver ges to sup n a n /n. Pro of of Lemma 6.9 . This will follo w by calculations in v olving the Ruelle Probability Cascades along with a standard cov ering argumen t. Consider the functional F ( Λ , S ) = 1 M E log X α v α X σ ∈ S exp X i Z i ( α ) σ i + Λ 1 M 1 R (1) 11 + Λ 2 M 2 R (2) 11 ! . Then F ( Λ , Σ N ) = F ( Λ ) and F (Λ , Σ M ( ρ, q )) = f 1 ,M ( µ, Σ N ( ρ, q )) + Λ 1 q + Λ 2 ( ρ − q ) , where the ﬁrst equality again follo ws by (6.6). Since f 1 from (6.18) is uniformly contin uous on S , thus suﬃces to show that F ( Λ ) − max x ∈S M F ( Λ , Σ M ( x )) → 0 . T o this end, we b egin b y presen t a formula for F ( Λ ; S ). Its pro of follows from an abstract v ersion of Theorem A.2. Let ( z p ) p ≤ r denote a collection of indep enden t centered Gaussian random v ariables with E z 2 p = 4 β 2 ( Q p − Q p − 1 ) , and let ( z p,i ) p ≤ r,i ∈ [ N ] b e i.i.d. copies of this pro cess. Consider X r ( Λ , S ) = log X σ ∈ S exp  X z p,i σ i + Λ N 1 R (1) 11 + Λ N 2 R (2) 11  . Deﬁne now the sequence of random v ariables ( X ` ) r ` =0 b y X ` = 1 µ ` E p exp( µ ` X ` +1 ) , 31 where E p is exp ectation in z p and µ ` = µ ([0 , Q ` ]). As a consequence of [68, Theorem 2.9], we ha v e that F ( Λ , S ) = 1 M X 0 . Let us no w pro v e the desired limit. By construction, exp( X r (Λ , Σ N )) = X x ∈S N exp( X r (Λ , Σ M ( x )) . Since µ ` ≤ 1 and µ ` µ ` +1 ≤ 1, and ( P a i ) s ≤ P a s i for s ≤ 1 and a i ≥ 0, w e hav e b y an inductiv e argumen t that exp( µ p X p ) = E p exp( µ p X p +1 ) ≤ X x ∈S M exp( µ p X p ) . Applying this in the case p = 0, w e see that F ( Λ ) = 1 M X 0 ( Λ , Σ M ) ≤ 1 M µ 0 log X x ∈S M exp µ 0 X 0 ( Λ , Σ M ( x )) ≤ 1 M µ 0 log car d ( S M ) + max x ∈S M F ( Λ , Σ M ( x )) On the other hand by (6.2) w e hav e that F ( Λ ) ≥ max x ∈S M F ( Λ , Σ M ( x )) . Since card ( S M ) is of at most p olynomial growth in M , the result follo ws b y the squeeze theorem.  7. St a tistical feasibility of Estima tion: proof of Theorem 1.2 W e prov e Theorem 1.2 in tw o parts. Recall that the second part was already pro ved in Section 2. It remains to show the ﬁrst part of Theorem 1.2, regarding the imp ossibilit y of appro ximate supp ort reco v ery . In the follo wing, for a, b ∈ (0 , 1), let h ( a ) = − a log a − (1 − a ) log(1 − a ) Pro of of part 1 of Theorem 1.2 . W e will establish the following: for an y 0 < c < 1, if λ < c 3 / 2 2 q 1 ρ log( 1 ρ ) then lim N 1 N sup ˆ v ∈ Σ N ( ρ ) inf v ∈ Σ N ( ρ ) E ( ˆ v , v ) ≤ c. Observ e that inf v ∈ Σ N ( ρ ) E ( ˆ v , v ) ≤ E v ∼ U (Σ N ( ρ )) ( ˆ v , v ), where E v ∼ U (Σ N ( ρ )) [ · ] denotes the expectation in A and v , where now v ∼ U (Σ N ( ρ )) instead of b eing ﬁxed. It thus suﬃces to upp er b ound lim N 1 N sup ˆ v ∈ Σ N ( ρ ) E v ∼ U (Σ N ( ρ ) ( ˆ v , v ) ≤ c. (7.1) W e do so by an information theoretic argumen t. F or t w o random v ariables, X , Y , let H ( X ) denote the Shannon en tropy , H ( X | Y ) denote the conditional en trop y , and let I ( X ; Y ) = H ( X ) − H ( X | Y ) , denote their m utual information. T o b egin, observe that v and ˆ v are conditionally indep enden t given A , and thus by the data pro- cessing inequalit y [32], I ( v ; ˆ v ) ≤ I ( v ; A ). Similarly , note that A and v are conditionally indep enden t giv en λ N v v T , and in fact v is uniquely determined given λ N v v T , and thus I ( v ; A ) ≤ I ( λ N v v T ; A ). Com bining, we hav e that I ( v ; ˆ v ) ≤ I ( λ N v v T ; A ). 32 No w, we note that given { λ N v i v j : i < j } , { A ij : i < j } is a pro duct distribution, and thus I  λ N v v T ; A  ≤ X i 0. Set f ( ˆ v ) = E v ∼ U (Σ N ( ρ )) [( v , ˆ v ) | ˆ v ]. By the Paley-Zygm und inequality , P v ∼ U (Σ N ( ρ )) h f ( ˆ v ) ≥ cN ρ 2 i ≥ 1 4 ( E v ∼ Σ N ( ρ ) [ f ( ˆ v )]) 2 E v ∼ Σ N ( ρ ) [( f ( ˆ v )) 2 ] ≥ c 2 4 whic h is b ounded a w ay from zero, uniformly in N and ρ . The inequalit y ab o v e follo ws from the observ ations that 0 ≤ f ( ˆ v ) ≤ N ρ , and E v ∼ U (Σ N ( ρ )) [ f ( ˆ v )] ≥ cN ρ . W e set E = { f ( ˆ v ) ≥ cN ρ 2 } to obtain H ( v | ˆ v ) = E v ∼ U (Σ N ( ρ )) h 1 E g ( ˆ v ) i + E v ∼ U (Σ N ( ρ )) h 1 E c g ( ˆ v ) i , where we use g ( x ) to denote the en trop y of the conditional distribution of v giv en ˆ v = x . On the ev en t E c , w e use the trivial bound g (ˆ v ) ≤ log  N N ρ  . On the ev en t E , we upp er b ound the conditional en trop y as follows. Fix x ∈ E . Using the sub-additivit y of conditional en trop y , we hav e, g ( x ) ≤ X i : x i =1 H ( v i | ˆ v = x ) + X i : x i =0 H ( v i | ˆ v = x ) ≤ N ρ log 2 + N (1 − ρ ) · 1 N (1 − ρ ) X i : x i =0 H ( v i | ˆ v = x ) ≤ N ρ log 2 + N (1 − ρ ) h  1 N (1 − ρ ) X i : x i =0 E v ∼ U (Σ N ( ρ )) [ v i | ˆ v = x ]  , where the last inequalit y follo ws by the concavit y of the Shannon entrop y h . Observ e that on the ev en t E , E v ∼ U (Σ N ( ρ )) [ P i : x i =0 v i | ˆ v = x ] ≤ N ρ (1 − c/ 2). F or ρ suﬃcien tly small ρ (1 − c/ 2) / (1 − ρ ) < 1 / 2, and th us maximizing the binary entrop y , we obtain the improv ed upp er b ound g ( x ) ≤ N ρ log 2 + N (1 − ρ ) h  ρ (1 − c/ 2) 1 − ρ  . Com bining, we hav e, H ( v | ˆ v ) ≤ P v ∼ U (Σ N ( ρ )) ( E ) h N ρ log 2 + N (1 − ρ ) h  ρ (1 − c/ 2) 1 − ρ i + P v ∼ U (Σ N ( ρ )) ( E c ) log  N N ρ  . Therefore, I ( v ; ˆ v ) ≥ log  N N ρ  − h P v ∼ U (Σ N ( ρ )) ( E )  N ρ log 2 + N (1 − ρ ) h  ρ (1 − c/ 2) 1 − ρ  + P v ∼ U (Σ N ( ρ )) ( E c ) log  N N ρ  i ≥ P v ∼ U (Σ N ( ρ )) ( E ) log  N N ρ  − N P v ∼ U (Σ N ( ρ )) ( E ) h ρ  1 − c 2  log 1 ρ + o  ρ log 1 ρ i ≥ c 3 16 N ρ log 1 ρ 33 for N suﬃciently large, and ρ suﬃcien tly small. Com bining this with (7.2), we obtain c 3 16 N ρ log 1 ρ ≤ N λ 2 ρ 2 4 = ⇒ λ 2 ≥ c 3 4 · 1 ρ log 1 ρ . This contradicts our c hoice of λ . Thus the upper b ound (7.1) holds whenever λ ≤ c 0 q 1 ρ log 1 ρ for some c 0 < c 3 / 2 / 2. This completes the pro of.  8. A Simple R ounding Scheme when λ > 1 ρ In this section, we establish that if the SNR is signiﬁcantly large, it is algorithmically easy to appro ximately reco v er the supp ort of the hidden principal sub-matrix. Speciﬁcally , we establish that for λ > 1 /ρ , there exists a simple sp ectral algorithm whic h approximately recov ers the hidden supp ort. T o this end, w e start with a simple lemma, which will b e critical in the subsequent analysis. Lemma 8.1. L et ( X , Y ) b e jointly distribute d r andom variables, with X ∈ { 0 , 1 } . F urther, assume P [ X = 1] = ρ , E [ Y 2 ] = ρ , and E [ X Y ] ≥ δ ρ for some universal c onstant δ > 0 . Then δ 2 4 ρ ≤ P h Y > δ 2 i ≤  1 + 4 δ 2  ρ. and P h X = 1 | Y > δ 2 i > δ 4 16 . Pr o of. First, note that E [ X Y ] ≥ δ ρ implies E [ Y | X = 1] ≥ δ . F urther, E [ Y 2 ] = ρ implies E [ Y 2 | X = 1] ≤ 1. By the P aley-Zygm und inequality , P h Y > δ 2 | X = 1 i ≥ P h Y > 1 2 E [ Y | X = 1] | X = 1 i ≥ 1 4 ( E [ Y | X = 1]) 2 E [ Y 2 | X = 1] ≥ δ 2 4 . This implies the low er b ound P h Y > δ 2 i ≥ ρ P h Y > δ 2 | X = 1 i ≥ ρ δ 2 4 . On the other hand, Chebyc hev inequalit y implies P h Y > δ 2 | X = 0 i ≤ 4 δ 2 E [ Y 2 | X = 0] ≤ 4 ρ (1 − ρ ) δ 2 . Th us, P h Y > δ 2 i = (1 − ρ ) P h Y > δ 2 | X = 0 i + ρ P h Y > δ 2 | X = 1 i ≤ ρ  1 + 4 δ 2  . Finally , using Chebyc hev’s inequality , P h Y > δ 2 i ≤ 4 ρ δ 2 . Bay es Theorem implies P h X = 1 | Y > δ 2 i = ρ P h Y > δ 2 | X = 1 i P h Y > δ 2 i ≥ δ 4 16 . This completes the pro of.  Armed with Lemma 8.1, we turn to the pro of of Lemma 1.6. Pr o of of L emma 1.6. Recall that A = λρ v √ N ρ ( v √ N ρ ) T + W , and th us for λρ > 1 + ε , the celebrated BBP phase transition [12, 19] implies that there exists a universal constant δ := δ ( ε ) > 0 such that w.h.p as N → ∞ 1 N ( v , ˆ v ) ≥ δ ρ. 34 In the subsequen t discussion, w e condition on this goo d even t. Consider the tw o dimensional measure µ N = 1 N P N i =1 δ ( v i , ˆ v i ) . Set ( X, Y ) ∼ µ N and note that ( X, Y ) satisfy the theses of Lemma 8.1. Lemma 8.1 implies that δ 2 4 ρN ≤ | ˜ S | ≤  1 + 4 δ 2  ρN . On the ev ent | ˜ S | < ρN , we ha v e, 1 N ( v , v ˆ S ) ≥ δ 2 4 ρ · δ 4 16 = δ 6 64 ρ. On the other hand, if | ˜ S | > N ρ , ( v , v ˆ S ) ∼ Hyp( | ˜ S | , | ˜ S ∩ { i : v i = 1 }| , N ρ ) . Th us we hav e, E [( v , v ˆ S )] = N ρ | ˜ S ∩ { i : v i = 1 }| | ˜ S | = N ρ P h X = 1 | Y > δ 2 i ≥ N ρ δ 4 16 . F urther, direct computation rev eals V ar[( v , v ˆ S )] = N ρ · pq · | ˜ S | − N ρ | ˜ S | − 1 , where w e denote p := | ˜ S ∩{ i : v i =1 }| | ˜ S | = 1 − q . Up on observing that pq ≤ 1 / 4, we hav e, V ar[( v , ˆ v )] ≤ N ρ/ 4. Thus by Chebyc hev inequality , P h ( v , v ˆ S ) < N ρ δ 4 32 i ≤ N ρ  N ρ δ 4 32  2 = O  1 N  . This implies that ( v , v ˆ S ) ≥ N ρ δ 4 32 whp ov er the sampling pro cess, and establishes that the con- structed estimator reco vers the supp ort approximately . This completes the pro of.  Appendix A. R uelle Pr obability Cascades F or the con v enience of the reader, w e brieﬂy review here basic prop erties of Ruelle probability cascades (RPCs) (sometimes called, Derrida Ruelle Probabilit y Cascades) used throughout this pap er. A.1. Construction and basic prop erties. Let us b egin by recalling the construction of RPCs and some basic prop erties. See, e.g., [68] or [47, Sec. 3.3]. Fix r ≥ 1 and let A r b e as in Section 6.1. W e lab el the v ertices of this tree as A r = N 0 ∪ N 1 ∪ ... ∪ N r , where a vertex at depth k has lab el α = ( α 1 , ..., α k ) whic h corresp onds to the ro ot-v ertex path, ∅ → α 1 → ( α 1 , α 2 ) → · · · → ( α 1 , . . . , α k ) . As ab o v e, we denote this path by p ( α ). Denote the depth of a v ertex b y | α | and let ∂ A r denote the lea ves of A r . F or r ≥ 1 and a ﬁxed sequence 0 = µ − 1 < µ 0 < . . . < µ r = 1, w e construct the corresp onding RPC as follows. Let m θ ( dx ) = θ x − θ − 1 dx . F or each non-leaf v ertex α ∈ A r \ ∂ A r , w e assign an indep enden t copy of the P oisson point process P P P ( m µ | α | ( dx )) arranged in decreasing order, where we assign each c hild of α the term in the p oin t pro cess of corresp onding rank. This yields a collection ( u α ) α ∈A r of random v ariables. Let w α = Q γ ∈ p ( α ) u γ and ﬁnally consider the normalized collection ( v α ) α ∈A r giv en by v α = w α P | β | = | α | w β . The Ruel le pr ob ability c asc ade with p ar ameters ( µ k ) r k = − 1 is the sto c hastic pro cess ( v α ) α ∈ ∂ A r . 35 It will also be helpful to note the following. Let µ ∈ M 1 ([0 , ρ ]) ha v e ﬁnite supp ort and consider the o v erlap distribution R ( µ ) deﬁned as in (6.16). W e note here the following elementary consequence of the deﬁnition. F or a pro of see, e.g., [68, Eq. 2.82]. Lemma A.1. L et µ ∈ M 1 ([0 , ρ ]) b e of ﬁnite supp ort and c onsider π as deﬁne d in (6.16) . Then E π ( R 12 = q ) = µ ( { q } ) . A.2. Calculating exp ectations and P arisi PDEs. Let us no w recall the follo wing w ell-kno wn result connecting Ruelle probabilit y cascades to Parisi-t ype PDEs. (Recall again that we may view N r = ∂ A r . ) Results of this type app ear in diﬀerent notations throughout the spin glass literature and are sometimes referred to as consequences of the Bolthausen-Sznitman inv ariance of RPCs. The following results are taken from [7]. Theorem A.2 (Theorem 6 from [7]) . Fix r ≥ 1 , T > 0 , and se quenc es 0 = q 0 < q 1 < . . . < q r = T 0 = µ − 1 < µ 0 < . . . < µ r = 1 . L et ψ ∈ C 1 ([0 , T ]) b e non-ne gative incr e asing and let ( g ψ ( α )) α ∈ N r denote the c enter e d Gaussian pr o c ess with c ovarianc e E g ψ ( α ) g ψ ( β ) = ψ ( q | α ∧ β | ) . Final ly, let ( v α ) α ∈ N r b e a Ruel le pr ob ability c asc ade with p ar ameters ( µ k ) . Then we have the fol lowing. (1) F or any smo oth f of at most line ar gr owth we have that E log X α ∈ N r v α exp[ f ( g ψ ( α ))] = φ µ (0 , 0) , wher e φ is the unique solution to ∂ t φ + ψ 0 2  ∆ φ + µ [0 , t ]( ∂ x φ ) 2  = 0 φ ( T , x ) = f ( x ) , and µ ∈ M 1 ([0 , T ]) is given by µ ( { q k } ) = µ k − µ k − 1 . (2) If ( g i ψ ) M i =1 ar e iid c opies of g ψ and ( f i ) ar e of at most line ar gr owth, then E log X α v α exp  X i f i ( g i ψ ( α ))  = X i E log X v α exp f i ( g ψ ( α )) . W e note here the following corollary which has appeared more or less verbatim in man y pap ers and follo ws by applying item 1 in the ab o v e with f ( x ) = x and the Cole-Hopf iteration. Corollary A.3 (Prop osition 7 from [7]) . We have that E log X α ∈ N r v α exp[ g ψ ( α ))] = Z T 0 ψ 0 ( s ) sµ ([0 , s ]) ds. The simplicit y of the form ula in this case follows from noting that the heat equation with initial data e x is exactly solv able. 36 Appendix B. Strict convexity T o prov e strict conv exit y of P , let us introduce the following notation. F or the sake of clarit y , w e mak e the dep endence of u i ν on Λ i explicit b y writing u ν, Λ i ( t, x ) = u i ν ( t, x ). F urthermore, as (2.1) is in v ariant under a spatial translation, we see that u ν, Λ ( t, x ) = u ν 0 , 0 ( t, x + Λ + ν ( { 1 } )) . where ν 0 = ν − ν ( { ρ } ) δ ρ . It will also b e helpful to recall the dynamic programming principle for u ν, Λ i ( t, x ) from [50, Lemma 3.5]. Lemma. F or any ( ν, λ ) ∈ A × R of the form dν = mdt + cδ ρ , we have that for any t < t 0 ≤ ρ , u ν,λ ( t, x ) = sup α ∈B t 0 E h u ν,λ ( t 0 , X α t 0 ) − Z t 0 t m ( s ) α 2 s ds i , wher e X α solves ( dX s = m ( s ) α 2 s ds + √ 2 dW s X t = x , W s is a standar d br ownian motion and B t 0 is the sp ac e of b ounde d sto chastic pr o c esses that ar e pr o gr essively me asur able with r esp e ct to the ﬁltr ation of ( W s ) s ≤ t 0 . F urthermor e, any optimal c ontr ol α ∗ s satisﬁes m ( s ) α ∗ s = m ( s ) ∂ x u ν,λ ( s, X s ) a.s. The pro of will b egin with the follo wing observ ation. Lemma B.1. F or any ν ∈ A 0 , and any t ∈ [0 , ρ ) , u ν, 0 ( t, x ) is strictly c onvex in x . Pr o of. Fix x 0 , x 1 ∈ R distinct, θ ∈ (0 , 1), and let x θ = θ x 0 + (1 − θ ) x 0 . Let ( X θ s ) denote the optimal tra jectory corresp onding to u ν, 0 ( t, x θ ), and similarly let α θ s = ∂ x u ν, 0 ( s, X θ s ) denote a corresp onding optimal control. Observ e that if w e let G s = R s 0 √ 2 dW s 1 + R s 0 2 m  α θ s 1  2 ds 1 , then X θ s = G s + x θ . W e ﬁrst claim that the la w of G ρ c harges an y in terv al ( a, b ) ⊆ R . As it is p ossible that R ρ 0 m 2 ( s ) ds = ∞ , Novik o v’s condition does not apply , so we cannot apply Girsano v’s theorem directly to G ρ . W e circum v en t this by a lo calization argument as follo ws. Fix 0 < s < ρ . Since 0 ≤ sup | α θ s | ≤ 1 b y (2.8), we hav e that b t = 2 m ( t )( α θ t ) 2 has sup 0 ≤ t ≤ s | b t | < C ( s ) for some non-random C ( s ) > 0. By Girsanov’s theorem [78, Lemma 6.4.1] there is a tilt of the la w of G s suc h that the la w under this tilted measure is Gaussian. Thus P [ G s ∈ I ] > 0 for an y in terv al I . Now, ﬁx an in terv al I = ( a, b ). Note that G ρ = G s 0 + Z ρ s 0 √ 2 dW s 1 + Z ρ s 0 2 m ( α θ s 1 ) 2 ds 1 , and thus | G ρ − G s 0 | ≤ 2 Z ρ s 0 mds 1 +    Z ρ s 0 √ 2 dW s 1    . F urther, R ρ s 0 √ 2 dW s 1 ∼ N (0 , 2( ρ − s 0 )). Fix C 0 > 0, and let ( c, d ) ⊂ I suc h that min { c − a, d − b } > 2 Z ρ s 0 mds 1 + C 0 p 2( ρ − s 0 ) . Suc h an interv al alwa ys exists once ρ − s 0 is suﬃciently small. This implies P [ G ρ ∈ I ] ≥ P h G s 0 ∈ ( c, d ) , | G ρ − G s 0 | ≤ 2 Z ρ s 0 mds 1 + C 0 p 2( ρ − s 0 ) i > 0 . 37 This, in turn, establishes that for any in terv al I , P [ G ρ ∈ I ] > 0. Let Y = G ρ + x 1 and Z = G ρ + x 0 , then w e ha v e that u ν, 0 ( t, x ) = E   X θ ρ  + − Z ρ t 2 m  α θ s  2 ds  < θ E  Y + − Z ρ t 2 m  α θ s  2 ds  + (1 − θ ) E  Z + − Z ρ t 2 m  α θ s  2 ds  ≤ θ u ν, 0 ( t, x 0 ) + (1 − θ ) u ν. 0 ( t, x 1 ) , where in the second line w e use that if a < 0 < b then ( θ a + (1 − θ ) b ) + < θ a + + (1 − θ ) b + and that P ( Y Z < 0) = P (( x 0 + G ρ )( x 1 + G ρ ) < 0) > 0 .  Lemma B.2. The functional P is strictly c onvex on A 0 × R . Remark B.3. It will be easy to see from the pro of that it is also con v ex on A × R . Strict con v exity , how ev er, fails on this larger domain due to the in v ariance of the functional under the map ( mdt + cδ ρ , Λ 1 , Λ 2 ) 7→ ( mdt, Λ 1 + 2 c, Λ 2 + 2 c ). Pr o of. This pro of follo ws the approac h of [51, Theorem 20]. Fix ( ν 1 , λ 1 ) , ( ν 2 , λ 2 ) ∈ A 0 × R , and θ ∈ [0 , 1], let ( ν θ , λ θ ) = θ ( ν 1 , λ 1 ) + (1 − θ )( ν 2 , λ 2 ) , and write ν 1 = m 1 dt , ν 2 = m 2 dt , ν θ = m θ dt with m θ = θ m 1 + (1 − θ ) m 2 . Let X θ s denote the optimal tra jectory for the sto c hastic control problem for u ν θ ,λ θ and let α θ = ∂ x u ν θ ,λ θ ( s, X θ s ) the corresp onding con trol. Finally let Y θ , Z θ solv e the SDEs d Y θ = 2 m 1  α θ s  2 ds + √ 2 dW s d Z θ = 2 m 2 ( α θ s ) 2 ds + √ 2 dW s with Y 0 = Z 0 = 0 . No w, ﬁx some 0 < t < ρ. then by the dynamic programming principle, u ν θ ,λ θ (0 , 0) = E h u ν θ ,λ θ ( t, X θ t ) − Z t 0 m θ ( s )  α θ s  2 ds i . Since the equation (2.1) is inv arian t under translations of space, we see that u ν,λ ( t, x ) = u ν, 0 ( t, x + λ ) for any ( ν, λ ). Thus we may rewrite the ab o ve as u ν θ ,λ θ (0 , 0) = E u ν θ , 0 ( t, X θ t + λ θ ) − Z t 0 m θ ( s )  α θ s  2 ds ≤ θ E  u ν θ , 0 ( t, Y t + λ 1 ) − Z t 0 m 1 ( s )( α θ s ) 2 ds  + (1 − θ ) E  u ν θ , 0 ( t, Z t + λ 2 ) − Z t 0 m 2 ( α θ s ) 2 ds  ≤ θ u ν 1 ,λ 1 (0 , 0) + (1 − θ ) u ν 2 ,λ 2 (0 , 0) , in the ﬁrst inequalit y we hav e used the conv exit y of u ν, 0 ( t, 0) in space. Note that in fact the ﬁrst inequalit y is strict, pro vided P (( Y t + λ 1 ) 6 = ( Z t + λ 2 )) > 0 . In particular, it suﬃces to show that V ar ( Y t − Z t ) + | λ 1 − λ 2 | > 0. Th us if λ 1 6 = λ 2 w e are done. If they are equal then w e kno w that m 1 6 = m 2 . In this case, by right con tinuit y and monotonicity , there must b e some s < τ < ρ suc h that m 1 ( t 0 ) 6 = m 2 ( t 0 ) on [ s, τ ] (that we can tak e t < ρ follo ws from the fact that if ν 1 6 = ν 2 then m 1 and m 2 m ust diﬀer on a set of positive leb esgue measure ). In particular, w e c ho ose t = τ from now on. 38 Note that b y Ito’s lemma, our choice of α θ s is a martingale, with α θ s − α θ 0 = Z s 0 √ 2 ∂ xx u ν, 0 ( s 1 , X θ s 1 ) dW s 1 . Th us by Ito’s isometry , if we let ∆ s = 2( m 1 − m 2 ) V ar ( Y t − Z t ) = E  Z t 0 ∆ s ( α θ s − α θ 0 ) ds  2 = Z Z [0 ,t ] 2 ∆ s 1 ∆ s 2 K ( s 1 , s 2 ) ds 1 ds 2 , where K ( s, t ) = E ( α s − α 0 )( α t − α 0 ) = Z t ∧ s 0 2 E  ∂ xx u ν, 0 ( s 1 , X θ s 1 )  2 ds 1 = p ( t ∧ s ) = p ( t ) ∧ p ( s ) , where p ( t ) = R t 0 2 E  ∂ xx u ν, 0 ( s, X θ s )  2 ds . Notice that since t < ρ , we ha ve that ∆ t ∈ L 2 ([0 , t ]) . Th us to show p ositivit y of the v ariance, it suﬃces to show that K is p ositiv e deﬁnite. Since u is C 2 ([0 , t +  ] × R ) for some  > 0 small enough, and u is strictly con v ex, w e ha v e that ∂ xx u ( t, x ) > 0 lebesgue a.e. x . Thus p ( t ) is strictly increasing. Th us this k ernel corresp onds to a monotone time c hange of a Bro wnian motion, so that it is p ositiv e deﬁnite.  References [1] Emmanuel Abb e. Comm unity detection and sto chastic blo c k models: recent developmen ts. The Journal of Machine L e arning R ese ar ch , 18(1):6446–6531, 2017. [2] Dimitris Achlioptas, Amin Co ja-Oghlan, and F ederico Ricci-T ersenghi. On the solution-space geometry of ran- dom constraint satisfaction problems. R andom Structur es & Algorithms , 38(3):251–268, 2011. [3] Louigi Addario-Berry and P ascal Maillard. The algorithmic hardness threshold for con tinuous random energy mo dels. arXiv pr eprint arXiv:1810.05129 , 2018. [4] Michael Aizenman, Rob ert Sims, and Shannon L Starr. Extended v ariational principle for the sherrington- kirkpatric k spin-glass mo del. Physic al R eview B , 68(21):214403, 2003. [5] Noga Alon, Michael Kriv elevich, and Benny Sudako v. Finding a large hidden clique in a random graph. Random Structur es & Algorithms , 13(3-4):457–466, 1998. [6] Arash A Amini and Martin J W ainwrigh t. High-dimensional analysis of semideﬁnite relaxations for sparse prin- cipal comp onen ts. In 2008 IEEE International Symp osium on Information The ory , pages 2454–2458. IEEE, 2008. [7] Louis-Pierre Arguin. Spin glass computations and Ruelle’s probabilit y cascades. J. Stat. Phys. , 126(4-5):951–976, 2007. [8] G´ erard Ben Arous, Alexander S W ein, and Ilias Zadik. F ree energy wells and ov erlap gap prop ert y in sparse p ca. In Confer enc e on L e arning The ory , pages 479–482. PMLR, 2020. [9] Antonio Auﬃnger and W ei-Kuo Chen. Parisi formula for the ground state energy in the mixed p -spin mo del. The Annals of Pr ob ability , 45(6B):4617–4631, Nov 2017. [10] Antonio Auﬃnger, W ei-Kuo Chen, and Qiang Zeng. The sk mo del is full-step replica symmetry breaking at zero temp erature. arXiv pr eprint arXiv:1703.06872 , 2017. [11] F rancesco Baﬃoni and F rancesco Rosati. Some exact results on the ultrametric o v erlap distribution in mean ﬁeld spin glass mo dels (i). The Eur ope an Physic al Journal B-Condense d Matter and Complex Systems , 17(3):439–447, 2000. [12] Jinho Baik, G´ erard Ben Arous, Sandrine P ´ ec h´ e, et al. Phase transition of the largest eigenv alue for nonn ull complex sample cov ariance matrices. The Annals of Pr ob ability , 33(5):1643–1697, 2005. [13] Siv araman Balakrishnan, Mladen Kolar, Alessandro Rinaldo, Aarti Singh, and Larry W asserman. Statistical and computational tradeoﬀs in biclustering. In NeurIPS 2011 workshop on c omputational tr ade-oﬀs in statistic al le arning , volume 4, 2011. [14] Boaz Barak, Samuel Hopkins, Jonathan Kelner, Pra v esh K Kothari, Ankur Moitra, and Aaron Potec hin. A nearly tigh t sum-of-squares low er b ound for the planted clique problem. SIAM Journal on Computing , 48(2):687–735, 2019. [15] Jean Barbier, Nicolas Macris, and Cynthia Rush. All-or-nothing statistical and computational phase transitions in sparse spiked matrix estimation. arXiv pr eprint arXiv:2006.07971 , 2020. [16] Adriano Barra, Giusepp e Genov ese, and F rancesco Guerra. Equilibrium statistical mechanics of bipartite spin systems. Journal of Physics A: Mathematic al and The or etic al , 44(24):245002, 2011. 39 [17] G´ erard Ben Arous, Reza Gheissari, and Aukosh Jagannath. Algorithmic thresholds for tensor p ca. arXiv pr eprint arXiv:1808.00921 , 2018. [18] G´ erard Ben Arous and Aukosh Jagannath. Sp ectral gap estimates in mean ﬁeld spin glasses. Communic ations in Mathematic al Physics , 361(1):1–52, 2018. [19] Florent Benayc h-Georges and Ra j Rao Nadakuditi. The eigen v alues and eigenv ectors of ﬁnite, low rank p ertur- bations of large random matrices. A dvanc es in Mathematics , 227(1):494–521, 2011. [20] Quentin Berthet and Philipp e Rigollet. Complexity theoretic low er b ounds for sparse principal comp onen t de- tection. In Confer enc e on L e arning The ory , pages 1046–1066, 2013. [21] Shank ar Bhamidi, Partha S Dey , and Andrew B Nobel. Energy landscap e for large av erage submatrix detection problems in gaussian random matrices. Pr ob ability The ory and R elate d Fields , 168(3-4):919–983, 2017. [22] St´ ephane Boucheron, G´ ab or Lugosi, and Pascal Massart. Conc entr ation ine qualities: A nonasymptotic the ory of indep endence . Oxford universit y press, 2013. [23] Matthew Brennan, Guy Bresler, and W asim Huleihel. Reducibility and computational lo wer b ounds for problems with planted sparse structure. arXiv pr eprint arXiv:1806.07508 , 2018. [24] Matthew Brennan, Guy Bresler, and W asim Huleihel. Univ ersality of computational low er b ounds for submatrix detection. arXiv pr eprint arXiv:1902.06916 , 2019. [25] Cristina Butucea, Y uri I Ingster, et al. Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoul li , 19(5B):2652–2688, 2013. [26] Cristina Butucea, Y uri I Ingster, and Irina A Suslina. Sharp v ariable selection of a sparse submatrix in a high- dimensional noisy matrix. ESAIM: Pr ob ability and Statistics , 19:115–134, 2015. [27] T T ony Cai, T engyuan Liang, Alexander Rakhlin, et al. Computational and statistical b oundaries for submatrix lo calization in a large noisy matrix. The Annals of Statistics , 45(4):1403–1430, 2017. [28] V enk at Chandrasek aran and Michael I Jordan. Computational and statistical tradeoﬀs via conv ex relaxation. Pr oc e edings of the National A c ademy of Scienc es , 110(13):E1181–E1190, 2013. [29] W ei-Kuo Chen, Da vid Gamarnik, Dmitry Panc henko, Mustazee Rahman, et al. Sub optimalit y of lo cal algorithms for a class of max-cut problems. The Annals of Pr ob ability , 47(3):1587–1618, 2019. [30] Y udong Chen and Jiaming Xu. Statistical-computational tradeoﬀs in planted problems and submatrix lo caliza- tion with a growing num ber of clusters and submatrices. The Journal of Machine L e arning R ese ar ch , 17(1):882– 938, 2016. [31] Amin Co ja-Oghlan, Amir Haqshenas, and Samuel Hetterich. W alksat stalls well below satisﬁability . SIAM Jour- nal on Discrete Mathematics , 31(2):1160–1173, 2017. [32] Thomas M Cov er and Joy A Thomas. Elemen ts of information theory john wiley & sons. New Y ork , 68:69–73, 1991. [33] N. G. de Bruijn and P . Erd¨ os. Some linear and some quadratic recursion formulas. I I. Ne derl. Akad. Wetensch. Pr oc. Ser. A. 55 = Indagationes Math. , 14:152–163, 1952. [34] Y ash Deshpande and Andrea Montanari. Information-theoretically optimal sparse p ca. In 2014 IEEE Interna- tional Symp osium on Information The ory , pages 2197–2201. IEEE, 2014. [35] Y ash Deshpande and Andrea Mon tanari. Improv ed sum-of-squares low er b ounds for hidden clique and hidden submatrix problems. In Confer enc e on L e arning The ory , pages 523–562, 2015. [36] Mohamad Dia, Nicolas Macris, Florent Krzak ala, Thibault Lesieur, Lenk a Zdeb oro v´ a, et al. Mutual information for symmetric rank-one matrix estimation: A pro of of the replica formula. In A dvanc es in Neur al Information Pr oc essing Systems , pages 424–432, 2016. [37] Y unzi Ding, Dmitriy Kunisky , Alexander S W ein, and Afonso S Bandeira. Sub exp onen tial-time algorithms for sparse p ca. arXiv preprint , 2019. [38] Vitaly F eldman, Elena Grigorescu, Lev Reyzin, Santosh S V empala, and Ying Xiao. Statistical algorithms and a low er b ound for detecting planted cliques. Journal of the ACM (JACM) , 64(2):8, 2017. [39] David Gamarnik and Quan Li. Finding a large submatrix of a gaussian random matrix. The Annals of Statistics , 46(6A):2511–2561, 2018. [40] David Gamarnik and Madhu Sudan. P erformance of sequential lo cal algorithms for the random nae-k-sat prob- lem. SIAM Journal on Computing , 46(2):590–619, 2017. [41] David Gamarnik and Ilias Zadik. High dimensional regression with binary co eﬃcien ts. estimating squared error and a phase transtition. In Confer enc e on L e arning The ory , pages 948–953, 2017. [42] David Gamarnik and Ilias Zadik. Sparse high-dimensional linear regression. algorithmic barriers and a local searc h algorithm. arXiv pr eprint arXiv:1711.04952 , 2017. [43] David Gamarnik and Ilias Zadik. The landscap e of the plan ted clique problem: Dense subgraphs and the ov erlap gap prop ert y . arXiv pr eprint arXiv:1904.07174 , 2019. [44] Chao Gao, Zongming Ma, Harrison H Zhou, et al. Sparse cca: Adaptiv e estimation and computational barriers. The Annals of Statistics , 45(5):2074–2101, 2017. 40 [45] Samuel B Hopkins, Prav esh Kothari, Aaron Henry P otec hin, Prasad Ragha vendra, and Tselil Schramm. On the integralit y gap of degree-4 sum of squares for planted clique. ACM T r ansactions on Algorithms (T ALG) , 14(3):28, 2018. [46] Samuel B Hopkins, Tselil Schramm, Jonathan Shi, and David Steurer. F ast sp ectral algorithms from sum-of- squares pro ofs: tensor decomp osition and planted sparse vectors. In Pr o c e e dings of the forty-eighth annual ACM symp osium on The ory of Computing , pages 178–191. A CM, 2016. [47] Aukosh Jagannath. Approximate ultrametricity for random measures and applications to spin glasses. Comm. Pur e Appl. Math. , 70(4):611–664, 2017. [48] Aukosh Jagannath, Justin Ko, and Subhabrata Sen. Max κ -cut and the inhomogeneous p otts spin glass. The Annals of Applie d Pr ob ability , 28(3):1536–1572, 2018. [49] Aukosh Jagannath, Patric k Lopatto, and L ´ eo Miolane. Statistical thresholds for tensor PCA. Ann. Appl. Prob ab. , 30(4):1910–1933, 2020. [50] Aukosh Jagannath and Subhabrata Sen. On the unbalanced cut problem and the generalized sherrington- kirkpatric k model. Ann. Inst. Henri Poincar. D , to appear. [51] Aukosh Jagannath and Ian T obasco. A dynamic programming approach to the parisi functional. Pr oc e edings of the Americ an Mathematical So ciety , 144(7):3135–3150, 2016. [52] Aukosh Jagannath and Ian T obasco. Low temp erature asymptotics of spherical mean ﬁeld spin glasses. Commu- nic ations in Mathematic al Physics , 352(3):979–1017, 2017. [53] Aukosh Jagannath and Ian T obasco. Some prop erties of the phase diagram for mixed p-spin glasses. Pr ob ability The ory and R elate d Fields , 167(3-4):615–672, 2017. [54] Mladen Kolar, Siv araman Balakrishnan, Alessandro Rinaldo, and Aarti Singh. Minimax lo calization of structural information in large noisy matrices. In A dvanc es in Neur al Information Pr o c essing Systems , pages 909–917, 2011. [55] Florent Krzak ala, Jiaming Xu, and Lenk a Zdeb oro v´ a. Mutual information in rank-one matrix estimation. In 2016 IEEE Information The ory Workshop (ITW) , pages 71–75. IEEE, 2016. [56] Marc Lelarge and L´ eo Miolane. F undamental limits of symmetric lo w-rank matrix estimation. Pr ob ability Theory and R elate d Fields , 173(3-4):859–929, 2019. [57] Thibault Lesieur, Florent Krzak ala, and Lenk a Zdeb oro v´ a. Phase transitions in sparse p ca. In 2015 IEEE Inter- national Symp osium on Information The ory (ISIT) , pages 1635–1639. IEEE, 2015. [58] Thibault Lesieur, Florent Krzak ala, and Lenk a Zdeb oro v´ a. Constrained lo w-rank matrix estimation: Phase tran- sitions, appro ximate message passing and applications. Journal of Statistic al Me chanics: The ory and Exp eriment , 2017(7):073403, 2017. [59] T engyu Ma and Avi Wigderson. Sum-of-squares low er b ounds for sparse p ca. In A dvanc es in Neur al Information Pr oc essing Systems , pages 1612–1620, 2015. [60] Zongming Ma and Yihong W u. Computational barriers in minimax submatrix detection. The Annals of Statistics , 43(3):1089–1116, 2015. [61] Raghu Mek a, Aaron Potec hin, and Avi Wigderson. Sum-of-squares low er b ounds for planted clique. In Pr o c e e dings of the forty-seventh annual ACM symp osium on The ory of c omputing , pages 87–96. ACM, 2015. [62] Marc M ´ ezard, Thierry Mora, and Riccardo Zecchina. Clustering of solutions in the random satisﬁabilit y problem. Physic al R eview L etters , 94(19):197205, 2005. [63] Andrea Montanari. Finding one communit y in a sparse graph. Journal of Statistic al Physics , 161(2):273–299, 2015. [64] Andrea Mon tanari. Optimization of the sherrington-kirkpatrick hamiltonian. arXiv pr eprint arXiv:1812.10897 , 2018. [65] Andrea Mon tanari, Daniel Reichman, and Ofer Zeitouni. On the limitation of sp ectral metho ds: F rom the gaussian hidden clique problem to rank-one p erturbations of gaussian tensors. In A dvanc es in Neur al Information Pr oc essing Systems , pages 217–225, 2015. [66] Cristopher Moore. The computer science and physics of communit y detection: Landscap es, phase transitions, and hardness. arXiv pr eprint arXiv:1702.00467 , 2017. [67] Dmitry P anchenk o. The Parisi ultrametricity conjecture. Ann. of Math. (2) , 177(1):383–393, 2013. [68] Dmitry P anchenk o. The Sherrington-Kirkpatrick mo del . Springer, 2013. [69] Dmitry P anchenk o. The Parisi formula for mixed p -spin mo dels. Ann. Pr ob ab. , 42(3):946–958, 2014. [70] Dmitry Panc henk o. The free energy in a m ulti-species Sherrington-Kirkpatrick model. Ann. Pr ob ab. , 43(6):3494– 3513, 2015. [71] Dmitry Panc henk o. F ree energy in the mixed p -spin models with vector spins. The Annals of Prob ability , 46(2):865–896, 2018. [72] Dmitry P anchenk o. F ree energy in the p otts spin glass. The Annals of Pr ob ability , 46(2):829–864, 2018. [73] Mustazee Rahman, Balint Virag, et al. Lo cal algorithms for indep enden t sets are half-optimal. The Annals of Pr obability , 45(3):1543–1577, 2017. 41 [74] Emile Ric hard and Andrea Montanari. A statistical mo del for tensor p ca. In A dvanc es in Neur al Information Pr oc essing Systems , pages 2897–2905, 2014. [75] Benjamin Rossman. A ver age-c ase c omplexity of dete cting cliques . PhD thesis, Massach usetts Institute of T ec h- nology , 2010. [76] Andrey A Shabalin, Victor J W eigman, Charles M P erou, Andrew B Nob el, et al. Finding large av erage subma- trices in high dimensional data. The Annals of Applie d Statistics , 3(3):985–1012, 2009. [77] J. Michael Steele. Prob ability the ory and c ombinatorial optimization , volume 69 of CBMS-NSF R e gional Confer- enc e Series in Applie d Mathematics . So ciet y for Industrial and Applied Mathematics (SIAM), Philadelphia, P A, 1997. [78] Daniel W. Stro o c k and S. R. Sriniv asa V aradhan. Multidimensional diﬀusion pr o c esses . Classics in Mathematics. Springer-V erlag, Berlin, 2006. Reprint of the 1997 edition. [79] Eliran Subag. F ollowing the ground-states of full-rsb spherical spin glasses. arXiv pr eprint arXiv:1812.04588 , 2018. [80] Michel T alagrand. Me an ﬁeld mo dels for spin glasses. Volume II , volume 55 of Er gebnisse der Mathematik und ihr er Gr enzgebiete. 3. F olge. A Series of Mo dern Surveys in Mathematics [R esults in Mathematics and R elate d Ar e as. 3r d Series. A Series of Mo dern Surveys in Mathematics] . Springer, Heidelb erg, 2011. Adv anced replica- symmetry and low temp erature. [81] Alexander S W ein, Ahmed El Alaoui, and Cristopher Mo ore. The kikuchi hierarch y and tensor pca. arXiv pr eprint arXiv:1904.03858 , 2019. Sloan School of Management, Massachusetts Institute of Technology, gamarnik@mit.edu Dep ar tment of St a tistics and A ctuarial Sciences and Dep ar tment of Applied Ma thema tics, Uni- versity of W a terloo, a.jagannath@uwaterloo.ca Dep ar tment of St a tistics, Har v ard University subhabratasen@fas.harvard.edu 42

The Overlap Gap Property in Principal Submatrix Recovery

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment