On semidefinite relaxations for the block model

On semideﬁnite relaxatio ns for the blo c k mo del Arash A. Amini and Eliza v eta Levina Departmen t of Statistics, UCLA and Departmen t of Statistics, Univ ersit y of Mic higan Marc h 17, 2016 Abstract The sto chastic block model (SBM) is a p o pular to ol fo r co mm unity detection in net- works, but ﬁtting it by maximum lik eliho o d (MLE) inv olves a computationally infeas ible optimization problem. W e prop ose a new semideﬁnite programming (SDP) solution to the problem of ﬁtting the SBM, derived as a r elaxation of the MLE. W e put ours and previously prop osed SDPs in a uniﬁed framework, as rela xations of the MLE o ver v ar ious sub- classes of the SBM, which also r eveals a connection to the well-kno wn problem o f s parse P C A. Our main relaxa tion, which we call SDP - 1, is tighter tha n o ther recently prop os ed SDP relaxatio ns, and thus pr eviously established theoretical gua rantees ca rry over. How ever, we show that SDP-1 exac tly recovers true co mm unities ov er a wider class of SBMs than those cov e r ed by current results. In particular, the assumption of s tr ong ass ortativity of the SB M, implicit in consistency conditions for previously pro po sed SDPs, can b e relaxed to weak assor ta tivit y for o ur approa ch, th us s igniﬁcantly br oadening the clas s of SBMs cov ered by the cons istency results. W e also sho w that strong as sortativity is indeed a nec - essary condition for exact recov ery for previously pro p o s ed SDP approa ch es and not an artifact of the pro o fs. Our analysis o f SDPs is based on primal-dual witness constructio ns, which provides some insight into the nature of the solutio ns of v arious SDPs. In particular, we sho w how to combine fea tures from SDP - 1 and alrea dy a v a ilable SDPs to achieve the most ﬂexibilit y in terms of b oth assortativity and block-size cons traints, a s our relaxation has the tendency to pro duce co mm unities of similar sizes. This tendency makes it the idea l to ol for ﬁtting netw ork histogra ms, a metho d gaining p opular it y in the gra phon estimatio n literature, as we illustrate on an example of a s o cial netw or k s of dolphins. W e also pr ovide empirical evidence that SDPs outp erfo r m sp ectral metho ds for ﬁtting SBMs with a lar g e nu m ber of blo cks. 1 1 In tro d uction Comm u nit y detection, one of the fu ndamenta l problems in net work analysis, has attracted a lot of att ention in a n umber of ﬁelds, including computer science, stati s tics, physics, and so ci- ology . The sto c h astic blo c k mo d el (SBM) [ 27 ] is a well-e stablish ed and wid ely u sed mo del for comm unity detecti on , attractiv e for its analytica l tractabilit y and connections to fundamen tal prop erties of random graph s [ 5 , 11 , 35 ], but ﬁtting it to data is a challenge d ue to th e need to optimize o ver K n assignmen ts of n no des to K comm un ities. Many ﬁ tting metho ds h a ve b een prop osed, including proﬁle likel ih o o d [ 11 ], MCMC [ 43 , 38 ], v ariational approac hes [ 3 , 15 , 12 ], b elief pr opagation [ 21 ], and pseudo-lik eliho o d [ 7 ], th e latt er tw o b eing more or less the curren t state of the art in sp eed and accuracy . Ho wev er, all these methods rely on a go o d initial v alue and can b e sensitive to starting p oints. In contrast, sp ectral clustering methods d o not require an initial v alue, are fast and h a v e also b een p opular in comm unit y detection [ 42 , 16 , 31 , 41 ]. Sp ectral clustering works reasonably well in dense netw orks with balanced communities b ut fails on spars e net works [ 30 ]. Regularization can help [ 16 , 7 , 28 ], bu t ev en regularized sp ectral clustering d o es not ac hieve the accuracy of likelihoo d-b ased metho d s when they are give n a go o d initial v alue [ 7 ]. Recen tly , semideﬁn ite p r ogramming (SDP) approac hes to ﬁtting the SBM ha v e app eared in the literat u re [ 17 , 18 , 14 ], whic h rely on a SDP relaxati on of the computationally infeasible lik eliho o d optimization p roblem. They are att ractiv e b ecause, on one hand, they solv e a global optimization problem and require no initial v alue, and on the other hand, they are still maximizing the lik eliho o d , and one can therefore hop e for b etter p erformance than from generic metho d s like sp ectral clustering, whic h do not us e the lik eliho o d in an y w a y . As global optimization m etho ds, they are easier to analyze than iterativ e m etho ds dep ending on a starting v alue. It also app ears that SDP relaxations in themselv es ha v e a regularizatio n eﬀect, wh ic h mak es th eir solutions more robu st to noise and outliers (see Remark 3.1 ). On e dra wb ac k of SDP metho d s is th e higher computational cost of SDP solvers. Ho we ver, by formulating the problem as a SDP , we can b eneﬁt from contin uous adv ances in solving large scale SDPs, an activ e area of researc h in optimization. In this pap er, we prop ose a new S DP relaxation of the lik eliho o d optimization pr ob lem, whic h is tigh ter than an y of the previously prop osed S DP relaxations [ 17 , 18 , 14 ]. W e also p ut all these relaxations int o a uniﬁ ed framewo r k, by viewing them as versions of the MLE restricted to diﬀerent parameter spaces, and sho w their connection to the w ell-studied problem of sp arse PCA. Emp irically , the tigh ter relaxatio n giv es b etter results, and we derive a ﬁ rst-order SDP implemen tation v ia ADMM whic h kee ps computing costs reasonable. On the theoretica l side, our fo cus for the most part will b e on balanced mo dels, i.e., those with equal comm u nit y sizes. W e obtain suﬃcient conditions on the parameters of the blo c k mo del for stron g consistency (i.e., exact reco very of communities) of our r elaxatio n , SDP- 1. These conditions guarantee success ov er a wider class of S BMs than in previous literature. Current conditions f or th e su ccess of SDP relaxations implicitly imp ose wh at we will call strong assortativit y , whereas our SDP succeeds for an y wea kly assortativ e SBM (cf. Deﬁn ition 4.1 ), when the exp ected degree gro w s as Ω(log n ). W e also sho w that the requiremen t of strong assortativit y is necessary for the success of pr evious S DP r elaxations (SDP-2 and SDP-3 in T able 1 ), and it is n ot an artifact of pr o of tec hniques (Section 5 ). Ou r pro of of the success of SDP-1 is based on a primal-dual witness construction which has already b een used su ccessfully in the conte xt of sparse reco very pr oblems; see for example [ 46 , 8 ]. In the con text of SDP 2 relaxations for the SBM, ho w ever, the only instance of th is appr oac h that w e kno w of is the recen t work of [ 1 ], for the case of the K = 2 SBM. Ou r approac h ca n b e viewe d as a n on-trivial extension of [ 1 ] to th e case of general K , and a more complex SDP with the d oubly n onnegativ e cone constrain t and more equ alit y constraint s . As a b y-pro d uct, w e also reco v er the curren t results for SDP-2 for the class of strongly assortativ e SBMs. Our results suggest that the greater divid e for SDP relaxations is not b etw een strongly and w eakly assortativ e SBMs, b ut b et ween purely assortativ e (or d issortativ e) and mixed mo dels, those with b oth assortativ e and dissortativ e communities. SDP-1, in its basic f orm, tends to partition the net work in to blo c ks of similar sizes. Th is is sometimes a n u n welco me feature in practice, and sometimes a desirable one, since v ery large and v ery small communities are generally diﬃcult to in terpret. If this feature is not desirable, SDP-1 can b e mo d iﬁed to allo w for diﬀerent blo c k sizes, as discuss ed in Section 6 . The equal sized b lo c ks are esp eciall y suitable for consr ucting net work histograms, a metho d for graphon estimation prop osed by [ 39 ]. Viewing the SBM as a nonparametric app ro ximation to a general reasonably smo oth mean f unction of the adjacency m atrix (the graphon) is analogous to constructing a h istogram to app ro ximate a general smo oth densit y fu nction. A n umb er of metho ds for graphon estimation h a v e b een prop osed recen tly [ 4 , 47 , 49 ], and the net w ork histogram as a graphon estimator has b een prop osed in [ 39 ]. A histogram is app ealing b ecause it is controlle d by the n umb er of bins (b lo c ks) K , w hic h is a single parameter that can b e c h osen to balance ﬁtting the data with robustness to noise. In this case, it is particularly appropriate to ﬁt blo c ks of equal or similar sizes, ju st lik e in the usual histogram. W e show empir ically in Section 8 that our SDP relaxation provides the b est to ol for histogram estimation, as w ell as generally cle aner solutions, compared to other less tight SDP relaxations and generic metho ds lik e s p ectral clustering. The rest of the pap er is organized as follo w s . In Section 2 , w e in tro d uce the SBM and its sub mo dels. W e d eriv e a general blu eprint f or MLE r elaxatio n s in Section 3 , introd uce our prop osed SDP and compare with th e ones existing in the literature, including a br ief discussion of the connection with sparse PCA. Section 4 presents our consistency results for balanced blo ck mo dels, along w ith an o verview of the pr o ofs. A result s ho wing the failure of SDP-2 in the absence of s trong assortativit y (wh ic h is not needed for our relaxat ion) app ears in Section 5 . Extension to the case of u n b alanced communities is dicussed in Section 6 . Section 7 pr esen ts application of SDP-1 to graph on estimatio n via ﬁtting net work h istograms. Section 8 compares sev eral SDPs n umerically , and w e conclude with a discussion in Section 9 . T ec hn ical details of the pro ofs and a brief discussion of a ﬁrst-order metho d for imp lemen ting SDP-1 can b e found in App endices. Notation. W e use ⊗ to denote Kronec k er p ro duct of m atrices, and ◦ to denote Sc hur (elemen t-wise) pro duct of matrices. S n denotes the set of s ymmetric n × n m atrices, and h A, X i := tr( AX ) the corresp onding inn er pr o duct. ( S n + ,  ) is the cone of p ositive semideﬁnite (PSD) n × n m atrices, and its natural partial ord er, namely , A  B iﬀ A − B ∈ S n + . E n,m is the n × m matrix of all ones and E n is the n × n matrix of all ones. A n × 1 v ector of all ones is denoted 1 n . k u k = k u k 2 is the ℓ 2 norm of v ector u , and | | | A | | | = | | | A | | | 2 is the ℓ 2 → ℓ 2 op erator norm of matrix A . k er ( A ) and range( A ) denote the k ern el (n ull space) and the r ange (co lu m n space) of matrix A . diag : R n × n → R n acts on squ are matrices and extracts th e diagonal. diag ∗ : R n → R n × n is the adjoin t of diag , acting on v ectors, pro du cing the natural diagonal matrix. F or a matrix X , let sup p( X ) := { ( i, j ) : X ij 6 = 0 } b e its su pp ort. More sp ecialize d notation is in tro d uced in Section 4.2 . 3 2 The sto chasti c blo ck mo del W e n o w formally int r o duce the SBM. T h e n et wo r k data (no des and edges connecting them) are represent ed by a simple un directed graph on n no des via its n × n adjacency matrix A , a b inary symmetric matrix w ith A ij = 1 if there is an edge b etw een no des i and j , and 0 otherwise. Eac h no de b elongs to exactly one communit y , sp eciﬁed by its mem b ership vec tor z i ∈ { 0 , 1 } K , with exactly one nonzero entry , z ik = 1, indicating that n o de i b elongs to comm unity k . The v ectors z i are not observe d. The S BM is parametrized th r ough the symmetric pr obabilit y matrix Ψ ∈ [0 , 1 ] K × K , where Ψ k r is the probabilit y of an edge formin g b et wee n a pair of no des from comm unities k and r . F or simplicit y , we assume n is a multiple of K . Giv en z i and Ψ, { A ij , i < j } are dra w n indep endent ly as Bernoulli random v ariables with E [ A ij | z i , z j ] = z T i Ψ z j . Let Z b e the n × K matrix with ro ws z T 1 , . . . , z T n . Then, w e can write the mo del as M Z := E [ A | Z ] = Z Ψ Z T . (2.1) Note that A ii ’s are s o far un d eﬁned. They can b e deﬁn ed based on con v enience, but we will alw a ys assume that they are deﬁned so that ( 2.1 ) holds o ver all element s . (F or example, one p ossibilit y is to set A ii := [ M Z ] ii .) W e do not treat { A ii } as part of the observ ed data. M Z is a blo c k constant , rank K matrix, and we can th ink of the op eration Ψ 7→ Z Ψ Z T as a blo c k constan t em b edd ing of a K × K matrix int o the sp ace of n × n matrices. Th is pro v id es us w ith a simple bu t usefu l prop ert y: for an y matrix M and fu n ction f on R , let f ◦ M b e the p oint wise app licatio n of f to the en tires of M , [ f ◦ M ] ij = f ( M ij ). Then, w e ha ve f ◦ ( Z Ψ Z T ) = Z ( f ◦ Ψ) Z T . (2.2) Using ( 2.2 ), w e can wr ite the log -likeli h o o d of the SBM in a compact form. Firs t note that ℓ ( Z, Ψ) = X i q . Note that ( 2.4 ) simply means th at the diagonal elements are p and the oﬀ-diagonal elemen ts are q . 4 (PP bal ) The balanced plan ted partition mod el, PP bal ( p, q ), whic h is PP( p , q ) with the additional assumption that the blo c ks h av e equal sizes. F or PP( p, q ), the likeli ho o d greatly simpliﬁes, since f ◦ Ψ and g ◦ Ψ tak e only tw o v alues, f ◦ Ψ = f ( q ) E K + [ f ( p ) − f ( q )] I K and similarly for g ◦ Ψ . Sin ce Z E K Z T = E n , ( 2.3 ) b ecomes 2 ℓ ( Z, Ψ ) = [ f ( p ) − f ( q )]  A, Z Z T  0 + [ g ( p ) − g ( q )]  E n , Z Z T  0 + const , where the constan t term do es not dep end on Z . With the condition p > q , we hav e f ( p ) > f ( q ) and g ( p ) < g ( q ). Then, we obtain, 2 ℓ ( Z, Ψ ) f ( p ) − f ( q ) =  A, Z Z T  − λ  E n , Z Z T  + const , λ := g ( q ) − g ( p ) f ( p ) − f ( q ) > 0 . (2.5) Note th at w e h a ve safely r eplaced h· , ·i 0 with h· , ·i , p ossibly changing the constant, since [ Z Z T ] ii = 1 , ∀ i regardless of Z . A similar calculation app ears in [ 14 ], alb eit in a sligh tly diﬀeren t f orm. 3 Relaxing the maxim um lik eliho o d estimator (MLE) Giv en the adjacency m atrix A , the MLE for ( Z, Ψ) is obtained b y maximizing the lik eliho o d of th e S BM. It is known to ha v e desirable consistency and in some sense optimalit y p rop erties [ 11 ], but the exact computation of the MLE is in general NP-hard, du e to the optimization o v er Z . Ho w ever, it can b e relaxed to computationally feasible con ve x pr oblems. W e can obtain a class of MLEs b y v aryin g the domain o v er wh ich the lik eliho o d ( 2.5 ) is maximized. That is, w e ha ve the general estimator b Z := argmax Z ∈Z  A, Z Z T  − λ  E n , Z Z T  . (3.1) Eac h Z corresp onds to a clustering matrix X = Z Z T ∈ { 0 , 1 } n × n , where X ij = 1 if i and j b elong to the same communit y , and X ij = 0 otherwise. Any subset Z in the Z -space induces a corresp onding sub set X in the X -space. W e can consider estimators of X , and our blueprin t for deriving diﬀeren t relaxations w ill b e v arying the space X in the optimization problem b X := argmax X ∈X  A, X  − λ  E n , X  . (3.2) 3.1 Our r elaxation: SDP-1 Our r elaxatio n corresp onds to the balanced mo d el PP bal ( p, q ), in whic h eac h communit y is of size n/K . In this case, all admissible Z ’s can b e obtained b y p erm u tation of an y ﬁxed admissible Z 0 = I K ⊗ 1 n/K , and w e can tak e the feasible set Z in ( 3.1 ) to b e Z orbit ( Z 0 ) :=  P Z 0 Q : P, Q are per mutation matrices  , 5 where ⊗ is the K r onec k er pro du ct and 1 n/K is the vec tor of all ones of length n/K . This c hoice of Z 0 is f or con venience and corresp onds to assigning no d es consecutiv ely to communities 1 through K . Recalling X = Z Z T , the corresp onding feasible set in the X -space is X orbit ( X 0 ) :=  P X 0 P T : P is a p ermutation matrix  , X 0 = I K ⊗ E n/K . (3.3) Note that X 0 is blo c k-diagonal w ith all the diagonal blocks equal to E n/K . In ord er to relax X orbit ( X 0 ), we ﬁrst note that an y X in this set is clearly p ositiv e semidef- inite (PSD), d enoted b y X  0, since X = ( P Z 0 )( P Z 0 ) T . In addition, 0 ≤ X ij ≤ 1 for all i , j , whic h we w r ite as 0 ≤ X ≤ 1, and diag ( X ) = 1 n . Note that the latter condition, X  0 and X ≥ 0 imply X ≤ 1, sin ce 1 − X 2 ij = X ii X j j − X 2 ij ≥ 0 implying X ij = | X ij | ≤ 1. Finally , it is easy to see that eac h row of X sh ould su m to n /K , i.e., X 1 n = ( n/K ) 1 n . Th is implies w e can remo ve the te rm λ h E n , X i from the ob jectiv e function in ( 3.2 ), since h X, E n i = tr( X 1 n 1 T n ) = 1 T n X 1 n = 1 T n ( n/K ) 1 n = n 2 /K, (3.4) whic h is a constan t. T h u s, w e arrive a t our prop osed relaxation, SDP-1 : argmax X h A, X i sub ject to X 1 n = ( n/K ) 1 n , diag( X ) = 1 n , X  0 , X ≥ 0 . (3.5) 3.2 Other relaxations: SDP-2 and SDP-3 Tw o other interesting SDP r elaxatio n s ha ve recent ly app eared in the literature. First, w e will consid er the relaxati on of Chen & Xu [ 18 ]; see also [ 17 ]. Th ey essentiall y work w ith the same PP bal ( p, q ), although their mo d el is slightly more general (see Remark 3.1 ). The main relaxation prop osed in [ 18 ] is via constraining the n uclear norm of X , a common heuristic for constraining the rank. Since X is PS D, we obtain | | | X | | | ∗ = tr( X ) = n . In addition, they imp ose a single aﬃne constrain t, namely ( 3.4 ). Thus, their main fo cus is on the relaxatio n whic h r eplaces X orbit ( X 0 ) with { X : | | | X | | | ∗ ≤ n, h X, E n i = n 2 /K, 0 ≤ X ≤ 1 } . Ho w ever, they br ieﬂy men tion a muc h tighte r SDP relaxatio n whic h imp oses p ositiv e semi-deﬁniteness directly . Th is is what w e ha ve called SDP-2 , sho wn in T able 1 . Note that X  0 and tr( X ) = n imply | | | X | | | ∗ = n , whic h is muc h tigh ter than | | | X | | | ∗ ≤ n . The main diﬀerence b etw een SDP-2 and our relaxation is that we imp ose the constraint h E n , X i = n 2 /K more restrictiv ely , by breaking it into n separate aﬃne constrain ts. W e also break the tr( X ) = n into n p ieces, but that do es not seem to mak e m u c h of a diﬀerence. Next, w e consider the relaxation of Cai & Li [ 14 ], though in a sligh tly d iﬀeren t form . Th is relaxation works for the more general mo del PP( p, q ). In this case, we are lo oking at th e feasible set X free = { X = Z Z T : Z is an admissible membersh ip matrix } . (3.6) F or X ∈ X free , we still hav e X  0 and X ij ∈ { 0 , 1 } . Th u s, one can simply r elax to the pr ob lem denoted b y SDP-3 in T able 1 . Note that λ h E n , X i remains in the ob jectiv e, since there are no constraints to make it constan t. W e cannot enforce an aﬃne constrain t in vo lvin g h E n , X i d irectly for X free without kno win g the blo ck sizes. In fact, let n = ( n 1 , . . . , n K ) b e the v ector of blo c k s izes, and let 6 T able 1: SDP relaxations SDP-1 SDP-2 SDP-3 EVT maximize h A, X i h A, X i h A, X i − λ h E n , X i h A, X i sub ject to X 1 n = ( n/ K ) 1 n h E n , X i = n 2 /K diag( X ) = 1 n tr( X ) = n tr( X ) = n X ≥ 0 0 ≤ X ≤ 1 | | | X | | | 2 ≤ n/ K X  0 mo del PP bal ( p, q ) ≡ X orbit ( X 0 ) PP( p, q ) ≡ X free E n := diag ∗ ( E n 1 , . . . , E n K ) b e the blo c k-diagonal matrix with diago n al blo cks of all ones with sizes giv en by n . It is easy to see that X free is the union of orbits of all p ossible E n , X free = [ n : k n k 1 = n X orbit ( E n ) = [ k n k 1 = n  P E n P T : P is a p erm utation matrix  (3.7) from whic h it follo ws that h E n , X i = k n k 2 2 = P j n 2 j , a function of the unkno wn { n j } . The optimal v alue f or parameter λ , assuming the mo del is PP( p, q ), is giv en in ( 2.5 ) as a function of p and q . Ho we ver, one can think of λ as a general regularization parameter con trolling the sparseness of X , noticing h E n , X i = k X k 1 since X ≥ 0. It is we ll kn o wn that the ℓ 1 norm is a go o d sur rogate for a cardinalit y constrain t when enforcing s p arseness, w hic h leads us to a link to sparse PCA discussed in S ection 3.3 . Remark 3.1. Both [ 18 ] and [ 14 ] consider the eﬀect of outlie rs on their SDPs. Cai & Li [ 14 ] deriv e the S DP for the mo del w e describ ed b ut they mo dify it by p enalizing the trace, whic h is justiﬁed by their theory f or a fairly general mo del of outliers. Chen & Xu [ 18 ] start with a gen- eralized version of PP bal ( p, q ) which allo w s for a subset of no d es that b elong to no comm u nit y , and relax th at mo del. Our relaxation SDP-1 can also w ork for this generalized mo del if w e replace X 1 n = ( n/K ) 1 n with the inequalit y ve rsion X 1 n ≤ ( n/K ) 1 n . This has an adv ant age o v er Ch en & Xu’s app roac h, since one d o es not n eed to kno w the n um b er of outliers a priori. 3.3 Connection wit h nonnegativ e sparse PCA Represen tation ( 3.7 ) suggests another natural dir ection to r estrict the p arameter space. No te that k n k ∞ = max j n j ∈ [ n/K, n ], as a consequence of k n k 1 = n . The closer k n k ∞ is to n /K , the more balanced the comm unities are. This suggests the follo wing class, X γ free := [ n X orbit ( E n ) : k n k 1 = n, k n k ∞ ≤ γ ( n/K ) o , (3.8) where γ ∈ [1 , K ] measures the d eviation from completely balanced communities. F or X ∈ X γ free , note that | | | X | | | 2 = | | | E n | | | 2 = max j | | | E n j | | | 2 = k n k ∞ ≤ γ ( n/K ). As b efore, we ha ve tr( X ) = n , k X k 1 = h E n , X i , and X ∈ N n + := { X : X  0 , X ≥ 0 } , the d oubly n onnegativ e cone. Let ting e X = ( K/n ) X , we ha v e argmax e X h A, e X i − λ k e X k 1 sub ject to | | | e X | | | 2 ≤ γ , tr( e X ) = K, e X  0 , e X ≥ 0 . (3.9) 7 Apart from the nonnegativ e constrain t e X ≥ 0 (whic h can b e remo v ed to obtain a further relaxation), this is a generali zation of th e S DP relaxation for sparse PCA. Sp eciﬁcally , γ = 1 corresp onds to the no w w ell-kno wn relaxation for reco v erin g a sparse K -dimensional lea ding eigenspace of A . The corresp onding solution e X can b e considered a generalized pro jection in to this subspace, see for example [ 45 , 19 ], and note that e X  0 , | | | e X | | | 2 ≤ 1 is equiv alen t to 0  e X  I . W e will not pursu e th is direction here, b ut it op ens up p ossibilities for lev eraging sparse PCA results in net w ork mo dels. 3.4 Connection wit h adjacency-base d spec tral c lustering The ﬁrst step in sp ectral clustering based on the adj acency matrix is the tr u ncation of A to its K largest eigen v alues, w hic h we call eigen v alue tr uncation (EVT). The resulting matrix e X is the solution of a S DP maximizing h A, e X i sub j ect to e X  0 , tr( e X ) = K , | | | e X | | | 2 ≤ 1. W e can consider X := ( n/K ) e X as an estimate of th e cluster matrix b y EVT. The resulting S DP app ears in T able 1 , and w ill b e our su rrogate to compare the other SDPs to this particular v ers ion of sp ectral clustering. W e should note th at the m ore common form of sp ectral clustering, based on truncation to K largest eignev alues in absolute v alue is equiv alen t to applying EVT to | A | = √ A 2 . The SDP formulation of EVT can b e considered a relaxatio n of the MLE in PP bal , similar to the other S DPs we ha ve considered . It is enough to note that | | | X | | | 2 = | | | X 0 | | | 2 = n/K for any X ∈ X orbit ( X 0 ). Also note that SDP-1 is a strictly tigh ter relaxatio n than EVT. T o see th at, tak e an y X w hic h is feasible for SDP-1, and note that X 1 n = ( n/K ) 1 n means that 1 n is an eigen v ector of X asso ciated with eigen v alue n/K . Th e P erron-F rob enius th eorem then implies that | | | X | | | 2 ≤ n/K , hence X is feasible for EVT. 4 Strong consistency results In th is section, we pro v id e consistency results for SDP-1 and a v arian t of SDP-2, whic h we will call SDP-2 ′ . This v ersion is obtained from SDP-2 by replacing tr( X ) = n with diag ( X ) = 1 n and remo ving the now redund an t condition X ≤ 1. This mo diﬁcation allo ws us to un ify the treatmen t of th ese t wo SDPs. F or example, o p timalit y conditions in Section 4.2.2 are deriv ed for a general blueprint ( 4.9 ), w hic h in cludes b oth SDP-1 and SDP-2 ′ as sp ecial cases. The consistency results will go b ey ond the PP bal ( p, q ) mod el original ly used in deriving them. Consider a general balanced blo c k mo del, d enoted as BM bal (Ψ) = BM bal m (Ψ) with block s ize m , and pr ob ab ility matrix Ψ ∈ [0 , 1] K × K . Note the relationship n = mK b et we en the n umb er of no des n , th e blo c k size m , and the n umb er of blo cks K . F or notat ional consistency , we w ill denote diagonal and oﬀ-diago n al entries of Ψ diﬀeren tly , p k := Ψ k k , q k ℓ := Ψ k ℓ , k 6 = ℓ. (4.1) The balanced plan ted partition mod el PP bal ( p, q ) = PP bal m,K ( p, q ) is a sp ecial ca se of BM bal m (Ψ) where p k = p and q k ℓ = q for all k , l . W e start with deﬁning t w o notions of assortativit y that will b e k ey in our resu lts. Let q ∗ k := max r = k ,s 6 = k q r s = max r 6 = k ,s = k q r s . (4.2) 8 Deﬁnition 4.1 (Str ong and w eak assortativit y) . Consider the balanced blo c k m o del BM bal m (Ψ) determined b y ( 4.1 ). • The mo del is strongly assortativ e (SA) if min k p k > m ax k q ∗ k . • The mo del is wea kly assortativ e (W A) if p k > q ∗ k for all k . An alternativ e w a y to state strong assortativit y is min k p k > max ( k, ℓ ): k 6 = ℓ q k ,ℓ . Strong assortativit y implies wea k assortativit y . See ( 8.1 ) for an example where w eak assortativit y holds but not the strong one. These deﬁnitions app ly to general blo c k mo dels since they are deﬁned only in terms of th e edge p r obabilit y matrix Ψ. W e also d eﬁne a partial ord er among balanced blo c k mo dels, wh ic h reﬂects the hardness of reco v ering the un derlying cluster matrix X . Deﬁnition 4.2 (Strong assortativit y (SA) ordering) . Th e collect ion { BM bal m (Ψ) : Ψ ∈ [0 , 1] K × K ∩ S K } is partially ordered b y BM bal m ( e Ψ) ≥ BM bal m (Ψ) ⇐ ⇒ e p k ≥ p k , e q k ℓ ≤ q k ℓ , ∀ k 6 = ℓ. (4.3) This ordering or the one induced on matrices in [0 , 1] K × K ∩ S K is referred to as SA-ordering. In tu itiv ely , f or assortat ive mo dels BM bal m ( e Ψ) ≥ BM bal m (Ψ) implies that BM bal m ( e Ψ) is easie r than BM bal m (Ψ) f or cluster reco very . This will b e made p recise in Corollary 4.2 in Section 4.2.1 . F or example, consider a s tr ongly assortat ive mo del BM bal (Ψ) where p − := min k p k > max ( k, ℓ ): k 6 = ℓ q k ,ℓ =: q + . (4.4) Then, it is easy to see that BM bal (Ψ) ≥ PP bal ( p − , q + ), r ou gh ly meaning that ﬁ tting BM bal (Ψ) is not harder than ﬁtting PP bal ( p − , q + ). In order to study consistency , we alw a ys condition on the tru e cluster matrix, which is tak en to b e X 0 := I K ⊗ E m without loss of generalit y . L et S 0 := su pp( X 0 ) b e the ind ex set of non-zero elemen ts of X 0 . W e write SDP sol ( A ) for the solution set of th e S DP for in put A , where SDP is an y of SDP-1, SDP-2 or S DP-3, whic h will b e clea r from the con text. In this notation, S DP sol ( A ) = { X 0 } means that X 0 is the unique solution of the S DP , in w h ic h case w e say that the S DP is str ongly c onsistent for cluster matric es. Remark 4.1. Our n otion of consistency h ere is stronger than what is commonly called strong consistency in the literature [ 50 , 10 , 7 , 37 ]. Strong consistency for an algorithm that outpu ts a set of comm u nit y lab els usu ally means e x act reco v ery of lab els up to a p erm u tation o f comm unities, with high probabilit y . Viewing SDPs as algorithms that output the cluster matrix b X , here by stron g consistency w e mean the exact reco very of X 0 , whic h immediately implies exact lab el reco v ery . W e note, h o we ver, that in some regimes one can reco ver lab els exactly even when the ou tp ut b X of an SDP is not exact. F or example, on e can run a comm un it y detection algorithm on b X , sa y sp ectral clustering; see Algorithm 1 in Section 7 . Ho we v er, if the lab els are inf er r ed directly from b X (if b X corresp onds to a graph with K disjoint connected comp onen ts, then output the lab els implied by the comp onen ts; otherwise, output random lab els), then our notion of strong consistency matc hes the standard one in the literature. A key piece in our resu lts will b e the follo w in g matrix concentrat ion inequalit y noted re- cen tly b y many authors; see, for example [ 31 , 18 , 44 ] and the referen ces therein. Results of this t yp e are often based on the reﬁned discretization argument of [ 22 ]. 9 Prop osition 4.1. L et A = ( A ij ) ∈ { 0 , 1 } n × n b e a symmetric binary matrix, with indep endent lower triangle and zer o diagonal. Ther e ar e universal p ositive c onstant ( C, C ′ , c, r ) such that if max ij V ar ( A ij ) ≤ σ 2 , for nσ 2 ≥ C ′ log n then with pr ob ability at le ast 1 − c n − r , | | | A − E A | | | ≤ C σ √ n. In what follo ws ( C, C ′ , c, r ) will alw ays refer to the constan ts in this prop osition. Our ﬁrst result establishes consistency of SDP-2 ′ for the balanced plante d partition mo dels. W e will w ork w ith t w o rescaled v ersion of p , namely ¯ p := pm = p n K , e p := ¯ p log n , (4.5) and similarly for e q , ¯ q and q . Theorem 4.1 (Consistency of SDP-2 ′ ) . L et A b e dr awn fr om PP b al m,K ( p, q ) . F or any c 1 , c 2 > 0 , let C 1 := C ′ ∨ 4 9 ( c 1 + 1) and C 2 := C + ( p 4( c 1 + 1) ∨ 3 p 4( c 2 + 1)) . A ssu me e p ≥ C 1 . Then, if e p − e q > C 2 ( p e p + p e q K ) , (4.6) SDP- 2 ′ is str ongly c onsistent with pr ob ability at le ast 1 − c ( K m − r + n − r ) − n − c 1 − 2 m − 1 n − c 2 . As a consequence, we get consistency f or a s trongly assortativ e blo c k mo del. More precisely , Theorem 4.1 com bined with Corollary 4.2 in Section 4.2.1 giv es the follo wing: Corollary 4.1 (Consistency of SDP-2 ′ for the strongly assortativ e case) . L et A b e dr awn fr om a str ongly assort ative BM b al m (Ψ) . Then, the c onclusion of The or em 4.1 holds with ( p, q ) r eplac e d with ( p − , q + ) as deﬁne d in ( 4.4 ) . Note that Theorem 4.1 and its corolla r y automatically apply to SDP-1, b ecause it is a tigh ter relaxation of the MLE than SDP-2 ′ . Ho wev er , SDP-1 succeeds f or the m u c h larger class of we akly assortative b lo c k mo dels, as reﬂected in our m ain result, Th eorem 4.2 b elo w. Recall the notation q ∗ k deﬁned in ( 4.2 ) and write q ∗ max = max k q ∗ k = max k 6 = ℓ q k ℓ . Th e scaled v ersions e q ∗ k , ¯ q ∗ k , and e q ∗ max , ¯ q ∗ max are deﬁned based on q ∗ k and q ∗ max as in ( 4.5 ). Theorem 4.2 (Consistency of S DP-1) . L et A b e dr awn fr om a we akly assortative BM b al m (Ψ) . F or any c 1 , c 2 > 0 , let C 1 := C ′ ∨ 4 9 ( c 1 + 1) and C 2 := ( p 4( c 1 + 1) + C ) ∨ (6 p 2( c 2 + 1)) . Assume m in k e p k ≥ C 1 . Then, i f min k h ( e p k − e q ∗ k ) − C 2  p e p k + q e q ∗ k  i > C s e q ∗ max K log n , (4.7) SDP-1 is str ongly c onsistent with pr ob ability at le ast 1 − c ( K m − r + n − r ) − n − c 1 − 2 m − 1 n − c 2 . Note that for an y w eakly assortativ e BM bal m (Ψ) with ﬁxed K a n d constan t entries of Ψ , condition ( 4.7 ) h olds for large n and hence SDP-1 is stron gly consisten t. W e s ho w in Section 5 that SDP-2 ′ fails in general outside the class of strongly assortativ e blo ck mod els. 10 Remark 4.2. Our result f or SDP-2 can b e sligh tly strengthened by stating ( 4.6 ) as in ( 4.7 ) with e p k ≡ e p and e q k ,ℓ ≡ e q . Th is giv es a b etter thr eshold in the case where e q → ∞ b ut e q / log n → 0. Remark 4.3. One can deﬁne strong and w eak disassortativit y by replacing p k and q k ℓ with − p k and − q k ℓ , r esp ectiv ely , in Deﬁn ition 4.2 . The results then h old if one applies the SDPs to − A in the disassortativ e case. Remark 4.4. Another wa y to expr ess conditions of T heorem 4.1 is in terms of the alternativ e parametrization ( d, β ) where d := ¯ p + ( K − 1) ¯ q is the exp ected no d e d egree, and β := q /p = ¯ q / ¯ p is the out-in-ratio. A s ligh t wea kening of condition ( 4.6 ), using ( a + b ) 2 ≤ 2( a 2 + b 2 ), giv es ( ¯ p − ¯ q ) 2 & ( ¯ p + ¯ q K ) log n ⇐ ⇒ d &  1 + K β 1 − β  2 log n. (4.8) where we ha ve used d ≍ ¯ p + K ¯ q . W e also need ¯ p & log n whic h translates to d & (1 + K β ) log n , whic h is imp lied by ( 4.8 ). In particular, for ﬁxed β , it is enou gh to h a v e d = Ω( K 2 log n ) for SDP-2 ′ (and hence SDP-1) to b e strongly consisten t. The p ro of of Theorem 4.2 app ears in Section 4.3 with some of the more tec hnical details deferred to the app endices. The pro of of Theorem 4.1 is simila r and app ears in App endix D.1 . 4.1 Comparison wit h other consistency results Rigorous results ab out the p hase transition in the so-called reconstruction pr oblem for th e 2- blo c k b alanced PP mo del, i.e., reco vering a lab eling p ositiv ely correlated with the truth, in the sparse regime where d = O (1), ha ve app eared in [ 35 , 36 , 33 ] after originally conjectured by [ 20 ]. F or K = 2, th e problem of exact reco v ery in P P has recen tly b een studied in [ 1 , 37 ], where the exact r eco v ery thr eshold is obtained when d = Ω(log n ), the minim al degree gro w th requir ed for exact reco ve r y . [ 37 ] also d iscu sses exact thresholds for wea k consistency , i.e., fraction of misclassiﬁed lab els going to zero. [ 1 ] also analyzed the MAX C UT SDP showing a consistency threshold within constan t factor of the op timal. Since the earlier draft of our man u script, more reﬁned analyses of SDPs for balanced PP hav e app eared in [ 25 , 26 ], as w ell as [ 2 ] which obtains the exact threshold for a general SBM, by a t wo-stag e approac h with no SDP inv olv ed. In [ 25 ], the argument in [ 1 ] is reﬁned to s h o w that MAX CUT SDP achiev es the threshold of exact reco very with optimal constan t, for the case K = 2. In [ 26 ], the analysis is extended to the general K , for an SDP whic h int erestingly is equiv alen t to what w e h a ve calle d S DP-1, sho win g that it ac hieve s optimal exact reco very threshold. Th is threshold is equiv alent, up to constan ts, to that obta in ed in [ 18 ], and hence to ( 4.8 ) as will b e discussed b elo w. The an alysis in [ 26 ] also pro vides the exact constant and a n extension to the unbala n ced case. F or the PP model with general K , [ 18 ] provides s uﬃcien t conditions for strong consistency of their n u clear norm relaxation of the MLE. These conditions automa tically apply to SDP-2 ′ and SDP-1 since they are tighte r relaxations. More precisely , their mo del, in th e zero outliers case, coincides with PP bal ( p, q ) and their su ﬃcien t conditions translate to ( p − q ) 2 ( n/K ) 2 & p ( n/K ) log n + q n . A slight ly w eak er version, obtained by replacing q with q log n , reads ( ¯ p − ¯ q ) 2 & ( ¯ p + ¯ q K ) log n whic h is the one w e h a v e obtained in ( 4.8 ) as a consequence of Theorem 4.1 . The stronger v ersion also f ollo w s from our pro of – see Remark 4.2 . I n terestingly , exact ly the same condition ( 4.8 ), is established in [ 14 ] for SDP-3, when sp ecialized to PP bal ( p, q ), the case 11 with zero outliers. In other wo r ds, r esults of the form predicted by Theorem 4.1 already exist for SDP relaxatio n s of the blo ck mo d el, alb eit using diﬀeren t p ro of tec hniques. O n the other h and, w e are not a ware of an y r esults lik e Theorem 4.2 , w h ic h guarantee s su ccess of SDP-1 for we akly assortativ e blo ck mo d els. A somewhat diﬀerent cond ition amounti ng to ( ¯ p − ¯ q ) 2 & mK 2 ¯ p and np = K ¯ p ≥ log n is imp lied by the results of [ 31 ] for sp ectral clustering based on the adjacency matrix, whic h w e ha ve called eigen v alue tru ncation (EVT). W e note that the dep endence on K is w orse than in the SDP results, among other things. This is corrob orated empirically in Section 8 , whic h shows SDPs outperf orm EVT for larger v alues of K . W e should p oin t ou t that there is a somewhat parallel line of w ork regarding relaxations for clustering pr oblems. F or example, a v ariant of SDP-1 (with diag( X ) = 1 n replaced w ith tr( X ) = n ) has b een pr op osed as a r elaxatio n of the K -means or normalized K -cut pr ob- lems [ 48 , 40 ]. Ho wev er, theoretic al analysis of SDPs in the clustering context hav e only recen tly b egan. See for example [ 9 ] for a recen t analysis, using a probabilistic mod el of clusters. An earlier line of w ork reformulates the clustering problem as instances of the plan ted partition mo del and analyzes an S DP relaxation for cluster reco very [ 6 , 34 ]. The planted K -disjoin t clique m o del in [ 6 ] and the fully random m o del of [ 34 ] b oth can b e considered as sp ecial case of th e p lan ted partition mo del. T he analysis in [ 34 ] is in p articular interesting for analyzing an SDP with triangle-inequalit y type constrain ts and pr o viding appro ximation b ounds relativ e to the optimal com binatorial solution. Recen tly , a v ery in teresting pap er [ 24 ] analyzed the p erformance of SDP relaxatio n s in th e sparse regime wh ere d = O (1). They sh o wed that as long as the feasible region is cont ained in the so-called Gr othendiec k s et { X  0 , diag ( X ) ≤ 1 } , the SDPs can achiev e arbitrary accuracy , with h igh probabilit y , assum ing that ( ¯ p − ¯ q ) 2 / ( ¯ p + ¯ q K ) is suﬃciently large. These results are complemen tary to ours and sho w that all the SDPs in T able 1 are capable of approximate reco v ery in the spars e regime. 4.2 Some useful general r esults Here we collect some general observ ations on solutio n s of SDPs whic h will b e useful in pro ving Theorems 4.1 and 4.2 . Let S k b e th e indices of the k th comm unity . W e ha v e | S k | = m . Let X S k S j b e the su bmatrix of X on indices S k × S j , and X S k := X S k S k . Let 1 S k ∈ R n b e the indicator v ector of S k , equal to one on S k and zero elsewhere. E S 0 ∈ { 0 , 1 } n × n denotes the indicator m atrix of S 0 ⊂ [ n ] 2 . Let e n k , or simply e k , b e k th unit ve ctor of R n . Let span { 1 S k } and span { 1 S k } ⊥ denote the subsp ace sp anned b y { 1 S 2 , 1 S 2 , . . . , 1 S K } and its orthogonal com- plemen t. Let d ( S k ) ∈ R n b e the vecto r of no de degrees relativ e to the subgraph in duced b y S k , d ( S k ) = A 1 S k = A S k 1 m . Note that [ d ( S k )] S k ∈ R m is the subv ector of d ( S k ) on indices S k . 4.2.1 SDPs resp e ct SA-ordering The f ollo w ing lemma formalizes an int u itiv e fact on how SDPs inte r act with the SA-orderin g of Deﬁnition 4.2 . Th e pro of is giv en in App endix C . Lemma 4.1. L et e A ∈ S n b e obtaine d fr om A b y setting some elements oﬀ S 0 to zer o and some elements on S 0 to one. Then, f or either of SDP- 1 or SDP -2 ′ , SDP sol ( A ) = { X 0 } = ⇒ S DP sol ( e A ) = { X 0 } . 12 The lemma generalizes to an y optimization problem that maximizes X 7→ h A, X i , and has its f easible region included in { X : 0 ≤ X ≤ 1 } . An immediate consequence is the follo win g probabilistic v ersion for SBMs, stated conditionally on the true cluster matrix X 0 . Corollary 4.2. Assume BM b al m ( e Ψ) ≥ BM b al m (Ψ) , and let e A ∼ BM b al m ( e Ψ) and A ∼ BM b al m (Ψ) . Then, for e i ther of SDP- 1 or SDP-2 ′ , P  SDP sol ( e A ) = { X 0 }  ≥ P  SDP sol ( A ) = { X 0 }  . This corollary allo ws us to transfer consistency results for SDPs r egarding a particular SBM to an y SBM that dominates it. It also allo ws u s to inﬂate oﬀ-diagonal entries of Ψ for a general BM bal (Ψ) without loss of generalit y . More pr ecisely , w e will assume in the course of the pro of that oﬀ-diagonal entries of Ψ satisfy certain lo wer b ounds to ensu re concen tration. These lo wer b ound s can then b e safely discarded at the end by Corollary 4.2 . 4.2.2 Optimality conditions Consider the follo wing general SDP: max h A, X i s.t. diag( X ) = 1 n , L 2 ( X ) = b 2 X  0 , X ≥ 0 (4.9) where L 2 is a linear map from S n to R s for some in teger s , and b 2 ∈ R s . Th is is a b lueprint for b oth SDP-1 and SDP-2 ′ . Let L 1 ( X ) := d iag( X ) and b 1 = 1 n . Then, L ( X ) := ( L 1 ( X ) , L 2 ( X )) = ( b 1 , b 2 ) =: b summarizes the linear constrain ts for the SDP . Th e dual problem is min h µ, b 2 i + P i ν i s.t. L ∗ 2 ( µ ) + d iag ∗ ( ν )  A + Γ , Γ ≥ 0 , where µ ∈ R s , ν ∈ R n and Γ ∈ S n , and the m in imization is ov er the triple ( µ, ν, Γ) of du al v ariables. L ∗ 2 is the adjoin t of L 2 and diag ∗ is the adjoin t of diag . Letting Λ := Λ( µ, ν, Γ) := L ∗ 2 ( µ ) + d iag ∗ ( ν ) − A − Γ , (4.10) the (KKT) optimalit y conditions are Primal F eas. X  0 , X ≥ 0 , L ( X ) = b, Dual F eas. Λ  0 , Γ ≥ 0 , Comp. Slac kness (a) Γ ij X ij = 0 , ∀ i, j, (CSa) Comp. Slac kness (b) h Λ , X i = 0 . (CSb) Another w a y to state (CSa) is to write Γ ◦ X = 0 wh ere ◦ denotes the Sch ur (elemen t-wise) pro du ct of matrices. The primal-dual witness approac h that we w ill u se in the pro ofs is b ased on ﬁnding a pair of p rimal and dual solutions that simulta neously satisfy the KKT conditions. The pair then witnesses strong dualit y b et wee n the primal and dual problems implying that it is an optimal pair. 13 4.2.3 Suﬃcien t conditions for exact reco very W e wo uld like to ob tain suﬃcient conditions un der which the tr ue cluster matrix X 0 = I K ⊗ E m is the unique solution of th e primal S DP . C omplemen tary slac kness (a), or (CSa), implies that w e need Γ S k = 0 for all k , while we are free to choose Γ S k S j for j 6 = k , usin g the su bmatrix notation. Since b oth X 0 and Λ are PSD, (CSb) is equiv alen t to Λ X 0 = 0, w hic h is in turn equiv alen t to range( X 0 ) ⊂ ker(Λ). Note that X 0 has K n onzero eigen v alues, al l equal to m , corresp ond in g to eigen vecto r s { 1 S k } K k =1 , w here 1 S k ∈ R n is the indicator vecto r of S k . Hence, range( X 0 ) = span { 1 S k } , and (CSb) for X 0 is equiv alen t to span { 1 S k } ⊂ k er (Λ) The follo w ing lemma, pro ved in App endix D , give s conditions for X 0 to b e the u nique optimal solution. Lemma 4.2. Assume that Γ is dual fe asible (i.e., Γ ≥ 0 ), and for some µ ∈ R and ν ∈ R n , (A1) k er  Λ( µ, ν , Γ)  = span { 1 S k } , and Λ( µ, ν, Γ)  0 , (A2) Γ S k = 0 , ∀ k , (A3) Each Γ S k S ℓ , k 6 = ℓ has at le ast one nonzer o element. Then X 0 is the unique primal optimal solution, and ( µ, ν, Γ) i s dual optimal. Note that condition (A1) is satisﬁed if for some ε > 0, Λ 1 S k = 0 , ∀ k (4.11) u T Λ u ≥ ε k u k 2 2 , ∀ u ∈ span { 1 S k } ⊥ . (4.12) 4.3 Pro of of Theorem 4.2 : primal-dua l witness for SDP-1 Let Φ i = e i 1 T n + 1 n e T i ∈ S n where e i = e n i is the i th standard basis v ector in R n . W e note that h X, Φ i i = tr( X Φ i ) = 2( X 1 n ) i . T hus, SDP-1 is an instance of ( 4 .9 ), with L 2 ( X ) =  h X, Φ i i  n i =1 and b 2 = 2 m 1 n . The corresp onding adjoint op erator is L ∗ 2 ( µ ) = P n i =1 µ i Φ i = µ 1 T n + 1 n µ T . Th us (cf. ( 4.10 )), Λ = Λ( µ, ν , Γ) = ( µ 1 T n + 1 n µ T ) + diag ∗ ( ν ) − A − Γ . (4.13) The follo wing su mmarizes our primal-dual construction in this case: ν S k = [ d ( S k )] S k − φ k m 1 m , µ S k := 1 2 φ k 1 m , (4.14) Γ S k := 0 , Γ S k S ℓ := µ S k 1 T m + 1 m µ T S ℓ + P 1 ⊥ m A S k S ℓ P 1 ⊥ m − A S k S ℓ , = 1 2 ( φ k + φ ℓ ) E m + P 1 ⊥ m A S k S ℓ P 1 ⊥ m − A S k S ℓ , k 6 = ℓ (4.15) for some num b ers { φ k } K k =1 to b e d etermined later. Not e that µ is c h osen to b e constan t o v er blo cks, but these constan ts can v ary b etw een b lo c ks. W e hav e the f ollo w ing analogue of Lemma D.2 . Recall that E S c 0 is the indicator matrix of S c 0 where S 0 is the supp ort of X 0 . 14 Lemma 4.3. L et ( µ, ν, Γ) b e as deﬁne d in ( 4.14 ) – ( 4.15 ) . Then, Γ veriﬁes (A2) and ( 4.11 ) holds. In addition, (a) Γ is dual fe asible, i.e. Γ ≥ 0 , i f for al l i ∈ S k , j ∈ S ℓ , ℓ 6 = k , 1 2 ( φ k + φ ℓ ) m ≥ d i ( S ℓ ) + d j ( S k ) − d av ( S k , S ℓ ) . (4.16) and satisﬁes (A 3) if at le ast one ine quality is strict for e ach p air k 6 = ℓ . (b) Γ ve riﬁes ( 4.12 ) if for ρ k := min i ∈ S k d i ( S k ) /m , min k  ( ρ k − φ k ) m − | | | ∆ k | | |  > | | | E S c 0 ◦ ∆ | | | . (4.17) This lemma amoun ts to a set of deterministic conditions f or the success of SDP-1. T o complete the pro of of Theorem 4.2 , we develo p a p robabilistic analogue by c h o osing φ k ≈ ¯ q ∗ k and using the k ey inequalit y ¯ q k ℓ ≤ 1 2 ( ¯ q ∗ k + ¯ q ∗ ℓ ). See App endix B for details. 5 F ailure of SDP-2 ′ in the absence of strong assortativit y W e n o w sho w that strong assortativit y is a n ecessary cond ition for exact reco v ery in SDP-2 ′ . F or this p urp ose, it is enough to fo cu s on the n oiseless case, i.e., when the in p ut to the SDP is the mean matrix of the b lo c k mo d el. If SDP-2 ′ fails on exact reco v ery of the true p opu lation mean, th ere is no hop e of reco ve r ing its noisy v ersion, i.e., the adjacency matrix. The follo wing result is deterministic and non-asymptotic. In p articular, it holds without any constraints on the exp ected d egrees (b esides those imp osed b y assortativit y assumptions). W e will state it in a sligh tly more general f orm than is needed h ere, includin g the case of general blo c k sizes. Keeping co n sistency with earlier notation, w e let E S k S ℓ ∈ { 0 , 1 } n × n b e th e ind icator m atrix of the set S k × S ℓ , and E S k := E S k S k . Similalry , I S k is the n × n identit y matrix with elements outside S k × S k set to zero, i.e., E S k S ℓ is not a submatrix of E n , but a mask ed v ersion of it. Prop osition 5.1. L et E [ A ] b e the me an matrix of a we akly assor ative blo ck mo del. A ssume that the b lo cks ar e indexe d by S k ⊂ [ n ] wher e | S k | = n k , for k = 1 , . . . , K . F or some I ⊂ [ n ] = { 1 , . . . , n } , to b e determine d, c onsider a solution of the form X = X k ∈ I X ℓ ∈ I α k ℓ E S k S ℓ + X k / ∈ I  β k E S k + (1 − β k ) I S k  , α k ℓ = α ℓk , (5.1) with α k k = 1 , k ∈ I and β k ∈ [0 , 1) for k / ∈ I . Then the fol lowing holds: (a) A ssu me that argmax k 6 = ℓ q k ℓ = { ( k 0 , ℓ 0 ) } and let I := { k : p k ≥ q k 0 ℓ 0 } . F urthermor e, let m := min k n k , ξ k := n k /m and α ∗ k 0 ℓ 0 := 1 2 ξ k 0 ξ ℓ 0 h 1 − 1 m  X k / ∈ I ξ k − X k ∈ I ξ k ( ξ k − 1) i . (5.2) If α ∗ k 0 ℓ 0 ∈ [0 , 1] , then SDP-2 ′ , applie d with A = E [ A ] and m = min k n k has ( 5.1 ) as solution, with α k ℓ = α ∗ k 0 ℓ 0 1 {{ k, ℓ } = { k 0 , ℓ 0 }} and β k = 0 for al l k / ∈ I . 15 (b) Assume that the given blo ck mo del is b alanc e d and let I c := [ K ] \ I = { k : p k < q k 0 ℓ 0 } wher e I and ( k 0 , ℓ 0 ) ar e deﬁne d in p art (a). If | I c | ≤ 2 , then the c onclusion of p art (a) holds with α ∗ k 0 ℓ 0 = 1 2 (1 − 1 /m ) | I c | . (c) Assume that the given blo ck mo del is b alanc e d and we akly but not str ongly assorta tive . L et SDP sol ( · ) b e the solution set of SDP -2 ′ . Then, SD P sol ( E [ A ]) 6 = { X 0 } . W e n ote that p art (c) establishes the failure of SDP-2 ′ once strong assortativit y is violated. W e prov e the pr op osition in App en d ix H . Th e conclusions of b oth parts (a) and (b) hold ev en in the strongly assortativ e case. Ho wev er, in that case, the set I c will b e empty and the conditions in part (a) cannot b e met, whereas part (b ) giv es the exp ected result of X = X 0 . The interesting case o ccurs when strong assorativit y is viola ted, wh ic h giv es a nonempt y set I c . Since β k = 0 for k ∈ I c , th is sho ws that SDP-2 ′ fails to reco ver those blo c ks. The condition | I c | ≤ 2, in th e balanced case, m igh t seem restrictiv e, but it is enough for our pur p ose of establishing part (c). In general, i.e., with n o assumption on | I c | , S DP-2 ′ still misses th e b lo c ks violating strong asso ciativit y , though the n onzero-blo c k p ortion of X , namely , ( α k ℓ ) k ,ℓ ∈ I tak es a more complicated f orm. In parts (a) and (b ), at most one n on-diagonal elemen t of ( α k ℓ ) is non-zero whereas in general seve r al such elements will b e nonzero. Th ese ideas are illustrated in Figure 1 , with more detailed discu s sion in App endix 6.1 , in particular, with an application of part (a), in the un balanced case with | I c | > 2. 6 Extensions to the u n balanced case Let us discu ss how our results can b e extended to the un balanced case. Reca ll that, in gen- eral, n = ( n 1 , . . . , n K ) denotes the v ector of blo ck sizes. One could argu e that as long as (min k n k ) /n ≥ C for s ome constan t C > 0, i.e., { n k /n } k is boun ded a wa y fr om zero, the pr ob- lem of b lo c k mo d el reco very is n ot inh eren tly more d iﬃcult than that of the balanced ca se. T o simplify our discussion, w e fo cus on the noiseless case from whic h the r esults can b e extended to the aforementio n ed b ounded blo ck-siz e regime. W e will show that in a W.A. b lo c k mo d el, SDP-1 applied with m = min k n k reco v ers all the blo c ks, alb eit some imp erfectly . W e also consider the follo wing m ixture of SDP-1 and SDP-3, whic h w e will call SDP-13 , max X h A, X i − µ h E n , X i , s.t. diag( X ) = 1 n , X 1 n ≥ m 1 n , X  0 , X ≥ 0 , (6.1) and sho w that wh en applied with m ≤ min k n k and appropr iate c hoice of µ , it to o reco vers the blo cks with impr o v ements o ver SDP-1. Without loss of generalit y , let u s sort the b lo c ks so that p 1 ≥ p 2 ≥ · · · ≥ p K . Prop osition 6.1. L et E [ A ] b e the me an matrix of a we akly assor ative blo ck mo del, with blo cks indexe d by S k ⊂ [ n ] wher e | S k | = n k , for k = 1 , . . . , K . Consider a solution of the form X = K X k =1 α k E S k + (1 − α k ) I S k , α k ∈ (0 , 1] . (6.2) The fol lowing holds: 16 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 E [ A ] Ideal SDP-2 ′ SDP-1 SDP-3 SDP-13 Figure 1: Illustration of Prop ositions 6.1 and 5.1 . This b lock mod el is weakly but not strongly assortative, and has u nequal b lock sizes n = (10 , 10 , 5 , 20 , 10 , 10). The leftmost column is the p opulation mean, and th e rest of the columns are the results of v arious SDPs, with m = min k n k , and equal regularization parameters in the case of SDP-3 and S DP-13 ( λ = µ ). The ideal cluster matrix is also sho wn for comparison. See App en dix K for more details. (a) SDP -1 applie d with A = E [ A ] a nd m ≤ min k n k has ( 6.2 ) as a solution with α k = ( m − 1) / ( n k − 1) for al l k . (b) Consider I := { k : n k > m } and I 1 ( k ) := { r ∈ I : r ≤ k } . L et J k := T k r =1 [ q ∗ r , p r ] . Deﬁne k 0 := max { k : J k 6 = ∅} . Then, SDP-13, applie d with A = E [ A ] , m ≤ min k n k and µ ∈ J k 0 ∩ [ p k 0 +1 , 1] , ( p K +1 := 0) , has ( 6.2 ) as a solution with α k = ( 1 , k ∈ I 1 ( k 0 ) , ( m − 1) / ( n k − 1) , otherwise . The key diﬀerence b et ween the s olution p resen ted in Pr op osition 6.1 and that of Prop osi- tion 5.1 is that in the former , all α k are guarant eed to b e nonzero, whereas in the latter, α k ≡ β k corresp ondin g to blo cks violating w eak assortativit y are zero. Let us call th e blo cks in ( 6.2 ) f or whic h α k ∈ (0 , 1), as imp erfectly-reco v ered , while those with α k = 1 as p erfe ctly r e c over e d . The result of Prop osition 6.1 can b e summarized as follo ws : Both SDP-1 and SDP-13, with p r op erly set parameters, reco ve r all the blo c ks at least imp erfectly , wh ile SDP-13 has the p otent ial to reco v er more blocks p erfectly . In particular, we alw a ys hav e k 0 ≥ 1 in part (b), implying that SDP-13 reco ve r s at least one more b lo c k p erfectly r elativ e to SDP-1. In th e s p ecial case of a strongly assortativ e b lo ck m o del, we ha ve ∅ 6 = [max k q ∗ k , min k p k ] ⊂ T K k =1 [ q ∗ r , p r ], hence k 0 = K and SDP-13 r eco v ers all the blo cks p erfectly . It is also inte r esting to note that b oth SDP-1 and SDP-13 reco ver the smallest blo cks (i.e., those in { k : n k = m } ) p erfectly , when we set m = min k n k (whic h is the optimal choice if the minimum is known). These observ ations are illustrated in Figure 1 . The p r o of of Pr op osition 6.1 app ears App endices, along more details on Figure 1 . 7 Application to net w ork histograms A b alanced blo c k mo del is ideally suited for computing n et wo r k histograms as deﬁned b y [ 39 ], whic h ha v e b een p rop osed as nonp arametric estimators of graphons . They ha ve b een shown to do well empirically and recen t results of [ 29 , Section 2.4] suggest rate-optimalit y of the balanced mo dels for reasonably sp ars e graphs; see also [ 23 ]. A graph on is a biv ariate s y m metric fun ction f : [0 , 1] 2 → [0 , 1]. T he corresp onding netw ork mo del can b e written as E [ A | ξ ] = f ( ξ i , ξ j ) w here 17 Algorithm 1 Graph on estima tion by ﬁ tting PP bal ( p, q ) Input: Estimated clus ter matrix b X , and num be r of blo cks K . Output: Graphon es tima tor c M b Z . 1: Compute the eigendecompo sition b X = b U b Λ b U T and set b U K = b U (: , 1: K ). 2: Apply K -mean s to r ows of b U K to get a label vector e ∈ [ K ] n . Set b Z ( i, e ( i )) = 1, o therwise 0. 3: Set b Ψ r k = 1 n 2 P e i = r,e j = k A ij for r 6 = k and 1 n ( n − 1) P e i = e j = r A ij otherwise. 4: Change b Ψ to Q b Ψ Q T so tha t its dia gonal is dec r easing. and up date b Z to b Z Q T . 5: Change b Z to P b Z so that corresp onding lab els are in increasing order. 6: Set c M b Z = b Z b Ψ b Z T . ξ = ( ξ 1 , . . . , ξ n ) ∈ [0 , 1] n are (unobserved) laten t n o de p ositions. Without loss of generalit y , ( ξ i ) can b e assumed to b e i.i.d. un if orm on [0 , 1]. The goal is to reco v er (a v ersion of ) f given A . In general, f is iden tiﬁab le up to a measure-pr eserving transformation σ of [0 , 1] on to itself, since f σ = f  σ ( · ) , σ ( · )  pro du ces the same net work mo del as f . Let { I 1 , I 2 , . . . , I K } b e a partition of [0 , 1] in to equal-sized blo cks, i.e ., | I k | = 1 /K for k ∈ [ K ]. W e associate to eac h no de a lab el z i , b y lett in g z i := k if ξ i ∈ I k . With some abuse of n otatio n , we identi f y z i with an elemen t ( z ik ) k of { 0 , 1 } K as b efore, and let Z = ( z ik ) ik . Then, M Z := E [ A | Z ] follo w s a blo c k mo del as in ( 2.1 ) with [Ψ] k k = | I k | − 1 R I k f ( ξ , ξ ) dξ and [Ψ] k ℓ = ( | I k || I ℓ | ) − 1 R I k R I ℓ f ( ξ , ξ ′ ) dξ dξ ′ , for k 6 = ℓ . Asymptotically , as n → ∞ , this blo ck mod el is ve ry close to b eing balanced. It pro vides an appro ximation of f , via the map p ing that sends Ψ to a blo c k constan t graphon e f , d eﬁned as e f ( ξ , ξ ′ ) = [Ψ] k ℓ if ξ ∈ I k , ξ ′ ∈ I ℓ . One can sho w that u nder regularit y assu mptions (e.g. smoothn ess) on f , as K → ∞ , e f appr o ximates f , for example in the quotien t norm: inf σ k f − e f σ k L 2 . Alternativ ely , one can consid er the mean matrix M f := ( f ( ξ i , ξ j )) ij ∈ [0 , 1] n × n of the graphon mo d el as an empirical version of f . I n w h ic h case, the mean matrix M Z of the aforemen tioned blo ck mo del serv es as an appro xim ation to M f , for example in the quotien t norm: inf P | | | M f − P M Z P T | | | F , where P runs through p erm u tation matrices. T h is is the approac h w e tak e here and, with some abuse of term in ology , call M f the “graphon”. Graphon estimation via blo ck m o del app ro ximation requires estimating the mean matrix M Z , whic h is fairly straigh tforward once we ha ve a go o d estimate of the cluster matrix X . Algorithm 1 d etails the pro cedure based on eigenv alue truncation and K -means (that is, sp ec- tral clustering), leading to estimat e c M b Z of M f . W e call c M b Z a network histo gr am or a gr aphon estimator , and note that it can b e computed from an y estimate of b X . Ho w ever, in practice, SDP-1 h as adv antag es o v er other w ays of estimating b X in th is cont ext. The lik eliho o d -based estimators ha v e n o wa y of enforcing equal num b er of no des in eac h blo ck, whereas our empirical results in Section 8 sho w that SDP-1 has a high tendency to form equal-sized blo cks, more so than S DP-2, making it an id eal c hoice for histograms. SDP-3 is not well suited f or this task since it d o es not enforce either a particular n umb er of blo cks or a particular block size. It is more ﬂexible due to the tuning parameter λ , but that ﬂexibilit y is a disadv an tage w h en the goal is to construct a histogram. 18 K 10 20 30 40 50 60 NMI relative to chance 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 n = 120 β = 0.05 d = 7.0 SDP-1 SDP-2 SDP-3 EVT K 0 20 40 60 NMI relative to chance 0 0.2 0.4 0.6 0.8 1 n = 120 β = 0.05 d = 5.0 SDP-1 SDP-2 SDP-3 EVT K 0 20 40 60 NMI relative to chance -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 n = 120 β = 0.05 d = 3.0 SDP-1 SDP-2 SDP-3 EVT Figure 2: Bias-corrected NMI vs. K in a balanced planted p artition model, for v arious v alues of a verage d egree d , with n = 120 an d β = 0 . 05. 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 (a) SDP -1, p 3 = 0 . 7 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 (b) SDP-2, p 3 = 0 . 7 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 (c) S DP-1, p 3 = 0 . 05 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 (d) SDP-2, p 3 = 0 . 05 Figure 3: Mean estimated cluster matrices, b X , for SDP-1 an d SDP-2, for the w eakly but not strongly assortativ e mod el ( 8.1 ) with p 3 = 0 . 7 and p 3 = 0 . 05. SD P-2 fails to reco ver one block at p 3 = 0 . 7 and t wo blo cks at p 3 = 0 . 05. 8 Numerical Results In this section w e presen t some exp erimen tal results comparing SDP-1 with SDP-2, SDP-3, and EVT, w hic h amounts to sp ectral clustering on the adjacency matrix A . W e chose EVT rather than a version of sp ectral clustering based on the graph Laplacian b ecause SDPs also all op erate on A itself. F or SDP-3, in sim u lations we set the tuning p arameter λ to the optimal v alue give n in ( 2.5 ); a data-driv en c hoice is giv en in [ 14 ]. W e ﬁrst consid er the balanced symmetric mo d el PP bal ( p, q ), reparametrized in terms of the a v erage exp ected d egree d = p ( n K − 1) + q n K ( K − 1) and the out-in-ratio β = q /p < 1. Estimation b ecomes harder wh en d decreases (fewer edges) and when β increases (communities are not w ell separated). As K increases, ho we ver, estimation b ecomes harder to a certa in p oint and then b ecomes relativ ely easier in some settings. Figure 2 shows the agreemen t of estimated lab els with the truth, as measured by the norm alized mutual information (NMI), v ersu s the num b er of communities K , av er aged o ver 25 Mon te Carlo replications. NMI tak es v alues b et w een 0 and 1, with higher v alues rep r esen ting a b etter matc h . T h e lab els are estimated from b X b y Algorithm 1 . As exp ected, the SDPs rank according to the tigh tness of relaxation, with SDP-1 d omin ating the other t wo, and all S DPs outp erforing EVT. In Figur e 2 , the NMI is bias-adjusted, so that r andom guessing maps to NMI = 0. Without the adjustment, the NMI of random guessing increases as K approac hes n , leading to a “dip” in th e plots. See Figure 6 in App endices and the discussion that follo ws for more details. Next, we consider a more general balanced b lo c k mo d el BM bal (Ψ), w ith K = 4 to inv estigate 19 0.2 0.4 0.6 0.8 1.0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 p 3 Normalized mutual information SDP−1 SDP−2 0.00 0.25 0.50 0.75 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 p 3 ||X − X 0 || normalized by ||X 0 || SDP−1 SDP−2 Figure 4: NMI and relative error of b X versus p 3 for the m odel with probab ilit y matrix ( 8.1 ). the predictions of the theorems of Section 4 . W e consider the probabilit y matrix Ψ =     . 7 . 4 . 05 . 2 . 4 . 6 . 05 . 2 . 05 . 05 p 3 . 05 . 2 . 2 . 05 . 4     (8.1) and w e v ary p 3 from 0 . 7 do wn to 0 . 05. Th is mo d el nev er satisﬁes the strong assortativit y assumption o v er the range of p 3 , b ecause of the last ro w. Ho wev er, it is at the b oundary of strong assortativit y if p 3 > 0 . 4 , since Ψ 44 = max k 6 = ℓ Ψ k ℓ and Ψ j j > max k 6 = ℓ Ψ k ℓ for j 6 = 4. Its deviation from strong assortativit y in creases once p 3 falls b elo w 0 . 4, and again once it crosses b elo w 0 . 2. Ho wev er, except for the b oundary v alue of p 3 = 0 . 05 , the mo d el alw a ys remains w eakly assortativ e. Figure 3 shows the results of Mon te Carlo simulat ions with 25 replications, for S DP-1 and SDP-2. Mean cluster matrices b X obtained for the tw o S DPs are sho wn at the b ound ary p oints p 3 = 0 . 05 , 0 . 7. SDP-2 has diﬃcult y reco v ering the fourth b lo c k in b oth cases, and completely fails to reco ver the third b lo c k when p 3 = 0 . 05. The p erform an ce of SDP-1, ho wev er, remains more or less the same, su rprisingly ev en at p 3 = 0 . 05. This can b e clearly seen in Figure 4 , which s h o ws the r elativ e errors | | | b X − X 0 | | | F / | | | X 0 | | | F for cluster matrices and th e NMI for the lab els reconstructed by Algorithm 1 . Note ho w SDP-2 d egrades as p 3 decreases to 0 . 05, with a sharp drop around 0 . 2, while SDP-1 b eha ves more or less the same. Note that for large r v alues of p 3 , while SDP-2 do es not reconstru ct X 0 sexactly as seen from th e relativ e error plot, the resulting lab els are nearly alw a ys exactly the tru th as seen from the NMI plot. This ma y b e due to th e EVT truncation on b X implicit in Algorithm 1 . Finally , w e apply v arious SDPs to grap h on estimation for the dolph ins net w ork [ 32 ], with n = 62 no des. F igur e 5 shows the results for the three SDPs and the EVT with K = 3 and K = 10. F or S DP-3, w e used the median connectivit y to set λ as s uggested in [ 14 ]. The adj acency matrice s in the ﬁrst ro w and the graph on estimators in the second row are b oth p ermuted according to the orderin g fr om Algorithm 1 . The SDPs again pro vide a m u c h cleaner picture of the comm unities in the d ata than the EVT. The blo c ks found b y SDP-1 are s im ilar in size and w ell separated fr om eac h other compared to th e other t wo S DPs. W e also ap p lied the algorithms with K = 2 to compare to the the p artition su ggested b y Fig. 1(b) in [ 32 ], which can b e considered the ground tr u th for a t wo -communit y s tructure. S DP-1, SDP-2, SDP-3, and EVT misclassify 7, 1, 4, and 11 nod es, resp ectiv ely , out of 62. Since this partition in to t wo 20 (a) SDP-1 (b) SDP-2 (c) SDP-3 (d) EVT (e) SDP- 1 (f ) SDP-2 (g) SDP-3 (h) EVT Figure 5: Results for the dolphins netw ork for K = 3 (a–d) and K = 10 (e–h). Row 1: adjacency matrix sorted according to the p ermutation of Algorithm 1 . Row 2: Graph on estimator c M b Z of Algorithm 1 . blo c ks has un b alanced b lo c ks (20 and 42), w e exp ect SDP-1 to not matc h it as w ell. Ho w ever, if we replace equalit y constrain ts with the inequalit y ones as d iscu ssed in Remark 3.1 , SDP-1 misclassiﬁes only 2 no d es. It is worth noting th at the groun d truth in th is case is only one p ossible wa y to describ e th e net work, tak en fr om one scienti ﬁ c pap er fo cused on the d olphins split, and th er e ma y w ell b e more comm un ities than t wo in the data. Th e nine strong (and one w eak) clus ters foun d b y SDP-1 ma y b e of in terest for furth er und er s tanding of this net work. 9 Discussion In this pap er, we ha v e put sev eral SDP r elaxations of the MLE into a uniﬁed framewo r k (T able 1 ) by treating them as relaxations ov er diﬀeren t p arameter spaces. S DP-1, the tigh ter relaxation we prop osed, was sh o wn to emp irically dominate p revious relaxations, and wh ile all the SDPs we considered are strongly consistent on the strongly assortativ e class of blo c k mo dels, w e show ed that SDP-1 is strongly consisten t on the muc h larger class of w eakly assortativ e mo dels, while SDP-2 fails outside the strongly assortativ e class. W e p rop osed a mixture of SDP-1 and S DP-3 wh ic h combines the ﬂexibilitie s of b oth, n amely , consistency in w eakly assortativ e and unbalanced mo dels. It r emains an op en qu estion wh ether a SDP relaxation can work for mixed net w orks with b oth assortativ e and d issortativ e comm u nities. There are some indications that one can tac kle mixed net works b y applying SDPs to | A | = √ A 2 , the p ositiv e squ are-ro ot of A . W e also n ote that SDP-3 is harder to compare d irectly to S DP-1 or SDP-2 b ecause it dep end s on a tuning p arameter λ . Ho wev er, Lagrange dualit y implies that for ev ery A and K there exists a λ th at mak es S DP-3 equiv alen t to SDP-2. In general, SDP-3 is more ﬂexible than SDP-2 b ecause of the con tinuous parameter λ , but this also makes it u nsuitable for certain tasks suc h as h istogram estimation. Em p irically , the SDPs ou tp erformed adjacency-based sp ectral clustering (EVT ), esp ecially for a large num b er of communities K . Th is is reﬂected in current theoretical guaran tees, where the cond itions f or the SDPs hav e b etter dep end en ce on K that those a v ailable for the EVT. In addition, SDP f orm ulation of EVT sho ws it to b e a lo oser relaxation than, sa y , S DP-1 for balanced planted partition mo del. The thr ee SDPs also seem to b e inherent ly more robu st to n oise than th e EVT, p erhaps due to the imp licit regularization eﬀect of the doubly nonnegativ e cone. 21 A Pro of of L emma 4.3 As in the p ro of of T h eorem 4.1 , and in accordance with condition (A2), w e set Γ S k := 0. Th en, Λ S k = µ S k 1 T m + 1 m µ T S k + diag ∗ ( ν S k ) − A S k Λ S c k S k = µ S c k 1 T m + 1 n − m µ T S k − ( A + Γ) S c k S k (A.1) Recalling that [ d ( S k )] S k = A S k 1 m , w e can rewrite ( 4.11 ) as Λ S k 1 m = 0 ⇐ ⇒ µ S k m + 1 m µ T S k 1 m + ν S k − [ d ( S k )] S k = 0 (A.2) Λ S ℓ S k 1 m = 0 ⇐ ⇒  µ S ℓ 1 T m + 1 m µ T S k − ( A + Γ) S ℓ S k  1 m = 0 , k 6 = ℓ (A.3 ) As in the case of SDP-2 ′ (cf. App end ices), ( A.3 ) is equiv alen t to µ S ℓ 1 T m + 1 m µ T S k − ( A + Γ) S ℓ S k = − B S ℓ S k for some B S ℓ S k acting on span { 1 m } ⊥ . As b efore, we set B S ℓ S k := P 1 ⊥ m A S ℓ S k P 1 ⊥ m , and note that ∆ := A − E A, [ E A ] S k = p k E m , [ E A ] S k S ℓ = q k ℓ E m , k 6 = ℓ, (A.4) so that B S ℓ S k = P 1 ⊥ m ∆ S ℓ S k P 1 ⊥ m . No w, take u ∈ span { 1 S k } ⊥ . Then, u = P k u S k = P k e k ⊗ u k , for some { u k } ⊂ span { 1 m } ⊥ . W e will wo rk w ith expansion of u T Λ u obtained in App endices (Eq. ( E.5 )). Using A S k = p k E m + ∆ S k and ( A.1 ), w e hav e u T k Λ S k u k = u T k  µ S k 1 T m + 1 m µ T S k − p k E m + diag ∗ ( ν S k ) − ∆ S k  u k = u T k  diag ∗ ( ν S k ) − ∆ S k  u k using 1 T m u k = 0. Let us no w c h o ose µ to b e constant o v er b lo cks, th at is, µ S k := 1 2 φ k 1 m , ∀ k for some n u m b ers { φ k } to b e determined later. Note that ( A.2 ) reads φ k m 1 m + ν S k − [ d ( S k )] S k = 0 or equiv alen tly diag ∗ ( ν S k ) = diag ∗ ([ d ( S k )] S k ) − φ k mI m . (A.5) On the other hand, for k 6 = ℓ , we h a ve u T k Λ S k S ℓ u ℓ = − u T k B S k S ℓ u ℓ = − u T k ∆ S k S ℓ u ℓ since { u k } ⊂ span { 1 m } ⊥ . W e arrive at u T Λ u = X k u T k  diag ∗ ([ d ( S k )] S k ) − ¯ µ k mI m − ∆ S k  u k − X k 6 = ℓ u T k ∆ S k S ℓ u ℓ (A.6) Pr o of of (a) and (b). T o v erify dual feasibilit y , recal l that P 1 ⊥ m e j = e j − 1 m 1 m . Then, [Γ S k S ℓ ] ij = e T i Γ S k S ℓ e j = 1 2 ( φ k + φ ℓ ) + ( e i − 1 m 1 m ) T A S k S ℓ ( e j − 1 m 1 m ) − A ij = 1 2 ( φ k + φ ℓ ) − 1 m  d i ( S ℓ ) + d j ( S k ) − d a v ( S k , S ℓ )  ≥ 0 . T o v erif y ( 4.12 ), we recall repr esentati on ( A.6 ). By assump tion diag ∗ ([ d ( S k )] S k )  ρ k mI m for all k . F rom ( A.5 ) and ( A.6 ) it follo ws that for u ∈ span { 1 S k } ⊥ u T Λ u ≥ X k u T k  ρ k mI m − φ k mI m − ∆ S k  u k − X k 6 = ℓ u T k ∆ S k S ℓ u ℓ ≥ X k  ( ρ k − φ k ) m − | | | ∆ S k | | |  k u k k 2 − u T ( E S c 0 ◦ ∆) u ≥ min k  ( ρ k − φ k ) m − | | | ∆ S k | | |  k u k 2 − | | | E S c 0 ◦ ∆ | | | k u k 2 . 22 B Probabilistic b ounds for BM bal W e will complete the construction of ( µ, ν, Γ) in ( 4.14 )–( 4.15 ) for BM bal m (Ψ), by sp ecifying { φ k } and ﬁnishing the th e pro of of Theorem 4.2 . The f ollo wing is the analogue of Lemma F.1 in App endices, for BM bal m (Ψ). The pro of is similar and is omitted. Lemma B.1. L e t γ k := p (4 c 1 log n ) / ¯ p k and ζ k ℓ := p (4 c 2 log n ) / ¯ q k ℓ . A ssume γ k , ζ k ℓ ∈ [0 , 3] . Then, d i ( S k ) ≥ ¯ p k (1 − γ k ℓ ) , i ∈ S k , ∀ k w.p. at le ast 1 − n − ( c 1 − 1) , and   d i ( S ℓ ) − ¯ q k ℓ   ≤ ζ k ℓ ¯ q k ℓ , i ∈ S k , ∀ ( k 6 = ℓ ) , w.p. at le ast 1 − 2 m − 1 n − ( c 2 − 2) . W e also ha ve the follo wing corollary of Prop osition 4.1 for BM bal m (Ψ). Recall the c hain of deﬁnitions and equiv alences: ¯ q ∗ max := max k ¯ q ∗ k = max k 6 = ℓ ¯ q k ℓ = m (max k 6 = ℓ q k ℓ ) =: mq max . Corollary B.1. L et A ∈ { 0 , 1 } n × n b e distribute d as BM b al m (Ψ) and ∆ := A − E A . Assu me that p k ≥ ( C ′ log m ) /m for al l k and q max ≥ ( C ′ log n ) /n . Then, • max k | | | ∆ S k | | | ≤ C √ ¯ p k , w.p. at le ast 1 − cK m − r . • | | | E S c 0 ◦ ∆ | | | ≤ C p ¯ q ∗ max K w.p. at le ast 1 − cn − r . Pr o of. The assertion ab out diagonal blo c ks follo ws as in Corollary F.1 in App endices. F or the second assertion, we note th at E S c 0 ◦ ∆ is an n × n m atrix wh ose entries hav e v ariance ≤ max k ,ℓ ( q k ,ℓ ) = max k ,ℓ ( ¯ q k ,ℓ /m ) = ¯ q ∗ max /m , hence w.p . at least 1 − cn − r , | | | E S c 0 ◦ ∆ | | | ≤ C p ( ¯ q ∗ max /m ) n . According to Lemma B.1 , for s uﬃcien tly small γ k and ζ k ,ℓ , we ha ve w.h.p . that d a v ( S k , S ℓ ) also lies in [ ¯ q k ℓ (1 − ζ k ℓ ) , ¯ q k ℓ (1 + ζ k ℓ )], for k 6 = ℓ , so that d i ( S ℓ ) + d j ( S k ) − d a v ( S k , S ℓ ) ≤ ¯ q k ℓ (1 + 3 ζ k ℓ ) ≤ ¯ q k ℓ + 3 p 4 c 2 ¯ q k ℓ log n, ( i, j ) ∈ S k × S ℓ . Note that right- hand side is increasing in ¯ q k ℓ . W e also note the follo wing k ey inequalit y ¯ q k ℓ ≤ 1 2 ( ¯ q ∗ k + ¯ q ∗ ℓ ) obtained by sum m ing th e follo wing tw o inequalities 1 2 ¯ q k ℓ ≤ 1 2 max r = k ,s 6 = k ¯ q r s , 1 2 ¯ q k ℓ ≤ 1 2 max r 6 = ℓ,s = ℓ ¯ q r s whic h hold for k 6 = ℓ . Hence, ¯ q k ℓ + 3 p 4 c 2 ¯ q k ℓ log n ≤ 1 2 ( ¯ q ∗ k + ¯ q ∗ ℓ ) + 3 q 2 c 2 ( ¯ q ∗ k + ¯ q ∗ ℓ ) log n ≤ 1 2  ¯ q ∗ k + 6 q 2 c 2 ¯ q ∗ k log n  + 1 2  ¯ q ∗ ℓ + 6 q 2 c 2 ¯ q ∗ ℓ log n  where w e h a v e u sed √ x + y ≤ √ x + √ y for x, y ≥ 0. Th us , taking φ k := 1 m ¯ φ k , ¯ φ k := ¯ q ∗ k + 6 q 2 c 2 ¯ q ∗ k log n 23 satisﬁes ( 4.16 ). W e also hav e m ρ k := min i ∈ S k d i ( S k ) ≥ ¯ p k − √ 4 c 1 ¯ p k log n , and from Corol- lary B.1 , | | | ∆ S k | | | ≤ C √ ¯ p k for all k . It follo w s that ( ρ k − φ k ) m − | | | ∆ k | | | ≥ ¯ p k − p 4 c 1 ¯ p k log n −  ¯ q ∗ k + 6 q 2 c 2 ¯ q ∗ k log n  − C √ ¯ p k ≥ ( ¯ p k − ¯ q ∗ k ) − ( C + √ 4 c 1 ) p ¯ p k log n − 6 q 2 c 2 ¯ q ∗ k log n By Corollary B.1 , | | | E S c 0 ◦ ∆ | | | ≤ C p ¯ q ∗ max K . Thus, to satisfy ( 4.17 ), it is enough to ha v e min k h ( ¯ p k − ¯ q ∗ k ) − ( C + √ 4 c 1 ) p ¯ p k log n − 6 q 2 c 2 ¯ q ∗ k log n i > C p ¯ q ∗ max K . whic h is imp lied b y min k h ( ¯ p k − ¯ q ∗ k ) − C 2  p ¯ p k log n + q ¯ q ∗ k log n  i > C p ¯ q ∗ max K . (B.1) Auxiliary conditions w e n eeded on ¯ p k and ¯ q k ℓ w ere ¯ p k ≥ (4 c 1 / 9) log n and ¯ q k ℓ ≥ (4 c 2 / 9) log n from Lemm a B.1 and ¯ p k ≥ C ′ log m and nq max > C ′ log n . As b efore, w e can drop the lo wer b ound s on { q k ℓ } k 6 = ℓ due to Corollary 4.2 . The lo wer b ound s on ¯ p k are implied by ¯ p k ≥ ( C ′ ∨ (4 c 1 / 9)) log n . This completes the pr o of. T o get to the form in whic h the theorem is state d , replace c 1 with c 1 + 1 and c 2 with c 2 + 2, and divide ( B.1 ) b y log n . 24 The follo wing app en d ices con tain pro ofs of the remaining resu lts, a detailed description of the implemen tation of an ADMM solv er for SDP 1, and additional details on sim ulations. C Pro ofs of Section 4.2.1 Pr o of of L emma 4.1 . Let S 0 := supp( X 0 ). W e pro ceed in tw o steps, ﬁ rst setting eleme n ts on S 0 to one, and then setting elemen ts o n S c 0 to zero. More precisely , let e A 1 = e A on S 0 (meaning that [ e A 1 ] ij = e A ij for ( i, j ) ∈ S 0 ) and e A 1 = A on S c 0 . Let X b e any feasible solution other than X 0 , so th at 0 ≤ X ≤ 1. W e will use the notatio n h A, X i S 0 := P ( i,j ) ∈ S 0 A ij X ij . By (unique) optimalit y of X 0 for A , w e hav e h A, X i < h A, X 0 i . Then, h e A 1 , X − X 0 i S c 0 = h A, X − X 0 i S c 0 < h A, X 0 − X i S 0 ≤ h e A 1 , X 0 − X i S 0 where the ﬁ rst equ alit y is b y assumption and the last inequalit y follo ws from A ≤ e A 1 on S 0 , and that X 0 − X ≥ 0 on S 0 . (Note that X 0 = E m on S 0 and X ≤ 1 ev erywh ere.) Hence, the conclusion of the lemma follo ws for e A 1 . No w, we can w rite h e A, X i ≤ h e A 1 , X i < h e A 1 , X 0 i = h e A, X 0 i . The ﬁ rst in equalit y is by n onnegativit y of X and e A ≤ e A 1 ev erywher e. T he second inequ alit y is b y (un ique) optimalit y of X 0 for e A 1 . Th e last equalit y is by e A = e A 1 on S 0 . Pr o of of Cor ol lary 4.2 . W e construct a coupling b et w een A and e A . Recall th at S k denotes the indices of no des in comm unity k . Dra w A ∼ BM bal m (Ψ), and dra w R ij ∼ Bern  e p k − p k 1 − p k  , ( i, j ) ∈ S k , ∀ k R ij ∼ Bern( e q k ℓ /q k ℓ ) , ( i, j ) ∈ S k × S ℓ , ∀ k < ℓ indep end en tly from A . Extend R symmetrically , b y setting R S k S ℓ = R T S k S ℓ for k > ℓ . Let e A ij := 1 − (1 − A ij )(1 − R ij ) , ( i, j ) ∈ S k , ∀ k e A ij := A ij R ij , ( i, j ) ∈ S k × S ℓ , ∀ k < ℓ and extend symmetrically . It is easy to v erify that e A has distribution BM bal m ( e Ψ). Moreo v er, b y construction e A ≥ A on sup p( X 0 ) and e A ≤ A on supp( X 0 ) c . The result no w follo ws from Lemma 4.1 . D Pro of of Lemma 4.2 T o p ro ve th e lemma, w e need the follo wing in termediate result. Lemma D.1. L et X ∈ S n with range( X ) ⊂ span { 1 S k } . Then X = B ⊗ E m for some B ∈ S K , that is, X is blo ck- c onstant. 25 Pr o of. Note that 1 S k = e k ⊗ 1 m where e k = e K k is the k -th basis v ector of R K . An eigen v ector v j of X will b e of the form v j = P k α j k 1 S k = ( P α j k e k ) ⊗ 1 m = u j ⊗ 1 m for some u j ∈ R K . Then, X = X j β j v j v T j = X j β j ( u j ⊗ 1 m )( u j ⊗ 1 m ) T = X j β j ( u j u T j ) ⊗ ( 1 m 1 T m ) =  X j β j u j u T j  ⊗ E m . Pr o of of L emma 4.2 . Conditions (A1) and (A2) together satisfy (CSa) and (CSb ) for X 0 and ( µ, ν , Γ), in addition to dual feasibilit y . Hence, X 0 is an optimal s olution of th e primal problem. T o s ho w un iqueness, let X b e an y optimal p rimal solution. Th en X and the sp eciﬁc triple ( µ, ν , Γ) assumed in the statemen t of the lemma should tog ether satisfy optimalit y conditions. (CSb) for X (and the triple) implies range( X ) ⊂ ker  Λ( µ, ν , Γ)  = span { 1 S k } b y (A1), whic h then implies X = B ⊗ E m for some B = ( b k ℓ ) ∈ S K b y Lemma D.1 . Note that this means X S k S ℓ = b k ℓ E m . No w, (CSa) for X implies 0 = X S k S ℓ ◦ Γ S k S ℓ = b k ℓ Γ S k S ℓ , for k 6 = ℓ using E m ◦ D = D , for an y D . But since Γ S k S ℓ is not iden tically zero b y (A3 ), w e should ha v e b k ℓ = 0, for k 6 = ℓ . One the other h and, p rimal feasibilit y of X , in particular, X ii = 1 implies b k k = 1. That is, B = I K , hence X = I K ⊗ E m = X 0 . D.1 Pro of of Theorem 4.1 : primal-dual witness for SDP - 2 ′ F or SDP-2 ′ , th e linear condition L 2 ( X ) = b 2 is just the scalar equation h E n , X i = n 2 /K = nm . The dual v ariable µ is a scalar in this case , and we ha ve b 2 = mn and L ∗ 2 ( µ ) = µ E n . Hence, w e h av e (cf. ( 4.10 )) Λ = Λ( µ, ν, Γ) = µ E n + diag ∗ ( ν ) − A − Γ . (D.1) Let d ( S k ) = A 1 S k b e the v ector of nod e degrees r elativ e to subgraph S k . W e denote it s i th elemen t b y d i ( S k ) = P j ∈ S k A ij . Let P 1 ⊥ m := I m − 1 m E m b e pro jection onto s p an { 1 m } ⊥ . The follo wing su mmarizes our primal-dual construction, mo dulo the c hoice of µ : ν i := d i ( S k ) − µm, for i ∈ S k , (D.2) Γ S k := 0 , ∀ k Γ S k S ℓ := µ E m + P 1 ⊥ m A S k S ℓ P 1 ⊥ m − A S k S ℓ , (D.3) for all k 6 = ℓ . Note that Γ is symmetric. Let ∆ := A − E [ A ] , d a v ( S k , S ℓ ) := 1 m X i ∈ S k d i ( S ℓ ) = 1 m X j ∈ S ℓ d j ( S k ) . (D.4) The follo wing lemma, p r o ve d in Section E , v eriﬁes the v alidit y of this construction. 26 Lemma D.2. L e t ( µ, ν, Γ) b e as in ( D.2 ) and ( D.3 ) . Then, Γ veriﬁes (A2) and ( 4.11 ) holds for al l µ . In addition, (a) Γ is dual fe asible, i.e. Γ ≥ 0 , i f for al l i ∈ S k , j ∈ S ℓ , k 6 = ℓ , µm ≥ d i ( S ℓ ) + d j ( S k ) − d av ( S k , S ℓ ) , (D.5) and satisﬁes (A 3) if at le ast one ine quality is strict for e ach p air k 6 = ℓ . (b) Γ ve riﬁes ( 4.12 ) if ( ρ − µ ) m > | | | ∆ | | | , wh er e ρ := min k min i ∈ S k d i ( S k ) /m. (D.6) W e note that c ho osing µm to b e the maxim u m of th e R HS of ( D.5 ), i.e., max k ,ℓ max i ∈ S k , j ∈ S ℓ  d i ( S ℓ ) + d j ( S k ) − d a v ( S k , S ℓ )  together w ith ( D.6 ) giv es a d eterministic condition for th e su ccess of SDP-2 ′ . In Section F , w e giv e a p robabilistic v ersion of this condition whic h completes the pro of of Theorem 4.1 . E Pro of of Lemma D.2 W e start b y seeing h o w far the KKT conditions determine the dual v ariables and ho w m uch freedom in c ho osing them is left. In acco r dance with cond ition (A2), we set Γ S k := 0. Then, ( 4.11 ) holds if and only if Λ S k 1 m = 0 and Λ S c k S k 1 m = 0, or equiv alen tly , Λ S k 1 m = ( µ E m + diag ∗ ( ν S k ) − A S k ) 1 m = µm 1 m + ν S k − A S k 1 m = 0 (E.1) Λ S c k S k 1 m = [ µ E n − m,m − ( A + Γ) S c k S k ] 1 m = µm 1 n − m − ( A + Γ) S c k S k 1 m = 0 (E.2) Let d ( S k ) = A 1 S k b e the v ector of n o de degrees r elativ e to comm u nit y/sub graph S k . W e denote its i th element as d i ( S k ) = P j ∈ S k A ij . Note also that A S k 1 m = [ d ( S k )] S k . Then, setting ν i := d i ( S k ) − µ m for i ∈ S k v eriﬁes ( E.1 ). T o verify ( E.2 ), w e need to ha ve ( A + Γ) S k S ℓ 1 m = µm 1 m , for all ℓ 6 = k . (E.3) Note that the same holds for ( A + Γ) S ℓ S k . That is, ev ery ro w and column of ( A + Γ) S k S ℓ , k 6 = ℓ should sum to a constan t (= µm ). In other words, 1 m is a righ t and left eigen vecto r of ( A + Γ) S k S ℓ asso ciated with eigen v alue µm . By sp ectral theorem (i.e., SVD), we sh ould ha v e ( A + Γ) S k S ℓ = µ E m + B S k S ℓ (E.4) where B S k S ℓ acts on span { 1 m } ⊥ . T o satisfy ( 4.12 ), w e ﬁrs t note that span { 1 S k } ⊥ = n u = X k e k ⊗ u k : u k ∈ R m , 1 T m u k = 0 , ∀ k o , where e k = e ( m ) k . In other w ords, span { 1 S k } ⊥ is the set of v ectors u suc h that eac h su b-v ector u S k sums to zero. No w, take u ∈ s pan { 1 S k } ⊥ . Then, u = P k u S k = P k e k ⊗ u k , for some { u k } ⊂ sp an { 1 m } ⊥ , and w e h a v e u T Λ u = X k ,ℓ u T S k Λ u S ℓ = X k ,ℓ u T k Λ S k S ℓ u ℓ = X k u T k Λ S k u k + X k 6 = ℓ u T k Λ S k S ℓ u ℓ . (E.5) 27 Recall that ∆ := A − E [ A ] where [ E A ] S k = p E m and [ E A ] S k S ℓ = q E m , k 6 = ℓ . T hen, A S k = p E m + ∆ S k and from ( D.1 ), w e h av e Λ S k = µ E m + diag ∗ ( ν S k ) − A S k . It follo ws that u T k Λ S k u k = u T k  ( µ − p ) E m + diag ∗ ( ν S k ) − ∆ S k  u k = u T k  diag ∗ ( ν S k ) − ∆ S k  u k using the fact that u T k E m u k = ( 1 T m u k ) 2 = 0. W e also note from ( D.2 ) that diag ∗ ( ν S k ) = diag ∗ ([ d ( S k )] S k ) − µmI m . On the other hand, for k 6 = ℓ , w e h a v e Λ S k S ℓ = µ E m − ( A + Γ) S k S ℓ = − B S k S ℓ from ( D.1 ) and ( E.4 ). T o su mmarize, u T Λ u = X k u T k  diag ∗ ([ d ( S k )] S k ) − ∆ S k  u k − µm X k k u k k 2 − X k 6 = ℓ u T k B S k S ℓ u ℓ . (E.6) T o satisfy ( 4.12 ), w e wan t u T Λ u to b e big, whic h is the case if b oth µ and { B S k S ℓ } are small. W e are free to choose them sub ject to dual feasibilit y constrain t Γ ≥ 0, w hic h trans lates to µ E m + B S k S ℓ ≥ A S k S ℓ . Our construction of Γ S k S ℓ in ( D.3 ) corresp onds to B S k S ℓ := P 1 ⊥ m A S k S ℓ P 1 ⊥ m . See also Remark E.1 for a discussion of the trade-oﬀ in v olved. Pr o of of p art (a). T o v erify d ual feasibilit y , we use P 1 ⊥ m e j = e j − 1 m 1 m to write [Γ S k S ℓ ] ij = e T i Γ S k S ℓ e j = µ + ( e i − 1 m 1 m ) T A S k S ℓ ( e j − 1 m 1 m ) − A ij = µ − 1 m e T i A S k S ℓ 1 m − 1 m 1 T m A S k S ℓ e j + 1 m 2 1 T m A S k S ℓ 1 m = µ − 1 m  d i ( S ℓ ) + d j ( S k ) − d a v ( S k , S ℓ )  ≥ 0 (A3) holds if [Γ S k S ℓ ] ij > 0 for at least one ( i, j ) ∈ S k × S ℓ , for eac h pair k 6 = ℓ , which is equiv alent to the stated condition. Pr o of of p art (b). T o verify ( 4.12 ), we recall repr esen tation ( E.6 ). By assu mption diag ∗ ([ d ( S k )] S k )  ρmI m for all k . Also, using B S k S ℓ = P 1 ⊥ m A S k S ℓ P 1 ⊥ m w e ha ve u T k B S k S ℓ u ℓ = u T k P 1 ⊥ m ( q E m + ∆ S k S ℓ ) P 1 ⊥ m u ℓ = u T k P 1 ⊥ m ∆ S k S ℓ P 1 ⊥ m u ℓ = u T k ∆ S k S ℓ u ℓ for { u k } ⊂ span { 1 m } ⊥ . F rom ( E.6 ) it follo w s that u T Λ u ≥ ρm X k k u k k 2 − X k u T k ∆ S k u k − µm X k k u k k 2 − X k 6 = ℓ u T k ∆ S k S ℓ u ℓ = ( ρ − µ ) m k u k 2 − u T ∆ u ≥  ( ρ − µ ) m − | | | ∆ | | |  k u k 2 . Remark E.1. The trade-oﬀ in c h o osing µ a nd B S k S ℓ can b e abstracted a wa y in the follo win g subpr oblem: h ( µ ) := m in  k e B k : µ E m + e B ≥ e A, range( e B ) ⊂ span { 1 m } ⊥  28 where e A ∈ { 0 , 1 } m × m is a non-symmetric adj acency matrix (sa y , of a directed Erdos-Renyi graph with connection p robabilit y q ). If µ = 1, one can take e B = 0, hence h (1) = 0. As one decreases µ from 1, the feasible set of the problem shrinks until the p roblem b ecomes in feasible for some µ 0 ∈ (0 , 1), if e A 6 = 0. W e ha v e c hosen e B = P 1 ⊥ m e AP 1 ⊥ m , essenti ally the largest choic e, to mak e µ as small as p ossible. This migh t not in general b e optimal. It wo u ld b e inte r esting to stud y h ( µ ) more carefully . F or example, another c hoice is e B = P V e AP V where V is a pr op er subspace of span { 1 m } ⊥ of lo w dimension. This increases µ , but d ecreases h ( µ ), h elping us to b etter con trol the contributions of oﬀ-diag onal b lo c ks in ( E.6 ). F Probabilistic conditions for PP bal W e will sho w wh en the construction of ( µ, ν, Γ) in ( D.2 ) and ( D.3 ) wo r ks for th e balanced plan ted p artition mo d el, completing the pro of of T heorem 4.1 . W e start we a consequence of Prop osition 4.1 . Corollary F.1. L et A = ( A ij ) ∈ { 0 , 1 } n × n b e dr awn fr om PP b al ( p, q ) with p ≥ ( C ′ log m ) /m and q ≥ ( C ′ log n ) /n . Then, w.p. at le ast 1 − c ( K m − r + n − r ) , | | | A − E A | | | ≤ C ( √ p m + √ q n ) . Pr o of. Let ∆ := A − E A a n d decomp ose it in to its diagonal and oﬀ-diago nal blo c ks. In particular, let S 0 := supp( X 0 ) = S k S k × S k and let S c 0 b e its complemen t. Then, | | | ∆ | | | ≤ | | | E S 0 ◦ ∆ | | | + | | | E S c 0 ◦ ∆ | | | = m ax k | | | ∆ S k | | | + | | | E S c 0 ◦ ∆ | | | E S c 0 ◦ ∆ is an n × n matrix whose entries hav e v ariance ≤ q , hence | | | E S c 0 ◦ ∆ | | | ≤ C √ q n w.p . at least 1 − cn − r . Eac h ∆ k is an m × m matrix whose ent r ies ha v e v ariance b ou n ded by p , hence | | | ∆ k | | | ≤ C √ pm w.p. at least 1 − cm − r , for eac h k . The result follo ws from union b ound. The follo wing consequence of Bernstein’s inequ alit y summarizes the concen tration of d ( S k ) around their mean. F or simplicit y , we will assum e that the d iagonal of A is also ﬁlled with Bern( p ) v ariates. This has n o eﬀect on the optimal primal solution d ue to the diagonal condi- tions X ii = 1. Recall that mK = n, ¯ p := pm, ¯ q := q m. Lemma F.1. L et γ := p (4 c 1 log n ) / ¯ p and ζ := p (4 c 2 log n ) / ¯ q and assume γ , ζ ∈ [0 , 3] . Then, d i ( S k ) ≥ ¯ p (1 − γ ) , i ∈ S k , ∀ k w.p. at le ast 1 − n − ( c 1 − 1) , and   d i ( S ℓ ) − ¯ q   ≤ ζ ¯ q , i ∈ S k , ∀ ( k 6 = ℓ ) , w.p. at le ast 1 − 2 m − 1 n − ( c 2 − 2) . The pro of is deferr ed to the App endix G . Assume no w that the conditions of Lemma F.1 (on γ and ζ ) are met. Then w.h.p., d a v ( S k , S ℓ ) is also in [ ¯ q (1 − ζ ) , ¯ q (1 + ζ )], for k 6 = ℓ , so that d i ( S ℓ ) + d j ( S k ) − d a v ( S k , S ℓ ) ≤ 2 ¯ q (1 + ζ ) − ¯ q (1 − ζ ) = ¯ q (1 + 3 ζ ) . 29 Th us, to satisfy ( D.5 ), it is enough to h a ve µm ≥ ¯ q (1 + 3 ζ ). On the other h and, Lemma F.1 implies that mρ := min k min i ∈ S k d i ( S k ) ≥ ¯ p − ¯ pγ . Then , ( ρ − µ ) m ≥ ¯ p − ¯ pγ − ¯ q − 3 ¯ q γ ≥ ¯ p − ¯ q − p 4 c 1 ¯ p log n − 3 p 4 c 2 ¯ q log n By Corollary F.1 , w.h.p. | | | ∆ | | | ≤ C ( √ ¯ p + √ ¯ q K ), wher e w e ha ve used q n = ¯ q K . Then, to satisfy ( D.6 ), it is enough to ha ve ¯ p − ¯ q − p 4 c 1 ¯ p log n − 3 p 4 c 2 ¯ q log n > C ( √ ¯ p + p ¯ q K ) whic h is imp lied b y ¯ p − ¯ q > ( C + √ 4 c 1 ) p ¯ p log n + ( C + 3 √ 4 c 2 ) p ¯ q K log n in turn implied b y ¯ p − ¯ q > C 2 ( p ¯ p log n + p ¯ q K log n ) . (F.1) Auxiliary conditions w e needed on ¯ p and ¯ q w ere ¯ p ≥ (4 c 1 / 9) log n and ¯ q ≥ (4 c 2 / 9) log n from Lemma F.1 and ¯ p ≥ C ′ log m and n q > C ′ log n from Corollary F.1 . W e can drop the lo wer b ound s on q due to C orollary 4.2 . Th e lo we r b ounds on ¯ p are implied by ¯ p ≥ ( C ′ ∨ (4 c 1 / 9)) log n . This completes the pro of. T o get to the form in w h ic h the theorem is stated, rep lace c 1 with c 1 + 1 and c 2 with c 2 + 2, and divide ( F.1 ) b y log n . G Pro of of L emma F.1 W e recall the follo w ing version of Bern stein inequalit y . Prop osition G.1 (Bernstein) . L et { X i } b e indep endent zer o-me an R Vs, with | X i | ≤ 1 almost sur ely, and let v := P i E [ X 2 i ] , then P  n X i =1 X i > v t  ≤ exp[ − v φ ( t )] , t > 0 , w her e φ ( t ) := t 2 2(1 + t/ 3) . F or th e ﬁrst assertion, n ote that f or i ∈ S k , d i ( S k ) = P j ∈ S k A ij is a b inomial r andom v ariable with mean mp a nd v ariance mp (1 − p ) ≤ mp . then ap p lying Be r nstein’s with v = mp and t = γ , we h a v e P [ d i ( S k ) − mp < − m pγ ] ≤ exp( − mp φ ( γ )) . It follo w s from union b ound that P  min k min i ∈ S k d i ( S k ) ≥ ¯ p (1 − γ )  ≥ 1 − mK exp( − ¯ pφ ( γ )) F or γ ∈ [0 , 3], we ha ve φ ( γ ) ≥ γ 2 / 4. It follo w s that mK exp( − ¯ pφ ( γ )) ≤ n exp( − ¯ pγ 2 / 4) ≤ nn − c 1 , proving the ﬁ rst assersion. T he second assersion follo ws similarly , b y n oting that d i ( S ℓ ) is binomial with mean ¯ q = q m f or i ∈ S k , k 6 = ℓ . It follo ws from t w o-sided Bernstein and un ion b ound that P  max k 6 = ℓ max i ∈ S k   d i ( S ℓ ) − ¯ q   ≤ ζ ¯ q  ≥ 1 −  2  K 2  m  2 exp( − ¯ q φ ( ζ )) ≥ 1 − ( mK ) 2 m − 1 2 exp( − ¯ q ζ 2 / 4) . The rest of the argumen t follo ws as b efore. 30 H Pro of of Prop osition 5.1 The implication ( a ) = ⇒ ( b ) follo ws since in the b alanced case ξ k = 1 , ∀ k . Let us explain ho w part (b) imp lies part (c): Let M := E [ A ] for a we akly b ut not strongly assortativ e blo ck mo del. Let f M be the matrix obtained from M by setting all the diagonal blo c ks identic ally equal to one, except for one of the blocks that violate s s tr ong assortativit y . P art (b ) app lies to f M with | I c | = 1 and hence SDP sol ( f M ) 6 = { X 0 } . It then follo w s from Lemma 4.1 that SDP sol ( M ) 6 = { X 0 } whic h is the desired resu lt. The remainder of this section is d ev oted to p ro ving p art (a). W e tak e dual v ariable Λ b e of the follo w ing form Λ = X k ∈ I λ k ( − E S k + n k I S k ) , with λ k ≥ 0 . (H.1) In ord er to satisfy Λ X = 0, it is enough to ha ve Λ 1 S r = 0 , r ∈ I . T h is holds for the form giv en in ( H.1 ), namely , Λ 1 S r = λ r ( − E S r + n r I r ) 1 S r = λ r ( − n r I S r + n r I S r ) = 0. W e also assume the follo wing f orm for Γ, Γ = X k 6 = ℓ ρ k ℓ E S k S ℓ + X k / ∈ I  γ k E S k − γ k I S k  . (H.2) F or k ∈ I , we ha ve α k k 6 = 0, and (CSa) implies Γ S k = 0. W e also h a v e X ii = 1 , ∀ i , hence Γ ii = 0 , ∀ i . The form giv en in ( H.2 ) resp ects these conditions. Let M := E [ A ] and reca ll that Λ = µ E n + diag ∗ ( ν ) − ( M + Γ), hence ( H.1 ) is equiv alen t to µ E n k n ℓ − ( M + Γ) S k S ℓ = 0 , k 6 = ℓ (H.3) µ E n k + diag ∗ ( ν S k ) − M S k = − λ k E n k + λ k n k I n k , k ∈ I (H.4) µ E n k + diag ∗ ( ν S k ) − ( M + Γ) S k = 0 , k / ∈ I . (H.5) Recall th at M S k S ℓ = q k ℓ E n k n ℓ for k 6 = ℓ , and M S k = p k E n k . Hence, ( H.3 ) is equiv alen t to ∀ ( k 6 = ℓ ) ρ k ℓ = µ − q k ℓ . Let us no w simplify condition ( H.4 ). Lo oking at the d iagonal, w e ha v e µ + ν i − p k = ( n k − 1) λ k , i ∈ S k , k ∈ I . Looking at the oﬀ-diag onal, w e h a v e λ k = p k − µ . Hence, ν i = n k λ k for i ∈ S k , k ∈ I . No w consid er ( H.5 ). F or k / ∈ I , th e diagonal giv es µ + ν i − p k = 0 since A ii = p k and Γ ii = 0 (b ecause of X ii = 1.) Hence, ν i = p k − µ for i ∈ S k , k / ∈ I . The oﬀ-diagonal giv es γ k = µ − p k . The follo w ing table summ arizes these relationships: k ∈ I k / ∈ I ∀ ( k 6 = ℓ ) λ k = p k − µ ρ k ℓ = µ − q k ℓ ν i = n k λ k , i ∈ S k ν i = p k − µ, i ∈ S k γ k = µ − p k (CSa) implies ( ∀ k / ∈ I ) β k γ k = 0, and ( ∀ k , ℓ ∈ I , k 6 = ℓ ) , ρ k ℓ α k ℓ = 0. T ogether with dual feasibilit y , n amely λ k ≥ 0, γ k ≥ 0, and ρ k ℓ ≥ 0, w e obtain th e follo win g restrictions on µ , k ∈ I k / ∈ I ∀ ( k 6 = ℓ ) µ < p k µ ≥ p k µ ≥ q k ℓ β k ( µ − p k ) = 0 α k ℓ ( µ − q k ℓ ) = 0 , k , ℓ ∈ I (H.6) 31 It is in teresting to note that when { q k ℓ , k 6 = ℓ } are d istinct (whic h is not assum ed here), at most one of { α k ℓ , k < ℓ , k, ℓ ∈ I } is nonzero. This is enforced b y the condition in the last ro w and column of ( H.6 ), since µ can b e equal to at most one of q k ℓ . Recall that ( k 0 , ℓ 0 ) := argmax k 6 = ℓ q k ℓ whic h is assumed to b e unique, and I := { k : p k ≥ max k 6 = ℓ q k ℓ } . By the assump tion of wea k assortat ivity , { k 0 , ℓ 0 } ⊂ I . Let µ = q k 0 ℓ 0 . Then, the condition in the last ro w and column of ( H.6 ) implies that α k ℓ = 0 , ∀ k , ℓ ∈ I \ { k 0 , ℓ 0 } , k 6 = ℓ , that is, only α k 0 ℓ 0 = α ℓ 0 k 0 could b e n onzero. Also, note that with th is c h oice of µ , the ﬁrst ro w of ( H.6 ) is satisﬁed. Since µ > p k for k ∈ I c := [ K ] \ I = { k : p k < q k 0 ℓ 0 } , the condition in the second ro w and column of ( H.6 ), implies that β k = 0 for k / ∈ I . It remains to verify primal feasibilit y , namely , h X, E n i = mn . Recall that n k = ξ k m with ξ k ≥ 1 (since m = min k n k ). W e need to ha ve h X, E n i = X k ∈ I n 2 k + 2 α k 0 ℓ 0 n k 0 n ℓ 0 + X k / ∈ I n k = nm W riting n = P k n k and dividing b y m , this is equiv alen t to X k ∈ I ξ 2 k + 2 α k 0 ℓ 0 ξ k 0 ξ ℓ 0 + 1 m X k / ∈ I ξ k = X k ξ k Some algebra giv es the expression α k 0 ℓ 0 = α ∗ k 0 ℓ 0 where th e latter is giv en in ( 5.2 ). Under the stated assumption, α k 0 ℓ 0 ∈ [0 , 1], the constructed solution is primal feasible and the pr o of is complete. I Pro of of Prop osition 6.1 W e start by pro vin g part (b). P art (a) then follo ws by simple mo diﬁcations to the argument. Throughout, we mainly ha ve the case m = min k n k in min d, which adds to the complexity in the construction of the pr imal-dual witness. When m < min k n k , th e set I c that app ears b elo w will b e empt y and the argument s im p liﬁes. Let L 2 ( X ) = X 1 n and b 2 = m 1 n . The dual to problem ( 6.1 ) is max Γ , ρ, ν, µ −h ρ, b 2 i + h ν , 1 n i s.t. Λ := diag ∗ ( ν ) − L ∗ 2 ( ρ ) − ( A − µ E n + Γ)  0 Γ ≥ 0 , ρ ≥ 0 Besides primal and dual feasibilit y we ha v e the f ollo w ing complementa r y slac kn ess conditions (CSa) (CSb) (CSc) Γ ij X ij = 0 , ∀ i, j Λ X = 0 ρ i [ L 2 ( X ) − b 2 ] i = 0 , ∀ i Consider the p otentia l primal solution give n in ( 6.2 ) and note that it alw ays satisﬁes X ii = 1, X  0 and X ≥ 0. W e can use (CSc) to mak e a reasonable c hoice of { α k } . F or i ∈ S k , [ L 2 ( X )] i = ( X 1 n ) i = α k n k +(1 − α k ), hence ρ i [ L 2 ( X ) − b 2 ] i = 0, together with th e corresp onding dual feasibilit y , tran s late to φ k [( α k n k + 1 − α k ) − m ] = 0 , (I.1) α k n k + 1 − α k ≥ m (I.2) 32 Note th at if n k > m , then setting α k = 1 forces φ k = 0, h ence we lose the ﬂexibilit y asso ciated with φ k . T h is suggests that w e should av oid setting α k = 1, as muc h as p ossible, u nless n k = m . F or simplicit y , let I 1 := I 1 ( k 0 ), and recall that I := { k : n k > m } and I 1 ⊂ I . Let I 2 := I \ I 1 and note that α k giv en in part (b) of the prop osition can b e written as α k := ( m − 1 n k − 1 < 1 , k ∈ I 2 1 , k ∈ I c ∪ I 1 where I c := [ n ] \ I = { k : n k = m } . Note that this choice f rees φ k , ∀ k to b e an y nonnegativ e n u m b er, exc ep t for k ∈ I 1 where w e n eed φ k = 0. W e no w turn to the dual v ariables. Let us tak e Λ to b e of the form Λ = K X k =1 λ k ( − E S k + n k I S k ) , λ k ≥ 0 . Λ is blo c k d iagonal, and the k th b lo c k has eigenv alues λ k (0 , n k , n k , . . . , n k ). W e will c h o ose ρ S k = 1 2 φ k 1 S k . Note that L ∗ 2 ( ρ ) = ρ 1 T n + 1 n ρ T , hence [ L ∗ 2 ( ρ )] S k S ℓ = 1 2 ( φ k + φ ℓ ) E n l ,n ℓ for all k , ℓ . With M := E [ A ], the follo wing has to hold µ E n k ,n ℓ − 1 2 ( φ k + φ ℓ ) E n k ,n ℓ − ( M + Γ) S k S ℓ = 0 , k 6 = ℓ µ E n k + diag ∗ ( ν S k ) − φ k E n k − M S k = λ k ( − E n k + n k I n k ) . (I.3) In deriving ( I.3 ), w e hav e used Γ S k = 0 , ∀ k wh ic h follo ws from the particular c hoice of X in ( 6.2 ) and (CSa). Let ψ k := µ − φ k . Using M S k S ℓ = q k ℓ E n k n ℓ , k 6 = ℓ and M S k = p k E n k , w e arriv e at Γ S k S ℓ = h 1 2 ( ψ k + ψ ℓ ) − q k ℓ i E n k ,n ℓ , ψ k + ν i − p k = ( n k − 1) λ k , i ∈ S k ψ k − p k = − λ k where the last tw o equalities are obtained by consider in g the d iagonal and oﬀ-diago n al elemen ts in ( I.3 ). It follo ws that λ k = p k − ψ k and ν i = n k λ k , i ∈ S k . Dual feasibilit y imp lies φ k = µ − ψ k ≥ 0 , ∀ k (I.4) 1 2 ( ψ k + ψ ℓ ) − q k ℓ ≥ 0 , ∀ k 6 = ℓ (I.5) λ k = p k − ψ k ≥ 0 , ∀ k (I.6) (CSb), namely , Λ X = 0, translates to λ k (1 − α k ) = 0, since E S k ( − E S k + m k I S k ) = 0 implies Λ X = P k λ k (1 − α k )( − E S k + n k I S k ). In p articular, for k ∈ I 2 , w e ha ve α k < 1, hence λ k = 0; otherwise λ k is free to b e an y nonnegativ e num b er. T o summarize, (CSb) and (CSc) imp ose the follo wing restrictions on the dual v ariables ( ∀ k ∈ I 1 ) φ k = 0 , ( ∀ k ∈ I 2 ) λ k = 0 . (I.7) 33 Recall the inequalit y q k ℓ ≤ 1 2 ( q ∗ k + q ∗ ℓ ) , ∀ k 6 = ℓ . It follo ws that b y choosing ψ k ≥ q ∗ k , ∀ k , we can satisfy ( I.5 ). T o satisfy ( I.4 ), ( I.6 ) and ( I.7 ), w e n eed ( ∀ k ∈ I 1 ) ψ k = µ ≤ p k , ( ∀ k ∈ I 2 ) ψ k = p k ≤ µ, ( ∀ k ∈ I c ) ψ k ≤ min { p k , µ } (I.8) where w e n ote that I 1 , I 2 , I c form a partition of [ K ]. Thus, it is enough to hav e ( ∀ k ∈ I 1 ) µ ∈ [ q ∗ k , p k ] , ( ∀ k ∈ I 2 ) µ ≥ p k , ( ∀ k ∈ I c ) q ∗ k ≤ min { p k , µ } (I.9) Since µ ∈ J k 0 ⊂ T k ∈ I 1 ( k 0 ) [ q ∗ k , p k ], w e ha ve µ ∈ [ q ∗ k , p k ] f or all k ∈ I 1 = I 1 ( k 0 ). Since w e ha v e tak en µ ≥ p k 0 +1 , w e hav e µ ≥ p k for all k ≥ k 0 + 1 due to th assum ed ordering of { p k } . In particular, µ ≥ p k for all k ∈ I 2 . F or k ∈ I c , either k ≤ k 0 , in whic h case µ ∈ [ q ∗ k , p k ], i.e. q ∗ k ≤ µ = min { p k , µ } , or we h a v e k ≥ k 0 + 1 in whic h case µ ≥ p k , h ence q ∗ k ≤ p k = min { µ, p k } . Th us, all the conditions in ( I.9 ) are met and the pro of is complete. Pr o of of p art (a). T he argument here is similar to that of p art (b). In addition to setting µ = 0, the main diﬀerence is that (CS c) and du al feasibilit y condition ρ i ≥ 0 , ∀ i is r eplaced b y the single primal feasibilit y condition L 2 ( X ) − b 2 = 0. Note that there is no nonnegativit y assumption on ρ anymore. The argument goes tru e if we tak e I 1 = ∅ and I 2 = I , whic h ensu res that X 1 n − m 1 n = 0. W e no w hav e ψ k = − φ k and the dual feasibilit y conditions reduce to ( I.5 ) and ( I.6 ). F u rth u rmore, ( I.7 ) is s impliﬁed to ( ∀ k ∈ I ) λ k = 0, since only (CSb) is presen t. Th us , it is enough to ha ve ψ k ≥ q ∗ k for all k and ( ∀ k ∈ I ) ψ k = p k , ( ∀ k ∈ I c ) ψ k ≤ p k . (I.10) Since q ∗ k ≤ p k , ∀ k , by assum ption, it is clearly p ossible to c ho ose ψ k to satisfy these conditions. J Implemen tation of SDP-1 It is straigh tforw ard to adapt a ﬁ rst order metho d to solv e the SDP-1 problem ( 3.5 ). W e b rieﬂy discuss the implemen tation of an ADMM solver [ 13 ]. W e start by rewriting th e problem as inf X  − h A, X i + δ { e L ( X ) = e b } + δ { Z ≥ 0 } + δ { Y  0 }  s.t. X = Z , X = Y , where δ S is the indicator of set S deﬁned by δ S ( x ) = 0 if x ∈ S and = ∞ otherwise, and e L : R n × n → R 2 n is a linear op er ator suc h th at e L ( X ) = e b collects the aﬃne constraints in ( 3.5 ). More precisely , for i = 1 , . . . , n , we ta ke [ e L ( X )] i = h X, H i i and [ e L ( X )] i + n = h X , F i i . Here, H i is a s ymmetric matrix with 1 in the oﬀ-diagonal elemen ts of th e i -th column and r o w, and 0 ev erywher e else. F i is a m atrix with element ( i, i ) equal to 1 and 0 everywhere else. Finally , e b i = 2(( n/K ) − 1) for i = 1 , . . . , n and b i = 1 otherwise. ( e L is a v ariation of L that app ears in Section 4.2.2 . It is c hosen so that H i is orthog onal to F j for all i, j . Ho we v er, { e L ( X ) = e b } and {L ( X ) = b } describ e the same aﬃn e subspace.) The only real w ork in derivin g ADMM up dates is to ﬁ n d the pro jection op erator Π A for A := { X : e L ( X ) = e b } . F or an y Y , this pro jection is giv en by Π A ( Y ) := Y − e L ∗ ( e L e L ∗ ) − 1 [ e L ( Y ) − e b ] . (J.1) 34 Note that h H i , F j i = 0 for all i, j = 1 , . . . , n . Hence, e L e L ∗ is blo c k diagonal with t wo blo c ks ( h H i , H j i ) = 2[( n − 2) I n + 1 n 1 T n ] and ( h F i , F j i ) = I n . It follo ws that ( e L e L ∗ ) − 1 = diag  1 2( n − 2)  I n − 1 n 1 T n 2 n − 2  , I n  . W e also ha ve e L ∗ ( e µ, ν ) = P i e µ i E i + P i ν i F i = ( e µ i + e µ j ) i 6 = j + diag( ν ), whic h give s a complete recip e to compute Π A ( Y ). Note that due to the simplicit y of ( e L e L ∗ ) − 1 and e L ∗ , implementing this pro jection has essentiall y the same computational cost as pro jecting on to an aﬃne set with t wo constrain ts { X : tr( X ) = n, h E n , X i = n 2 /K } , wh ich is needed for implemen ting S DP-2. The ADMM up d ates are easily derived to b e X k +1 = Π A  1 2 ( Z k − U k + Y k − V k + 1 ρ A )  , Z k +1 = max { 0 , X k +1 + U k } , Y k +1 = Π S n +  X k +1 + V k  , U k +1 = U k + X k +1 − Z k +1 , V k +1 = V k + X k +1 − Y k +1 . where Π S n + is the p r o jection on to the PS D cone S n + , whic h ca n b e done by trun cating t o nonnegativ e eigenv alues. The ADMM up dates for SDP-2 and SDP-3 can b e deriv ed sim ilarly , as in [ 14 ]. K Details on Figure 1 : c omparing the theoretical p redictions with empirical results Figure 1 pro vid es a n illustratio n of the con tent s of Prop ositions 5.1 and 6.1 . T he r esults are obtained by n u merically solving the SDPs. Here, w e explain how th ey matc h w ith our theoretical results. Th e leftmost panel corresp ond s to the mean matrix M = E [ A ] of a w eakly assortativ e b lo c k mo del, r an d omly pic k ed among such mo d els. The sp eciﬁc edge p robabilit y matrix is as follo ws: 0 . 670 0 . 072 0 . 02 0 0 . 023 0 . 18 6 0 . 187 0 . 072 0 . 570 0 . 52 1 0 . 016 0 . 36 0 0 . 107 0 . 020 0 . 521 0 . 55 5 0 . 048 0 . 31 1 0 . 188 0 . 023 0 . 016 0 . 04 8 0 . 494 0 . 08 1 0 . 137 0 . 186 0 . 360 0 . 31 1 0 . 081 0 . 47 5 0 . 031 0 . 187 0 . 107 0 . 18 8 0 . 137 0 . 03 1 0 . 195 . There are six blo cks of sizes n = (10 , 10 , 5 , 20 , 10 , 10). The parameters p k and q ∗ k = max ℓ 6 = k q ℓ : k ℓ , for eac h of the K = 6 blo cks are as follo w s q ∗ k 0 . 187 0 . 521 0 . 521 0 . 137 0 . 36 0 0 . 18 8 p k 0 . 670 0 . 570 0 . 555 0 . 494 0 . 47 5 0 . 19 5 . ( K .1) where the o ve r all maxim u m of the oﬀ-diagonal en tries is max k q ∗ k = 0 . 521. It is clear that the last three blo c ks violate strong asso ciativit y . W e can use part (a) of Prop osition 5.1 to predict the b ehavio r of SDP-2 ′ . Note that for this example, ( k 0 , ℓ 0 ) := argmax k <ℓ q k ℓ = (2 , 3). W e ha v e m = min k n k = 5, hence ( ξ k ) = 35 (2 , 2 , 1 , 4 , 2 , 2) wh ere ξ k = n k /m , I = { k : p k ≥ q k 0 ℓ 0 = 0 . 521 } = { 1 , 2 , 3 } , ξ k 0 = 2 and ξ ℓ 0 = 1. It then follo ws that α ∗ k 0 ℓ 0 = 1 2(2 · 1) h 1 − 1 5  (4 + 2 + 2) − (2 · 1 + 2 · 1 + 1 · 0) i = 0 . 6 . Since α ∗ k 0 ℓ 0 ∈ [0 , 1], th e conditions of p art (a) of Pr op osition 5.1 are m et and the solution is of the form ( 5.2 ) with α 1 = α 2 = α 3 = 1, α 23 = α 32 = 0 . 6 and β k = 0 for k = 4 , 5 , 6. Let us n ow app ly Prop osition 6.1 to pr edict the b eha vior of SDP-13. Recall that J k = T k r =1 [ q ∗ r , p r ] whic h in this case giv es k 1 2 3 4 5 6 J k 0 . 187 0 . 521 0 . 52 1 ∅ ∅ ∅ 0 . 670 0 . 570 0 . 55 5 . W e h a v e k 0 = max { k : J k 6 = ∅} = 3 and w e can apply SDP-13 with an y µ ∈ J 3 ∩ [ p 4 , 1] = [0 . 521 , 0 . 555] ∩ [0 . 494 , 1] = [0 . 521 , 0 . 555]. The images in Figure 1 are generated with µ = 0 . 55 and m = min k n k = 5. W e hav e I = { k : n k > m } = { k : ξ k > 1 } = { 1 , 2 , 4 , 5 , 6 } , hence I 1 ( k 0 ) = I 1 (3) = { 1 , 2 } . Pr op osition 6.1 (b) now applies an d we ha ve a solution of the f orm ( 6.2 ) with k 1 2 3 4 5 6 α k 1 1 1 0 . 211 0 . 444 0 . 44 4 . Note th at we ha ve th r ee p erf ectly reco vered b lo c ks in the sense discussed after Prop osition 6.1 . Finally , part (a) of Pr op osition 6.1 predicts that SDP-1, ap p lied with m = 5, has a solution of the form ( 6.2 ) with k 1 2 3 4 5 6 α k 0 . 444 0 . 444 1 0 . 211 0 . 4 44 0 . 4 44 . All the ab o ve predictions match what is empirically r ep orted in Figure 1 . W e also note that although the b ehavi or of SDP-3 is not men tioned in Prop ositions 5.1 and 6.1 , that solution can also b e predicted b y careful examination of the pr o ofs. L Details on Figure 2 : the b ias of normalized m utual informa- tion for large K T op left p an el in Fig u re 6 sh o ws the a verag e (empirical) NMI of random guessing as a function of K . Cont r ary to p opular b elief, the emp ir ical NMI do es not automatically adjust s o that random guessing corresp onds to zero NMI; unless K is small. It is d esigned to do so based on p opulation quan tities, but for the p opulation qu an tities to b e accurately appro xim ated by empirical ones, one needs concen tration of the coun ts in the confusion matrix around their means, which do es not h app en unless n/K is suﬃcien tly large. Figure 6 sho w s the p lots of Figure 2 after adjusting for random gu essin g by su btracting the corresp onding a verag e NMI. As one w ould exp ect, the dip in the curves goes a wa y after the adjustment. 36 K 10 20 30 40 50 60 NMI of chance 0.1 0.2 0.3 0.4 0.5 0.6 n = 120 K 10 20 30 40 50 60 NMI 0.5 0.6 0.7 0.8 0.9 n = 120 β = 0.05 d = 7.0 SDP-1 SDP-2 SDP-3 EVT K 0 20 40 60 NMI 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n = 120 β = 0.05 d = 5.0 SDP-1 SDP-2 SDP-3 EVT K 0 20 40 60 NMI 0.2 0.3 0.4 0.5 0.6 0.7 0.8 n = 120 β = 0.05 d = 3.0 SDP-1 SDP-2 SDP-3 EVT Figure 6: T op left: Avera ge NMI of rand om guessing (or chance). The other th ree plots corresp ond to those of Figure 2 but with raw NMI (no adjustment for chance). 37 Ac kno wledgemen ts W e w ould lik e to thank Karl Rohe and Da vid Choi for in teresting discussions r egarding the p ossibilit y of exte n ding SDPs to mixed mo dels. This researc h has b een partially supp orted b y NSF gran ts DMS-1106772 , DMS-11590 05, and DMS-15215 51. References [1] E. Abb e, A. S. Bandeira, and G. Hall. Exact r eco v ery in the sto chasti c b lo c k mo d el. IEEE T r ansactions on Information The ory , 62(1):47 1—-487, Ma y 2016. [2] E. Ab b e and C. S andon. Communit y d etectio n in general sto c hastic blo c k mo dels: fun- damen tal limits and eﬃcien t r eco v ery algorithms. arXiv pr eprint arXiv:1503.0 0609 , F eb. 2015. [3] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P . Xing. Mixed memb ership sto c h astic blo c kmo d els. J. Machine L e arning R ese ar ch , 9:1981–2 014, 200 8. [4] E. M. Airoldi, T. B. Costa, and S. H. Chan. Sto c hastic blo ckmodel appr o ximation of a graph on: Theory and consistent estimation. In A dvanc es in NIPS 26 , p ages 692–700 . 2013. [5] D. J. Aldous. Represent ations for p artially exc h angeable arr a ys of rand om v ariables. J. Multivariate Anal ysis , 11:581 –598, 1981. [6] B. P . W. Ames and S. A. V a v asis. Con vex optimization for the plant ed k-disj oint-cl ique problem. Mathematic al Pr o gr amming , 143(1): 299–337, Au g. 2014 . [7] A. A. Amini, A. Chen, P . Bick el, and E . Levina. Fitting comm u nit y m o dels to large sparse net works. Annals of Statistics , 41(4):2097 –2122, 2013. [8] A. A. Am in i and M. J. W ain wright. High-dimensional an alysis of s emid eﬁnite relaxations for sparse principal comp onen ts. The Annals of Statistics , 37(5B):2 877–2921, Oct. 200 9. [9] P . Aw asthi, A. S . B an d eira, M. C harik ar, R. Krish nasw am y , S. Villar, and R. W ard. Relax, no need to round: integ r alit y of clustering formulat ions . In Pr o c e e dings of th e 2015 Confer enc e on Innovat ions in The or etic al Computer Scienc e , pages 191—-200, Aug. 2015. [10] P . J. Bic kel and A. Chen. A nonparametric view of net w ork mo dels and Newman-Girv an and other mo d ularities. Pr o c e e dings of the National A c ademy of Scienc es of the Unite d States of Americ a , 106(50):2 1068–73, Dec. 2009 . [11] P . J. Bic kel and A. Chen. A nonparametric view of net w ork mo dels and Newman-Girv an and other mo d ularities. Pr o c. Natl. A c ad. Sci. USA , 106:210 68–21073, 200 9. [12] P . J . Bic k el, D. Choi, X. Chan g, and H. Z h ang. Asymptotic n ormalit y of maximum lik e- liho o d and its v ariational approxima tion f or stoc h astic blo c kmo dels. Annals of Statistics , 41:192 2–1943, 20 13. 38 [13] S . Bo yd , N. Parikh, E. Ch u, B. P eleato, and J . Eckste in . Distributed Optimization and Statistical Learning via the Alternating Direction Metho d of Multipliers. F oundations and T r ends in Machine L e arning , 3(1):1–1 22, 2010. [14] T . Cai and X. Li. Robust and computationally feasible comm unity detect ion in the pres- ence of arbitrary outlier no des. The Annals of Statistics , 43(3):102 7—-1059, 201 5. [15] A. C elisse, J .-J. Daudin, and L. Pierr e. C on s istency of m axim um-like lih o o d and v ariational estimators in the sto chastic b lo c k m o del. Ele ctr onic Journal of Statistics , 6:18 47–18 99, 2012. [16] K . Ch audhuri, F. Chung, and A. Tsiatas. S p ectral clustering of grap h s w ith general degrees in the extended p lan ted partition mod el. JMLR Workshop and Confer enc e Pr o c e e dings , 23:35. 1–35.23, 201 2. [17] Y. C hen, S . S angha vi, and H. Xu. Clustering Spars e Graphs. I n NIPS , pages 2204—-221 2, 2012. [18] Y. Ch en and J. Xu. S tatistica l-Comp utational T radeoﬀs in Plan ted Problems and Sub- matrix Lo calization with a Gro wing Num b er of Clusters and Submatrices. arXiv pr eprint arXiv:1402.12 67 , 20 14. [19] A. D’Aspremont, L. El Gh aoui, M. I. Jordan, an d G. R. Lanc k r iet. A d irect formulat ion for sparse PCA using semideﬁnite programming. SIA M r eview , 49(3):434—- 448, 2007. [20] A. Decelle, F. Krzak ala, C. Mo ore, and L. Z deb orov´ a. Asymptotic analysis of the sto chastic blo c k mo del for mo du lar n et wo r ks and its algorithmic applications. Physic al R evie w E , 84(6): 06610 6, Sept. 2011. [21] A. Decelle, F. Krzak ala, C. Mo ore, and L. Z deb orov´ a. Asymptotic analysis of the sto chastic blo c k mo del for mo du lar n et wo r ks and its algorithmic applications. Physic al R evie w E , 84:066 106, 2012. [22] U. F eige and E. Ofek. S p ectral tec h niques applied to sparse random graphs. R andom Structur es & A lgorithms , 27(2):2 51–275, 200 5. [23] C . Gao, Y. Lu, and H. H. Zh ou. Rate-optimal Graphon Es timation. The A nnals of Statistics , 43(6):26 24—-2652, 2015. [24] O . Gu´ edon and R. V ersh ynin. Comm u nit y detection in spars e n et w ork s via Grothendieck’ s inequalit y. No v. 2014. [25] B. Ha jek, Y. W u, and J . Xu. Ac h ieving Ex act Cluster Reco very Th reshold via Semideﬁn ite Programming. arXiv pr eprint arXiv:1412.61 56 , No v. 2014. [26] B. Ha jek, Y. W u, and J . Xu. Ac h ieving Ex act Cluster Reco very Th reshold via Semideﬁn ite Programming: Extensions. arXiv pr eprint arXiv:1502.0773 8 , F eb. 2015. [27] P . W. Holland, K . B. Lask ey , and S. Leinhardt. Sto chastic blo ckmodels: ﬁ rst steps. So cial Networks , 5(2):1 09–137, 19 83. 39 [28] A. Joseph and B. Y u. Impact of regularizati on on Sp ectral Clu stering. arXiv pr eprint arXiv:1312.17 33 , 20 13. [29] O . Klopp, A. B. Tsybako v, and N. V erzelen. Oracle inequ alities f or net work mo dels and sparse graphon estimation. arXiv pr eprint arXiv:1507.04 118 , jul 2015. [30] C . M. L e, E. Levina, and R. V ersh ynin . Sparse rand om graphs: regularization and con- cen tration of th e Laplacia n . arXiv pr eprint arXiv : 1502.0 3049 , F eb. 2015. [31] J . Lei and A. Rinaldo. Consistency of sp ectral clustering in sp arse sto c h astic blo c k mo dels. 2013. arxiv:1312.2050 . [32] D. Lusseau and M. E. J. Newman. Iden tifyin g the r ole that anim als pla y in their so cial net works. Pr o c e e dings of the R oyal So ciety, Ser. B: Biolo gic al Scie nc es , 271:S477–S48 1, 2004. [33] L . Massoulie. Comm un it y detection thresholds and th e w eak Ramanujan pr op ert y. In Pr o c e e dings of the 46th Annual ACM Symp osium on The ory of Computing , p ages 694– 703, No v. 2014. [34] C . Mathieu and W. Sc hudy . Correlation clustering with n oisy in put. I n Pr o c e e dings of the twenty-ﬁrst annual ACM , pages 712—-7 28, 2010. [35] E . Mossel, J . Neeman, and A. Sly . Sto c hastic b lo c k mo d els and reconstru ction. arXiv:1202 .1499, 2012. [36] E . Mossel, J. Neeman, an d A. Sly . A pr o of of the b lo c k model threshold conjecture. arXiv pr eprint arXiv:1311.4115 , No v . 2013. [37] E . Mossel, J. Neema n , and A. S ly . Consistency T hresholds for Binary Symmetric Blo ck Mo dels. arXiv pr eprint arXiv:1407.15 91 , July 201 4. [38] K . No w ic ki and T . A. B. Snijd ers. Estimation and p rediction for sto c h astic blo ckstructures. Journal of the Americ an Statistic al Asso ciation , 96(455):107 7–1087, 2001. [39] S . C . Olhede and P . J. W olfe. Net work histograms and u niv ers alit y of blo ckmod el ap- pro ximation. P r o c e e dings of the National A c ademy of Scienc es , 111(41) :14722 —-14727, 2014. [40] J . Pe n g and Y. W ei. Appr o ximating k-means-t yp e clustering via s emideﬁnite program- ming. SIAM Journal on Optimization , 18(1):18 6–205, 2007 . [41] T . Qin and K. Rohe. Regularized sp ectral clustering under the degree-corrected sto chastic blo c kmo d el. 2013. arxiv:1309.41 11. [42] K . Rohe, S. Ch atterjee, and B. Y u. S p ectral clustering and the high-dimensional sto c h astic blo c k m o del. Annals of Statistics , 39(4):1878 –1915, 2011. [43] T . S nijders and K. No wic ki. Estimation and p rediction for sto chastic b lo c k-structures for graphs with laten t b lo c k stru ctur e. Journal of Classiﬁc ation , 14:75–1 00, 1997 . 40 [44] D.-C. T omozei and L . Massouli ´ e. Distributed u ser proﬁling via sp ectral metho ds. In ACM SIGMETRICS Performanc e Evaluation R ev iew , pages 383—-3 84, 2010. [45] V. V u , J. Cho, J. Lei, and K . Rohe. F antope Pro jection and Selection: A nearoptimal con ve x relaxation of sparse PCA. In NIPS , pages 1–9 , 2013. [46] M. J. W ain wr igh t. Sharp thresholds for high-dimensional and noisy sparsit y reco v ery u sing constrained quadratic pr ogramming (Lasso). IEEE T r ans. Info. The ory , 55 (5):2183–2 202, 2009. [47] P . J. W olfe and S. C. Olhede. Nonparametric graphon estimation. arXiv pr eprint arXiv:1309.59 36 , 20 13. [48] E . P . Xing and M. I. Jordan. O n semideﬁnite relaxati on s for n ormalized k-cut and con- nections to sp ectral clustering. T ec h nical rep ort, UC Berkele y , 2003. [49] Y. Zhang, E. Levina, and J. Zh u. Estimating net work edge probabilities b y neigh b orh o o d smo othing. arXiv pr e print arXiv:1509.08588 , sep 2015 . [50] Y. Z hao, E. Levina, and J . Zh u. Con s istency of communit y d etectio n in netw orks under degree-correcte d s to chastic blo c k mo dels. A nnals of Statistics , 40(4):22 66–22 92, 2012. 41

On semidefinite relaxations for the block model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment