Compressive Network Analysis
Modern data acquisition routinely produces massive amounts of network data. Though many methods and models have been proposed to analyze such data, the research of network data is largely disconnected with the classical theory of statistical learning…
Authors: Xiaoye Jiang, Yuan Yao, Han Liu
Compressiv e Net w ork Anal ysis Xiao y e Jiang 1 , Y uan Y ao 2 , Han Liu 3 , Leonidas Guibas 1 1 Stanfor d Universit y; 2 P eki n g Universit y; 3 Johns Hopkins Universit y Octob er 17, 2018 Abstract Mo dern data acquisition routinely prod uces ma ssive amoun ts of net w ork d ata. Though man y met h o d s and mo dels ha ve b een prop osed to analyze such data, the researc h of net wo rk data is largely disconnected with the classical theory of stati stical learning and signal pro cessing. In this pap er, w e pr esent a new framew ork for mod el- ing net w ork data , whic h connects t wo seemingly differen t areas: network da ta analysis and c ompr esse d sensing . F rom a n onparametric p ersp ectiv e, w e m o d el an obs er ved net wo rk using a large dictionary . In particular, w e consider the net wo r k clique detec- tion problem and sho w conn ections b et w een our form ulation with a new algebraic to ol, namely R andon b asis pursuit in homo gene ous sp ac es . Suc h a connection allo ws u s to iden tify rigorous reco very conditions for clique detection problems. Though this pap er is mainly conceptual, we also d ev elop pr actical app ro ximation algorithms for s olving empirical problems and d emonstrate their usefulness on real-w orld datasets. Keyw ords : netw ork data analysis, compressiv e s ensing, Radon basis pursuit, restricted isometry prop erty , clique detection. I. Introduction In the pa st decade, the rese ar c h of net work data has increased dramatically . Examples include scien tific studies in v olving w eb data or h yp er text do cumen ts connected via h yp er- links, so cial net w orks o r user profiles connected via friend links, co-authorship and citation net w ork connected b y collab oration or citation relationships, gene or protein netw orks con- nected b y regulatory relationships, and m uch more. Suc h data app ear frequen tly in mo dern application domains a nd has led to n umerous high- impact applications. F or instance, detect- ing anomaly in ad- ho c information net work is vital for corp or ate and go ve rnmen t securit y; exploring hidde n comm unity structures helps us to b etter conduct online adv ertising and 1 mark eting; inferring larg e- scale gene regulatory netw ork is crucial for new drug design and disease con trol. Due to the increasing import a nce of net w ork data, principled a nalytical and mo deling to ols are crucially needed. T o w ards this goal, researc hers from the network mo deling comm unity hav e prop o sed many mo dels to explore and predict the net w ork data. These mo dels roughly fa ll into tw o cate- gories: static and dynamic mo dels. F or the static mo del, there is only one single snapshot of the netw ork b eing observ ed. In con tra st, dynamic mo dels can be applied to analyz e datasets that con tain man y snapshots of the netw ork indexed b y differen t time p oin ts. Ex- amples of the static net work mo dels include t he Erd¨ os-R ´ en yi-Gilb ert random graph mo del (Erd¨ os and R ´ en yi, 1 959, 1960), the p 1 (Holland and Leinhardt, 1981), p 2 (Duijn et al., 2004) and more general exp onen tial random graph (or p ∗ ) mo del (W asserman a nd P attison, 1996), laten t space mo del (Hoff et al., 20 0 1), blo ck mo del (Lorrain and White, 1971), sto chas - tic blo ckmodel (W asserman a nd Anderson, 1987), and mixed mem b ership sto c hastic blo c k- mo del (Airoldi et al., 200 8). Examples o f the dynamic netw ork mo dels include the preferen- tial attac hmen t mo del (Baraba si and Alb ert, 1999), the small-w orld mo del (W atts and Strogatz, 1998), duplication-at t a c hmen t mo del (Klein b erg et al., 1 999; Kumar et al., 2000), con tin u- ous time Mark o v mo del ( Snijders , 2005), and dynamic laten t space mo del (Sark a r and Mo ore, 2005). A comprehensiv e r eview of these mo dels is provided in Golden b erg et a l. (2010). Though many metho ds and mo dels ha v e been prop osed, t he researc h of netw ork data analysis is largely disconnected with the classical theory o f statistical learning and signal pro cessing. The main r eason is that , unlik e the usual scien tific data for whic h indep enden t measuremen ts can b e rep eatedly collected, net w ork data a r e in general collected in one single realization and the no des within the netw ork are highly relational due to the existence of many link- ages. Suc h a disconnection prev en ts us fro m directly exploiting the state-of-t he- a rt statistical learning metho ds and theory to a nalyze net w ork data. T o bridge this gap, w e presen t a nov el framew ork to mo del net w ork data. Our f r amew o rk assumes that the observ ed netw ork has a sparse represen tation with resp ect t o some dictionary ( o r basis space). O nce the dictionary is giv en, we form ulat e the netw ork mo deling problem into a c ompr esse d sensing problem. Compressed sensing, also know n as compressiv e sens ing and compressiv e sampling, is a tec h- nique for finding sparse solutions to underdetermined linear systems. In statistical mac hine learning, it is related to reconstructing a signal whic h has a sparse represen tat ion in a large dictionary . The field of compressed sensing has existed for decades, but recently it has ex- plo ded due to the imp ortant con tributio ns of Cand ` es and T ao (2005, 2007); Cand ` es (2008); Tsaig and Dono ho (2006). By viewing the observ ed netw ork adjacency matrix as the output of an underlying function ev a lua ted on a discrete domain of net w or k no des, w e can fo rm ulate the net w ork mo deling problem in to a compressed sensing problem. Sp ecifically , we consider the net w ork clique detection problem within this no v el framew or k. 2 By conside ring a generativ e mo del in w hich the observ ed adjacency matrix is assumed to ha v e a sparse represen tatio n in a large dictionary where e ach basis c orr esp onds to a clique, w e connect our framew o rk with a new algebraic to ol, na mely R a ndon b asis pur- suit in homo gene ous sp ac es . Our problem can be regarded as an ex tension of the w ork in Jagabathula and Shah (2008) whic h studies sparse recov ery of f unc tion s on p ermutation gr oups , while w e reconstruct f unc tion s on k -sets (cliques), often called the homo gene ous sp ac e a sso ciated with a p erm utation group in the literature (Diaconis, 1988). It turns out that the discrete Radon basis b ecomes the natural choice instead of the F o urier basis con- sidered in Jagabathula and Shah (200 8). This lea v es us a new challe nge on addressing the noiseless exact recov ery and stable reco ve ry with noise. Unfortunately , the g reedy algorithm for exact reco v ery in Jagabathula and Shah (2008) cannot b e applied to noisy settings, and in general the Radon basis do es not satisfy the Restricted Isometry Prop ert y (RIP) (Cand ` es, 2008) which is crucial fo r the univ ersal reco very . In this pap er, w e dev elop new theories and algorithms which guara n tee exact, sparse, and stable reco very under the c hoice of Radon basis. The se theories hav e deep ro o ts in Basis Purs uit (Chen et al., 1999) a nd its ex tensions with unifor mly b ounded noise. Though this pap er is mainly conceptual: sho wing t he con- nection b et we en net w ork mo deling and compressed sensing, we also provide some rigoro us theoretical analysis and practical algorithms on the cliq ue reco v ery problem to illustrate the usefulness of our fra mew or k. The main conten t of this pap er can b e summarized as follo ws. Section 2 presen ts the general framew ork on compressiv e net w ork analysis. In Section 3, 4 and 5, w e consider the clique detection problem under the compressiv e net w ork analys is framew ork. A p olynomial time appro ximation a lgorithm is provide d in Section 6 for the clique detection problem. W e also demonstrate successful application examples in Section 7. Section 8 concludes the pap er. I I. Main Idea In this section w e presen t the general framework of compre ssiv e net work analysis with a nonparametric view. W e start w ith an in tro duction of notations: let u = ( u 1 , . . . , u d ) T ∈ R d b e a v ector and I ( · ) b e the indicator function. W e denote k u k 0 ≡ d X j =1 I ( u j 6 = 0) , k u k 2 ≡ v u u t d X j =1 u 2 j , k u k ∞ ≡ max j | u j | . (2 . 1) W e also denote b y h· , ·i the Euclidean inne r pro duct and sign( u ) = (sign( u 1 ) , . . . , sign( u d )) T , 3 where sign( u j ) = +1 u j > 0 0 u j = 0 − 1 u j < 0 (2 . 2) W e represen t a net w ork as a graph G = ( V , E ), wh ere V = { 1 , . . . , n } is the se t of no des and E ⊂ V × V is the set of edges. Let B ∈ R n × n b e the adjacency matrix of the observ ed net work with B ij represen ts a quan t ity asso ciat ed with nodes i and j . With no loss of generalit y , w e assume that B is symmetric: B = B T and diag ( B ) = 0. W ith these assumptions, to mo del B we only need to mo del its upp er-triangle. F or notatio nal simplicit y , w e squeeze B in to a v ector b ∈ R M where M = n ( n − 1) / 2 is the n um b er of upp er-triang le elemen ts in B . Let f ( V ) ∈ R M b e a n unknow n v ector-v alued function defined o n V . W e assume a generativ e mo del of the observ ed adjacency matrix B (or equiv a len tly , b ): b = f ( V ) + z , (2 . 3) where z ∈ R M is a nois e v ector. W e can view f ( V ) as ev aluating a p ossibly infinite- dimensional function f on a discrete set V , th us the mo del (2 . 3) is in trinsically nonparametric and can mo del any static net w orks. Without further regular it y conditions or constrain ts, there is no hop e for us t o reliably estimate f . In our framew ork, w e assume tha t f has a sparse r epresen tation with respect to an M b y N dictionary A = [ φ 1 ( V ) , . . . , φ N ( V )] where eac h φ j ( V ) ∈ R M is a basis function, i.e., there exists a subset S ⊂ { 1 , . . . , N } with cardinality | S | ≪ N , suc h that f ( V ) = X q ∈ S x q φ q ( V ) . (2 . 4) In the sequel, w e denote by A pq the elemen t on the p -th row and q -th column of A . Here p indexes a pair of differen t no des a nd q indexes a basis φ q ( V ). T o estimate f , w e only need to reconstruct x = ( x 1 , . . . , x N ) T . Giv en the dictionary A , w e can estimate f b y solving the follo wing pr o gram: (P 0 ) min k x k 0 s.t. k b − Ax k z ≤ δ (2 . 5) where k · k z is a v ector norm constructed using the kno wledge of z . The problem in (2 . 5) is non-con ve x. In the sparse learning literature, a con v ex relaxation of (2 . 5) can b e written as (P 1 ) min k x k 1 s.t. k b − Ax k z ≤ δ. (2 . 6) One thing to note is that the dictionary A can b e either constructed based on the domain kno wledge, or it can b e learned fr o m empirical data. F or simplicit y , w e alw ay s assume A is 4 pre-giv en in this pap er. In the follow ing sections, w e use the clique detection problem as a case study to illustrate the usefulness of this framew ork. I I I. Clique Detec tion In netw ork data analysis, The problem of identifyin g comm unities or cliques 1 based on part ia l information arises frequen tly in many applications, including iden tity manag emen t (Guibas, 2008), statistical ra nking (D iaconis, 19 8 8; Jagabat hula and Shah, 20 08), and social net- w orks (Lesk ov ec et al., 2010 ). In these applications w e a r e t ypically given a netw ork with its no des r epresen ting pla y ers, items, or characters , and edge weigh ts summarizing the observ ed pairwise inte ra ctions. The b a s ic p r oblem is to determin e c ommunities or cli q ues within the network by observing the fr e quencies of low or der inter a c tion s, since in reality suc h low order in teractions are often go v erned b y a considerably smaller n um b er of high order communitie s or cliques. Therefore the clique detection problem can b e f orm ulated as c o m pr esse d sensin g of cliques in large netw orks. T o solv e this problem, one ha s t o answ er t w o que stions: (i) what is the suitable r epr esentation b asis, and (ii) what is the r e c onstruction metho d ? Before rigorously formulating the problem, w e pro vide three motiv ating examples as a glimpse of t ypical situations whic h can b e addressed within the fra mew or k in this pap er. Example 1 (T rac king T eam Identities) W e consider the scenario of m ultiple targets mo ving in an en vironmen t monitored by sensors. W e assume ev ery mo ving target has an iden tit y and they e ach b elong to some teams or groups. Ho w ev er, w e can only obtain partial in teraction information due to the measuremen t structure. F or example, watc hing a grey-scale video of a bask etball g ame (when it ma y b e ha rd to tell apa r t the tw o teams ), sensors may observ e ball passes or collab orative ly offensiv e/defensiv e in t era ctio ns b et w een teammates. The observ ations are partial due to the fact that pla ye rs mostly exhibit to sensors low order in teractions in bask etball games. It is difficult t o observ e a single eve nt whic h in v olve s all team mem b ers. Our ob jectiv e is to infer mem b ership information (whic h team the play ers b elong to) from suc h pa rtially observ ed in teractions. Example 2 (Inferring High Order Partial Rankings) The problem of clique identifi- cation also arises in ranking problems . Consider a collection of items which are to b e ra nked b y a set o f users. Eac h user can prop ose the set of his or her j most fav orite items (sa y top 3 items) but without sp ecifying a relative preference within this set. W e then wish to infer what are the to p k > j most fav o rite items (say t o p 5 items). This problem requires us to infer high order partial rankings from low order observ ations. Example 3 (Detecting Comm unities in So cial Netw orks) Detecting comm unities in 1 A clique means a complete subgraph of the netw or k. 5 so cial netw orks is of extraordinary imp ortance. It can b e used to understand the organization or colla b oration structure o f a so cial netw ork. Ho w ev er, w e do not hav e direct mec hanisms to sense so cial comm unities. Instead, w e ha v e pa r tial, lo w order interaction information. F or example, we observ e pa irwise or triple-wise co-app earance among p eople who hang out for some leisure activities together. W e hope to detec t those social commun ities in the netw ork from suc h part ia lly observ ation data. In t hese examples w e are typically given a net w or k with some no des represen ting pla y- ers, it ems, or c haracters, a nd edge w eigh ts summarizing the observ ed pairwise in teractions. T riple-wise and other low o rder information can b e further exploited if we consider complete sub-graphs or cliques in the net w orks. The b asic pr oble m is to determine c ommo n inter- est gr oups or cliques within the network by ob serving the fr e quenc y of low or der inter actions . Since in realit y suc h low order inte ractio ns are often go v erned by a considerably smaller num- b er of high order comm unities. In this sense w e shall form ulate our problem as c omp r esse d sensing of cliques in net works . The problem w e are going to address has a close relationship with comm unit y detection in so cial net works . Comm unity structures are ubiquitous in social netw orks. How ev er, there is no consisten t definition of a “comm unit y”. In the ma jority of researc h studies, communit y detections based on partitions of no des in a netw ork. Among these w orks, the most famous one is based on the mo dularity (Newman, 2006) of a par tition of the no des in a g r oup. A shortcoming in partition-based metho ds is tha t they do not allo w o v erlapping communitie s, whic h o ccur fr equen tly in practice. Recen tly there has b een gro wing in terest in s tudying o v erlapping communit y structures (Lancic hinetti and F ortunat o, 2009). The relev ance of cliques to o v erlapping comm unities w as probably first addressed in the clique percolat io n metho d (P alla et al., 2005). In that w ork, comm unities w ere modeled as maximal connected comp onen ts of cliques in a graph where t wo k -cliques are said to b e connected if they sh ar e k − 1 no des. In this pap er, we pursue a compressiv e represen tation of signals or functions on net w orks based on clique information whic h in turns sheds light on m ultiple asp ects of comm unit y structure. In this pap er, w e use the same definition as in P alla et a l. (2005) but are more inte rested in iden tifying cliques. W e pursue an alternativ e approac h on exploring net w orks based on clique information whic h p oten tially sheds ligh t on m ultiple asp ects of comm unity structures. Roughly sp eaking, we assume that there is a frequency f unction defined on complete low order subsets. F or example, in some so cial net works edge w eigh ts are biv ariate functions defined o n pairs of no des reflecting strength of pairwise in teractions. W e also assume t ha t there is another laten t frequency function defined on complete high order subsets whic h w e hop e to infer. In tuitive ly , the in teraction frequency of a pa rticular lo w or der subset should b e the sum of frequencies of high o r der subsets whic h it b elongs to. H ence w e consider 6 a gener ative me chan ism in whic h there exists a linear mapping f rom frequencie s on hig h order subsets (usually sparsely distributed) to lo w o r der subsets. One typic ally can collect data on low order subsets while the task is t o find t ho se few dominan t high order subsets. This problem naturally fits in to the general compressiv e net w ork a na lysis framew ork we in tro duced in the previous section. Belo w w e demonstrate that the Rado n basis will b e an appropriate represen tation f or our purp o se whic h allo ws the sparse r eco v ery b y a simple linear programming reconstruction approach. IV. Rad o n Basis Pursuit A. Mathematic al F ormulation Under the g eneral fr a mew or k in (2 . 3), w e fo rm ulate the clique detection problem in to a compressed sensing problem named R adon Basis Pursuit . F or this, w e construct a dictionary A so that eac h column of A corresp onds to one clique. The in tuition of suc h a construction is that w e assume there are sev eral hidden cliques within the net w ork, whic h a re p erhaps of differen t size s and ma y hav e o v erlaps. Ev ery cliqu e has certain w eigh ts. The observ ed adjacency matrix B (or equiv alently , its v ectorized v ersion b ) is a linear com bination o f man y clique basis contaminated b y a noise ve ctor ǫ . F or simplicit y , w e first restrict ourselv es to the case tha t all the cliques are of the same size k < n . The cas e with mixed sizes w ill b e discussed later. Let C 1 , C 2 , . . . , C N b e all the cliques o f size k and each C j ⊂ V . W e hav e N = n k . F or eac h q ∈ { 1 , . . . , N } , w e construct the dictionary A as the follo wing A pq = 1 if the p -th pair of no des b oth lie in C q 0 otherwise . The matrix A constructed ab ov e is related to discrete Radon tr a nsforms. In fact, up to a constan t and column scaling, t he tra nsp ose matr ix A ∗ is called t he discrete Radon transform for tw o suitably defined homogeneous spaces (Diaconis, 19 88). Our usage here is t o exploit the transp ose matrix of the Radon transform to construct a n o ve r- complete dictionar y , s o that the observ ed output b has a sparse represen tatio n with resp ect to it. More tec hnical discussions of the Radon transforms is b ey ond the scop e of this pap er. The ab ov e f o rm ulation can b e generalized to the case where b is a v ector of length n j ( j ≥ 2) with the p ’th en try in b c hara cterizing a quan tity asso ciated with a j - set (a set with cardinalit y j ). The dictionary A will then b e a binar y matrix R j,k with entries indicating 7 whether a j -set is a subset of a k -clique (a clique with k no des), i.e., R j,k pq = 1 if the p -th j -set of no des all lie in the k -clique C q 0 otherwise . Therefore, the case where b is t he v ector of length n 2 corresp onds to a sp ecial case where A = R 2 ,k . Our algorithms and theory hold for general R j,k with j < k . No w w e provide t w o conc rete reconstruction programs for the clique iden tification problems: ( P 1 ) min k x k 1 s.t. b = Ax ( P 1 ,δ ) min k x k 1 s.t. k Ax − b k ∞ ≤ δ. P 1 is known a s Basis Pursuit (Chen et al., 1999) where w e consider an ideal case that the noise leve l is zero. F or robust rec onstruction aga inst noise, w e consider the relaxe d program P 1 ,δ . The prog ram in P 1 ,δ differs from the Dantz ig selector (Cand` es and T ao, 2007 ) whic h uses the constrain t in the form k A ∗ ( Ax − b ) k ∞ ≤ δ . The reason for our c hoice of P 1 ,δ lies in the fact that a mo r e natural noise mo del f o r net w ork data is b ounded noise rather than Gaussian noise. Moreo ve r, our linear programming form ulation of P 1 ,δ enables practical computation for large scale problems. B. Intuition Let G = ( V , E ) be the net work w e are trying to mo del. The s et of ve rtices V represen ts individual iden tities suc h as p eople in the so cial net work. Eac h edge in E is asso ciated with some w eigh ts whic h represen t in teraction frequency info rmation. W e assume that there are sev eral common in terest groups or communities within the net- w ork, represe nted by cliques (or complete sub-graphs) within graph G , whic h are p erhaps of differen t sizes and may hav e o v erlaps. Ev ery comm unit y has certain interaction frequency whic h can b e vie we d as a function on cliques. How ev er, w e only receiv e partial measure- men ts consisting of lo w order interaction frequency on subsets in a clique. F or example, in the s implest case w e ma y only observ e pairwise inte ra ctions repres ente d by edge weigh ts. Our problem is to reconstruct the function o n clique s fro m such partially obse rve d data. A graphical illustration o f this idea is provided in Figure 1, in whic h w e see an observ ed net w ork can b e written as a linear com bination of sev eral ov erlapp ed cliques. One application scenario is to identify tw o bask etball t eams from pairwise in teractions among pla y ers. Supp ose w e ha v e x 0 whic h is a signal on all 5-sets o f a 1 0-pla ye r set. W e a ssume it is sparsely concen tra t ed o n tw o 5-sets whic h corr esp ond to the tw o teams with nonzero w eigh ts. Assume w e ha ve observ a tions b of pairwise interactions b = Ax 0 + z , whe re z is 8 Figure 1: An illustrativ e example of the main idea. uniform rando m noise defined on [ − ǫ, ǫ ]. W e solv e P 1 ,δ , with δ = ǫ , whic h is a linear program o v er x ∈ R ( 10 5 ) = R 252 with parameters A ∈ R ( 10 2 ) × ( 10 5 ) = R 45 × 252 and b ∈ R 45 . C. C o nne ction wi t h R adon Basis Let V j denote t he set of all j -sets of V = { 1 , · · · , n } and M j b e the set of real- v alued functions o n V j . The obse rved interaction frequencies b on a ll j - sets, can be view ed as a function in M j . W e build a matrix e R j,k : M k → M j ( j < k ) as a mapping from functions on all k -sets of V to functions on all j -sets of V . In this setup, each row represen ts a j -set and eac h column represen ts a k -set. The en tries of e R j,k are either 0 or 1 indicating whether the j - set is a subset of the k -set. No t e that ev ery column of e R j,k has k j ones. Lackin g a priori info rmation, w e assume that ev ery j -set of a particular k -set ha s equal in teraction probabilit y , whence choo se the same constan t 1 for eac h column. W e further normalize e R j,k to R j,k so that the ℓ 2 norm of eac h column of R j,k is 1. T o summarize, w e ha ve R j,k ( σ ,τ ) = ( 1 q k j , if σ ⊂ τ ; 0 , otherwise , where σ is a j -set and τ is a k -set. As w e will see, this construction leads to a canonical basis asso ciated with the discrete Rado n transform. The size o f ma t r ix R j,k clearly dep ends on the tota l n um b er of items n = | V | . W e omit n as its me aning will b e clear from the con text. The matrix R j,k constructed ab o ve is related to discre te Radon transforms on homogeneous space M k . In fact, up to a constan t, the a djoin t op erator ( R j,k ) ∗ is called the discrete Radon transform from homogeneous space M j to M k in D ia conis (1 9 88). Here all the k - sets form a homogeneous space. The collection of all row v ectors of R j,k is called as the j -th R adon b asis for M k . Our usage here is to exploit the tr a nspo se matrix of the Radon transform to construct a n o ve r- complete dictionary for M j , so that the observ ation b can b e represen ted b y a p ossibly sparse function x ∈ M k ( k ≥ j ). The Radon basis w as prop osed as an efficien t w ay to study partially ra nk ed data in Diaconis (1988), where it w a s sho wn that by lo oking at lo w order Radon co efficien t s of a function on 9 M k , w e usually get useful and interpretable information. The approac h here adds a rev ersal of this persp ectiv e, i.e. the reconstruction of sparse high order functions from lo w order Radon co efficien ts. W e will discuss this in the sequel with a connection to the compressiv e sensing (Chen et al., 19 99; Cand ` es and T ao, 2005). V. Ma thema tical Theor y One adv antage o f our new framew ork on compressiv e net w ork analysis is that it enables rigorous theoretical ana lysis of the corresp onding conv ex programs. A. F ailur e of Universal R e c overy Recen tly it w as show n b y Cand ` es and T ao (2005) and Cand ` es (2008) t ha t P 1 has a unique sparse solution x 0 , if the matrix A satisfies the R estricte d I s ometry Pr op erty (RIP), i.e. for ev ery sub set of columns T ⊂ { 1 , . . . , N } with | T | ≤ s , there exists a certain univ ersal constan t δ s ∈ [0 , √ 2 − 1) suc h that (1 − δ s ) k x k 2 2 ≤ k A T x k 2 2 ≤ ( 1 + δ s ) k x k 2 2 , ∀ x ∈ R | T | , where A T is the sub-matrix of A with columns indexed b y T . Then exact recov ery holds for all s -sparse signals x 0 (i.e. x 0 has at most s non- zero comp onents ), whence called the universal r e c overy . Unfortunately , in our construction of the basis matrix A , RIP is not satisfied unles s for v ery small s . The followin g theorem illustrates the fa ilure of univ ersal reco v ery in our case. Theorem 5.1. Let n > k + j + 1 a nd A = R j,k with j < k . Unless s < k + j +1 k , there do es not exist a δ s < 1 suc h that the inequalities (1 − δ s ) k x k 2 2 ≤ k A T x k 2 2 ≤ ( 1 + δ s ) k x k 2 2 , ∀ x ∈ R | T | hold univ ersally for ev ery T ⊂ { 1 , . . . , N } with | T | ≤ s , where N = n k . Note that k + j +1 k do es not dep end on the netw ork size n , whic h will b e problematic. W e can only reco v er a constan t n umber of cliques no matter ho w large the net w or k is. The main problem fo r suc h a negativ e result is that t he RIP tries to guar a n tee exact recov ery for arbitr ary signals with a sparse represen tation in A . F or man y applications, suc h a condition is to o strong to b e realistic. Instead of studying suc h “univers al” conditions, in this pap er we seek conditions that secure exact rec ov ery of a collection of sparse signals x 0 , whose sparsity 10 pattern satisfies certain conditions mo r e appro pr ia te to our setting. Such conditions could b e mo r e natural in realit y , whic h will b e sho wn in the sequel a s simply r equiring b ounded o v erlaps b et w een cliques. Remark 5.2. Recall that the matrix A has altog ether N = n k columns. Each column in fact correspo nds t o a k -clique. Therefore, w e c ould also use a k -clique to index a column of A . In this sense, let T = { i 1 , . . . , i k } ⊂ { 1 , . . . , N } b e a subset of size k . An equiv alen t notation is to represen t T as a class of sets: T = { τ 1 , . . . , τ k } where eac h τ i ⊂ { 1 , . . . , n } and | τ | = k . Pr o of . W e can extract a set of columns T = { τ : τ ⊂ { 1 , 2 , · · · , k + j + 1 } and | τ | = k } ( τ is interpreted as a k -set) and form a s ubmatrix A T . Recall that A has altogether n j n um b er of row s. Com bined with the condition that n > k + j + 1 and the fact that the n um b er o f nonzero rows of A T should b e exactly k + j +1 j . W e know that there m ust exist ro ws in A T whic h only contains zero es. By discarding zero ro ws, it is easy to show that the r a nk of A T is at most k + j +1 j , whic h is less than the n umber of columns. T o see that the r a nk of A T is at mos t k + j +1 j , w e need to exploit the fact that j < k , therefore k + j + 1 j < k + j + 1 k , (5 . 1) from whic h w e see that the num b er of nonzero ro ws of A T is smaller than the n umber of columns. Th us, the columns in A T m ust b e linearly dep enden t. In other w ords, there exist a nonzero v ector h ∈ R N where supp( h ) ⊂ T suc h that Ah = 0. When s ≥ k + j +1 k , Since | supp ( h ) | ≤ | T | < s , w e can no t exp ect univ ersal sparse reco ve ry for all s - sparse signals . B. Exact R e c overy Conditions Here w e presen t our exact rec ov ery conditions for x 0 from the observ ed data b by solving the linear program P 1 . Supp ose A is an M -by- N matrix and x 0 is a sparse signal. Let T = supp( x 0 ), T c b e the complemen t of T , and A T (or A T c ) b e t he submatrix of A where w e only extract column set T (o r T c , resp ectiv ely). The fo llo wing prop osition from Cand ` es and T ao (2005) c haracterizes t he conditio ns that P 1 has a unique condition. T o mak e this pap er self-con tained, w e also include the pro of in this section. Prop osition 5.3. (Cand ` es and T ao, 2005) Let x 0 = ( x 01 , . . . , x 0 N ) T , we a ssume that A ∗ T A T is in v ertible and there exists a v ector w ∈ R M suc h that 11 1. h A j , w i = sign( x 0 j ) , ∀ j ∈ T ; 2. | h A j , w i | < 1 , ∀ j ∈ T c . Then x 0 is the unique solution f o r P 1 . Pr o of . The necess ity of the t w o conditio ns come from the KKT conditions of P 1 . If w e consider an equiv alen t form of P 1 min 1 T ξ sub ject to Ax − b = 0 − ξ ≤ x ≤ ξ ξ ≥ 0 whose Lagrangia n is L ( x, ξ ; γ , λ , µ ) = 1 T ξ + γ T ( Ax − b ) − λ T + ( ξ − x ) − λ T − ( ξ + x ) − µ T ξ . Here γ ∈ R M , λ + = ( λ + (1) , . . . , λ + ( N )) T ∈ R N + , λ − = ( λ − (1) , . . . , λ − ( N )) T ∈ R N + , µ ∈ R N + are the Lagra nge m ultipliers. Then the KKT condition give s 1. A ∗ γ + ( λ + − λ − ) = 0, 2. 1 − ( λ + + λ − ) − µ = 0, with λ, µ ≥ 0 and λ + ( j ) λ − ( j ) = 0 fo r all j . Clearly T = supp( x 0 ) = { j : ξ j > 0 } . Let w = γ , b y the Strictly Complemen tary Theorem for line ar pro g ramming in Y e (1997), there exis t µ and ξ such that 1 > µ j > 0 for all j ∈ T c with ξ j = 0, and µ j = 0 for all j ∈ T with ξ j > 0. Th us, the first equation leads to h w , A j i = − ( λ + ( j ) − λ − ( j )) = − sign ( x 0 j ) , j ∈ T ; the second equation leads to |h w , A j i| = | λ + ( j ) − λ − ( j ) | = 1 − µ j < 1 . Therefore, the tw o conditions are necessary for x 0 to b e the unique solution of P 1 . 12 T o pro ve that these t wo conditions are sufficien t to guarantee x 0 is the unique minimiz er to P 1 , w e need to show a n y minimizer y 0 to the problem P 1 m ust b e equal to x 0 . Sinc e x 0 ob eys the constraint Ax 0 = b , we m ust hav e k y 0 k 1 ≤ k x 0 k 1 . No w take a w ob eying the tw o conditions, w e then compute k y 0 k 1 = X j ∈ T | x 0 j + ( y 0 j − x 0 j ) | + X j ∈ T | y 0 j | ≥ X j ∈ T sign( x 0 j )( x 0 j + ( y 0 j − x 0 j )) + X j 6∈ T y 0 j h w , A j i = X j ∈ T | x 0 j | + X j ∈ T ( y 0 j − x 0 j ) h w , A j i + X j 6∈ T y 0 j h w , A j i = X j ∈ T | x 0 j | + * w , X j ∈ T ∪ T c y 0 j A j − X j ∈ T x 0 j A j + = k x 0 k 1 + h w , b − b i = k x 0 k 1 Th us, the inequalities in the ab ov e computation must in fact be equalit y . Since | h w , A j i | is strictly less than 1 for all j 6∈ T , this in particular forces y 0 j = 0 for all j 6∈ T . Th us X j ∈ T ( y 0 j − x 0 j ) A j = f − f = 0 . Since all columns in A T are indep enden t, w e m ust hav e y 0 j = x 0 j for all j ∈ T . Th us x 0 = y 0 . This concludes the pro o f o f our theorem. The ab o ve theorem p o ints out the necessary and sufficien t condition that in the noise-free setting P 1 exactly reco v er the sparse signal x 0 . The necessit y and sufficiency comes from the KKT condition in conv ex optimization theory (Cand ` es and T ao, 2005). Ho w ev er this condition is difficult to c hec k due to the presence o f w . If we furt her a ssume that w lies in the column span o f A T , the condition in Prop osition 5.3 reduces to the fo llowing condition. Irrepresen table Condition (IRR) The matrix A satis fies the IRR condition with resp ect to T = supp( x 0 ), if A ∗ T A T is in v ertible and k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ < 1 , 13 or, equiv alen tly , k ( A ∗ T A T ) − 1 A ∗ T A T c k 1 < 1 , where k · k ∞ stands for the matrix sup-norm, i.e., k A k ∞ := max i P j | A ij | and k A k 1 = max j P i | A ij | . Prop osition 5.4. By restricting t ha t w lies in the image of A T , the conditions in prop o- sition 5.3 reduce to the IRR condition. Pr o of . Since w lies in the image of A T , we can write w = A T v . T o mak e sure that the first condition in Prop o sition 5.3 holds, w e m ust hav e v = ( A ∗ T A T ) − 1 sign( x 0 ), so w = A T ( A ∗ T A T ) − 1 sign( x 0 ) . No w the second condition in prop osition 5.3 can b e equiv alen tly written as k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ < 1 , whic h is exactly the IRR condition. In tuitiv ely , the IRR condition requires that, f o r the true sparsity signal x 0 , the relev an t bases A T is not highly correlated with irrelev ant bases A c T . Not e that this condition only dep ends o n A and x 0 , which is easier to c hec k. The a ssumption tha t w lies in t he column span of A T is mild; it is actually a necessary condition so that x 0 can b e reconstructed b y Lasso (Tibshirani, 1996) or Dan tzig selector (Cand ` es and T ao, 200 7), ev en under Ga ussian- lik e noise a ssumptions (Zhao and Y u, 20 06; Y uan and Lin, 200 7). C. Dete cting Cliques of Equal Size In this subsection, w e presen t sufficien t conditio ns of IRR whic h can b e easily v erified. W e consider the case that A = R j,k with j < k . Give n data b ab out all j - sets, w e w ant to inf er imp ortan t k -cliques. Supp ose x 0 is a sparse signal on all k -cliques. W e ha ve t he follo wing theorem, whic h is a direct result of Lemma 5 .6. Theorem 5.5. Let T = supp ( x 0 ) , if w e enforce the ov erlaps among k - cliques in T to b e no larger than r , t hen r ≤ j − 2 guaran tees t he IRR condition. Lemma 5.6. L et T = supp( x 0 ) and j ≥ 2 . Supp ose f o r a ny σ 1 , σ 2 ∈ T , the tw o cliques corresp onding to σ 1 and σ 2 ha v e o ve rlaps no larg er than r , w e hav e 14 1. If r ≤ j − 2 , then k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ < 1 ; 2. If r = j − 1 , then k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ ≤ 1 where equalit y holds with certain examples; 3. If r = j , there are examples suc h that k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ > 1 . One thing to note is that Theorem 5.5 is only an easy-to-v erify condition based on the w orst- case analysis, whic h is sufficien t but not necessary . In fact, what really ma t ters is the IRR condition. It uses a simple c haracterization of allow ed clique ov erlaps whic h guaran tees the IRR Condition. Sp ecifically , clique o ve rlaps no larger than j − 2 is sufficien t to guarantee the exact sparse reco v ery by P 1 , while larger o ve rlaps ma y viola te the IR R Condition. Since this theorem is based on a w orst- case analysis, in real applications, one ma y encoun ter examples whic h hav e o v erlaps larger than j − 2 while P 1 still w orks. In summary , IRR is sufficie nt and almost necessary to guaran tee exact reco v ery . The orem 5 .5 tells us the intuition b ehind the IRR is that overlaps among cliques must b e smal l enough , whic h is easier to c hec k. In the next subsection, w e show that IRR is a lso sufficien t to guaran tee stable r ecov ery with noises. Pr o of . T o prov e Lemma 5.6, giv en a n y τ ∈ T c , w e define µ τ ≡ X σ ∈ T | τ ∩ σ | j k j . the in tuition of suc h a definition is that sup τ ∈ T c µ τ = k A ∗ T c A T k ∞ . (5 . 2) As w e will see in the following pro ofs, w e essen tially try to b ound µ τ for τ ∈ T c . Before we presen t the detailed tec hnical proof, w e first in tro duce the high-lev el idea: o ur main purpose is to bo und k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ . Since eac h en try of the matrix A ∗ T A T is indexed by t w o k -sets, the v alue of this entry represen ts ho w many j -sets ar e con tained in the in tersection of these tw o k - sets. Und er the condition that r ≤ j − 1, it’s straigh tforw ard that the matrix A ∗ T A T is an iden tit y . Therefore, b ounding k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ is equiv alen t as b ounding k A ∗ T c A T k ∞ , whic h is exactly sup τ ∈ T c µ τ . Pr o of of the c ase under Condition 1 Under Condition 1, since an y σ 1 , σ 2 ∈ T satisfy | σ 1 ∩ σ 2 | ≤ j − 2 , hence any t wo c olumns in T are orthogo nal. This implies A ∗ T A T is an iden tity matrix. No w give n τ ∈ T c , w e will prov e µ τ < 1 under condition 1. If t his is true, then sup τ ∈ T c µ τ = k A ∗ T c A T k ∞ = k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ < 1 15 Let T = { σ 1 , σ 2 , · · · , σ | T | } where σ i (1 ≤ i ≤ | T | ) are k -sets. W e need to prov e µ τ = | T | X i =1 | τ ∩ σ i | j k j < 1 for all τ ∈ T c . Let M i = { ρ : | ρ | = j, ρ ⊂ τ ∩ σ i } , so M i is a collection of j -sets of τ ∩ σ i (Here if | τ ∩ σ i | < j , then M i is simply an empt y set). Ob viously , w e hav e |M i | = | τ ∩ σ i | j . So | T | X i =1 | τ ∩ σ i | j = | T | X i =1 |M i | . No w w e note the fact that for a n y 1 ≤ i, l ≤ | T | , we hav e M i ∩ M l = ∅ . This is true b ecause otherwise supp ose ρ ∈ M 1 ∩ M 2 , then this mean ρ is a j -set of M 1 and M 2 . Hence ρ ⊂ τ ∩ σ 1 , ρ ⊂ τ ∩ σ 2 , whic h implies that | σ 1 ∩ σ 2 | ≥ | ( τ ∩ σ 1 ) ∩ ( τ ∩ σ 2 ) | ≥ | ρ | ≥ j. This con tradicts with the condition that σ i ’s(1 ≤ i ≤ T ) hav e o v erlaps at most j − 2. So M i m ust b e pairwise disjoin t. Hence | T | X i =1 | τ ∩ σ i | j = | T | X i =1 |M i | = | ∪ | T | i =1 M i | F or an y 1 ≤ i ≤ | T | , ev ery ρ ∈ M i is a j -set of τ ∩ σ i . Hence ρ is of course a j -set of τ . The set τ is of size k . So if we let M 0 = { ρ : | ρ | = j, ρ ⊂ τ } whic h is the collection of all j -sets of τ , then w e ha v e ∪ | T | i =1 M i ⊂ M 0 . So | ∪ | T | i =1 M i | ≤ |M 0 | ≤ k j . Till no w, w e actually prov ed µ τ ≤ 1. All the ab ov e pro of ab out µ τ ≤ 1 for any τ ∈ T c will remain v alid for condition 2. In the next, w e prov e if an y σ i , σ l ∈ T satisfy | σ i ∩ σ l | ≤ j − 2, then equalit y can not hold. Without loss of generalit y , we assume | σ 1 ∩ τ | ≥ j , otherwise if none of σ i ’s satisfies | σ i ∩ τ | ≥ j , then µ τ = 0 whic h actually finishes the pro of. T o show the the equalit y will not ho ld, we only need to find one j -set that is do es not b elong to ∪ i M 0 . In this case, w e can let τ = { 1 , 2 , · · · , k } , σ 1 = { 1 , 2 , · · · , s, k + 1 , k + 2 , 2 k − s } where j ≤ s ≤ k − 1( s ≤ k − 1 because otherwis e σ 1 = τ whic h con tra dicts with the fact that σ 1 ∈ T , τ ∈ T c ). No w w e sho w that ρ 0 = { 1 , 2 , · · · , j − 1 , s + 1 } is not a mem b er of ∪ | T | i =1 M i . Clearly ρ 0 is not a mem b er of M 1 b ecause s + 1 6∈ σ 1 . No w it r emains to sho w that ρ 0 is not a 16 mem b er of an y M i (2 ≤ i ≤ | T | ). If this was not true, sa y ρ 0 ∈ M 2 , then ρ 0 ⊂ ( τ ∩ σ 2 ) ⊂ σ 2 , then { 1 , 2 , · · · , j − 1 } ⊂ σ 1 ∩ σ 2 , whic h con tradicts with the condition that | σ 1 ∩ σ 2 | ≤ j − 2. While it is clear tha t ρ 0 in M 0 , so this means ∪ | T | i =1 M i is a pro p er subset of M 0 . So | ∪ | T | i =1 M i | < k j whic h means µ τ < 1. Pr o of of the c ase under Condition 2 Under condition 2, then almost the same as pro of fo r lemma 1. W e hav e A ∗ T A T is a n iden tity matrix and µ τ ≤ 1. How ev er, one can not sho w µ τ < 1 in this case. W e ha v e the follo wing example where if n is lar g e enough, then µ τ can happ ens to b e equal to one exactly . Let τ = { 1 , 2 , · · · , k } ∈ T c . Denote all the j -sets of τ to b e ρ 1 , ρ 2 , · · · , ρ ( k j ) . when n is large enough, w e c ho ose k j disjoin t ( k − j )-sets o f { k + 1 , k + 2 , · · · , n } , denoted b y ω 1 , ω 2 , · · · , ω ( k j ) . Let T = { σ 1 , σ 2 , · · · , σ | T | } , where σ i = ρ i ∪ ω i . Hence | T | = k j and σ i ’s satisfy | σ i ∩ σ j | ≤ j − 1. But | T | X i =1 | τ ∩ σ i | j k j = | T | X i =1 1 k j = 1 . Pr o of of the c ase under Condition 3 Under condition 3, we can construct examples where k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ > 1 . Let ρ 1 , ρ 2 , · · · , ρ ( k j ) b e all j -sets of { 1 , 2 , · · · , k } . F or large enough n , it is possible to c ho ose k j + 1 disjoint ( k − j )-sets of { k + 1 , k + 2 , · · · , n } , say ω 0 , ω 1 , ω 2 , · · · , ω ( k j ) . Let σ i = ρ i ∪ ω i for 1 ≤ i ≤ k j and σ 0 = ρ 1 ∪ ω 0 . Define T = { σ 0 , σ 1 , σ 2 , · · · , σ ( k j ) } whic h is of size | T | = k j + 1. In this case, | σ i ∩ σ l | = j − 1 for an y 1 ≤ i, l ≤ k j and | σ 0 ∩ σ 1 | = j , | σ 0 ∩ σ i | ≤ j − 1 for an y 2 ≤ i ≤ k j . The n A ∗ T A T is a k j + 1 b y k j + 1 matrix shown b elow with ro ws and columns corresp onds to { σ 0 , σ 1 , · · · , σ ( k j ) } A ∗ T A T = 1 ǫ 0 0 · · · 0 ǫ 1 0 0 · · · 0 0 0 1 0 · · · 0 0 0 0 1 · · · 0 0 0 . . . . . . . . . 0 0 0 0 0 · · · 1 17 Here ǫ = 1 ( k j ) . The in v erse of the matrix is ( A ∗ T A T ) − 1 = 1 1 − ǫ 2 − ǫ 1 − ǫ 2 0 0 · · · 0 − ǫ 1 − ǫ 2 1 1 − ǫ 2 0 0 · · · 0 0 0 1 0 · · · 0 0 0 0 1 · · · 0 0 0 . . . . . . . . . 0 0 0 0 0 · · · 1 Consider τ = { 1 , 2 , · · · , k } ∈ T c , then t he r ow corresp onds to τ fo r A ∗ T c A T is a v ector of length | T | = k j + 1 with each en try b eing ǫ = 1 ( k j ) . So the row v ector corresp onds to τ in A ∗ T c A T ( A ∗ T A T ) − 1 is a v ector of length k j + 1, [ ǫ 1+ ǫ , ǫ 1+ ǫ , ǫ, ǫ, · · · , ǫ ]. This v ector has ro w sum 2 ǫ 1 + ǫ + ( k j − 1) ǫ = 2 ǫ 1 + ǫ + ( 1 ǫ − 1) ǫ = 1 + 2 ǫ − ǫ 2 1 + ǫ > 1 + 2 ǫ − ǫ 1 + ǫ = 1 Hence in this example k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ > 1. In the following, w e construct explicit conditions whic h a llo w larg e ov erlaps while the IRR still holds, as long as suc h he avy o v erlaps do not o ccur t o o often among the cliques in T . The existenc e of a partition of T in the next theorem is a reasonable assumption in the net w ork settings where net w ork hierarc hies exist. In so cial netw orks, it has b een o bserv ed b y Girv an a nd Newman (2002) that comm unities themselv es also join together to form meta- comm unities. The assumptions that w e made in the ne xt theorem where we allow relativ ely larger o ve rla ps b etw een communitie s from the same meta-commu nity , while we allow rela- tiv ely smaller ov erlaps b et we en communitie s fro m different meta-comm unities c haracterize suc h a scenario. Theorem 5.7. Assume ( k + 1) / 2 ≤ j < k . let T = supp( x 0 ) . Supp o se there exist a partition T = T 1 ∪ T 2 ∪ · · · ∪ T m with eac h T i satisfies | T i | ≤ K , suc h that • for any σ i , σ j b elong to the same partitio n, | σ i ∩ σ j | ≤ r ; • for any σ i , σ j b elong to differen t partitions, | σ i ∩ σ j | ≤ 2 j − k − 1 . If K satisfies ( K − 1) r j / k j < 1 / 4 , k − 1 j + ( K − 1) ( k + r ) / 2 j ! / k j ≤ 3 / 4 , then IRR holds. 18 Pr o of . W e will sho w the follow ing t w o inequalities hold. k A ∗ T A T − I k ∞ ≤ ( K − 1) r j / k j , k A ∗ T c A T k ∞ ≤ k − 1 j +( K − 1) ( k + r ) / 2 j ! / k j . W e first b ound t he sup-norm o f A ∗ T A T − I . Note that when σ i and σ j b elong to differen t partitions of T , then | σ i ∩ σ j | = 0 b ecause t heir o ve rla p is no larger than 2 j − k − 1 whic h is strictly smaller than j . So A ∗ T A T is a blo c k diagonal matrix with blo c k sizes | T 1 | , | T 2 | , · · · , | T m | , and eac h diagonal en try of A ∗ T A T is one. Th us, for any σ ∈ T , only cliques from the same partition as σ may hav e o v erlaps with σ greater than j . Th us, the row sum o f A ∗ T A T − I can b e b ounded by ( K − 1) r j / k j . So the first inequalit y is no w established. T o pro v e the second inequalit y , we observ e that for a fixed τ ∈ T c , | τ ∩ σ i | ≥ j and | τ ∩ σ j | ≥ j can not hold at the same time for any σ i and σ j b elong t o different par t it io ns. This is b ecause otherwise, w e will ha v e | τ | ≥ | τ ∩ ( σ i ∪ σ j ) | = | τ ∩ σ i | + | τ ∩ σ j | − | τ ∩ σ i ∩ σ j | ≥ j + j − (2 j − k − 1) = k + 1 Th us, all σ ’s whic h ha ve in tersections with a fixed τ no less than j m ust lie in the same partition of T . F or the same reason, we can sho w that for a fixed τ ∈ T c , | τ ∩ σ i | ≥ ( k + r + 1) / 2 and | τ ∩ σ j | ≥ ( k + r + 1) / 2 can not hold at the same time for σ i and σ j b elong to the same partition of T . This is b ecause otherwise, w e will hav e | τ | ≥ | τ ∩ ( σ i ∪ σ j ) | = | τ ∩ σ i | + | τ ∩ σ j | − | τ ∩ σ i ∩ σ j | ≥ ( k + r + 1) / 2 + ( k + r + 1) / 2 − r = k + 1 Th us we kno w the maxim um row su m of A ∗ T c A T is b ounded fro m ab o v e by k − 1 j + ( K − 1) ( k + r ) / 2 j ! / k j . No w if K further satisfies ( K − 1) r j / k j < 1 / 4 , k − 1 j + ( K − 1) ( k + r ) / 2 j ! / k j ≤ 3 / 4 . then, w e hav e k A ∗ T A T − I k ∞ < 1 / 4 , k A ∗ T c A T k ∞ ≤ 3 / 4 . 19 Th us, k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ ≤ k A ∗ T c A T k ∞ k ( A ∗ T A T ) − 1 k ∞ ≤ k A ∗ T c A T k ∞ (1 + ∞ X i =1 k ( A ∗ T A T − I ) k i ∞ ) < 3 / 4(1 + ∞ X i =1 (1 / 4) i ) = 1 So IRR holds under our conditions. The basis matrix A = R j,k ha v e n k bases, which is not p olynomial with r esp ect to k . As w e will se e from later sections, a practical imple men ta tion of the Radon ba sis pursuit for the clique detection problem w orks on a subset of bases among all n k bases. In that case, w e are actually solving P 1 and P 1 ,δ with the basis matrix ¯ A , whic h is only a submatrix of A with a subset o f column bases extracted. W e ha v e the follow ing theorem regarding this scenario. Theorem 5.8. Denote the set of all cliques for columns in ¯ A b y S , where ¯ A is a submatrix of A . As sume a n y tw o k -cliques in ¯ A ha ve interse ctions at most r , i.e. ∀ σ i , σ j ∈ T ∪ T c , | σ i ∩ σ j | ≤ r , where T = supp( x 0 ) ⊂ S , and T c is the complemen t of T with resp ect to S . Then IRR holds if r ≤ 1 | T | (1 + p | T | ) ! 1 /j k (5 . 3) Pr o of . Note that k ¯ A ∗ T c ¯ A T ( ¯ A ∗ T ¯ A T ) − 1 k ∞ ≤ k ¯ A ∗ T c ¯ A T k ∞ k ( ¯ A ∗ T ¯ A T ) − 1 k ∞ ≤ k ¯ A ∗ T c ¯ A T k ∞ · p | T |k ( ¯ A ∗ T ¯ A T ) − 1 k 2 So it suffices to show k ¯ A ∗ T c ¯ A T k ∞ · p | T |k ( ¯ A ∗ T ¯ A T ) − 1 k 2 < 1 under condition (5 . 3). Firstly , k ¯ A ∗ T c ¯ A T k ∞ = max τ ∈ T c X σ ∈ T | τ ∩ σ | j k j ≤ | T | r j k j , since | τ ∩ σ | ≤ r . 20 A t least we need | T | r j / k j < 1 . (5 . 4) Secondly , let K = ¯ A ∗ T ¯ A T , then K ii = 1 and since ∀ σ i , σ j ∈ T , | σ i ∩ σ j | ≤ r , w e ha ve K ij ≤ r j k j . Under condition (5 . 4), K is dia g onal dominan t, i.e. K ii > X j 6 = i | K ij | . Then b y G irshgorin Circle Theorem, λ min ≥ 1 − X j 6 = i | K ij | ≥ 1 − ( | T | − 1) r j / k j ≥ 1 − | T | r j / k j . Therefore it suffices to hav e | T | r j k j p | T | 1 − | T | r j / k j < 1 whic h gives r j < 1 | T | (1 + p | T | ) k j . T o satisfy this, it suffices to assume r < 1 | T | (1 + p | T | ) ! 1 /j k . D. Stable R e c overy The or ems In applications, one alw a ys encoun ters examples with noise suc h that exact sparse reco ve ry is imp ossible. In this setting, P 1 ,δ will b e a go o d replacemen t of P 1 as a robust reconstruction program. Here w e pr esen t stable r eco v ery theorem of P 1 ,δ with b ounded noise. 21 Theorem 5.9. Under the general framew ork (2 . 3 ) , w e assum e that k z k ∞ ≤ ǫ , | T | = s , and the IRR k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ ≤ α ≤ 1 /s. Then the following error b o und holds for an y solutio n b x δ of P 1 ,δ , k b x δ − x 0 k 1 ≤ 2 s ( ǫ + δ ) 1 − α s k A T ( A ∗ T A T ) − 1 k 1 . (5 . 5) Pr o of . Let h = b x δ − x 0 . Note that k A b x δ − b k ∞ ≤ δ and z = Ax 0 − b with k z k ∞ ≤ ǫ . Then k Ah k ∞ = k A b x δ − Ax 0 k ∞ = k A b x δ − b + b − Ax 0 k ∞ ≤ k A b x δ − b k ∞ + k z k ∞ ≤ δ + ǫ. (5 . 6) W e denote b x δ | T as constraining b x δ on the suppo r t T , i.e. all the entrie s of b x δ corresp onding to T c will b e set to zero. F ro m the optimization problem in ( P 1 ,δ ), we kno w that k x 0 k 1 ≥ k b x δ k 1 , k h T k 1 = k x 0 − b x δ | T k 1 ≥ k x 0 k 1 − k b x δ | T k 1 ≥ k b x δ k 1 − k b x δ | T k 1 = k b x δ | T c k 1 = k h T c k 1 . (5 . 7) Therefore, |h Ah, A T ( A ∗ T A T ) − 1 h T i| = |h A T h T , A T ( A ∗ T A T ) − 1 h T i + h A T c h T c , A T ( A ∗ T A T ) − 1 h T i| ≥ k h T k 2 2 − |h h T c , A ∗ T c A T ( A ∗ T A T ) − 1 h T i| ≥ k h T k 2 2 − k h T c k 1 k A ∗ T c A T ( A ∗ T A T ) − 1 h T k ∞ ≥ 1 s k h T k 2 1 − α k h T c k 1 k h T k ∞ ≥ 1 s k h T k 2 1 − α k h T c k 1 k h T k 1 ≥ 1 s − α k h T k 2 1 where the last step is due to k h T k 1 ≥ k h T c k 1 in the inequalit y (5 . 7). On the other hand, |h Ah, A T ( A ∗ T A T ) − 1 h T i| ≤ k Ah k ∞ k A T ( A ∗ T A T ) − 1 h T k 1 ≤ ( δ + ǫ ) k A T ( A ∗ T A T ) − 1 k 1 k h T k 1 using (5 . 6). Com bining these t wo inequalities yields k h T k 1 ≤ s ( δ + ǫ ) 1 − α s k A T ( A ∗ T A T ) − 1 k 1 , 22 as desired. In the sp ecial case where k = j + 1, w e hav e: Corollary 5.10. Let k = j + 1 , | T | = s , and for any σ 1 , σ 2 ∈ T , the t wo cliques corre- sp onding to σ 1 and σ 2 ha v e o v erlaps no larger than r . Then w e hav e k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ ≤ 1 / ( j + 1) , and th us the f o llo wing error b ound for solution b x δ of P 1 ,δ holds: k b x δ − x 0 k 1 ≤ 2 s ( ǫ + δ ) 1 − s j +1 p j + 1 , s < j + 1 . Pr o of . This corolla ry follows follow s from the Lemma a b o v e. Note that when the con- ditions in Theorem 2 hold, A ∗ T A T = I and k A T k 1 ≤ q k j = √ j + 1. No w it suffice to establish the fact that in this sp ecial case, w e ha v e k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ ≤ 1 j + 1 < 1 Note that sinc e an y σ 1 , σ 2 ∈ T satisfy | σ 1 ∩ σ 2 | ≤ j − 2 , w e hav e A ∗ T A T is an iden tity matrix. So k A ∗ T c A T ( A ∗ T A T ) − 1 k ∞ = k A ∗ T c A T k ∞ . Now assume τ ∈ T c , let S τ = { σ : | σ ∩ τ | ≥ j, σ ∈ T } , then | S τ | ≤ 1 . This is b ecause otherwise, supp ose { σ 1 , σ 2 } ⊂ S τ suc h that | S τ | ≥ 2 , the n w e ha v e | τ | ≥ | τ ∩ ( σ 1 ∪ σ 2 ) | = | τ ∩ σ 1 | + | t ∩ σ 2 | − | t ∩ σ 1 ∩ σ 2 | ≥ j + j − ( j − 2) = j + 2 whic h con tradicts with the fact that τ is a j + 1-set. So there exist at most one σ 0 ∈ T suc h that | τ ∩ σ | ≥ j . Let v τ b e the r ow v ector of A ∗ T c A T with row index corresp ond t o τ . Then k v τ k ∞ ≤ ( j j ) ( j +1 j ) = 1 j +1 < 1. E. Identifying Cliques with Mi x e d Sizes In general settings, w e need to identify high order cliques of mixed sizes, i.e., cliques of sizes k 1 , k 2 , · · · , k l ( k 1 < k 2 < · · · < k l ), based o n the observ ed data b on all j - sets. One w a y to construct the basis matrix A is b y concatenating R j,k with differen t k ’s satisfying k > j . W e can then solv e P 1 and P 1 ,δ for exact reco very and stable reco v ery with this newly concatenated basis matrix A . W e ha v e the follow ing theorem: 23 Theorem 5.11. Supp ose x 0 is a sparse signal on cliques o f sizes k 1 , k 2 , · · · , k ℓ ( j ≤ k 1 < k 2 < · · · < k ℓ ≤ k ) and b = Ax 0 . Let T = supp( x 0 ) . 1. If the cliques in T hav e no ov erlaps, then they can b e iden tified by P 1 . 2. Moreov er, if the data b = Ax 0 + z is contaminated b y t he no ise z , P 1 ,δ pro vides an estimate of x 0 for whic h the inequalit y in ( 5 . 5 ) still holds. Pr o of . W e prov e under the condition that an y σ 1 , σ 2 ∈ T satisfy | σ 1 ∩ σ 2 | = 0, then solv e P 1 will exactly identify x 0 . F or simplicit y , giv en any τ ∈ T c , w e define µ τ = X σ ∈ T 1 q | τ | j | σ | j | τ ∩ σ | j Note that the in tersection of σ 1 and σ 2 is zero implies that A ∗ T A T = I , moreo v er, given τ ∈ T c , the collection of sets { τ ∩ σ | σ ∈ T } are disjoin t. Note that if t here is only one σ 0 satisfies | τ ∩ σ 0 | ≥ j , then µ τ = 1 q | τ | j | σ 0 | j | τ ∩ σ 0 | j < 1 , b ecause it is the inner pro duct of t w o column v ectors corresp onds to τ and σ 0 of A , where there are no t wo columns in A are identical. No w supp ose t here are at least t wo σ ’s satisfy , | τ ∩ σ | ≥ j , then w e hav e µ τ = X σ ∈ T 1 q | τ | j | σ | j | τ ∩ σ | j ≤ X σ ∈ T , | τ ∩ σ |≥ j 1 q | τ | j | τ ∩ σ | j | τ ∩ σ | j = X σ ∈ T , | τ ∩ σ |≥ j q | τ ∩ σ | j q | τ | j Since the collection of sets { τ ∩ σ | σ ∈ T } are disjoin t, so if w e can pro v e s | τ ∩ σ 1 | j + s | τ ∩ σ 2 | j | < s | τ ∩ ( σ 1 ∪ σ 2 ) | j , 24 then w e know that µ τ ≤ X σ ∈ T , | τ ∩ σ |≥ j q | τ ∩ σ | j q | τ | j < s | τ ∩ ( ∪ σ ∈ T , | τ ∩ σ |≥ j σ ) | j / s | τ | j ≤ 1 No w w e only need to prov e the fo llo wing inequality: suppose j ≥ 2, giv en n 1 ≥ j, n 2 ≥ j , w e need to prov e q n 1 j + q n 2 j < q n 1 + n 2 j The case of j = 2 can b e v erified directly , while for j ≥ 3, w e square b oth sides and w e no w w e only need to pro ve n 1 j + n 2 j + 2 q n 1 j n 2 j < n 1 + n 2 j . Since n 1 + n 2 j = j X s =0 n 1 j − s n 2 s . So w e kno w w e only need to prov e 2 q n 1 j n 2 j < n 2 n 1 j − 1 + n 1 n 2 j − 1 . Since n 2 n 1 j − 1 + n 1 n 2 j − 1 ≥ 2 q n 1 n 2 n 1 j − 1 n 2 j − 1 , so w e only need to v erify n 1 n 1 j − 1 > n 1 j , this can b e easily v erified b y writing out explicitly b oth sides. The ab ov e theorem provide s us a sufficien t condition to guarantee exact sparse reco v ery with concatenated bases and the stable reco v ery theory is also established. VI. A P ol ynomial Time Appro xima tion Algorithm In practical applications, we hav e pairwise in teraction data in a net w ork with n no des and w e wish to infer high order cliques up to size k . Directly constructing A b y concatenating Rado n basis matrices R j,j , R j,j +1 . . . , R j,k and solving P 1 ,δ w ould incur exp onen tial comple xity since A has exp onen tia lly man y columns with resp ect to k . T his would b e in tractable fo r inferring high order cliques in large netw orks. In this section, we describ e a p olynomial time (with resp ect to b oth n and k ) approx imatio n a lg orithm for solving P 1 ,δ . Recall that the primal and dual programs P 1 ,δ and D 1 ,δ are: ( P 1 ,δ ) min k x k 1 s.t. k Ax − b k ∞ ≤ δ ( D 1 ,δ ) max − δ k γ k 1 − b ∗ γ s.t. k A ∗ γ k ∞ ≤ 1 . Prop osition 6.1. The problem ( D 1 ,δ ) is the dual of ( P 1 ,δ ) . 25 Pr o of . Consider an alternativ e form of P 1 ,δ , min 1 T ξ sub ject to − δ · 1 ≤ Ax − b ≤ δ · 1 − ξ ≤ x ≤ ξ ξ ≥ 0 whose Lagrangia n is L ( x, ξ ; γ , λ , µ ) = 1 T ξ − γ T + ( δ · 1 − Ax + b ) − γ T − ( Ax − b + δ · 1) − λ T + ( ξ − x ) − λ T − ( ξ + x ) − µ T ξ . Here if we assume A is a matrix of size M b y N , then γ + = ( γ + (1) , . . . , γ + ( M ) ) ∈ R M + , γ − = ( γ − (1) , . . . , γ − ( M ) ) ∈ R M + , λ + = ( λ + (1) , . . . , λ + ( N )) T ∈ R N + , λ − = ( λ − (1) , . . . , λ − ( N )) T ∈ R N + , µ ∈ R N + are the Lagra nge m ultipliers. Then the KKT condition give s 1. A ∗ ( γ + − γ − ) + ( λ + − λ − ) = 0, 2. 1 − ( λ + + λ − ) − µ = 0, with γ , λ, µ ≥ 0 and γ + ( τ ) γ − ( τ ) = λ + ( τ ) λ − ( τ ) = 0 f or all τ . No w w e can see that the dual function of 1 T ξ is − δ ( γ T + + γ T − ) · 1 − ( γ T + − γ T − ) b, whic h is − δ k γ k 1 − b ∗ γ , while the constraints for γ is k A ∗ γ k ∞ ≤ 1 . The k ey of o ur algorithm is that we use a p olynomial n umber of v ariables a nd constrain ts to appro ximate b o th programs, yielding an approximate solution for P 1 ,δ . More precisely , we apply a sequen tial primal-dual interior p o in t metho d t o solve the relaxed programs: ( P 1 ,δ, T ) min k x k 1 s.t. k A T x − b k ∞ ≤ δ ( D 1 ,δ, T ) max − δ k γ k 1 − b ∗ γ s.t. k A ∗ T γ k ∞ ≤ 1 . Here A T is a submatrix of A where w e extract a subset of columns T . W e approx imate the solution to the orig inal programs by solving the ab ov e relaxed pro g rams where we only use p olynomially many columns indexed b y T . In particular, w e wan t to find an in terior p oin t γ for D 1 ,δ,T whic h is also feasible for D 1 ,δ . With this γ av aila ble, w e can use duality gaps to c hec k con ve rgence because the curren t dual ob jectiv e provide s a low er b ound f o r D 1 ,δ and an y interior p o in t for P 1 ,δ,T pro vides an upp er b ound for P 1 ,δ . 26 Let A i b e t he i -th column of A . W e need to sequen t ia lly up date the column set T . When w e hav e a solution γ (whic h is called the approxim at e analytic cen ter) fo r the r elaxed pro- gram D 1 ,δ,T , we need to find a new column A i ( i ∈ T c ) whic h is not feasible in D 1 ,δ,T . By incorp orating A i in to T , the feasible region of D 1 ,δ,T is reduced to b etter approximate tha t of D 1 ,δ . Whe n the curren t solution γ ha s no violated constrain t, i.e., γ is feasible for D 1 ,δ , w e use in terior p oint metho ds to find a series of interior p oin ts whic h con v erge to the solution of D 1 ,δ,T . Ho we ve r, we ma y obtain a new in terior p oint γ whic h is not feasible f or D 1 ,δ . W e then go bac k and add violated constrain ts. A formal description is prov ided in Algorithm 1. Algorithm 1 Cutting Plane Metho d for Solving P 1 ,δ Initialize A = I , x = b , γ = (1 , 1 , · · · , 1 ) t . while TR UE do if ∃ | A ∗ i γ | > 1 where i ∈ T c then T ← T ∪ { i } , formulate new D 1 ,δ,T and P 1 ,δ,T . Find new in terior p oin ts γ and x for D 1 ,δ,T and P 1 ,δ,T resp ectiv ely . else if the dualit y gap is small then Get the dual solution b x and stop. else Find a new in terior p oin t γ for D 1 ,δ,T , whic h optimizes the dual ob jectiv e. end if end while In Algorithm 1, the first IF statemen t inv olves a problem of finding a violated dual constraint for the curren t relaxed progra m. In the sp ecial case where γ are dual v aria bles a sso ciated with edges, the problem b ecomes the maximum e dge w e ight clique pr oblem , whic h is kno wn to b e NP-ha r d. W e use a simple greedy heuristic algo r it hm, whic h iterativ ely adds new no des in order t o maximize s ummation of edge w eigh ts to solve this problem (Lueke r , 1978), whic h runs in O ( nk 2 ) time and can return a 0 . 94 - appro ximate solution in t he av erage case. Note that, if γ is feasible for the dual relaxation problem with no additional violated cons tra ints, then 0 . 94 γ m ust b e feasible for D 1 ,δ whose ob jectiv e is discoun ted by 0 . 94. Th us, w e will terminate with an 0 . 94 -appro ximate solution. Let η b e t he threshold t o c hec k the duality gap. Algorithm 1 can also b e understo o d as the column generation metho d (D a n tzig and W olfe, 19 6 0), since a dding a new inequalit y constrain t in the dual prog ram adds a v ariable to the primal pro g ram and th us adds a column to the basis matrix. F or more details of the algor it hm, see Mitc hell (2003) and Y e (1997). Theoretically , if one is able to find a violated constraint in constant time and uses in terior p oint metho ds to lo cat e approximate cen ters of the primal-dual feasible regions, 27 then Algorithm 1 has computational complexit y O ( M /η 2 ), where M is the num b er of dual v ariables (Mitc hell , 20 03; Y e, 1997). In our cas e, M ≍ n 2 and find a violated constrain t has complexit y O ( nk 2 ), th us a lg orithm 1 has complexit y O ( n 3 k 2 /η 2 ). Finally , w e note that o t her iterativ e algo rithms, e.g., Bregman iterations, whic h hav e gua r - an teed conv ergence ra t es (Cai et al., 2009) can b e used to find solutions o f linear program relaxations in our a lg orithms. W e also note that, in practice , w e nev er need to explicitly construct the mat rix A b ecause there a re man y combinatorial structures w ithin the basis matrix to exploit. F o r example, op erat ions suc h a s ev aluating inner pro ducts b etw een the bases can b e ev aluated efficien tly by directly comparing t wo sets. VI I. Applica tion Examples In this section, w e pro vide four application examples to illustrate the effe ctive ness of the prop osed framew ork in t his pap er. As w e will see, our cliq ue-based mo del can deal with o v erlaps b et we en cliques whic h g ives us more comm unity structural information compared against using purely clustering metho ds and the state- of-the-art clique p ercolatio n method. In these examples, w e use the clique volume and c onductanc e , whic h arguably are the simplest ev aluation criteria of clustering quality , to ev aluat e differen t a lg orithms. The clique v olume is the sum of edge w eights inside the clique, while the clique conductance is the ratio b et w een the n um b er of w eights lea ving t he clique and the clique volume (Lesk ov ec et al., 2010). More precisely , let B uv b e the elemen t on the u -th row and v - t h column of the a djacency matrix B . The c on d uctanc e φ ( S ) of a set of no des S is defined as φ ( S ) = P { ( u,v ): u ∈ S, v / ∈ S } B uv min(V ol( S ) , V ol( V \ S )) and volume is V ol( S ) = P { u,v ∈ S } B uv . A. Basketb a l l T e am Dete ctio n Detecting tw o baske tball teams from pairwise interactions among pla ys is an ideal scenario since the tw o teams do not ov erlap. Supp ose we ha v e x 0 whic h is the true signal indicating the t w o teams among all 5-sets of the 10- pla y er set, i.e., it is sparsely concen tra ted on t wo 5- sets whic h corresp ond to the tw o teams with magnitudes b oth equal to one. As sume w e hav e observ ations b o f pairwise inte ra ctio ns, i.e. b = Ax 0 + z , where z is b ounded random noise uniformly distributed in [ − ǫ, ǫ ]. W e solv e P 1 ,δ , with δ = ǫ , whic h is a linear programming searc h o ver x ∈ R ( 10 5 ) = R 252 with a pa r a meter matrix A ∈ R ( 10 2 ) × ( 10 5 ) = R 45 × 252 and b ∈ R 45 . The r esults are sho wn in Figure 2. In F ig ure 2- ( a ), w e see that the t w o bask etball teams 28 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Noise Level ε Values on 5−Cliques Detecting Baksetball Teams with Noise Team 1 Team 2 Alternative 5−Subsets (a) (b) Figure 2: Detecting Bask etball T eams with Noise. ( a ) Two teams in a virtual Baske tball Game, with intra-team in teraction 1 and cross-team in teraction noise no more than ǫ ; (b) Under a la r ge noise lev el ǫ < 0 . 9, the tw o teams are iden tifiable. F o r eac h noise lev el, w e run 100 sim ulations rep eatedly , whose error ba r plot of w eigh ts on cliques are shown. are p erfected detected as exp ected. Since the tw o 5 -sets corresp ond to the t wo teams ha v e no o v erlap, hence satisfy the irrepresen table Condition (IRR). In Figure 2-(b), w e try to detect the t w o teams under different noise lev els ǫ ∈ [0 , 1]. The tw o bask etball teams can b e detected under f airly large noise lev els. This example can also b e dealt with using sp ectral clustering tec hniques where w e nor malize the pa ir wise in teraction data to get the transition matrix, follow ed by spectral clustering on eigenspaces. W e observ ed t ha t b oth our metho d and sp ectral clustering works v ery w ell under noise leve l less than 0 . 8 (i.e. | ǫ | < 0 . 8). B. The So cial Network of L es Mis ` er ables W e consider the so cial net w ork of 3 3 c hara cters in Victor Hugo’s no v el Les Mis` erables (Knuth, 1993). W e represen t this so cial netw ork using a w eighted graph (Figure 3-(a ) ) . The edge w eigh ts are the co-app earance frequencies of the t w o corresp onding c haracters. T able 1 illustrates sev eral so cial comm unities formed b y relationships including frie ndships, str e et gangs, kinships , etc. The underlying s o cial comm unity , regarded as the ground truth for the data, is summarize d in Figure 3-( a) where sev eral so cial comm unities arise. Figure 3-(b) sho ws the sp ectral clustering result in whic h the first three red cuts are reasonable while the next three blue cuts destro y ed a lot of comm unit y structures within the net w or k. 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Street Gang Friendship Friendship Friendship Student Union Kinship Dramatic Conflict 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 (a) (b) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 (c) (d) Figure 3 : Decomp osition of Les Miserables so cial net work. (a) So cial netw ork of characters in Les Miserables ; (b) Sp ectral clustering res ult; (c) The iden t ified 3-cliques; (d) The iden tified 4-cliques. W e compare our metho d with the clique p ercolation metho d, 23 and 19 cliques w ere iden tified resp ectiv ely where our approa c h can iden tify more meaningful cliques – see Figure 3 and T able 1 where we v erified the ground t ruth fr o m the no v el. F or example, our metho d can 30 0 1 2 3 4 5 6 7 3 4 5 6 7 8 9 clique sizes φ (conductance) Radon Basis Pursuit −− Clique Conductances 0 1 2 3 4 5 6 7 3 4 5 6 7 8 9 clique sizes φ (conductance) Clique Percolation −− Clique Conductances (a) (b) 0 50 100 150 200 250 300 350 400 450 3 4 5 6 7 8 9 clique sizes clique volume Radon Basis Pursuit −− Clique Volumes 0 50 100 150 200 250 300 350 400 450 3 4 5 6 7 8 9 clique sizes clique volume Clique Percolation −− Clique Volumes (c) (d) Figure 4: Les Miserables so cial net work: Box plot of clique conductances and volume s for clique p ercolat io n method and our approac h. Clique s iden tified b y our approac h ha v e smaller conductances and larger volumes . correctly iden tify tw o separate cliques { 4 , 1 5 , 2 2 } and { 20 , 21 , 22 } , while the clique p ercolation metho d treats { 4 , 15 , 20 , 21 , 2 2 } as a single clique. The in teraction frequencies among those c haracters, how ev er, s how that there are relative ly smaller cross-comm unity in teractions, th us those t w o 3-cliques should b e separated. Figure 3-(c) a nd 3-(d) dep ict impor t a n t 3- cliques and 4-cliques iden tified by our algo rithm. The sparsit y patterns of those cliques satisfy the ir represen table condition where ov erlaps b et wee n t hem are generally not larg e. Ho w ev er, they do not ne cessarily satisfy t he condition in Lemma 5.6 whic h is based on a w orst- case analysis. In Figure 4, w e also compare b oth metho ds in terms of clique conductances and v olumes and see that the cliques iden tified by Radon basis pursuit ha ve sligh tly lo w er 31 conductances and larger volumes , whic h demonstrates adv an tages of o ur approac h. T able 1: So cial Net works of Les Mis ` erables Cliques Names o f Cha racters Relationships Perco. Radon { 1 , 2 , 3 } { Myriel, Mlle Baptistine, Mme Maglo ire } F riendship N N { 4 , 13 , 14 } { V aljea n, Mme Thenardier , Thenardier } Dramatic Conflicts N Y { 4 , 15 , 22 } { V aljea n, Cosette, Mar ius } Dramatic Conflicts N Y { 20 , 21 , 22 } { Gillenormand, Mlle Gilleno r mand, Mar ius } Kinship N Y { 5 , 6 , 7 , 8 } { Tholomy es, Listolier , F ameuil, Blacheville } F riendship Y Y { 9 , 10 , 11 , 12 } { F avourite, Dahlia, Zephine, F antine } F riendship Y Y { 14 , 31 , 32 , 33 } { Thenardier , Gueulemer, Ba bet, Claques o us } Street Gang N Y In summary , our metho d obtains more a bundan t social structure informatio n than the com- p eting techniq ues. W e also obta in so cial communities with o v erlaps whic h is imp ossible for clustering metho ds. W e note that some simple sc hemes will not work w ell. F or example, one ma y think of scoring eac h larg e clique b y the mean scores of the included small cliques . In this example, sinc e t w o o r three k ey characters app ear v ery freq uently , w e will end up with finding that the top high order cliques alwa ys contain them. In fact, among the top ten 3-cliques, sev en of them con tain no de 4 and six o f them contain no de 15 , whic h does not giv e us go o d results. C. C o authorships in Netwo rk Scienc e W e also studied a medium size coauthorship net w ork where t here is a total of 1,589 scien tists who come fro m a broad v ariet y of fields. P art of this netw ork is sho wn in Figure 5-(a) . 136 and 166 clique s are iden tified b y our approach and the clique p ercolation metho d respectiv ely . W e also compare the t w o metho ds in terms of clique conductances and v olumes. F rom Figure 6-(a),(b), we see t ha t the cliques iden tified b y Radon basis pursuit hav e smaller conductances and comparable c lique v olumes tha n the clique p ercolation method. Our approa c h can scale v ery w ell. In this example, it can iden tify the cliques up to size 9 in 564 seconds. So this application example show s that our approac h can b e used to identify cliques in so cial net w orks with h undreds or ev en thousands of no des. Finally , w e note that clustering tec hniques, e.g., sp ectral clustering, com bined with our algo- rithm can pro vide a mo r e refined analysis of the net w or k. W e can loo k at the p ersistence of iden tified cliq ues in t he binary tree decomp osition of bipartite spectral clustering of the net- w ork in a botto m-up w ay . Cliques whic h p ersist through more lev els will giv e us meaningful comm unit y structural informa t io n. In figure 5- (b), a small fraction of the binary tree decomp osition of bipart it e sp ectral clus- tering is depicted, where c hild no des are sp ectral bipartition of the paren t no de. W e can 32 A B D C (a) (b) Figure 5: (a) Coautho r ships in Net w ork Science, only a pa r t of t he net w ork is shown; (b) Imp ortan t cliques iden tified within clusters b ehav e in a p ersisten t w a y . Clustering no de B is exactly the blue part in (a) detect cliques within the child no des. Once cliques within clusters C , D are iden tified, w e then bac ktrac k to the par ent no des B and A to see if the iden tified cliques still p ersist. W e can iden tify 3 cliques ( c 1 = { Kumar, Raghav an, Ra jag o palan, T o mkins } , c 2 = { Kumar. S, Raghav an, Ra jagopa la n } , c 3 = { Raghav an, Ra jagopalan, T omkins, Kumar . S } ) within C and 3 cliques ( d 1 = { Flak e. G, L awrence . S, Giles. C, Co etzee. F } , d 2 = { Flak e. G, Lawre nce. S, Giles. C, P enno c k. D , Glov er. E } , d 3 = { Flak e. G, Lawre nce. S, Giles. C } ) within D whic h p ersist to paren ts B and A . W e can iden t if y pap ers whose authors are exactly those c liques. Using only clustering will not get this result because those cliques ha v e hea vy o v erlaps b et w een them. In figure 5-(b), for simplicit y , w e only show tw o p ersisten t cliques: c 1 = { Kumar, Raghav an, Ra jagopalan, T omkins } and d 1 = { Flak e. G, La wrence. S, Giles. C, Co etzee. F } whic h are the mos t impo rtan t cliq ues (having the la rgest weigh ts when solving the LP pro g ram) in clusters C and D respective ly . These tw o cliques are also the most imp or t a n t t w o cliques in 33 0 1 2 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 clique sizes φ (conductance) Radon Basis Pursuit −− Clique Conductances 0 1 2 3 4 5 6 7 8 9 10 3 4 5 6 7 8 9 clique sizes φ (conductance) Clique Percolation −− Clique Conductances (a) (b) 0 10 20 30 40 50 3 4 5 6 7 8 9 clique sizes clique volume Radon Basis Pursuit −− Clique Volumes 5 10 15 20 25 30 35 40 45 50 55 3 4 5 6 7 8 9 clique sizes clique volume Clique Percolation −− Clique Volumes (c) (d) Figure 6: Coautho r ship Netw ork: Box plot of clique conductances and v olumes fo r clique p ercolation metho d and our approach . Cliques iden tified by our approac h hav e smaller conductances and larger volumes . cluster B , and if we ev en furt her back track them to clustering A , they a r e still ranke d as the first and the third in terms of w eights among all cliques identifiable in A . D. Inferring high or der r an k i n g Jester dataset (Goldb erg et al., 20 01) contains ab out 24 , 000 users who giv e rating s o n 1 00 jok es. Those rating s are of r eal v a lue ranging from − 10 . 00 to +10 . 00. W e extract top 20 jok es from the en tire dataset a ccording to mean scores. Among those 20 joke s, w e coun t the v oting on top 5 - jok es b y each user and view them as the gro und trut h. Figure 7-(a) sho ws 34 that there is a top 5- set, { 27 , 29 , 35 , 36 , 5 0 } , with an ov erwhelming v oting than the others. No w supp ose we only know information as top 3 counts of the jokes and w onder if w e can iden tify the most p opular 5-jok e group. By solving P 1 ,δ with the whole regularizatio n path b y v arying δ , we are capable to detect this subset (Figure 7-(b)) in a robust wa y . 0 2000 4000 6000 8000 10000 12000 14000 16000 0 10 20 30 40 50 60 70 number of votes sorted top 5 subsets Distribution of votes on top 5 subsets 50 60 70 80 90 100 110 120 130 140 150 −5 0 5 10 15 20 25 30 35 δ Magnitude of Top x σ Solution Path of P 1, δ on Inferring Top 5 Jokes (a) (b) Figure 7 : (a) There is a significan t top-5 jok es (in red) whose ID is { 27 , 2 9 , 35 , 36 , 50 } ; (b) Regularization path w here the top curv e (red) selects this top gr o up o v er δ ∈ [50 , 130]. Note that the top 2 nd curv e (green) also iden tifies the fourth 5-set in a p ersisten t w ay . VI I I. Conclusions In this w ork, w e presen t a no v el approach to connec t t wo seemingly differen t areas: network data analysis and c ompr es s ive sens i n g . By a dopting a new algebraic to ol, R ando n b asis pursuit in homo gene ous sp ac es , we form ulate the net w or k cliq ue detection pro blem into a compressed sensin g problem. Suc h a nov el form ulation allows us to cons truct r ig orous conditions to c hara cterize the netw ork clique reco v ery problems. Instead of prov iding another heuristic metho d, w e aim at contributing at the foundational lev el to net work data analysis. W e ho p e tha t our work could build a bridge connecting the researc h commu nities of net w ork mo deling and compressiv e sensing, so that researc h results and to ols f rom one area could be p orted to another one to create more exciting results. T o illustrate the usefulness o f this new framew ork, w e presen t a nov el approac h to iden tify o v erlapp ed comm unities as cliques in so cial net works , based on compressed sensing with an new algebraic method, i.e. Radon basis purs uit in homogeneous spaces ass o ciated with per- 35 m utation g r oups. Our approa c h starts from a general pr o blem of compressiv e represe ntation of lo w order in teractiv e information from high order cliques, which firstly arises fr o m iden- tit y manageme nt and statistical ranking, etc. Sp ecifically applied to so cial net w orks, this approac h studies bi-v a r iate functions defined o n pairs of no des, and lo oks for compressiv e represen tat ions of suc h functions based on clique information in netw orks. It turns out that the sparse represen tation unde r Radon basis ma y disclose comm unity structures, ty pically o v erlapp ed, in so cial net w orks. W e hav e sho wn tha t noiseless exact recov ery and stable re- co v ery with uniformly b ounded noise hold under some natural conditions. Though this pap er is mainly me tho dological and theoretical, w e also dev elop a p olynomial-time approxim atio n algorithm for solv ing empirical problems and dem onstrat e the usefulness of the prop o sed approac h o n real-w orld net works . IX. Ackno wled gments Xiao ye Jiang and Leonidas Guibas wish to a ckno wledge the support of AR O grants W911NF- 10-1-0 0 37 and W911NF-07- 2-0027, as w ell as NSF grant CCF 101 1228 and a gift from the Go ogle Corp orat io n. Y. Y ao ac knowle dges supp orts from the National Basic Research Pro- gram of China (973 Prog r a m 2011CB809105) , NSFC (61071157 ), Microsoft Researc h Asia, and a professorship in the Hundred T a len ts Progra m at P eking Univ ersity . The autho rs also thank Zongming Ma, Min yu P eng, Michael Saunders, Yinyu Y e for ve ry helpful discussions and commen ts. Han Liu is thankful for a faculty supp orting pack age from Johns Hopkins Univ ersit y . Reference s Airo ldi, E. M. , Blei, D. M . , Fienberg, S. E. and Xing, E. P. (2008 ). Mixed mem b ership sto c hastic blo c kmo dels. Journal of Machine L e arning R ese ar ch 9 1981–2014 . Barabasi, A. L. and Alber t, R. (1999). Emergence of s caling in random net w orks. Scienc e 286 509 – 512. Cai, J . , Osher, S. and Shen, Z. (200 9). Linearized bregman iterations for compressed sensing. Mathematics of Computation 78(267) 1515– 1536. Cand ` es, E. J. (2008). The restricted isometry prop erty and its implications for compressed sensing. Comptes R endus de l’A c ad´ emie des Scienc es, Paris, S´ erie I 346 589 –592. Cand ` es, E. J. and T a o, T. (2005). Deco ding b y linear progra mming. IEEE T r ansaction on I nformation The ory 51 4203–4215. 36 Cand ` es, E. J. and T ao, T. (2007). The dantzig selector: statistical estimation when p is m uc h larger than n . Annals of Statistics 35(6) 2313–2351. Chen, S. , Donoho, D . L. and Saunders, M. A. (1999). A tomic decompo sition by basis pursuit. SIAM Journal on Scientific Computing 20 33 –61. D antzig, G. a nd W olfe, P. (19 6 0). Decomp o sition principle for linear programs. O p - er ations R ese ar ch 8 101–11 1. Diaconis, P. (1988) . Gr oup R ep r esentations in Pr ob abi l i ty an d Statistics . Institute of Mathematical Statistics. Duijn, M. A. J. V. , Snijders , T. A. B. and Zijlstra, B. J. H. (2 0 04). p 2 : a random effects mo del with cov ariates f o r directed graphs. Statistic a Ne erlandic a 59 234–254. Erd ¨ os, P. and R ´ enyi, A. (195 9). On random graphs, i. Public ationes Mathematic ae 6 290–297. Erd ¨ os, P. and R ´ enyi, A. (1960). On the ev olution of random graphs. Public ation of the Mathematic al Institue of the Hungrian A c ademy of Scien c e 5 17–61. Gir v an, M . and Ne wman, M. E. J. (2 002). C ommunit y structure in so cial and biolog ical net w orks. Pr o c e e dings of the National A c adem y of Scienc es of the Unite d States of A meric a 99 7821–7826 . Goldberg, K. , R oeder, T. , Gupt a, D. and Pe rkins, C. (2 0 01). Eigen taste: A constan t time collab orativ e filtering algorithm. Information R etrieval 4(2) 13 3–151. Goldenberg, A. , Zheng, A. X. , Fienberg, S. E. and Airoldi, E. M. (2010). A surv ey o f statistical net w ork mo dels. F oundations an d T r en d s in Machine L e arning 2 . Guibas, L. J. (20 0 8). The identit y managemen t problem — a short surv ey . In International Confer en c e on Information F usion . Hoff, P. D. , Rafter y, A. E. , Handcock, M. S. and H, M . S. (2 0 01). La ten t space approac hes to so cial net work analysis. Journal of the Americ an Statistic al Asso ciation 97 1090–109 8. Holland, P. W. and Leinhardt, S. (1981 ). An expo nen t ial family of probabilit y dis- tributions for directed gra phs. Journal of the Americ an Statistic al Asso ciation 76 33 – 50. Ja gaba thula, S. and Shah, D. (2008) . Inferring rankings under constrained sensing. In Neur al Inform a tion Pr o c essi n g Systems (NI PS ) . 37 Kleinberg, J. M. , Kumar, R. , Ragha v an, P. , R ajagop alan, S. and Tomkins, A. (1999). The w eb as a graph: measuremen ts, mo dels, and metho ds. In International Computing and C ombinatorics Confer e n c e . Knuth, D. E. (1993). The Stanf o r d Gr aphBas e : A Platform for Combin atorial Com puting . Addison-W esley . Kumar, R . , Ra gha v an, P. , Raja gop alan, S. , Siv akumar, D. , Tomkins, A. and Upf al, E. (2000). Sto ch astic mo dels for the W eb graph. In Pr o c e e dings of the 41st Annual Symp osium on F o undations of Comp uter Scie nc e . Lancichinetti, A. a nd Fo r tuna to, S. (2009). Benc hmarks for testing commun ity de- tection algorithms on directed and we ighted graphs with o v erlapping comm unities. Physic al R eview E 80(1) 16118. Lesko vec, J. , Lang, K. a nd M ahoney, M. (2010). Empirical comparison of algorithms for net w ork comm unity detection. In ACM WWW International Co nfer enc e o n World Wide Web ( WWW) . Lorrain, F. and White, H. (1971 ). Structural equiv alence of individuals in so cial net- w orks. Journal of Mathematic al So ciolo gy 1 49 –80. Lueker, G. S. (1 9 78). Maximization pr o blems on g raphs with edge w eigh ts c hosen from a normal distribution. In ACM S ymp osium on The ory of Co m puting . Mitchell, J. E. (2 003). P olynomial in terior p oint cutting plane metho ds. Optimization Metho ds and Softwar e 18 2003. Newman, M. E . J. (2 006). Mo dularity and comm unity structure in net w orks. Pr o c e e d ings of Nationa l A c ad emy of Sc i enc es 103( 23) 8577–858 2 . P alla, G. , Der ´ enyi, I. , F arkas, I. and Vicsek, T. (2005). Unco v ering the ov erlapping comm unit y structure of complex netw orks in nature and so ciet y . Natur e 435(7043) 814. Sarkar, P. a nd Moore, A. (2005). D ynamic so cial netw ork analysis using laten t space mo dels. SIGKDD Explor ations: Sp e cial Edition on Link Mining . Snijders, T. A. B. (2005). Mo dels f or longitudinal net w ork data. In Mo dels and Metho ds in S o cial Network Analysis . Univ ersit y Press. Tibshirani, R. (1996 ). Regression shrink ag e and selection via the lasso. Journal of the R oyal Statistic a l So ciety, Series B 58(1) 267–288. 38 Tsaig, Y. and D onoho, D . L. (2006). Compressed se nsing. IEEE T r ans a ction on Information The ory 52 1289–1306. W asserman, S. and Anderson, C. (19 87). Sto c hastic a p osterior block mo dels: Con- struction and assessmen t. So cial Networks 9 1–36. W asserman, S. and P a ttison, P. (1996) . Logit mo dels and lo gistic regressions f o r so cial net w orks: I. an in tro duction to mark ov graphs and p ∗ . Psychometrika 61 401–42 5 . W a tts, D. J. and Stroga tz, S. H. (1998). Collectiv e dynamics of ’small-w orld’net works . Natur e 393 409 –10. Ye, Y. (1997). In terior Point Algorithms: The ory and Analysis . W iley . Yuan, M. and Lin, Y. (2 007). On the nonnegativ e garro t e estimator. Journal of the R oyal Statistic a l So ciety. Series B 69(2) 143–161. Zhao, P. and Yu, B . (2006). On mo del selection cons istency of la sso. Journal o f Machine L e arning R eser ach 7 2541– 2563. 39
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment