Maximum lilkelihood estimation in the $beta$-model

We study maximum likelihood estimation for the statistical model for undirected random graphs, known as the $\beta$-model, in which the degree sequences are minimal sufficient statistics. We derive necessary and sufficient conditions, based on the po…

Authors: Aless, ro Rinaldo, Sonja Petrovic

The Annals of Statistics 2013, V ol. 41, No. 3, 1085–11 10 DOI: 10.1214 /12-AOS1078 c  Institute of Mathematical Statistics , 2 013 MAXIMUM LILKELIHOOD ESTIMA TION IN THE β -MODEL 1 By Alessan dr o Ri nal d o, Sonja Petro vi ´ c and Stephe n E. Fienberg Carne gie Mel lon University, Pennsylvania State Univ ersity and Carne gie Mel lon University W e study maximum li kelihoo d estimation for the statistical mo del for undirected random graphs, known as the β -mod el, in whic h the degree sequences are minimal sufficient statistics. W e derive necessary and sufficient conditions, based on the p olytop e of degree sequences, for the existence of the maximum likeli ho od estimator ( MLE) of the mod el parameters. W e chara cterize in a combinatorial fashion sam- ple p oints leading to a nonexistent MLE, and nonestimabilit y of the probabilit y parameters under a nonexistent MLE. W e formulate con- ditions that guarantee that the MLE exists with p robabilit y tending to one as th e num b er of no des increases. 1. In tro duction. Man y statistical mo dels for the r epresen tation and an al- ysis of netw ork data rely on inform atio n conta ined in the de gr e e se quenc e , the v ector of n od e degrees of the observed graph . No de degrees not only qu an- tify the ov erall connectivit y of the net wo rk, b u t also rev eal other p oten tially more refined f eatures of interest. The study of the degree sequences and, in particular, of the d egree distributions of real net wo rks is a classic topic in net w ork analysis, w hic h has r eceiv ed extensive treatmen t in the s tatistical literature [see, e.g ., Holland and Leinhardt ( 1981 ), Fien b erg and W asser- man ( 1981a ), Fien b erg, Mey er and W asserman ( 1985 )], the physics litera- ture [see, e.g., Newman, S trogat z and W atts ( 2001 ), Alb ert and Barab´ asi ( 2002 ), Newman ( 2003 ), Park and Newman ( 2004 ), Newman, Barab´ asi and Received Sep tem b er 2011; rev ised D ecem b er 2012. 1 Supp orted in part by Grant F A 9550 -12-1-0392 from the U.S. Air F orce Office of Scien- tific Research (AFOSR) and the Defense Ad v anced Research Pro jects Agency (DARP A), NSF Grant DMS-06-31589, and by a gran t from the S ingapore N ational Research F oun- dation (N RF) under the Interactiv e & D igital Media Programme Office to the Living Analytics R esearc h Centre (LARC). AMS 2000 subje ct classific ations. 62F99. Key wor ds and phr ases. β - model, p olytop e of degree sequences, random graphs, max i- mum likelihoo d estimator. This is an electronic reprint of the original ar ticle published by the Institute of Mathematical Statistics in The Annals of Statist ics , 2013, V ol. 41, No. 3, 1085– 1 110 . This repr in t differs from the o riginal in pagination and typog raphic detail. 1 2 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG W atts ( 2006 ), F oster et al. ( 2007 ), Willinger, Alderson and Do yle ( 2009 )] as w ell as in th e so cial net w ork literature [see, e.g., Robins et al. ( 2007 ), Go od reau ( 2007 ), Handco c k and Morris ( 2007 ) and r eferences therein]. See also the monograph b y Goldenberg et al. ( 2010 ) and the b o oks b y Kolaczyk ( 2009 ), Cohen and Ha vlin ( 2010 ) and Newman ( 2010 ). The sim p lest instance of a statistical net w ork mo del b ased exclusiv ely on the no de degrees is the exp onen tial f amily of probability distribu tions for undirected random graphs with the degree sequen ce as its natur al sufficient statistic. This is in fact a simpler, u n directed v ersion of the broader class of statistica l m o dels f or directed netw orks kno wn as the p 1 -mo dels, int ro du ced b y Holland and Leinhardt ( 1981 ). W e will r efer to this m o del as the b eta mo del (henceforth the β -mo del), a name recently coined by Chatterjee, Di- aconis and Sly ( 2011 ), and refer to Blitzstein and Diaconis ( 2010 ) for details and extensive references. Despite its app arent simp licit y and p opu larit y , the β -mo del, muc h lik e most net w ork mo dels, exhibits nonstand ard statistical features, since its complexit y , m easured by the d imension of th e parameter space, increases with the size of the graph. Lauritzen ( 2003 , 2008 ) charac terized β -mo dels as the natural mo dels for represen ting exc hangeable binary arrays that are we akly summarize d , that is, random arrays wh ose d istribution only dep ends on the ro w and column totals. More recen tly , C hatterjee, Diaconis and Sly ( 2011 ) conducted an analysis of the asymptotic p rop erties of the β -mo del, including existence and consistency of the m axim u m lik eliho o d estimator (MLE) as the dimension of the net wo rk increases, and provided a simple algorithm f or estimating the natural parameters. They also c haracterized the graph limits, or g r aphons [see L ov´ asz and Szegedy ( 2006 )], corresp ond- ing to a sequence of β -mo dels w ith giv en degree sequences [for a connection b et w een the th eory of graphons and exc h angeable arra ys s ee Diac onis and Janson ( 2008 )]. Concur ren tly , Barvinok and Hartigan ( 2010 ) explored the asymptotic b eha vior of sequ en ces of r an d om grap h s with giv en degree se- quences, a nd studied a differen t mo de of stoc hastic con v ergence. Among other things, they sho w that, as the size of the netw ork increases and under a “tameness” condition, th e num b er of edges of a uniform graph with giv en degree sequence conv erges in probabilit y to the n umb er of edges of a random graph dra wn fr om a β -mo del parametrized b y th e MLE corresp ondin g to degree sequence. Y an and Xu ( 2012 ) and Y an, Xu and Y ang ( 2012 ) derive d asymptotic conditions for uniform consistency and asymp totic n ormalit y of the MLE of th e β -mo del, and asymptotic normalit y of the likelihoo d r a- tio test for h omoge neit y of the m od el parameters. Perry and W olfe ( 2012 ) consider a general class of mo dels for net wo rk data parametrized b y no de- sp ecific parameters, of whic h the β -mo del is a sp ecial case. Th e authors deriv e nonasymp totic conditions un der which the MLEs of mo del parame- ters exist and can b e well appr o x im ated by simple estimators. MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 3 In an attempt to av oid the reliance on asymptotic metho ds, w hose ap- plicabilit y to netw ork mo dels remains largely u nclear [see, e.g., Hab erman ( 1981 )], sev eral r esearchers h a ve tur ned to exact inference for the β -mo del, whic h hinges up on the non trivial task of samp ling from the set of graphs with a giv en degree sequence. Blitzstei n and Diaconis ( 2010 ) dev elop ed and analyzed a sequen tial imp ortance sampling algo rithm for generating a r an- dom graph with the pr escrib ed degree sequence [see also Viger an d Latap y ( 2005 ) for a different algorithm]. Hara and T ak em ura ( 2010 ) and Oga w a, Hara and T akem u ra ( 2013 ) tac kled the same task using more abs tract alge- braic metho ds, and Petro vi´ c, Rin aldo and Fien b erg ( 2010 ) studied Marko v bases f or the more general p 1 mo del. In this article we stud y the existence of th e MLE for the parameters of the β -mo del u nder a more general sampling sc heme in whic h eac h edge is observ ed a fixed num b er of times (ins tead of ju st once, as in pr evious w orks) and for increasing n etw ork sizes. W e view the issue of existence of the MLE as a natural measur e of the in trinsic statistical difficulty of the β -mo del for t w o reasons. First, existence of the MLE is a n atural minimum requir ement for feasibilit y of statistical inference in d iscrete exp onen tial families, such as the β -mo del: nonexistence of the MLE is in fact equ iv alen t to nonestimabilit y of the mo del p arameters, as illustrated in Fienberg an d Rinaldo ( 2012 ). Thus, establishing conditions for existence of the MLE amounts to sp ecifying th e conditions u nder w hic h statistical inference for these mo dels is fully p ossible. Second, un d er the asymptotic scenario of gro wing net work sizes, existence of the MLE will p r o vide a natural measure of sample complexit y of the β - mo del and will ind icate the asymptotic scaling of the mo del parameters for whic h s tatist ical inference is viable. Though Chatterjee, Diaconis and Sly ( 2011 ) and Barvinok and Hartigan ( 2010 ) 2 also consid er ed th e existence of the MLE, our analysis differs su b- stan tially from theirs in that it is r ooted in the statistical theory of discrete linear exp onentia l families and relies in a f u ndamen tal wa y on the geometric prop erties of these families [see, in particular, Rinaldo, Fienberg and Zh ou ( 2009 ), Geye r ( 200 9 )]. Our contributions are as follo ws: • W e provide explicit necessary and su fficien t cond itions for existence of th e MLE for the β -mo del that are based on th e p olytop e of degree sequences, a w ell-studied p olytop e arisin g in the stud y of thr eshold graphs ; see Ma- hadev and P eled ( 1995 ). In con trast, the conditions of Chatterjee, Diaconis and S ly ( 2011 ) are only su fficien t. W e then sho w that nonexistence of th e 2 In the analysis of Barvinok and H artigan ( 2010 ), the maxim um entr opy m atrix asso ci- ated to a d egree sequence is in fact exactly the MLE corresponding to the observed degree sequence. This is a w ell-known prop ert y of linear exp onential families; see, for example, Co ver and Thomas ( 1991 ), Chap t er 11. 4 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG MLE is b rough t on by certain forbidd en patterns of extremal net w ork con- figurations, w hic h we charact erize in a com binatorial wa y . F urthermore, when th e MLE do es not exist, w e can identi fy exactly wh ic h pr obabilit y parameters are estimable. • W e u se the prop erties of the p olytop e of degree sequences to formulate geometric conditions that allo w us to deriv e fi n ite sample b oun ds on the probabilit y that the MLE do es not exist. Our asymptotic results im p ro v e analogous results of Chatterjee, Diaconis and S ly ( 2011 ) and our pro of is b oth simpler and more direct. F u rthermore, we sho w that the tameness condition of Barvinok and Hartigan ( 2010 ) is stronger th an our conditions for existence of the MLE. • O ur analysis is n ot s p ecific to the β -mo del b u t, in fact, follo ws a pr in cipled w a y for detecting nonexistence of the MLE and ident ifying nonestimable parameters that is based on p olyhedral geometry and applies more gener- ally to d iscrete m od els. W e illustrate this p oin t by an alyzing other n et work mo dels that are v ariations or generalizations of the β -mo del: the β -mo del with random n umber s of ed ges, the Rasc h mo del, the Br ad ley–T erry m o del and the p 1 mo del. Due to s p ace limitations, the details of these additional analyses are con tained in the s u pplemen tary material [Rin aldo, P etro vi ´ c and Fienberg ( 2013 )]. While this is a self-con tained article, the results deriv ed h ere are b est understo o d as applications of the geometric and com binatorial p rop erties of log-linear mo dels un d er p ro duct-m ultinomial sampling sc hemes, as detailed in Fien b erg and Rinaldo ( 2012 ) and its s u pplemen tary material, to which w e refer the reader for fur ther d etails as well as for practical algorithms. The article is organized in the f ollo wing wa y . S ectio n 2 introdu ces the β -mo del and establishes the exp onen tial family parametrization th at is ke y to our analysis. In Section 3 w e d eriv e necessary and su ffi cien t conditions for existence of the MLE of the β -mo del parameters and c haracterize parameter estimabilit y u nder a nonexistent MLE. These results are furth er d iscussed with examples in Section 4 . In Section 5 w e p r o vide sufficien t conditions on the exp ected degree sequence guaran teeing that, with high probability as the netw ork size increases, the MLE exists. Finally , in Section 6 we indicate p ossible extensions of our w ork and briefly discu s s some of the computational issues directly related to detecting n onexistence of the MLE and parameter estimabilit y . W e w ill assume throughout some familiarit y with basic concepts from p olyhedral geometry [see, e.g., S c hrijv er ( 1986 )] and th e theory of exp onen - tial families; see, for example, Barndorff-Nielsen ( 1978 ), Brown ( 1986 ). 2. The (generalized) β -mo del. In this section we describ e the exp onen- tial family p arametriza tion of a simp le generalizatio n of the β -mo del, whic h, with slight abuse of notation, w e will r efer to as the β -mo del as w ell. MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 5 W e are concerned with mo deling the o ccurrence of edges in a simp le und i- rected random graph with n o de set { 1 , . . . , n } . The statistical exp erimen t consists of recording, for eac h pair of n od es ( i, j ) with i < j , the num b er of edges app earing in N i,j i.i.d. samples, w here the in tegers { N i,j , i < j } are deterministic and p ositiv e (we can relax b oth the n onrandomness and p os- itivit y assump tions). Th us, in our setting we allo w for the p ossibilit y that eac h edge in the net w ork b e sampled a different n umb er of times, a realistic feature that mak es the mo del more flexible. F or i < j , we denote by x i,j , the num b er of times we ob s erv e the edge ( i, j ) and, accordingly , by x j,i the n umb er of times edge ( i, j ) is missing. T h us, for all ( i, j ) , x i,j + x j,i = N i,j . W e mo del the observe d edge coun ts { x i,j , i < j } as dr aws from mutually indep endent binomial distrib u tions, with x i,j ∼ Bin( N i,j , p i,j ), where p i,j ∈ (0 , 1) for eac h i < j . Data arising f rom s uc h an exp er im ent has a representa tion in the form of a n × n con tingency table w ith empt y diagonal cells and whose ( i, j )th cell con tains the count x i,j , i 6 = j . F or mo deling pu rp oses, ho we ve r, we need on ly consider the up p er-triangular part of this table. In deed, since, giv en x i,j , the v alue of x j,i is determined by N i,j − x i,j , w e can repr esen t the sample space more p arsimoniously as the follo wing sub set of N ( n 2 ) : S n := { x i,j : i < j and x i,j ∈ { 0 , 1 , . . . , N i,j }} . W e in dex the co ordinates { ( i, j ) : i < j } of any p oin t in S n lexicographicall y . In the β -mo del, we parametrize the  n 2  edge pr ob ab ilities b y p oints β ∈ R n as follo ws. F or eac h β ∈ R n , the p robabilit y parameters are uniqu ely determined as p i,j = e β i + β j 1 + e β i + β j and p j,i = 1 − p i,j = 1 1 + e β i + β j ∀ i 6 = j (1) or, equiv alen tly , in terms of log-o dds, log p i,j 1 − p i,j = β i + β j ∀ i 6 = j. (2) The magnitude and s ign of β i quan tifies the p rop ensit y of no de i to ha v e ties: the degree of no de i is exp ected to b e large (small) if β i is p ositiv e (negativ e) and of large magnitude. Thus the β -mo del is th e natural h eterog enous ve r- sion of th e well -known Erd˝ os–R´ enyi random graph mo del [Er d˝ os and R´ enyi ( 1959 )]. F or a d iscussion of this mo del and its generalizations see Golden b erg et al. ( 2010 ). F or a giv en c hoice of β , th e p r obabilit y of observing the vec tor of edge coun ts x ∈ S n is p β ( x ) = Y ii x i,j , i = 1 , . . . , n, (5) and the log-partition function is ψ ( β ) = P ii ˜ p i,j , i = 1 , . . . , n, (6) a rescaled version of the su fficien t statistics ( 5 ), normalized by th e n umber of observ ations. In particular, for the random grap h mo del, ˜ d = d . Theorem 3.1. L et x ∈ S n b e the observe d ve ctor of e dge c ounts. The MLE exists if and only if ˜ d ( x ) ∈ in t( P n ) . Theorem 3.1 verifies the conjecture con tained in Addendum A in C h atter- jee, Diaconis and Sly ( 2011 ) for the ran d om graph mo del: the MLE exists if 8 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG and only if the degree sequence b elongs to th e in terior of P n . This resu lt fol- lo ws fr om the standard prop erties of exp onen tial families; see Theorem 9.13 in Barn d orff-Nielsen ( 1978 ) or Theorem 5.5 in Bro wn ( 1986 ). It also con- firms the observ ation made b y C hatterjee, Diaconis and Sly ( 2011 ) that the MLE neve r exists if n = 3 : in d eed, since P 3 has exactly 8 vertic es, as man y as the p ossible graphs on 3 no des, n o degree sequence can b e inside P 3 . W e conclude by taking n ote that, by representing the sufficient statistics as a linear mapp ing d = A x , we can recast the β -mo del as a log-linear mo del with design matrix A ⊤ and pro duct-m ultinomial sc heme, with  n 2  sampling constrain ts, one for eac h edge. This simple yet far r eac hin g observ ation allo ws us, among the other things, to design algorithms for detecting nonexistence of the MLE and iden tifying estimable parameters u n der a n onexisten t MLE, as explained in the sup plemen tary material to this article. 3.1. Par ameter estimability under a nonexistent MLE . The geometric nature of Theorem 3.1 has imp ortan t consequences. First, it allo ws us to iden tify the patterns of observ ed edge count s that cause n onexistence of the MLE; that is, the sample p oints for wh ic h the MLE is undefi ned. Second, it yields a complete description of estimabilit y of the edge probabilit y p aram- eters und er a n onexisten t MLE, a k ey issu e for corr ect ev aluation of degrees of fr eedom of the mo del. Th e next result addresses the last t w o p oints. Lemma 3.2. A p oint y b elongs to the interior of some fac e F of P n if and only if ther e exists a set F ⊂ { ( i, j ) , i < j } suc h that y = A p, (7) wher e p = { p i,j : i < j, p i,j ∈ [0 , 1] } ∈ R ( n 2 ) is such that p i,j ∈ { 0 , 1 } if ( i, j ) / ∈ F and p i,j ∈ (0 , 1) if ( i, j ) ∈ F . The se t F is uniquely determine d by the fac e F and is the maximal set for which ( 7 ) holds. F ollo wing Geiger, Meek and Sturmfels ( 2006 ) and Fienb erg and Rin aldo ( 2012 ), w e refer to an y suc h set F a facial set of P n and its complement, F c = { ( i, j ) : i < j } \ F , a c o-facial set . F acial sets form a lattice that is isomorphic to the face lattice of P n [Fien b erg and Rinaldo ( 2012 ), Lemma 5]. Th us the faces of P n are in one-to-one corresp ondence with the facial sets of P n and, for an y pair of faces F and F ′ of P n with asso ciate d facial sets F and F ′ , F ∩ F ′ = ∅ if and only if F ∩ F ′ = ∅ and F ⊂ F ′ if and only if F ⊂ F ′ . In details, for a p oint x ∈ S n , d ( x ) = A x b elongs to the interior of a face F of P n if and only if there exists a nonnegativ e p suc h that d ( x ) = A p , where F = { ( i, j ) : 0 < p i,j < 1 } is the f acia l set corresp onding to F . By the same tok en, y ∈ in t( P n ) if and only if y = A p for a vecto r p with co ord inates strictly b et w een 0 and 1. MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 9 F acial sets ha v e statistical relev ance for t w o reasons. First, n onexistence of the MLE can b e describ ed combinato rially in terms of co-facia l sets, that is, p atte rns of edge counts that are either 0 or N i,j . In particular, the MLE do es n ot exist if and only if the set { ( i, j ) : i < j, x i,j = 0 or N i,j } con tains a co-facia l set. Second, apart from exhausting all p ossible patterns of forbidden en tries in the table leading to a nonexistent MLE, facial sets sp ecify wh ich probabilit y p arameters are estimable. In fact, insp ection of the lik eliho od function ( 3 ) rev eals that, for any observ able set of coun ts { x i,j : i < j } , there alw a ys exists a uniqu e maximizer b p = { b p i,j , i < j } w h ic h, b y strict conca vity , is u niquely determined by the fir st order optimalit y conditions ˜ d ( x ) = A b p, also kn o wn as the momen t equ atio ns. Existence of the MLE is then equiv- alen t to 0 < b p i,j < 1 for all i < j . When the MLE do es n ot exist, that is, when ˜ d is on the b ound ary of P n , the momen t equations still hold, but the en tries of the optimizer { b p i,j , i < j } , known as the extende d M LE , are no longer strictly b et we en 0 and 1. Instead, b y Lemma 3.2 , th e extended MLE is such that b p i,j = ˜ p i,j ∈ { 0 , 1 } for all ( i, j ) ∈ F c . F urthermore, it is p ossible to sho w [see, e.g., Morton ( 2013 )] that b p i,j ∈ (0 , 1) for all ( i, j ) ∈ F . Th erefore, when th e MLE do es n ot exist, only th e probabilities { p i,j , ( i, j ) ∈ F } are estimable by the extended MLE . W e refer the reader to Barndorff-Nielsen ( 1978 ), Bro wn ( 1986 ), Fienb erg and Rinaldo ( 2012 ) and references therein, for details ab ou t the theory of extended exp onent ial f amilies and extended maxim um lik eliho o d estimation in log-linear m od els. T o su mmarize, while co-facial sets enco de the patterns of table entries leading to a n onexisten t MLE, facial s ets indicate whic h p robabilit y p aram- eters are estimable. A s im ilar, though more inv olve d interpretation holds for the estimabilit y of th e natur al parameters, f or which the reader is referred to Fienb er g and Rinaldo ( 2012 ). F ur ther, for a giv en sample p oin t x , the realized f acial set and its cardinalit y are b oth rand om, as they d ep end on the actual v alue of the obs erv ed sufficien t statistics A x . This implies that, with a nonexisten t MLE, the s et of estimable p arameters is itself random. 4. The b ound ary of P n . Th eorem 3.1 and Lemma 3.2 show that th e b oundary of the p olytop e P n pla ys a fu n damen tal role in determinin g the existence of the MLE for th e β -mo del an d in s p ecifying whic h parameters are estimable. In particular, the larger the num b er of faces (i.e., facial sets) of P n the higher the complexit y of the β -mo del as measured b y the num b ers of p ossible p atte rn s of edge counts for which the MLE do es not exist. T herefore, gaining an ev en basic u nderstanding of the n umb er and of the t yp es of co-facia l patterns will provide v aluable insigh ts into the b eh a vior of the β -mo del. Belo w w e fur ther elab orate on the consequences of the results 10 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG established in Section 3 and pr esen t a small selection of examples of co- facial sets asso ciated to the facets of P n . Though the discussion and examples of this section w ill rev eal a num- b er of sub tle issues, we b eliev e that the key message is tw o-fold. Firs t, the com binatorial complexit y of P n , measured by b oth the num b er of the t yp es of co-facial sets, grows very fast with n , with the co-facia l sets asso ciated to no de d egrees b ounded a wa y from 0 and n − 1 v astly outnum b ering the easily detectable cases of minimal or m aximal degree. Second, sin ce com- plete enumeration of the faces of P n is impr actic al, it is imp ortant to devise algorithms for detecting a nonexistent MLE and iden tifying the facial sets of estimable parameters. Both these issues b ecome more sev ere in large and sparse n et works, where it is exp ected th at th e explo ding num b er of p ossible non trivial co-facial set renders estimation of the mo del parameters m ore d if- ficult. Later in Section 5 , we w ill derive conditions, based on the geometry of P n that preve nts th is from happ ening, with large probability for large n . 4.1. The c ombinatorial c omplexity of P n . Mahadev and Peled ( 1995 ) d e- scrib e the facet-defining inequalities of P n , for all n ≥ 4 (when n ≤ 3 th e problem is of little interest), a r esult we use later in Section 5 . Let P b e th e set of all p airs ( S, T ) of disjoin t non emp t y subsets of { 1 , . . . , n } , su c h that | S ∪ T | ∈ { 2 , . . . , n − 3 , n } . F or any ( S, T ) ∈ P and y ∈ P n , let g ( S, T , y , n ) := | S | ( n − 1 − | T | ) − X i ∈ S y i + X i ∈ T y i . (8) Theorem 4.1 [Theorem 3.3.17 in Mahadev and Pel ed ( 1995 )]. L et n ≥ 4 and y ∈ P n . The fac et-defining ine qualities of P n ar e: (i) y i ≥ 0 , for i = 1 , . . . , n ; (ii) y i ≤ n − 1 , for i = 1 , . . . , n ; (iii) g ( S, T , y , n ) ≥ 0 , for al l ( S, T ) ∈ P . Ev en with the exhau s tiv e c haracterizatio n of P n pro vided by Th eorem 4.1 , understand ing the com binatorial complexit y of P n (i.e., the collection of all its faces and their inclusion relations) is far fr om trivial. Stanley ( 1991 ) studied the n umb er faces of the p olytop e of d egree sequences P n and der ived an expression for computing the en tries of the f -ve ctor of P n . The f -v ector of an n -dimensional p olytop e is the v ector of length n w hose i th en try conta ins the num b er of i -dimensional f aces, i = 0 , . . . , n − 1. F or example, the f -v ector of P 8 is the 8-dimensional vect or (334,982 , 1, 726,648 , 3, 529,344 , 3, 679,87 2 , 2, 074 , 660 , 610, 288 , 81,144 , 3322) . Th us, P 8 is an 8-dimensional p olytop e with 334, 982 ve rtices, 1, 726,648 edges and so on, up to 3322 facets. Also, according to Stanley’s form ula, th e num- b er of facets of P 4 , P 5 , P 6 and P 7 are 22, 60, 224 and 882, resp ectiv ely MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 11 [these num b ers corresp ond to the num b ers we obtained w ith the softw are polymake , u sing the metho ds describ ed in the supplementary material to this article; see Gawrilo w and Joswig ( 2000 )]. Stanley’s analysis sh o wed that the combinato rial complexit y of P n is extraordinarily large, with b oth the n umb er of vertice s, and the n umb er of facets gro wing at least exp onentia lly in n , and consequen tly , the tasks of identifying p oint s on the b oun dary of P n and the asso ciated facial set are f ar from trivial. F or instance, compu t- ing dir ectly the n umber of v ertices of P 10 is p r ohibitiv ely exp ensive , ev en using one of the b est kn o w n algorithms, s u c h as the one imp lemen ted in the soft w are minksum ; see W eib el ( 2010 ). T o o v ercome these problems w e hav e devised an algorithm for detecting b ound ary p oin ts and the asso ciated facial sets that can handle n et works with up to hundreds of no des. W e rep ort on this algorithm, which is based on a log-linear m o del reparametrization and is equiv alen t to what is kn o w n in computational geometry as the “Ca yley tric k,” in the supp lemen tary material. Using the metho ds describ ed there, w e w ere able to identify a few in teresting cases in w hic h the MLE do es not exist, most of w hic h h a ve gone u nrecognized in the statistical literature. Be- lo w w e describ e s ome of our compu tatio ns for the pur p ose of elucidating the results derived in Section 3 . 4.2. Some e xamples of c o-facial sets. Recall th at we can repr esen t th e data as a n × n table of coun ts with structur al zero diagonal elemen ts and where the ( i, j )th en try of the table indicates the num b er of times, out of N i,j , in wh ic h w e observed the edges ( i, j ) . In our examples, empt y cells corresp ond to facial sets and may cont ain arbitrary count v alues, in con trast to the cells in the co-facial sets that con tain either a zero v alue or a maximal v alue, namely N i,j . Lemma 3.2 implies that extreme coun t v alues of this nature are pr ecisely what leads to the n on existence of the MLE. T he pattern sho wn on the left of T able 1 provides an instance of a co-facia l set, which corresp onds to a facet of P 4 . Assume for simp licit y that the emp ty cells con tain coun ts b ound ed a w a y from 0 and N i,j . Then the sufficient statistics ˜ d are also b ound ed a wa y from 0 and n − 1, and so are the ro w and column T able 1 L eft: c o-facial set le ading to a nonexistent MLE. Center: an example of data exhibiting the p attern of c ounts c onsistent with the c o-facial set on the l eft when N i,j = 3 for al l i 6 = j . Right: table of the extende d MLE of the estimate d pr ob abil i ties × 0 N 1 , 2 × × N 3 , 4 0 × × 0 1 2 3 × 2 1 2 1 × 3 1 2 0 × × 0 0.5 0.5 1 × 0 .5 0.5 0.5 0.5 × 1 0.5 0.5 0 × 12 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG T able 2 Examples of a c o-facial set le ading to a nonexistent MLE. L eft: ˜ d 2 = 0 . Right: example wher e the de gr e es ar e al l b ounde d away fr om 0 and 3 , the M LE do es not exist × 0 N 1 , 2 × 0 0 N 3 , 2 × N 4 , 2 × × 0 0 N 1 , 2 × 0 × N 4 , 1 N 4 , 2 × sums of the n ormalized coun ts { x i,j N i,j : i 6 = j } , ye t the MLE do es not exist. This is further illustrated in T able 1 , cent er, wh ic h shows an in s tance of data with N i,j = 3 for all i 6 = j , satisfying th e ab o v e pattern and , on the righ t, th e p r obabilit y v alues maximizing the log-lik eliho o d function. Notice that, b ecause the MLE do es not exist, the su prem um of the log-lik eliho o d under the natural parametrization is attained in the limit by an y sequence of natural p arameters { β ( k ) } of the form β ( k ) = ( − c k , − c k , c k , c k ), where c k → ∞ as k → ∞ . As a result, some of these probabilit y v alues are 0 and 1 . The order of th e pattern is crucial. In T able 2 we show, on the left, another example of a co-facial set that is easy to d etect , since it corresp ond s to a v alue of 0 for the n orm alize d sufficient statistic ˜ d 2 . In d eed, from cases (i) and (ii) of T h eorem 4.1 , the MLE do es not exist if ˜ d i = 0 or ˜ d i = n − 1, for some i . On the right, we s ho w a co-facial set that is instead compatible with normalized sufficient statistics b eing b ound ed aw ay from 0 and n − 1 . Finally , in T able 3 w e list all 22 co-facial sets asso ciated with the facets of P 4 , includ ing the cases already sho wn. In general, there are 2 n facets of P n that are d etermined by one ˜ d i equal to 0 or n − 1. Thus, just b y in sp ecting the ro w su m s or the observed sufficien t statistics, we can detect only 2 n co-fac ial sets asso ciated to as m an y facets of P n . Comp aring this num b er to the en tries of the f -v ector calculated in Stanley ( 1991 ), ho w ev er, and as our computations confirm, most of the facets of P n do not yield co-facia l sets of this form . Since the num b er of facets app ears to gro w exp onen tially in n , we conclude that most of the co-fac ial sets do not app ear to arise in this fashion. Thus, at least combinato rially , patterns of data count s leading to the nonexistence of MLEs but with the normalized degree b ounded a wa y from 0 and n − 1 are muc h more frequ ent, esp ecially in larger n et works. 4.3. The r andom gr aph c ase. In the s p ecial case of N i,j = 1 for all i < j , whic h is equiv alen t to a m o del for random undirected graphs, p oin ts on the b oundary of P n are, b y constr u ction, degree sequences and h a v e a direct graph-theoretical interpretatio n. W e say that a subset of a set of no des of a giv en graph is stable if it induces a subgraph w ith no edges and a cliqu e if it ind uces a complete sub graph. MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 13 T able 3 Al l p ossible c o-facial sets f or P 4 c orr esp ondi ng to the fac ets of P 4 (empty c el ls i ndic ate arbitr ary entry values) × 0 N 1 , 2 × × N 3 , 4 0 × × 0 N 1 , 2 × 0 0 N 3 , 2 × N 4 , 2 × × 0 0 N 1 , 2 × 0 × N 4 , 1 N 4 , 2 × × 0 0 0 N 1 , 2 × N 1 , 3 × N 4 , 1 × × 0 0 N 1 , 2 × 0 N 1 , 3 N 2 , 3 × × × 0 0 × N 1 , 3 × 0 N 1 , 4 N 3 , 4 × × 0 × 0 N 1 , 3 N 2 , 3 × 0 N 3 , 4 × × N 1 , 2 N 1 , 3 N 1 , 4 0 × 0 × 0 × × N 1 , 3 N 1 , 4 × 0 × N 3 , 4 0 0 × × N 1 , 2 N 1 , 3 0 × N 2 , 3 0 0 × × × N 1 , 3 × N 2 , 3 0 0 × N 3 , 4 0 × × N 1 , 2 N 1 , 4 0 × N 2 , 4 × 0 0 × × N 1 , 4 × N 2 , 4 × N 3 , 4 0 0 0 × × 0 × 0 N 1 , 3 N 2 , 3 × 0 N 3 , 4 × × N 1 , 2 0 × 0 0 N 2 , 3 × N 2 , 4 × × × 0 0 N 2 , 3 × 0 N 2 , 4 N 3 , 4 × × 0 × 0 × 0 N 1 , 4 N 2 , 4 N 3 , 4 × × N 1 , 2 0 × × 0 N 3 , 4 × × 0 × N 2 , 4 N 1 , 3 × 0 × × N 1 , 3 × 0 0 × N 2 , 4 × × N 1 , 4 × 0 N 2 , 3 × 0 × × 0 × N 2 , 3 0 × N 1 , 4 × Lemma 4.2 [Lemma 3.3.13 in Mahadev and P eled ( 1995 )]. L et d b e a de gr e e se quenc e of a g r aph G that lies on the b oundary of P n . Then e i ther d i = 0 , or d i = n − 1 for some i , or ther e exist nonempty and disjoint subsets S and T of { 1 , . . . , n } such that: (1) S is clique of G ; (2) T is a stable set of G ; 14 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG Fig. 1. Examples of r andom gr aphs on 4 (left), 5 (c enter) and 6 (right) no des with no de de gr e es b ounde d away fr om 0 and n − 1 and for which the MLE is not define d. L emma 4.2 applies with S = { 3 , 4 } and T = { 1 , 2 } (left), with S = { 2 , 3 , 4 } and T = { 1 , 5 } (c enter) and with S = { 1 , 2 , 6 } and T = { 3 , 4 , 5 } (right). (3) every vertex in S is adjac ent to every vertex in ( S ∪ T ) c in G ; (4) no vertex of T i s adjac ent to any vertex of ( S ∪ T ) c in G . Using Lemma 4.2 , we can create virtually any example of a rand om graph whose n od e degree sequence lies on the b ound ary of P n . In p articular, we note that having no de degrees b ound ed aw ay fr om 0 and n − 1 is not a sufficien t condition for the existence of the MLE, although its violation im- plies nonexistence of the MLE; see the examples of Figure 1 . Nonetheless, Lemma 4.2 is of little or no practical use wh en it comes to detecting b ound- ary p oints and the asso ciated co-facial sets, since c hec king for the existence of a pair ( S, T ) of subsets of no des satisfying cond itions (1) throu gh (4) is algorithmicall y impractical. In the su p plemen tary material to th is article, w e describ e alternativ e p ro cedures that can b e used in large net w orks. Figure 1 sho ws th ree examples of graphs on 4 , 5 and 6 no des for which the MLE of the β -mo del is un defined ev en though the n od e degrees are b oun d ed a w a y fr om 0 and n − 1 in all cases. All the examples were constru cted usin g directly Lemma 4.2 , as explained in the caption. T o the b est of our kn o w l- edge, even these v ery sm all examples of nonexistent MLEs are un kno wn to practitioners and no a v ailable softw are for fitting the MLE is able to d etect nonexistence, m uc h less iden tify the relev an t facial set. F or the case n = 4 , our computations sh o w th at there are 14 distinct co- facial sets asso ciated to the f acet s of P n . Eight of th em corresp ond to d egree sequences con taining a 0 or a 3, and the remaining six are shown in T a- ble 4 , whic h we computed numerically using the pro cedure describ ed in the supplementary material. Notice that the thr ee tables on the second ro w are obtained from the first three tables b y switc hing zeros with ones. F urther- more, the num b er of the co-facial sets we found is s maller th an the n umb er of facets of P n , wh ic h is 22, as sho wn in T able 3 . This is a consequence of the fact that the only observ ed coun ts in the random graph mo del are 0’s or 1’s: it is in fact easy to see in T able 3 that an y co-facial set conta ining thr ee zero coun ts an d three m aximal count s N i,j is equiv alen t, in the r an d om grap h case, to a n od e h a vin g degree zero or 3. Ho w ev er, as so on as N i,j ≥ 2 , the n umb er of p ossible co-facial sets matc hes th e num b er of faces of P n . There- fore, the cond ition N i,j = 1 is not inconsequentia l, as it app ears to redu ce MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 15 T able 4 Patterns of zer os and ones yielding r andom gr aphs with nonexistent MLE (empty c el ls i ndic ate that the entry c ould b e a 0 or a 1 ) × 0 1 × × 1 0 × × 0 × 1 × 1 0 × × 1 × 0 1 × 0 × × 1 0 × × 0 1 × × 1 × 0 × 0 1 × × 0 × 1 0 × 1 × the num b ers of observ able patterns leading to a nonexistent MLE, though w e do not kno w the extent of the impact of such reduction in general. 5. Existence of the MLE: Finite sample b ounds . In this section w e ex- ploit the geometry of the b oundary of P n from Lemma 4.2 to deriv e suffi cien t conditions that imp ly the existence of the MLE with large probability as the size of the net wo rk n gro ws. These conditions essen tially guaran tee that the probabilit y of observing any of the sup er-exp onentia lly man y (in n ) co-facial sets of P n is p olynomially small in n . Un like in previous analyses, our result do es not require the netw ork to b e den se. W e mak e the simplifying assum ption that N i,j = N , for all i and j , w here N = N ( n ) ≥ 1 could itself dep end on n . Recall the random vect or ˜ d , whose co ordinates are give n in ( 6 ) and let d = E [ ˜ d ] ∈ R n b e its exp ected v alue und er the β -mo del. T hen d i = X j i p i,j , i = 1 , . . . , n. W e f orm ulate sufficient conditions for th e existence of the MLE in terms of the entries of the ve ctor d . Theorem 5.1. Assume that, for al l n ≥ max { 4 , 2 q c n log n N + 1 } , the ve c- tor d satisfies the c onditions: (i) min i min { d i , n − 1 − d i } ≥ 2 q c n log n N + C , (ii) min ( S,T ) ∈P g ( S, T , d, n ) > | S ∪ T | q c n log n N + C , wher e c > 1 / 2 and C ∈ (0 , n − 1 2 − q c n log n N ) . Then, with pr ob ability at le ast 1 − 2 n 2 c − 1 , the MLE exists. 16 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG When N is constan t, for examp le, when N = 1 as in the random graph case, we can relax the conditions of T heorem 5.1 b y requiring condition (ii) to h old only o v er subs ets S and T of cardin alit y of ord er Ω( √ n log n ) . While w e present this r esu lt in greater generalit y by assu ming only that n ≥ N , w e do n ot exp ect it to b e s harp in general when N gro ws with n . Corollar y 5.2. L et n ≥ max { N , 4 , 2 √ cn log n + 1 } , c > 1 and C ∈ (0 , n − 1 2 − √ cn log n ) . Assume the ve ctor d satisfies the c onditions: (i ′ ) min i min { d i , n − 1 − d i } ≥ 2 √ cn log n + C , (ii ′ ) min ( S,T ) ∈P n g ( S, T , d, n ) > | S ∪ T | √ cn log n + C , wher e P n := { ( S, T ) ∈ P : min {| S | , | T |} > p cn log n + C } , wher e the set P was define d b efor e The or em 4.1 . Then the MLE exists with pr ob ability at le ast 1 − 2 n 2 c − 2 . If N = 1 , it is suffici e nt to have c > 1 / 2 , and the MLE exists with pr ob ability lar ger than 1 − 2 n 2 c − 1 . Discussion and c omp arison with pr evious work. Sin ce | S ∪ T | ≤ n , one could r eplace assumption (ii) of Theorem 5.1 with th e simpler bu t stronger condition min ( S,T ) ∈P n g ( S, T , d, n ) > n 3 / 2 p c log n + C n . Then, if w e assume for simplicit y that N is a constan t, as in Corollary 5.2 , the MLE exists with pr obabilit y tending to one at a rate th at is p olynomial in n wheneve r min i min { d i , n − 1 − d i } = Ω( p n log n ) and, for all pairs ( S, T ) ∈ P , g ( S, T , d, n ) > Ω ( n 3 / 2 p log n ) . F or the case N = 1 , w e can compare C orolla ry 5.2 with Theorem 3.1 in Chatterjee, Diaconis and S ly ( 2011 ), whic h also pro vides sufficien t conditions for the existence of the MLE with p robabilit y n o smaller than 1 − 1 n 2 c − 1 (for all n large enough). Their result app ears to b e stronger than ours, but that is actually not th e case as we no w exp lain. In f act, their conditions require that, for some constan t c 1 , c 2 and c 3 in (0 , 1), c 1 ( n − 1) < d i < c 2 ( n − 1) for all i and | S | ( | S | − 1) − X i ∈ S d i + X i / ∈ S min { d i , | S |} > c 3 n 2 (9) MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 17 for all sets S such that | S | > ( c 1 ) 2 n 2 . F or any n onempt y subs ets S ⊂ { 1 , . . . , n } and T ⊂ { 1 , . . . , n } \ S , X i / ∈ S min { d i , | S |} ≤ X i ∈ T d i + | S || ( S ∪ T ) c | , whic h im p lies that | S | ( n − 1 − | T | ) − X i ∈ S d i + X i ∈ T d i > | S | ( | S | − 1) − X i ∈ S d i + X i / ∈ S min { d i | S |} , where we hav e used the equalit y n = | S | + | T | + | ( S ∪ T ) c | . Thus if ( 9 ) holds for some nonemp t y S ⊂ { 1 , . . . , n } , it satisfies th e facet conditions imp lied by all the pairs ( S, T ), for any nonempty set T ⊂ { 1 , . . . , n } \ S . As a result, f or an y sub set S , condition ( 9 ) is stronger th an any of th e facet conditions of P n sp ecified b y S . In addition, w e wea ke ned significant ly the r equiremen ts in Chatterjee, Diaconis and S ly ( 2011 ) that c 1 ( n − 1) < d i < c 2 ( n − 1) f or all i to min i min { d i , n − 1 − d i } ≥ 2 √ cn log n + C . As a direct consequence of th is w eak ening, we only need | S | > √ cn log n + C as opp osed to | S | > ( c 1 ) 2 n 2 . Ov erall, in our setting, th e v ector of exp ected degrees of the sequence of net w orks is allo wed to lie muc h closer to the b oundary of P n . As w e explain next, suc h w eak ening is significant, since the setting of Ch atte rjee, Diaconis and Sly ( 2011 ) only allo ws us to estimate an increasing num b er of proba- bilit y p arameters (the edge pr obabilities) that are un iformly b ounded aw ay from 0 and 1, while our assumptions allo w for these probabilities to b ecome degenerate as the net w ork size gro ws, and therefore hold even in nondense net w ork settings. The nonde gener ate c ase. W e no w b riefly discu s s the case of sequences of net w orks for wh ic h N = 1 and the edge pr ob ab ilities are uniform ly b oun ded a w a y from 0 and 1, that is, δ < p i,j < 1 − δ ∀ i, j (10) for some δ ∈ (0 , 1) indep endent of n . In this scenario, the n umb er of pr oba- bilit y parameters to b e estimated gro ws with n , b u t their v alues are guar- an teed to b e nondegenerate. I t immediately follo ws from the nondegenerate assumption ( 10 ) that d ∈ int( P n ) and δ ( n − 1) < d i < (1 − δ )( n − 1) , i = 1 , . . . , n. (11) Then, the same arguments we used in the pro of of Corollary 5.2 imply that the MLE exists with h igh probability . W e p ro vide a sketc h of the pro of. First, we note that, with h igh p robabilit y , g ( S, T , ˜ d, n ) ≥ g ( S, T , d, n ) − | S ∪ T | Ω( √ n log n ) , for eac h p air ( S, T ) ∈ P . F urthermore, b ecause of ( 11 ), it is enough to consider only pairs ( S, T ) of disjoint su bsets of { 1 , . . . , n } of sizes of order Ω( n ). F or eac h such pair, the condition on d i further yields that 18 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG g ( S, T , d, n ) is of order Ω( n 2 ), and, by T h eorem 8 the MLE exists with high probabilit y . In f act, the b oundedn ess assu mption of Chatterjee, Diaconis and Sly ( 2011 ) that k β k ∞ < L with L indep endent of n , is equiv alent to the n onde- generate assum ption ( 10 ), as we see fr om equation ( 1 ). Unlik e Chatterjee, Diaconis and S ly ( 2011 ), who fo cus on the n ondegenerate case, our results hold u nder weak er scaling, as w e only r equire, for instance, that d i b e of or- der Ω( √ n log n ) f or all i . Relatedly , w e note that th e tameness condition of Barvinok and Hartigan ( 2010 ) is equiv alent to δ < b p i,j < 1 − δ for all i and j and a fixed δ ∈ (0 , 1), w h ere b p i,j is the MLE of p i,j . Therefore, the tameness condition is stronger than the existence of the MLE. In fact, u sing again Theorem 1.3 in Chatterjee, Diaconis and Sly ( 2011 ), for all n su fficien tly large, the tameness cond ition is equ iv alen t to the b ounded ness condition of Chatterjee, Diaconis and Sly ( 2011 ). W e conclude this section with tw o usefu l r emarks. Firs t, Theorem 1.3 in Chatterjee, Diaconis and Sly ( 2011 ) demonstrates that, when the MLE exists, m ax i | b β i − β i | = O ( p log n/n ) , with p r obabilit y at least 1 − 2 n 2 c − 1 . Com bined with our Corollary 5.2 , this imp lies that the MLE is a consis- ten t estimator un d er a gro wing net w ork size and with edge probabilities approac hing the d egenerate v alues of 0 and 1. Second, after the submission of this article w e learned ab out the interest- ing asymp totic r esu lts of Y an and Xu ( 2012 ), Y an, Xu and Y ang ( 2012 ), who claim th at, based on a mo dification of the arguments of C hatterjee, Diaconis and Sly ( 2011 ), it is p ossible to sh o w the MLE of the β -mo del exists and is uniformly consisten t if L = o (log n ) and L = o (log log n ) , resp ectiv ely , where L = m ax i | β i | . 6. Discussion and extensions. W e ha ve us ed p olyhedral geometry to an- alyze the conditions for existence of the MLE of a generalized version of the β -mo del and to deriv e fin ite sample b ound s for the p robabilit y asso ciate d with the existence of the MLE. Ou r results offer a no v el and explicit c har- acterizat ion of the patterns of ed ge counts leading to nonexistent MLEs. The problem of nonexistence o ccurs in num b ers and with a complexity that w as n ot previously kno wn. Ou r resu lts allo w us to sh arp en cond itions for existence of the MLE. Ou r analysis in particular highlight s th e fact that requiring no de degrees equal to 0 and n − 1 is only a suffi cient condition f or nonexistence of th e MLE and n onestimabilit y of th e edge p robabilities. W e sho w that w e need to acco unt for man y m ore edge patterns. W e note that the us e of p olyhedral geometry in statistical mo dels f or d iscrete data is a hallmark of the theory of exp on ential f amilies, but its considerable p oten tial for use and applications in the analysis of log-linear and n et work mo dels has only recently b egun to b e in v estigated; Fien b erg and Rinaldo ( 2012 ), Rinaldo, Fienberg and Zhou ( 2009 ). MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 19 Our generalizat ion of the β -mo del allo w s for P oisson and binomial, not simply Bernoulli distrib utions for ed ges. Email databases and others in- v olving rep eated transactions among pairs of parties pro vides the simplest examples of s ituatio ns for netw orks wh ere ed ges can o ccur multiple times. These are often an alyzed as we ight ed n etw orks but that may n ot necessarily mak e as muc h sense as using a P oisson for random num b ers of o ccurrences. As our results ind icate , the nonexistence of th e MLE is equiv alent to nonestimabilit y of a subset of the parameters of the mo del, but by no means do es it imply that no statistical inference can tak e place. In fact, when the MLE do es n ot exist, there alwa ys exists a “restricted” β -mo del that is sp ecified by the approp r iate f acial set, and for wh ich all parameters are estimable. Th us, for suc h a small mo del, traditional statistical tasks suc h as hyp othesis testing and assessment of parameter uncertain t y are p ossib le, ev en though it b ecomes necessary to adjust the num b er of degrees of freedom for the nonestimable parameters. A complete d escription of this approac h, whic h is ro oted in the theory of extended exp onential f amilies, is b eyond the scop e of th e article. S ee Fien b erg and Rinaldo ( 2012 ) for details. W e can extend our stu d y of the β -mo del in a n umb er of wa ys. In the su p- plemen tary material to this article, w e consider v arious generalizatio ns of the β -mo del setting, includin g the β -mo del with ran d om num b ers of ed ges, the Rasc h mo del fr om item resp onse theory , the Bradley–T erry paired com- parisons mo del and the p 1 net w ork mo del. F o r most of these mo dels w e were able to carry out a fairly explicit analysis b ased on th e un derlying geome- try , b u t for the full p 1 mo del th e complexit y of the mo del p olytop e app ears to m ak e such a direct analysis very difficu lt [this is r efl ecte d in the h igh complexit y of the Marko v basis f or p 1 mo del, of w hic h we give fu ll accoun t in Pet rovi ´ c, Rinaldo and Fien b erg ( 2010 )]. Another interesting extension of our results of Section 5 w ould b e to translate our conditions, wh ich are f or- m ulated in terms of exp ected degree sequences, in to conditions on the p i,j ’s themselv es, for in s tance, by establishing appr opriate b ound s f or min i C > 0 . (18) Th us, we ha ve shown th at ( 17 ) and ( 18 ) hold, provided that the eve nt O n is true and assum ing (i) and (ii). T herefore, by Th eorem 4.1 the MLE exists.  Pr oof of Corollar y 5.2 . Using the same setting and notation of Theorem 5.1 , we will assu me throughout the pro of that the ev en t O ′ n := n max k max i | d ( k ) i − d i | ≤ p cn log n o holds true. By Hoeffdin g’s inequ ality , the union b ound an d the inequalit y log N ≤ log n , we hav e P ( O ′ c n ) ≤ 2 exp {− 2 c log n + log n + log N } ≤ 2 n 2 c − 2 . A simp le calculati on sho ws th at, when O ′ n is satisfied, we also ha ve n max i | ˜ d i − d i | ≤ p cn log n o . Then, b y the same argum en ts w e used in the p ro of of T heorem 5.1 , assump- tion (i ′ ) yields that 0 < ˜ d i < n − 1 , i = 1 , . . . , n, (19) and, for eac h pair ( S, T ) ∈ P , g ( S, T , ˜ d, n ) ≥ g ( S, T , d, n ) − | S ∪ T | p cn log n . (20) It is easy to see th at, for the ev ent O ′ n , assumption (i ′ ) also yields min k min i min { d ( k ) i , n − 1 − d ( k ) i } ≥ p cn log n + C. (21) 24 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG W e n ow sho w that, when ( 19 ) and the previous equation are satisfied, the MLE exists if min ( S,T ) ∈P n g ( S, T , d, n ) > C > 0 . (22) Indeed, sup p ose that ( 19 ) is true and that ˜ d b elongs to the b oun dary of P n . Then, by the integral it y of the p olytop e P n , there exist non emp t y and dis- join t subsets T and S of { 1 , . . . , n } satisfying the conditions of Lemma 4.2 for eac h of the degree sequences d (1) , . . . , d ( k ) . If min k min i d ( k ) i > √ cn log n + C , then, necessarily , | S | > √ cn log n + C , b ecause | S | is the maximal degree of ev ery n od e i ∈ T . Similarly , s ince eac h i ∈ S has degree at least | S | − 1 + | ( S ∪ T ) c | , if max k max i d ( k ) i < n − 1 − √ cn log n − C , the inequalit y | S | − 1 + | ( S ∪ T ) c | < n − 1 − p cn log n − C m ust h old, imp lying that | T | = n − | S | − | ( S ∪ T ) c | > √ cn log n + C . Th us, w e h a v e sho wn that if ( 19 ) and ( 21 ) h old, and ˜ d b elongs to the b oun dary of P n , the cardinalities of th e sets S and T definin g the facet of P n to which ˜ d b elongs cannot b e smaller than √ cn log n + C . By T heorem 4.1 , wh en ( 19 ) and ( 21 ) hold, ( 22 ) imp lies th at ˜ d ∈ in t( P n ), so the MLE exists. Ho w ev er, equation ( 20 ) and assu mption (ii ′ ) implies ( 22 ), so the pro of is complete.  Ac kno wledgment s. A p revious v ersion of this man uscript was completed while the second auth or was in residence at Institut Mittag- Leffler, for w h ose hospitalit y she is grateful. SUPPLEMENT AR Y MA TERIAL Supp lemen t to “Maxim um lilk eliho od estimation in the β -mo del” (DOI: 10.121 4/12-A O S 1078SUPP ; .p d f ). In the s upplemen tary material we extend our analysis to other mo dels for net w ork d ata: the Rasc h m od el, th e β -mo del with no sampling constraint s on the num b er of observed edges p er dya d, the Bradley–T err y mo del and the p 1 mo del of Holland and Leinhard t ( 1981 ). W e also provide details on h o w to determine whether a giv en d egree sequence b elongs to the in terior of the p olytop e of degree sequences P n and on h o w to compute the facial set corresp ond ing to a degree sequence on the b ound ary of P n . REFERENCES Alber t, R . and Barab ´ asi, A.-L. (2002). S t atistical mechanics of complex net works. R ev. Mo dern Phys. 74 47–97. MR1895096 Barndorff-Nielsen, O. (1978). Information and Exp onential F amil ies in Statistic al The- ory . Wiley , Chichester. MR0489333 Bar vinok, A. and Har tigan, J. A. (2010). The number of graphs and a random graph with a given degree sequence. Av ailable at http://arx iv.org/pdf/100 3.0356v2 . MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 25 Blitzstein, J. and Diac onis, P. (2010). A sequen tial importance sampling algorithm for generating random graphs with prescrib ed degrees. Internet Math. 6 489–522. MR2809836 Bro wn, L. (1986). F undamentals of Statistic al Exp onential F amili es . Institute of M athe- matic al Statistics L e ctur e Notes—Mono gr aph Series 9 . I MS, Hayw ard, CA. Cha tterj ee, S. , Diaconis, P. and Sl y, A. (2011). Rand om graphs with a given degree sequence. Ann. Appl. Pr ob ab. 21 1400–1435 . MR2857452 Cohen, R. and Ha vlin, S. (2010). Complex Networks: Structur e, Rob ustness and F unc- tion . Cambridge Univ. Press, Cambridge. Co ver, T. M. and Thomas, J. A. (1991). Elements of Information The ory . Wiley , N ew Y ork. MR1122806 Diaconis, P. and Janson, S. (2008). Graph limits and exchangeable random graphs. R end. Mat. Appl. (7) 28 33–61. MR2463439 Erd ˝ os, P. and R ´ enyi, A. (1959). On rand om graphs. I. Publ. Math. Debr e c en 6 290–297. MR0120167 Fienberg, S. E. and Ri naldo, A. (2012). Maximum like liho od estimation in log-linear mod els. Ann. Statist. 40 996–1023. MR2985941 Fienberg, S. E. , M eyer, M. M. and W asserman, S. S . (1985). S tatistical analysis of multiple so ciometric relations. J. A mer. Statist. Asso c. 80 51–67. Fienberg, S. E. and W a sserman, S. S. (1981a). Categorical data analysis of single sociometric relations. So ciolo gic al Metho dolo gy 1981 156–192. Fienberg, S . E. and W asserman, S. S. (1981b). An exp onential family of probability distributions for directed graph s: Comment. J. Amer. Statist. Asso c. 76 54–57. F oster, J. G. , Foster, D. V. , Grassberger, P. and P aczuski, M. (2007). Link and subgraph like liho ods in random undirected netw orks with fixed and partially fix ed de- gree sequ ences. Phys. R ev. E (3) 76 046112, 12. MR2365608 Fukuda, K. (2004). F rom the zonotop e construction to the Minko wski addition of conv ex p olytopes. J. Symb olic Comput. 38 1261–1272 . MR2094220 Ga wrilo w, E. and Joswig, M. (2000). p olymake: A framew ork for analyzing conv ex p oly- top es. In Polytop es—Combinatorics and Computation (Ob erwolfach, 1997) ( G. Kalai and G . M . Ziegler , eds.). DMV Seminar 29 43–73. Birkh¨ auser, Basel. MR1785292 Geiger, D. , Meek, C. and Sturmfels, B. (2006). On th e toric algebra of graphical mod els. Ann. Statist. 34 1463–1492. MR2278364 Geyer, C. J. (2009). Likelihoo d inference in exp onential familie s and directions of reces- sion. El e ctr on. J. Stat. 3 259–289 . MR2495839 Goldenberg, A. , Zhen g , A. X. , Fienberg, S. E. an d Airoldi, E. M. (2010). A survey of statistical netw ork mod els. F oundations and T r ends in Machine Le arning 2 129–233. Goodreau, S. M. (2007). Adva nces in ex ponential random graph (p*) models applied to a large social netw ork. So cial Networks 29 231–248. Haberman, S. (1981). D iscussio n of “A n exp onentia l family of p robabilit y distribu tions for directed graphs,” by P . W. Holland and S. Leinhardt. J. Amer. Statist. Asso c. 76 60–61. Handcock, M. S. and M orris, M. (2007). A simple mo del for complex netw orks with arbitrary degree distribution and clustering. In Stat istic al Network An alysis : Mo dels, Issues and New Dir e ctions (E. Airoldi, D. Blei, S. E. Fienberg, A. Goldenberg, E. X ing and A. Zheng, eds.). L e ctur e Notes in Computer Scienc e 4503 103–114. Sp ringer, Berlin. Hara, H. and T akemura, A. (2010). Connecting tab les with zero-one entries by a subset of a Marko v basis. In Algebr aic Metho ds in Statistics and Pr ob ability II ( M. Via na and H. Wynn , eds.). C ontemp or ary Mathematics 516 199–213. Amer. Math. Soc., Provi- dence, R I. MR2730750 26 A. RINA LDO, S. PETR OVI ´ C AND S . E. FIENBERG Holland, P. W . and Leinhardt, S . (1981). A n exp onential family of probabilit y distri- butions for directed graphs. J. Amer. Statist. Asso c. 76 33–50. Ko laczyk, E. D. (2009). Statistic al Analysis of Network Data: Metho ds and Mo dels . Springer, New Y ork. MR2724362 Lauritzen, S. L. (2003). Rasch mo dels with exchangeable ro ws and columns. In Bayesian Statistics, 7 (Tenerife, 2002) ( J. M. Bern ardo , M. J. Ba y arri , J. O. Berger , A. P. Da wid , D. Heckerman , A . F. M. Smith and M. W est , eds.) 215–232. Oxford Univ. Press, New Y ork. MR2003175 Lauritzen, S. L. (2008). Exchangeable Rasch matrices. R end. Mat. Appl. (7) 28 83–95. MR2463441 Lo v ´ asz, L. and Sze gedy, B. (2006). Limits of dense graph sequences. J. Combin. The ory Ser. B 96 933–957. MR2274085 Mahadev, N. V. R. and Peled, U. N. ( 1995). Thr eshold Gr aphs and R elate d T opics . Ann als of Discr ete Mathematics 56 . N orth-Holland, Amsterdam. MR1417258 Meyer, M. M. (1982). T ransforming contingency t ables. Ann. Statist. 10 1172–1181. MR0673652 Mor ton, J. (2013). Relations among conditional probabilities. J. Symb olic Comput. 50 478–492 . MR2996892 Newman, M. E. J. (2003). The structure and function of complex n et wo rks. SIAM R ev. 45 167–256 (electronic). MR2010377 Newman, M. E. J. (2010). Networks: An Intr o duction . Oxford Univ. Press, Ox ford. MR2676073 Newman, M. , Barab ´ asi, A.-L. and W a tts, D . J. , eds. (2006). The Structur e and Dy- namics of N etworks . Princeton Univ. Press, Princeton, NJ. MR2352222 Newman, M. E. J. , S troga tz, S. H. and W a tts, D. J. (2001). Random graphs with arbitrary degree distributions and t heir app licatio ns. Phys. R ev. E (3) 64 026118, 17. Oga w a , M. , Hara, H. and T akemura, A. (2013). Gra ver basis for an und irected graph and its application to testing th e b eta mo del of rand om graphs. A nn. Inst. Statist. Math. 65 191–212 . P ark, J. and N ewman, M. E. J. (2004). Statistical mec hanics of netw orks. Phys. R ev. E (3) 70 066117, 13. MR2133807 Perr y, P. O. and Wolfe, P. J. (2012). Null mo dels for netw ork data. Ava ilable at http://arx iv.org/abs/120 1.5871 . Petr ovi ´ c, S. , Ri naldo, A. and Fienberg, S. E. (2010). Algebraic statistics for a di- rected rand om graph mo del with recipro cation. In Algeb r aic Metho ds i n Statistics and Pr ob ability I I . Contemp or ary Mathematics 516 261–283. A mer. Math. So c., Providence, RI. MR2730754 Rinaldo, A. , Fienberg, S. E. and Zhou, Y. (2009). On the geometry of discrete ex- p onen tial families with application to exp onential random graph models. Ele ctr on. J. Stat. 3 446–484. MR2507456 Rinaldo, A. , Petro vi ´ c, S. an d Fienberg, S. E. (2013). Supp lemen t to “Maximum lilk eliho od estimation in the β - model.” DOI: 10.1214 /12-AOS1078SUPP . Ro bins, D. , P a ttison, P. , Kalish, Y. and Lusher, D. (2007). An in tro duction t o ex- p onen tial rand om graph ( p ∗ ) mo dels for social n et wo rks. So cial Networks 29 173–191. Schrijver, A. (1998). T he ory of Line ar and Inte ger Pr o gr amming . Wiley , New Y ork . St anley, R. P. (1991). A zonotop e associated with graphical degree sequences. In Ap- plie d Ge ometry and Discr ete M athemat ics . DIMACS Series i n Discr ete M athemat - ics and The or etic al Computer Scienc e 4 555–570. Amer. Math. Soc., Providence, RI. MR1116376 MAXIMUM LILKELI HOOD ESTIMA TION IN THE β -MODEL 27 Viger, F. and La t apy, M. (2005). Efficient and simple generation of random simple connected graph s with prescrib ed d egree sequence. In Computing and Combinatorics . L e ctur e Notes in Com puter Scienc e 3595 440–449. Springer, Berlin. MR2190867 Weibel, C. (2010). Imp lemen tation and parallelizati on of a reverse-searc h algorithm for Minko wski sums. I n Pr o c e e dings of the 12th Workshop on Algorithm Engine er- ing and Exp eriments (ALENEX 2010) 34–42. SI AM, Philadelphia. Av ailable at https://si tes.google.com /site/christopheweibel/research/minksum . Willinger, W. , A lde rson, D. and Doyle, J. C. (2009). Mathematics and the Internet: A source of enormous confusion and great p otential. Notic es Amer. Math. So c. 56 586– 599. MR2509062 Y an, T . and Xu, J. (2012). A central limit t heorem in the β - model for und irected ran- dom graphs with a d iv erging num b er of vertices. Av ailable at http://arx iv.org/abs/ 1202.3307 . Y an, T. , Xu, J. and Y ang, Y. ( 2012 ). High d imensional Wilks phenomena in rand om graph mo dels. Av ailable at http://arx iv.org/abs/120 1.0058 . A. Rinaldo Dep ar tmen t of St a tistics Carnegie Mellon University 5000 Forbes A v enue Pittsburgh, Pen n syl v ania 15213 USA E-mail: arinaldo@cmu.ed u S. Petrovi ´ c Dep ar tmen t of St a tistics Pennsyl v ania St a te University 326 Thoma s Building University P ark, Pennsyl v ania 16802 USA E-mail: petrovic@stat.psu.edu S. E. Fienb erg Dep ar tmen t of St a tistics Machine Learnin g Dep ar tment Cylab Heinz School Carnegie Mellon University 5000 Forbes A v enue Pittsburgh, Pen n syl v ania 15213 USA E-mail: fien b erg@stat.cm u.edu

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment