On uniform sampling simple directed graph realizations of degree sequences
Choosing a uniformly sampled simple directed graph realization of a degree sequence has many applications, in particular in social networks where self-loops are commonly not allowed. It has been shown in the past that one can perform a Markov chain a…
Authors: M. Drew Lamar
On uniform sampling simple directed graph realizations of degree sequences M. Drew LaMar ∗ No v em ber 15, 2018 Abstract Choosing a uniformly sampled simple directed g raph realization of a deg ree sequence has many applications, in particular in social netw orks where self-loops are commonly not allo w ed. It has b een show n in the past that one can p erform a Marko v chain arc-switching algorithm to sample a simple directed graph u niformly b y p erforming tw o t yp es of switc hes: a 2-switc h and a directed 3-cycle reorien tation. This pap er discusse s und er what circumstances a d irected 3-cycle reorien tation is required. In particular, the class of d egree sequences where th is is required is a su bclass of the directed 3-cycle anchored d egree sequen ces. An imp ortant implication of this result is a redu ced Marko v chain algorithm that uses only 2-switches. 1 In tro duction Marko v chain Monte Carlo a lgorithms have b een us e d successfully to uniformly sa mple realizatio ns of bo th undirected and directed deg r ee sequences [2, 5]. The algor ithms use a sequence of moves from a move-set to go from o ne realiza tion to a no ther. This results in a ra ndom w alk on a meta-gr aph , where eac h vertex co rresp onds to a realization and the e dges connecting these vertices correspo nd to mov es from the mo ve-set. If the meta-graph is connected with appropriate p roba bilit y weigh ts for the edges (see [2]), we will b e guaranteed a unifor mly sa mpled realization with the fixed degree sequence. T o sa mple simple directed realizations (i.e. no self-lo ops or m ulti-arcs), there are t wo types of mov es in our mov e-set [5]: a 2- switc h and the reorientation of a directed 3-cycle ~ C 3 , wher e ~ C 3 has vertex set { v 1 , v 2 , v 3 } and ar c se t { ( v 1 , v 2 ) , ( v 2 , v 3 ) , ( v 3 , v 1 ) } . A 2-s witc h is g iv en by v 1 v 1 w 1 v 2 w 2 w 1 v 2 w 2 7− → where do tted lines denote no arc s , with the ~ C 3 reorientation given by ∗ Departmen t of Applied Science, The College of Will iam and Mary , 311 McGlothlin-Street Hall, Will iamsburg V A 23187 ( mdlama@ wm.edu ). 1 v 1 v 1 v 3 v 2 v 3 v 2 7− → . ~ C 3 reorientations can lead to a m uch larg er mixing time in certain circumstances. In this pap er we ident ify the cases wher e ~ C 3 reorientations a r e neces sary , give a deg ree-sequence characterization o f these cases, and show that we can reduce our mov e- set to only 2-switches. Recently , Berger and M ¨ uller-Hannemann po sted a pap er with similar results. In [1], they re- discov er the result by Rao et a l. [5] proving connectivit y o f the meta-graph using 2-switches and ~ C 3 reorientations. They a lso implement a Mo n te Car lo algorithm s imila r to Rao et al. [5] which uniformly samples simple realizations from a direc ted degr ee sequence, including mixing time cal- culations as w ell. They show, as we do, that the sp ecial case s where ~ C 3 reorientations are required are precisely the subset of ~ C 3 -anchored digraphs which we call ~ C ∗ 3 -anchored. Our paper differs in that our pro of is substantially sho r ter (built up on the structural characteriza tion of ~ C 3 -anchored digraphs fo und in [4]) and uses the degree sequence ch ara cterization o f the ~ C 3 -anchored digraphs to ident ify the ~ C 3 -anchors, a s o pposed to a more co mputationally intensive algo rithm that r equires the knowledge o f all induced 3-cyc les o f a given realizatio n. Using the degree sequence characterization is muc h faster (linear in the n umber of vertices) and allows us to us e the more efficien t 2-switch random w alk. W e b oth, ho wev er, sho w that the meta-graph consists of 2 k isomorphic subgraphs, where k is the num b er of a nc hored 3-c y cles. 2 Notation All directed graphs in this article will be simple, i.e. with no se lf-loops o r m ulti-arcs. W e consider int eger- pair sequences d = { ( d + i , d − i ) } N i =1 and say d is digr aphic if there exis ts a digra ph (i.e. direc ted graph) with degre e sequence d , denoting the set of digra ph realizatio ns of d by R ( d ). All int eger- pair sequences are assumed to be digraphic (otherwise R ( d ) = ∅ ), and th us d + and d − will denote th e out-degree a nd in-degr ee s equences of d , r espectively . W e denote dir ected gr aphs by ~ G , with V ( ~ G ) the vertex set and A ( ~ G ) the ar c set. W e w ill drop the reference to ~ G when the digraph is understo od through the notation ~ G = ( V , A ), for example. An arc b etw een vertices a a nd b will b e deno ted by ( a, b ), with the or ien tation given by the or dering. Given a digraph ~ G = ( V , A ) and vertex sets X , Y ⊂ V , w e defin e the subgraph ~ G [ X, Y ] = ( X ∪ Y , A [ X , Y ]), where A [ X, Y ] = { ( x, y ) ∈ A : x ∈ X and y ∈ Y } . When X = Y , we ha ve the usual definition of an induced s ubgraph and will denote this by ~ G [ X ]. W e will use the vertex lab eling nota tio n v i in place of L − 1 ( i ), where L is a bijectiv e la beling function L : V − → { 1 , . . . , | V |} go ing from v ertices to co ordina tes o f the degree seq uence. 3 Result Given a degree sequence d , we define the meta-graph Ω d = ( V , E ), where V is in one-to - one corre- sp ondence with R ( d ). W e will denote V ~ G ∈ V to b e t he v er tex corr espo nding to ~ G ∈ R ( d ). There are t wo types of e dges E = E 2 ∪ E 3 : ( V ~ G , V ~ G ′ ) ∈ E 2 if t here is a 2-switch betw een them. Similarly , ( V ~ G , V ~ G ′ ) ∈ E 3 if ther e is a ~ C 3 reorientation connecting them. W e have the following result: Theorem 3.1 (Rao e t a l. [5]) The meta-gr aph Ω d is c onne cte d. 2 W e can define a Markov c hain random walk on Ω d by an appropr iate c hoice of probability weigh ts for ea c h edge in E . There are man y choices for the weights, but for simplicity I will give as a n example probability weights induced b y a particularly simple random walk algorithm (see [6, 2]). Given a realization ~ G ( n ) ∈ R ( d ), with proba bility p attempt a 2- switc h and with probability 1 − p a ~ C 3 reorientation. F or a 2-switch, choose four vertices without replacement and, if p ossible, p erform a 2-s witc h to ar r iv e at ~ G ( n +1) . Otherwise, do nothing, i.e. ~ G ( n +1) = ~ G ( n ) . Similarly , for a ~ C 3 reorientation cho ose three vertices without replac emen t a nd, if p ossible, p erform a ~ C 3 reorientation to arrive at ~ G ( n +1) . Other wise, do nothing. The resulting probabilities for this Marko v c hain are given b y P ij = ( p/ N 4 if ( V ~ G i , V ~ G j ) ∈ E 2 , (1 − p ) / N 3 if ( V ~ G i , V ~ G j ) ∈ E 3 . By do ing nothing with fa iled mov e a ttempts, we imp ose self-lo ops at ea c h r ealization such that P | R ( d ) | j =1 P ij = 1 . By Theo rem 3.1 , this Mar k ov chain is irr e ducible, and it is eas ily s e en to b e symmetric and ap erio dic. Thus, there is a unique limiting distribution which b y symmetry must be the unifor m distribution. It is men tioned in [6, 5] that in most situations one need only use 2-switch es, and thus we can choose p to b e close to 1. The difficulty with this is there ar e degree s equences where this will lead to very long mixing times, due to the rar e ca ses where there is not a path with edges in E 2 connecting t wo realizations. The rarity of thes e ca ses is also unknown, and so there is no wa y to know how close to 1 one sho uld choo se p . What are the structure of these degree sequences, and can we iden tify them? It turns out we can iden tify them: they are a subset of what are kno wn as ~ C 3 -anc hored degree sequences, as defined below. Definition 3.2 We c al l a de gr e e se quenc e d ~ C 3 -anc hored if it is for cibly ~ C 3 -digr aphic and t her e exists a nonempty s et of c o or dinates J , c al le d a ~ C 3 -anc hor s et , such that for every c o or dinate i ∈ J a nd every ~ G ∈ R ( d ) , ther e is an induc e d sub gr aph ~ C ′ ⊆ ~ G with ~ C ′ ∼ = ~ C 3 and v i ∈ V ( ~ C ′ ) . A l l r e alizations G ∈ R ( d ) a r e also c al le d ~ C 3 -anc hored digraphs . The structural characterizatio n of ~ C 3 -anchored digraphs w as given in [4] b y a digra ph decom- po sition using M -partitions. An M -pa rtition of a digraph ~ G is a partition of the vertex-set V ( ~ G ) int o k disjoin t classes { X 1 , . . . , X k } , where the arc constr ain ts within and b et ween classes are g iv en by a symmetric k × k matrix M with elements in { 0 , 1 , ∗} (see [3]). M ii equals 0 or 1 when X i is an indep endent set or clique, resp ectiv ely , and is set to ∗ when ~ G [ X i ] is an arbitr ary subgra ph. Similarly , for i 6 = j , M ij equal to 0, 1, or ∗ corresp onds to ~ G [ X i , X j ] ha ving no arcs fr o m X i to X j , all ar cs from X i to X j , and no constra in ts on a r cs from X i to X j , resp ectiv ely . The subset o f ~ C 3 -anchored digr aphs that are the fo cus of this pap er are ca lled ~ C ∗ 3 -anchored and are rea lizations of ~ C 3 -anchored degree seq ue nc e s such that | J | = 3 K , wher e K is a p ositive integer, with ~ G [ { v j 3 n +1 , v j 3 n +2 , v j 3 n +3 } ] ≃ ~ C 3 for all r ealizations ~ G ∈ R ( d ), 0 ≤ n ≤ K − 1. Note that K denotes the n umber of anchor e d 3-cycles, i.e. those vertices that induce a directed 3- cycle for all realizations . ~ C ∗ 3 -anchored digra phs hav e a str uc tur al characterization g iv en b y the following theor e m (see Fig. 1 for a pictoria l representation): Theorem 3.3 (L aMar [4]) The digr aph ~ G = ( V , A ) is a ~ C ∗ 3 -anchor e d digr aph if and only if ther e is a C ⊂ V such that ~ G [ C ] ∼ = ~ C 3 and an M -p artition of ~ G [ V − C ] with vertex classes given by 3 C 0 C − C + C ± x u w v x u w v x u w v x u w v C ± C 0 C + C − Figure 1: Left : The 4 vertex classes {C 0 , C − , C + , C ± } of ~ C ∗ 3 -anchored digraphs defined by ho w a vertex x in ea c h class connects to a directed 3 -cycle ~ C 3 with vertex set { u, v , w } . Right : Dia gram showing the relations within and b etw e en the 4 po ssible vertex classes. Solid and dashed-dotted arrows denote forced and a llo wable arcs, resp ectively , while the abs ence of an arrow denotes no arcs. C ± is a clique, C 0 an indep endent set, while ~ G [ C − ] and ~ G [ C + ] are a rbitrary subgraphs. {C 0 , C − , C + , C ± } , wher e e ach class defin es how its elements r elate to C as follows: C 0 ≡ { x ∈ V − C : ( x, C ) ∪ ( C, x ) ⊂ A C } C − ≡ { x ∈ V − C : ( x, C ) ⊂ A and ( C, x ) ⊂ A C } C + ≡ { x ∈ V − C : ( C, x ) ⊂ A and ( x, C ) ⊂ A C } C ± ≡ { x ∈ V − C : ( x, C ) ∪ ( C, x ) ⊂ A } and matrix M given by C ± C − C + C 0 C ± 1 ∗ 1 ∗ C − 1 ∗ 1 ∗ C + ∗ 0 ∗ 0 C 0 ∗ 0 ∗ 0 . If we define the meta-graph Ω ′ d = ( V , E 2 ), then the following is the main theore m of this pap er. Theorem 3.4 Ω ′ d is disc onne cte d if and only if d is ~ C ∗ 3 -anchor e d. Pro of Let d b e ~ C ∗ 3 -anchored and ~ G ∈ R ( d ). It should b e cle a r that Ω ′ d is disco nnected, since every realization has a directed 3-cycle through the same three vertices, and thus no 2-s witc hes will connect realizations with the opp osite o rien tations for that 3-cycle without first removing the 3-cycle. Suppo se d is a degree sequence with Ω ′ d disconnected. By Theorem 3.1, there m ust b e a rea lization ~ G ∈ R ( d ) with an oriented direc ted 3-c ycle C suc h that V ~ G is in one connected co mponent of Ω ′ d and V ~ G ′ is in another connected co mponent, wher e ~ G ′ is found from ~ G b y r eorienting C . Let C = { u, v , w } with { ( u, v ) , ( v , w ) , ( w , u ) } ⊂ A . W e w a n t to show that for a n y vertex x ∈ V − C , x m ust be in one of the vertex classes C 0 , C − , C + or C ± . W e will sho w that ( C , x ) ⊂ A and/or A C (b y symmetry , we will als o ha ve ( x, C ) ⊂ A and/o r A C ). Suppose there is only one of the thr ee arcs, and witho ut loss of generality choose ( v , x ) ∈ A , with { ( u, x ) , ( w, x ) } ⊂ A C (the case with t wo existing arcs follows by considering the g raph complement). The left panel in Fig. 2 shows tha t we can pe r form a series of 2 -switc hes to reorie nt the 3-c y cle, which co n tradicts ~ G a nd ~ G ′ being in tw o separate connected comp onen ts of Ω ′ d . Thus, w e must hav e ( C, x ) ⊂ A a nd/ or A C , showing x ∈ C 0 , C − , C + or C ± . 4 x u v w x • • • • u 0 • 1 0 v 1 0 • 1 w 0 1 0 • x u v w x • • • • u 0 • 1 0 v 0 1 • 1 w 1 0 0 • x u v w x • • • • u 1 • 0 0 v 0 1 • 1 w 0 0 1 • x u v w x • • • • u 0 • 0 1 v 1 1 • 0 w 0 0 1 • x u w v x u w v x u w v x u w v x y u v w x • 1 0 0 0 y • • • • • u • 0 • 1 0 v • 0 0 • 1 w • 0 1 0 • x y u v w x • 0 0 0 1 y • • • • • u • 0 • 1 0 v • 1 0 • 0 w • 0 1 0 • x y u v w x • 0 0 0 1 y • • • • • u • 0 • 1 0 v • 0 1 • 0 w • 1 0 0 • x y u v w x • 0 0 0 1 y • • • • • u • 1 • 0 0 v • 0 1 • 0 w • 0 0 1 • x y u v w x • 1 0 0 0 y • • • • • u • 0 • 0 1 v • 0 1 • 0 w • 0 0 1 • u v w x y u v w x y u v w x y u v w x y u v w x y Figure 2: Both left and righ t panels sho w a series of 2-switches used to reo rien t a directed 3-cycle ~ C 3 (see Theorem 3 .4). The solid dots denote entries of the adjacency matrix which ar e not used in the s eries of mov es . Now we m ust show that the connections b et ween the v ertex clas ses a re g iv en by the M -matrix in Theorem 3.3. Let x ∈ C 0 ∪ C + and y ∈ C 0 ∪ C − , and supp ose that ( x, y ) ∈ A . In the rig h t panel of Fig . 2, we see a gain that there is a ser ies o f 2- s witc hes which reor ien ts the 3-cycle, showing ( x, y ) / ∈ A . By considering the gra ph complemen t, we can prov e ( x, y ) ∈ A for x ∈ C ± ∪ C − and y ∈ C ± ∪ C + . This shows d is ~ C ∗ 3 -anchored, thereby c ompleting the pro of. F o r every a nc hored 3-cycle C , there are t wo isomo r phic copies of connected compo nen ts of Ω ′ d corres p onding to each orientation o f C . In g eneral, Ω d th us has the following for m. Corollary 3.5 Ω d ≃ Ω d [ V ( G 2 )] × × k i =1 K 2 , wher e G 2 is one c onne cte d c omp onent of Ω ′ d and k denotes the numb er of anch or e d 3 -cycles. The real p ow er of this r esult is the knowledge that if we know where the anchored 3- cycles are , then we can simply choose an orientation for ea c h anchored 3-cy c le uniformly at random, a nd then 5 per form a r andom walk on the gr a ph Ω ′ d . This will b e an efficient pro cedure if the identification of the anc hored 3-cycles can b e done without too m uch work. It w as shown in [4] that ~ C ∗ 3 -anchored digraphs hav e not o nly a s tructural characterization as given in Theo r em 3.3 but also a degree- sequence characteriza tion. In other words, we can identify the anchored 3 -cycles using a simple pro cedure o n the degree sequence itself. T o this end, w e s ta rt with some definitions. Given a n integer sequence a , define the c orr e cte d c onjugate se quenc e a ′′ by a ′′ k = | I k | + | J k | , where I k = { i | i < k and a i ≥ k − 1 } , J k = { i | i > k and a i ≥ k } . Definition 3.6 A de gr e e se quenc e d = { ( d + i , d − i ) } N i =1 is non-incr e asing re lative t o the p os itiv e lexicographical ordering if and only if d + i ≥ d + i +1 , with d − i ≥ d − i +1 when d + i = d + i +1 . In this c ase, we wil l c al l d p os itiv ely ordered and denote the or dering by d i ≥ d i +1 . We say d is n on-incr e asing r elative to the negativ e lexicographical ordering by giving pr efer enc e t o the se c ond c o or dinate, c al ling d in t his c ase ne gativ el y ordered and denoting the or dering b y d i d i +1 . F o r a g iven deg ree sequence d = { ( d + i , d − i ) } N i =1 , define the sequences ¯ d = { ( ¯ d + i , ¯ d − i ) } N i =1 and d = { ( d + i , d − i ) } N i =1 to be the p ositive a nd negative order ings of d , resp ectively . F o r a degree seq ue nce d , define the slack sequences ¯ s a nd s by ¯ s l = l X i =1 [ ¯ d − ] ′′ i − l X i =1 ¯ d + i with ¯ s 0 ≡ 0 , s k = k X i =1 [ d + ] ′′ i − k X i =1 d − i with s 0 ≡ 0 . Theorem 3.7 (L aMar [4]) The d e gr e e se quenc e d = { ( d + i , d − i ) } N i =1 is ~ C ∗ 3 -anchor e d if and only if ther e ar e c o or dinates { j 1 , j 2 , j 3 } and an inte ger-p air ( k , l ) ≥ (1 , 1) su ch that d j 1 = d j 2 = d j 3 = ( k , l ) (1) with ( d j 1 , d j 2 , d j 3 ) = ( ¯ d l , ¯ d l +1 , ¯ d l +2 ) = ( d k , d k +1 , d k +2 ) (2) and the sla ck se quenc es satisfying (0 , 1 , 1 , 0) = ( ¯ s l − 1 , ¯ s l , ¯ s l +1 , ¯ s l +2 ) = ( s k − 1 , s k , s k +1 , s k +2 ) . (3) In this c ase, { v j 1 , v j 2 , v j 3 } induc es an a nchor e d 3 -cycle. Algorithm 3.8 T o achieve a uniformly sample d simple r e alization of a de gr e e se quenc e d , we che ck if d is ~ C ∗ 3 -anchor e d and identify the anch or e d 3-cycles using The or em 3.7, ra ndomly assign an orientation to e ach anchor e d 3-cycle with e qu al pr ob ability ( which effe ctively cho oses a c onne cte d c omp onent of Ω ′ d ), and t hen p erform a r andom walk on this c omp onent of Ω ′ d using only 2-switches. 6 4 Conclusion W e have shown that the degre e seq ue nc e s that r equire b oth types of mov e sets, i.e. 2- switc hes and directed 3-cycle r eorientations, ar e the ~ C ∗ 3 -anchored degree sequences whose d egree sequences hav e been c haracteriz e d in [4]. This characterization allows for a fast a lgorithm that iden tifies the ~ C 3 -anchor sets, leading to a Monte Car lo algorithm inv olving o nly 2-s witc hes. This is the second instance where ~ C 3 -anchored degree seq uences hav e b een found to be the sp ecial cases in alg orithms inv o lving dir ected gra phs (for the first case, see [4]). It is interesting to see where else this structure may b e imp ortant. Ac k no wledgmen ts The autho r thanks Greg ory Smith and Sarah Day for helpful discussio ns. This work w a s funded by the po stdoctor al and underg raduate biolo gical sc iences education pr o gram g rant aw arded to the College of William and Mary by the How a rd Hug he s Medica l Institute. References [1] Annab ell Berg er and Matthias M ¨ uller- Hannemann. Uniform sampling of undir e c ted a nd directed graphs with a fixed degree sequence. arXiv , 091 2.0685v1 , Dec 2009. [2] Geo rge W Cobb and Y ung-Pin Chen. An applica tion of markov chain monte car lo to communit y ecology . The Americ an Mathematic al Monthly , 110(4):2 6 5–288, 200 3. [3] T F eder, P Hell, S K lein, and R Mo t w ani. List pa rtitions. SIA M J . Discr ete Math. , 16(3):44 9–478, Jan 2 003. [4] M. Drew LaMar . Algorithms f or realizing degree sequences of directed graphs. arXiv , 0906.0 343v1, June 200 9 . [5] A Rao, Ra bindr anath Ja na, a nd Sura j Ba ndyopadhy ay . A marko v chain monte car lo metho d for generating r andom (0,1)-ma trices with g iv en mar ginals. Sankhya Ser. A , 58(2):225 –242, 19 96. [6] J o hn Rob erts. Simple metho ds for simulating so ciomatrice s with g iv en marg inal totals. So cial Networks , 22(3):273– 283, 200 0. 7
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment