Analysis of centrality in sublinear preferential attachment trees via the CMJ branching process

Analysis of cen tralit y in sublinear preferen tial attac hmen t trees via the Crump-Mo de-Jagers branc hing pro cess V arun Jog P o-Ling Loh vjog@ece.wisc.edu loh@ece.wisc.edu Departmen t of ECE Grainger Institute for Engineering Univ ersity of Wisconsin - Madison Madison, WI 53715 Octob er 2016 Abstract W e in vestigate cen trality and ro ot-inference prop erties in a class of growing random graphs kno wn as sublinear preferen tial attac hment trees. W e show that a con tin uous time branc hing pro cesses called the Crump-Mo de-Jagers (CMJ) branching pro cess is well-suited to analyze suc h random trees, and prov e that almost surely , a unique terminal tree centroid emerges, ha ving the prop ert y that it becomes more central than an y other ﬁxed vertex in the limit of the random gro wth process. Our result generalizes and extends previous work establishing persistent cen trality in uniform and linear preferential attac hmen t trees. W e also sho w that centralit y may b e utilized to generate a ﬁnite-sized 1 −  conﬁdence set for the root node, for any  > 0, in a certain sub class of sublinear preferential attac hment trees. 1 In tro duction Recen t y ears ha ve seen an explosion of datasets p ossessing some form of underlying netw ork struc- ture [ 14 , 5 , 19 , 31 ]. V arious mathematical mo dels hav e consequently b een deriv ed to imitate the b eha vior of real-world net works; desirable characteristics include degree distributions, connectivity , and clustering, to name a few. One p opular probabilistic mo del is the Barab´ asi-Alb ert mo del, also kno wn as the (linear) preferential attac hment mo del [ 4 ]. No des are added to the netw ork one at a time, and eac h new no de connects to a ﬁxed num b er of existing nodes with probability prop or- tional to the degrees of the no des. In addition to modeling a “rich get richer” phenomenon, the Barab´ asi-Albert mo del giv es rise to a scale-free graph, in whic h the degree distribution in the graph deca ys as an inv erse p olynomial p ow er of the degree, and the maximum degree scales as the square ro ot of the size of the net w ork. Suc h a property is readily observ ed in man y netw ork data sets [ 1 ]. Ho wev er, netw orks also exist in whic h the disparity b et w een high- and lo w-degree nodes is not as severe. In the sublinear preferen tial attachmen t mo del, no des are added sequentially with probabilit y of attac hmen t proportional to a fractional pow er of the degree. This leads to a stretched exp onen tial degree distribution and a maxim um degree that scales as a p o w er of the logarithm of the n um b er of no des [ 26 , 3 ]. Net w orks exhibiting suc h behavior include certain citation netw orks, Wikip edia edit net works, rating n etw orks, and the Digg netw ork [ 27 ]. The case when the probability of attac hment is uniform o ver existing v ertices is kno wn as uniform attac hment and is used to mo del net works in which the preference giv en to older no des is attributed only to birth order and not degree. The iterativ e nature of the preferen tial attachmen t mo del generates interesting questions con- cerning phenomena that arise (and p otentially v anish) as the netw ork expands. Dereic h and 1 M¨ orters [ 11 ] established the emergence of a p ersisten t h ub—a v ertex that remains the highest- degree no de in the netw ork after a ﬁnite amount of time—in a certain preferential attac hment mo del where edges are added indep enden tly . Suc h a result was also sho wn to hold for the Barab´ asi- Alb ert preferential attac hment mo del in Galashin [ 17 ]. Motiv ated by the fact that p ersistent hubs do not exist in uniform attachmen t mo dels, how ev er, our previous w ork [ 22 ] studied the problem of p ersisten t cen troids and established that the K most cen tral nodes according to a notion of “balancedness centralit y” alwa ys p ersist in preferential and uniform attac hmen t trees. Another related problem concerns identifying the oldest no de(s) in a net w ork. Shah and Za- man [ 33 ] ﬁrst studied this problem in the con text of a random gro wing tree formed by a diﬀusion spreading ov er a regular tree, and show ed that the cen troid of the diﬀusion tree agrees with the ro ot no de of the diﬀusion, with strictly p ositiv e probability . Bub ec k et al. [ 8 ] devised conﬁdence set estimators for the ﬁrst node in preferential and uniform attachmen t trees, in which the goal is to identify a set of no des con taining the oldest no de, with probabilit y at least 1 −  . They show ed that when nodes are selected according to an appropriate measure of “balancedness centralit y ,” the required size of the conﬁdence set is a function of  that do es not grow with the o verall size of the netw ork. These results w ere later extended to diﬀusions spreading ov er regular trees by Khim and Loh [ 25 ]. Graph centralit y ideas, in particular balancedness centralit y , ha v e also b een lev er- aged in T an et al. [ 37 ] to iden tify the most inﬂuential vertices in a so cial netw ork. Luo et al. [ 28 ] studied the problem of iden tifying single or multiple sources of rumors in a graph and proposed certain eﬃciently computable estimators related to the MAP estimator emplo y ed in Shah and Za- man [ 33 ]. Recen tly , rumor iden tiﬁcation has also b een analyzed in certain probabilistic mo dels, suc h as rep eated observ ations of rumor spreading in Dong et al. [ 13 ], and incomplete information ab out rumor spreading in Karamc handani et al. [ 24 ]. In addition to having ob vious practical im- plications for pinp oin ting the origin of a netw ork based on observing a large graph, identifying and remo ving the oldest no des may hav e desirable deleterious eﬀects from the p oin t of view of netw ork robustness [ 15 ]. Previous analysis of determining a ﬁnite conﬁdence set [ 8 , 25 ], as well as establishing the p ersis- tence of a unique tree centroid [ 22 ], crucially dep ended on the following prop ert y satisﬁed by linear preferen tial attachmen t, uniform attachmen t, and diﬀusions ov er regular trees: the “attraction function” relating the degree of a v ertex to its probability of connection at each time step is linear. Bub ec k et al. [ 8 ] p osed an open question concerning the existence of ﬁnite-sized conﬁdence sets in the case of sublinear or superlinear preferen tial attachmen t; w e likewise conjectured in previous w ork that a unique centroid should persist for a more general class of nonlinear attraction func- tions [ 22 ]. How ev er, the techniques in these pap ers do not extend readily to nonlinear settings. An approac h to dealing with more complicated tree mo dels in the con text of diﬀusions w as presen ted in Shah and Zaman [ 34 ], using a con tin uous time branc hing process kno wn as the Bellman-Harris branc hing process. In this pap er, w e show that preferen tial attac hmen t trees with nonlinear at- traction functions may also b e analyzed via con tinuous time branc hing pro cesses. Our results rely on properties of the Crump-Mo de-Jagers (CMJ) branc hing process [ 9 , 10 , 20 ]. Con tinuous time branc hing pro cesses were previously leveraged by Bhamidi [ 6 ] and Rudas et al. [ 32 ] to establish prop erties regarding the degree distribution, maxim um degree, heigh t, and lo cal structure of a large class of preferen tial attachmen t trees. Our main contributions are tw ofold: First, we establish the prop ert y of terminal c entr ality for sublinear preferen tial attac hmen t trees, thereby addressing our conjecture in [ 22 ]. W e pro ve the existence of a unique v ertex that b ecomes more central than an y other vertex, in the limit of the gro wth process. In fact, the existence of a p ersisten t cen troid implies terminal cen tralit y , but the latter implication might not hold, since p ersisten t cen trality requires a tree cen troid to emerge and remain the centroid starting from a single ﬁnite time p oin t. Second, w e aﬃrmatively 2 answ er the op en question of Bub eck et al. [ 8 ] by devising ﬁnite-sized conﬁdence sets for the ro ot no de in sublinear preferen tial attachmen t trees. Due to the inapplicability of P´ olya urn theory in the present setting, the pro of tec hniques emplo yed in our pap er diﬀer signiﬁcantly from the analysis used in previous work. F urthermore, the literature concerning CMJ branching pro cesses is v ast and unconsolidated, and another imp ortan t technical contribution of our pap er is to gather relev an t results and show that they ma y be applied to study sublinear preferen tial attac hmen t trees. The remainder of the pap er is organized as follo ws: In Section 2 , we review CMJ branching pro cesses and sho w how to em b ed a preferential attachmen t tree in a CMJ pro cess. W e also verify that the CMJ pro cesses corresp onding to certain sublinear preferential attachmen t trees enjo y use- ful conv ergence prop erties. In Section 3 , w e establish the existence of a unique terminal centroid in sublinear preferential attachmen t trees. In Section 4 , w e prov e that the conﬁdence set construction via the same centralit y measure leads to ﬁnite-sized conﬁdence sets for the ro ot no de. Although we b eliev e sublinear preferen tial attachmen t trees should also p ossess a p ersisten t centroid, some chal- lenges arise in bridging the gap b et w een terminal centralit y and p ersisten t centralit y . W e discuss these challenges and related op en problems in Section 5 . Additional pro of details are con tained in the supplementary app endices. Notation: W e write V ( T ) to denote the set of vertices of a tree T , and write Max-Deg( T ) to denote the maximum degree of the vertices in T . F or u ∈ V ( T ), we write ( T , u ) to denote the corresp onding ro oted tree, which is a tree with directed edges emanating from u . W e write ( T , u ) v ↓ to denote the subtree directed a w ay from u and starting from v . Finally , we write Out-Deg( v ) to denote the n um b er of c hildren of v ertex v in the ro oted tree. 2 Preliminaries In this section, we review prop erties of the CMJ branching pro cess, la ying the groundwork for our analysis of sublinear preferen tial attachmen t trees. The CMJ branching pro cess is a general age- dep enden t con tin uous time branching pro cess mo del in tro duced b y Crump, Mode, and Jagers [ 9 , 10 , 20 ]. It b egins with a single individual, kno wn as the ancestor, at time t = 0. An individual x may giv e birth m ultiple times throughout its lifetime, and the times at whic h it pro duces oﬀspring are giv en by a p oin t pro cess ξ on R + . The deﬁning prop erty of branching pro cesses is that individuals b eha v e in an i.i.d. manner; i.e., ev ery individual starts its own indep endent p oin t pro cess of births from the moment it is b orn until the time it dies. The resulting branching pro cess is said to b e driven by ξ . Many common branc hing processes are special cases of a CMJ pro cess with an appropriate p oin t pro cess and lifetime random v ariable: If individuals hav e random lifetimes and giv e birth to a random num b er of c hildren at the moment of their death, the resulting branching pro cess is called the Bellman-Harris process. If the lifetimes of individuals are also constant (usually tak en to b e 1), the resulting pro cess is kno wn as the Galton-W atson process [ 2 , 18 ]. Deﬁnition 1 (Random preferen tial attac hmen t tree with attraction function f ) . A sequence of random trees { T n } is generated as follo ws: A t time n = 1, the tree T 1 consists of a single v ertex v 1 . A t time n + 1, a new vertex v n +1 is added to T n via a directed edge from a v ertex v i to v n +1 , where v i is chosen with probabilit y prop ortional to f (Out-Deg( v i )) and Out-Deg( v i ) is computed with resp ect to the tree T n . Th us, the linear preferen tial attachmen t tree corresponds to the attraction function f ( i ) = i + 1, 1 and the uniform attac hment tree corresp onds to the constan t function f ≡ 1. W e now deﬁne 1 Note that for all nodes except the root no de, Deg( v i ) = Out-Deg( v i ) + 1. Th us, this mo del diﬀers sligh tly from the one considered in our previous work [ 22 ] and in Bub eck et al. [ 8 ], since the attractiv eness of v 1 is prop ortional to 3 sublinear preferen tial attachmen t trees, which ha v e an attraction function that lies strictly b et ween those of a linear preferential attachmen t tree and a uniform attac hmen t tree. Deﬁnition 2 (Sublinear preferential attac hmen t trees) . Sublinear preferential attac hmen t trees are preferential attachmen t trees with an attraction function f satisfying the following conditions: 1. f is a nondecreasing function. 2. f ( i ) ≥ 1 for all i ≥ 0, and f is not iden tically equal to 1. 3. There exists 0 < α < 1 suc h that f ( i ) ≤ ( i + 1) α , for all i ≥ 0. Note that the last condition implies f (0) = 1. When f ( i ) = ( i +) α , we denote the corresponding tree to b e the α -sublinear preferen tial attac hment tree. T o deﬁne the branc hing pro cess correspond- ing to a preferential attachmen t tree, we deﬁne the p oin t process ξ asso ciated with the attraction function f : Deﬁnition 3 (P oint process asso ciated to f ) . Giv en an attraction function f , the asso ciated p oin t pro cess ξ on R + is a pure-birth Mark ov pro cess with f as its rate function: P ( ξ ( t + dt ) − ξ ( t ) = 1 | ξ ( t ) = i ) = f ( i ) dt + o ( dt ) , with the initial condition ξ (0) = 0. Note that w e do not need to normalize the rate of this Mark o v pro cess: Consider a CMJ pro cess driv en by the p oin t pro cess ξ as ab o ve, in which individuals never die. Supp ose that at some time t 0 , the branching pro cess consists of n individuals { v 1 , . . . , v n } , where the n um b er of c hildren of no de v i is denoted by d i . In the discrete time tree evolution, the next vertex v n +1 attac hes to v ertex v i with probabilit y f ( d i ) P n j =1 f ( d j ) . In the con tinuous time process, the new v ertex “attaches to v i ” if and only if node i has a c hild b efore an y of the other no des. This child is then v n +1 . Using prop erties of the exp onen tial distribution, we may chec k that this happ ens with probability f ( d i ) P n j =1 f ( d j ) , which is exactly the same as that in the discrete time tree evolution. Th us, if we lo ok at the CMJ branching pro cess at the stopping times when successive vertices are b orn, the resulting trees evolv e in the same wa y as in the discrete time model described in Deﬁnition 1 . Deﬁnition 4 (Malthusian parameter) . F or a p oin t pro cess ξ on R + , let µ ( t ) = E [ ξ (0 , t ]] denote the mean in tensity measure. The p oint pro cess ξ is a Malthusian pr o c ess if there exists a parameter θ > 0 suc h that θ Z ∞ 0 e − θt µ ( t ) dt = 1 . The constant θ is called the Malthusian p ar ameter of the p oin t pro cess ξ . Example 1. F or the line ar pr efer ential attachment tr e e with f ( i ) = i + 1 , the asso ciate d p oint pr o c ess ξ is the standard Y ule process , deﬁne d as fol lows: (a) ξ (0) = 0 , and (b) P ( ξ ( t + dt ) − ξ ( t ) = 1 | ξ ( t ) = i ) = ( i + 1) dt + o ( dt ) . Deg( v 1 ) + 1 rather than Deg( v 1 ). 4 The me an intensity me asur e for the Y ule pr o c ess is µ ( t ) = e t − 1 , and the Malthusian p ar ameter is e qual to 2. Example 2. F or the uniform attachment tr e e with f ≡ 1 , the asso ciate d p oint pr o c ess ξ is the P oisson point pro cess with r ate 1. The me an intensity me asur e is µ ( t ) = t , and the Malthusian p ar ameter is e qual to 1. The Malthusian parameter of a point pro cess plays a critical role in the theory of branching pro cesses. It accurately c haracterizes the growth rate of the p opulation generated by the CMJ branc hing pro cess driven by the p oin t pro cess, as follows: If the p opulation at time t is given b y Z t , the random v ariable e − θt Z t con verges to a nondegenerate random v ariable W . V arious assumptions on the p oin t pro cess lead to diﬀerent t yp es of conv ergence results, such as con vergence in distribution, in probabilit y , almost surely , in L 1 , or in L 2 [ 9 , 10 , 12 , 30 ]. As derived in Lemma 9 in App endix A , the Malth usian parameter for a sublinear preferential attachmen t pro cess alwa ys exists and lies b et ween the v alues corresp onding to linear preferential attachmen t and uniform attac hment trees describ ed in Examples 1 and 2 . Our results will rely heavily on the following theorem: Theorem 1. L et ξ b e the p oint pr o c ess c orr esp onding to a subline ar attr action function f . The CMJ br anching pr o c ess Z t driven by ξ describing the gr owing r andom tr e e satisﬁes e − θt Z t L 2 , a.s. − → W, wher e W is an absolutely or singular c ontinuous r andom variable supp orte d on al l of R + , satisfying W > 0 , almost sur ely. The pro of of Theorem 1 , which is contained in App endix A , is established b y showing that the tec hnical conditions required for certain theorems ab out CMJ pro cesses [ 30 , 21 , 7 ] are satisﬁed b y the p oin t process ξ . 3 T erminal cen tralit y W e no w turn to our main result, whic h establishes the existence of a unique terminal centroid in sublinear preferential attac hment trees. W e b egin b y introducing some notation and basic terminology . Consider the function ψ T : V ( T ) → N deﬁned by ψ T ( u ) = max v ∈ V ( T ) \{ u } | ( T , u ) v ↓ | . Recall that ( T , u ) v ↓ denotes the subtree of T directed a wa y from u , starting at v , as depicted in Figure 1 . Thus, ψ T ( u ) is the size of the largest subtree of the ro oted tree ( T , u ), and measures the lev el of “balancedness” of the tree with respect to v ertex u . W e m ak e the following deﬁnition: Deﬁnition 5. Given a tree T , a v ertex u ∈ V ( T ) is called a c entr oid if ψ T ( u ) ≤ ψ T ( v ), for all v ∈ V ( T ). Note that although w e ha ve deﬁned the centroid with respect to the criterion ψ T , n umerous equiv alent characterizations of tree cen troids exist [ 23 , 19 , 38 , 35 , 36 , 29 ]. (The c haracterization app earing in Deﬁnition 5 coincides with the notion of “rumor cen ter” deﬁned by Shah and Za- man [ 34 ].) F urthermore, a tree may ha ve more than one centroid (although by Lemma 11 in 5 ( T, u ) u v ( T, u ) v # Figure 1: A tree T ro oted at vertex u . The subtree ( T , u ) v ↓ is highlighted. App endix B , a tree may hav e at most t w o cen troids, whic h must then be neigh b ors). F or any tw o no des u and v , if ψ T ( u ) ≤ ψ T ( v ), we sa y that u is at le ast as c entr al as v . Finally , w e deﬁne the notion of terminal cen trality: Deﬁnition 6. A vertex v ∗ ∈ ∪ ∞ n =1 V ( T n ) is a terminal c entr oid for the sequence of gro wing trees { T n } n ≥ 1 if for every vertex u 6 = v ∗ , there exists a time M (possibly dependent on u ), suc h that for all times n ≥ M , w e hav e ψ T n ( v ∗ ) < ψ T n ( u ) . Th us, the terminal cen troid ev entually b ecomes more central than any other ﬁxed vertex. (Note, ho wev er, that terminal cen trality do es not immediately imply the prop ert y of p ersistent c entr ality ; for instance, v ∗ migh t b e a terminal centroid without ever b eing the centroid at an y ﬁnite time.) W e hav e the follo wing theorem: Theorem 2. Subline ar pr efer ential attachment tr e es have a unique terminal c entr oid with pr ob a- bility 1. The statemen t and proof of Theorem 2 ma y b e compared to the results obtained in our previous w ork [ 22 ], which establish p ersisten t centralit y for the sp ecial cases α = 0 and α = 1. F or a subtree T , deﬁne the attractiveness of T as the sum of the attraction functions ev aluated at each vertex of T . In the case of uniform attachmen t, the attractiveness of T is simply | T | , whereas for linear preferen tial attachmen t, it is the sum of the degrees of the vertices, which is 2 | T | − 1. The linearity of attractiv eness in | T | was critical to obtaining sharp b ounds on the diagonal crossing probability of certain random w alks. When α ∈ (0 , 1), how ever, the attractiv eness of T is no longer a function of | T | alone, rendering the metho ds of our previous w ork defunct. In the present pap er, we leverage a con tinuous time embedding and conv ergence results for CMJ pro cesses to pro ve terminal centralit y for a large class of sublinear preferential attac hment trees, with the tradeoﬀ b eing a slightly w eaker theoretical guarantee. Pr o of of The or em 2 (sketch). The k ey steps of the pro of are as follo ws: (i) Identify a necessary condition that a v ertex m ust satisfy in order to be a terminal centroid. 6 (ii) Show that the set of v ertices satisfying the condition in (i), called the set of c andidate terminal c entr oids and denoted by C CAN , is nonempt y and ﬁnite with probability 1. (iii) Show that among the set of candidate terminal centroids, a unique v ertex emerges that ev entually b ecomes more cen tral than an y other candidate. (iv) Show that the v ertex in (iii) is the unique terminal cen troid. W e ﬁrst describe the necessary condition in step (i). (F or an illustration, see Figure 2 .) Let v ∗ ( n ) be a cen troid of the tree T n . If T n has t w o cen troids, w e c ho ose v ∗ ( n ) to b e the y ounger v ertex from among the tw o. If v ertex v n +1 is a terminal centroid, it must necessarily b ecome more cen tral than v ∗ ( n ) after a ﬁnite amount of time. Consequently , let C CAN := { v n +1 : ∃ M s.t. ψ T m ( v n +1 ) < ψ T m ( v ∗ ( n )) ∀ m ≥ M } , and deﬁne E n to b e the even t { v n +1 ∈ C CAN } . W e follow the conv en tion of considering v 1 to b e a candidate terminal cen troid; in particular, C CAN 6 = φ . In fact, for n > 1, either v ∗ ( n ) or v n +1 ev entually b ecomes more cen tral than the other, whic h follo ws from the following lemma: Lemma 1. F or any two vertic es u and v , ther e exists a time M such that either ψ T m ( u ) < ψ T m ( v ) or ψ T m ( u ) > ψ T m ( v ) holds for al l m > M , almost sur ely. Pr o of. Without loss of generality , assume u is b orn b efore v . Let T v and T u denote the trees ( T m , u ) v ↓ and ( T m , v ) u ↓ , where m is the time of birth of v . Note that T v consists of the single v ertex v . W e now restart the pro cess in contin uous time; i.e., we start indep enden t CMJ pro cesses initiated from the starting states T u and T v . Using Theorem 1 , we ha ve the a.s. con vergence result | ( T m , v ) u ↓ | | ( T m , u ) v ↓ | a.s. − → W u W v , (1) for absolutely or singular contin uous indep enden t random v ariables W u and W v , whose distribu- tions are determined b y the structure of the starting states T u and T v , resp ectiv ely . Since W u − W v cannot hav e p oin t masses, we hav e P ( W u = W v ) = P ( W u − W v = 0) = 0 . Th us, either W u > W v or W u < W v , almost surely . The almost sure conv ergence in equation ( 1 ) implies that there exists M > 0 such that either | ( T m , v ) u ↓ | > | ( T m , u ) v ↓ | or | ( T m , v ) u ↓ | < | ( T m , u ) v ↓ | , for all m > M . Applying Lemma 13 in App endix B concludes the pro of. The following lemma furnishes the result in step (ii): Lemma 2. |C CAN | < ∞ , with pr ob ability 1. Pr o of. W e ﬁrst sho w that any node joining the tree suﬃciently late has a v ery small chance of b elonging to C CAN . By Lemma 13 in App endix B , the even t E n o ccurs if and only if there exists M > 0 such that for all m ≥ M , | ( T m , v ∗ ( n )) v n +1 ↓ | > | ( T m , v n +1 ) v ∗ ( n ) ↓ | . (2) T o sim plify notation, we deﬁne A m := ( T m , v ∗ ( n )) v n +1 ↓ and B m := ( T m , v n +1 ) v ∗ ( n ) ↓ , for m ≥ n + 1. Lemma 12 in App endix B implies that at time m = n + 1, the num b er of vertices in B m is at least 7 v n +1 v ⇤ ( n ) T n Figure 2: The notation from Lemma 2 is illustrated ab o ve. The centroid of tree T n is v ∗ ( n ), and v n +1 is the new est vertex joining T n to form T n +1 . n 2 . Thus, B m has a large lead ov er A m , whic h has only one vertex. A t time n + 1, we pause the pro cess in discrete time and restart it in con tinuous time, with state at t = 0 b eing the state at the (discrete) time n + 1. Observe that if a time M exists such that inequality ( 2 ) holds, a time Γ > 0 m ust also exist such that the contin uous time trees satisfy | A τ | > | B τ | , for all t > Γ. Note that the population | A t | is simply a sublinear preferential attac hment process started from a single v ertex, whic h we denote by Y ( t ). The p opulation | B t | sto c hastically dominates the sum of n 2 indep enden t sublinear preferential attachmen t pro cesses starting from a single vertex, whic h w e subsequen tly denote b y X 1 ( t ) , . . . , X n/ 2 ( t ). Thus, the probabilit y that E n o ccurs is upp er- b ounded by the probability that Y ( t ) even tually becomes larger than P n/ 2 i =1 X i ( t ). By Theorem 1 , the rescaled pro cesses e − θt Y ( t ) and e − θt X i ( t ), for 1 ≤ i ≤ n/ 2, all conv erge a.s. to i.i.d. random v ariables, whic h w e denote b y W Y and { W i } 1 ≤ i ≤ n/ 2 , resp ectively . Th us, the probability that Y ( t ) ev entually b ecomes larger than P n/ 2 i =1 X i ( t ) is equal to the probability that W Y is greater than P n/ 2 i =1 W i . Using Lemma 14 in App endix D , w e conclude that this probability is upp er-bounded by C n 2 , for some constant C . Finally , since P 1 n 2 is a conv ergent sequence, the B orel-Can telli lemma implies that with probabilit y 1, only ﬁnitely many even ts E n o ccur, completing the proof. F or step (iii), we simply note that Lemma 1 implies a ﬁxed ordering via centralit y for an y t w o v ertices. Thus, if we ha ve a ﬁnite set such as C CAN , a rep eated application of Lemma 1 to members of this set yields a ﬁxed ordering from the most central to the least central v ertices in C CAN . Let v ∗ b e the most cen tral vertex from the set C CAN that emerges from this ordering. Step (iv) is provided b y the follo wing lemma: Lemma 3. The vertex v ∗ is the unique terminal c entr oid. Pr o of. Let u 0 6 = v ∗ b e an y vertex. If u 0 ∈ C CAN , the c hoice of v ∗ implies that v ∗ ev entually becomes more central than u 0 . Th us, we assume u 0 / ∈ C CAN , me aning the centroid at the time vertex u 0 w as b orn, whic h we denote by u 1 , even tually b ecomes more cen tral than u 0 in the limit. If u 1 ∈ C CAN , then v ∗ ev entually b ecomes more cen tral than u 1 , whic h in turn even tually b ecomes more cen tral than u 0 , as w anted. If instead u 1 / ∈ C CAN , we may consider u 2 , which is the centroid when u 1 w as 8 b orn. Contin uing in this manner, w e deﬁne a sequence u 0 , u 1 , u 2 , . . . of progressively older, whic h is necessarily ﬁnite, with the last v ertex in the sequence b eing v 1 . Thus, if we deﬁne r = min i ≥ 0 { u i ∈ C CAN } , then u r is w ell-deﬁned. W e then hav e that v ∗ is more central than u r , whic h is more cen tral than u r − 1 , whic h is more cen tral than u r − 2 , and so on, con tin uing up to u 0 . This completes the pro of. This also completes the pro of of Theorem 2 . In fact, Theorem 2 may be extended to establish the existence of a ﬁxed set of size K > 0 consisting of the most terminally cen tral vertices. This is summarized in the following theorem: Theorem 3. F or any K ≥ 1 , a unique set of distinct vertic es { v ∗ 1 , v ∗ 2 , . . . , v ∗ K } exists such that for any other vertex u ∈ ∪ ∞ n =1 V ( T n ) , ther e exists a time M (p ossible dep endent on u ) such that ψ T n ( v ∗ 1 ) < ψ T n ( v ∗ 2 ) < · · · < ψ T n ( v ∗ K ) < ψ T n ( u ) , for al l n ≥ M . Pr o of. The argumen t closely parallels that of the pro of of Theorem 2 in our previous w ork [ 22 ], with appropriate mo diﬁcations to prov e terminal centralit y instead of p ersisten t centralit y . W e refer the reader to our earlier paper, noting that the argumen t only requires properties of absolute or singular con tinuit y of the appropriately normalized subtree sizes, which are pro vided by Theorem 1 . 4 Finite conﬁdence set for the ro ot F or the results in this section, w e limit our consideration to α -sublinear preferen tial attac hment trees. Recall that these are trees in whic h the attraction function is given b y f ( i ) = (1 + i ) α , for α ∈ (0 , 1). The problem of ﬁnding a conﬁdence set for the ro ot no de in the case of linear preferen tial and uniform attachmen t trees w as studied by Bub ec k et al. [ 8 ]. One prop osed metho d for constructing a conﬁdence set that con tains the ro ot no de with probabilit y 1 −  is as follo ws: 1. Given a sequence of random trees { T n } , order the vertices according to the balancedness function ψ T n . 2. Select the K vertices with the smallest v alues of ψ T n , for a prop er v alue of K = K (  ). The abov e metho d was shown to pro duce ﬁnite-sized conﬁdence sets in Bub ec k et al. [ 8 ], and the analysis w as later extended to diﬀusions o ver regular trees [ 25 ]. In fact, the con tinuous time analysis of sublinear preferen tial attachmen t trees also furnishes a metho d for b ounding the required size of a conﬁdence set for the ro ot no de. F ollowing the notation of Bub ec k et al. [ 8 ], we use H K ψ ( T n ) to denote the set of K vertices c hosen according to the metho d describ ed ab o v e, and drop the argumen t T n when the context is unambiguous. Our main result shows that the same estimator pro duces ﬁnite-sized conﬁdence sets for sublinear preferential attachmen t trees: Theorem 4. F or  > 0 , ther e exists a c onstant K (dep ending on  ) such that lim inf n →∞ P  v 1 ∈ H K ψ ( T n )  ≥ 1 − . 9 v 1 v 2 v 3 v 4 v 5 ·· · ·· · ·· · ·· · ·· · T n i =4 ,K =5 T n i =3 ,K =5 Figure 3: An illustration of the trees T n i,K deﬁned in the pro of of Theorem 4 . The ﬁgure sho ws a tree with K = 5, with the tw o trees T n 3 , 5 and T n 4 , 5 highligh ted. Pr o of of The or em 4 (sketch). W e follo w the approac h of Bub eck et al. [ 8 ]. F or 1 ≤ i ≤ K , let T n i,K denote the tree containing v ertex v i in the forest obtained from T n b y remo ving all edges b et ween no des { v 1 , . . . , v K } . (See Figure 3 for an illustration.) Observe that P ( v 1 / ∈ H K ψ ) ≤ P ( ∃ i > K : ψ ( v i ) ≤ ψ ( v 1 )) ≤ P ( ψ ( v 1 ) ≥ (1 − δ ) n ) + P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ ) n ) , (3) for δ > 0 to b e chosen later. T o handle the ﬁrst term in inequality ( 3 ), w e hav e the follo wing lemma: Lemma 4 (Pro of in App endix C.1 ) . Ther e exists δ 0 > 0 such that lim sup n →∞ P ( ψ ( v 1 ) ≥ (1 − δ 0 ) n ) <  2 . The pro of of the ab o ve lemma is simple and follo ws b y an argumen t similar to that in Bubeck et al. [ 8 ]. The analysis of the second term in inequality ( 3 ) is more technical: Lemma 5 (Pro of in App endix C.2 ) . Ther e exist c onstants N and C dep ending only on  such that if K > N and C K (log K ) 2 1 − α ( K − 1) 2 <  4 , then lim sup n →∞ P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) <  2 . A brief proof sk etc h of Lemma 5 is as follows. First, we claim that for an y i > K , ψ ( v i ) ≥ min 1 ≤ k ≤ K K X j =1 ,j 6 = k | T n j,K | . This is b ecause v i m ust lie in one of the trees T n k,K , for some 1 ≤ k ≤ K . Th us, the largest subtree hanging oﬀ v i is at least as large as the subtree of ( T , v i ) containing vertex v k . F rom Figure 4 , w e 10 see that this is at least P K j =1 ,j 6 = k | T n j,K | , which is in turn at least min 1 ≤ k ≤ K P K j =1 ,j 6 = k | T n j,K | . Hence, the desired expression ma y be lo w er-b ounded by P  ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n  ( a ) = P  ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ 1 − δ 0 δ 0 | T k,K |  , where ( a ) follo ws b ecause P K j =1 | T n j,K | is simply the total n umber of vertices, which is n . This ﬁnal term may b e b ounded from ab o v e b y observing that (i) the growth rate of | T n k,K | is larger for a large degree of v k in T K ; and (ii) the maximum degree of T K is roughly (log K ) 1 / (1 − α ) , which is not suﬃcien t to increase the gro wth of | T n k,K | so as to compete with the sum of K − 1 trees giv en b y P K j =1 ,j 6 = k | T n j,K | , when K is suﬃciently large. Com bining the bounds of Lemmas 4 and 5 and substituting back into inequality ( 3 ) completes the pro of. 5 Discussion In this pap er, we ha v e established the existence of a unique terminal centroid in sublinear preferen- tial attac hment trees. Ho w ever, our results hav e stopp ed short of proving that a p ersisten t cen troid exists, whic h w as conjectured in our previous w ork [ 22 ]. T o establish the stronger statemen t, it w ould suﬃce to show that the terminal centroid iden tiﬁed in our pap er is in fact p ersisten t. (Al- though somewhat counterin tuitive, the deﬁnition of terminal cen trality lea ves op en the p ossibilit y that the terminal cen troid nev er actually becomes the tree centroid at any ﬁnite time p oin t.) A p ossible approach lev erages ideas from our previous w ork [ 22 ]. W e now describ e the main b ottlenec k in extending the argumen t emplo yed there in the present setting. F or the purp ose of this discussion, supp ose the sublinear preferential attac hment tree has an attraction function f ( i ) = ( i + 1) α , for some 0 < α < 1. The analog of our previous approac h [ 22 ] would inv olve tw o steps: (i) sho wing that the total n um b er of vertices that ev er become cen troids is ﬁnite, almost surely; and (ii) concluding the existence of unique p ersisten t centroid by an application of Lemma 1 . In the ﬁrst step, w e need to show that vertices which are b orn late hav e a very small probability of ev er b ecoming the tree cen troid at any future p oint in time, and then apply the Borel-Cantelli lemma to establish ﬁniteness. As describ ed in the pro of of Theorem 2 , a v ertex v n +1 can b ecome cen troid at some p oin t in the future if and only if the tree A t is able to catc h up with the tree B t at some future time t . How ever, unlike in the case of uniform or linear preferen tial attac hment trees, the probabilit y that a new vertex joins A t or B t is no longer prop ortional to a simple linear function of the size of the subtree. Based on these ideas, ho wev er, one can show that a suﬃcient condition for persistent cen trality in the tree growth pro cess is as follo ws: Irrelev ance of structure condition: There exists a parameter η ∈ (0 , 1) suc h that for any t wo trees Γ 1 and Γ 2 with | Γ 1 | = | Γ 2 | , the probabilit y that the CMJ process started from Γ 1 has a larger population than the CMJ pro cess started from Γ 2 , in the limit, lies in the in terv al ( η , 1 − η ). The ab o ve condition essen tially ensures that the structure of the tree does not hav e a signiﬁcant impact on ho w quickly it gro ws. Note that for linear preferen tial attac hment and uniform attachmen t trees, we can tak e any η < 1 / 2, since the probabilit y that the population of Γ 1 is larger than the p opulation of Γ 2 in the 11 limit is exactly 1 2 . Irrelev ance of structure is thus crucially leveraged in the pro of strategy of our previous work [ 22 ]. Unfortunately , the irrelev ance of structure condition do es not app ear to hold for sublinear preferen tial attachmen t trees, due to the following small example: Example 3. L et Γ 1 b e the “line tr e e”; i.e. a se quenc e of r vertic es { v 1 , v 2 , . . . , v r } such that the e dge set is { ( v i , v i +1 ) : 1 ≤ i ≤ r − 1 } . L et Γ 2 b e the “star tr e e” with the e dge set b eing { ( v 1 , v i ) : 2 ≤ i ≤ r } . If the p opulation of the CMJ pr o c ess starte d fr om Γ i is asymptotic al ly W i e θt for i ∈ { 1 , 2 } , it is p ossible to show that W 1 r and W 2 r c onver ge in pr ob ability to some c onstant s c 1 and c 2 , such that c 1 > c 2 , as r → ∞ . Thus, the pr ob ability that the p opulation of Γ 1 is lar ger than the p opulation of Γ 2 in the limit tends to 1 as r → ∞ . This violates the irr elevanc e of structur e c ondition, sinc e no matter which η > 0 is chosen, the pr ob ability of Γ 1 b eing lar ger than Γ 2 in the limit surp asses 1 − η for al l lar ge enough r . Of course, line trees and star trees do not t ypically sho w up in sublinear preferen tial attac hment trees, so it may be p ossible to redeﬁne the irrelev ance of structure prop ert y to rule out such lo w-probability conﬁgurations. Ho wev er, a suitable mo diﬁcation that pav es the w ay to pro ving p ersisten t centralit y has yet to be determined. Finally , supp ose we deﬁne C ∞ to b e the set of all v ertices whic h are tree centroids for inﬁnite amoun ts of time. It is easy to see that Lemma 1 rules out the possibility of C ∞ ≥ 2, since any tw o v ertices in C ∞ will ha ve a ﬁxed centralit y ordering in the limit. Note that C ∞ = 0 implies that an inﬁnite n umber of vertices ev er b ecome cen troids, alb eit for ﬁnite amoun ts of time; whereas C ∞ = 1 also do es not preclude suc h a p ossibilit y . A weak er conjecture than p ersistent cen trality , but stronger than terminal centralit y , would therefore be to sho w that |C ∞ | = 1. This question curren tly remains op en. Regarding ro ot inference, it is an op en question whether the method of constructing ﬁnite conﬁdence sets based on cen trality con tinues to hold b ey ond the subclass of α -sublinear preferen tial attac hment trees. Also note that the results on ro ot inference in this pap er are generally weak er than those obtained for linear preferential and uniform attachmen t trees in Bub ec k et al. [ 8 ] and Jog and Loh [ 22 ], since we hav e not provided b ounds on the size of a conﬁdence set for the ro ot no de, or the size of the h ub around v 1 that will ensure its p ersisten t centralit y . The main h urdle in establishing suc h b ounds is, again, the lac k of concrete information ab out the limiting random v ariable W in a CMJ pro cess. Although obtaining the exact distribution of W seems too optimistic, it ma y b e possible to obtain bounds on moments or tail probabilities, whic h could b e used to obtain b ounds on h ub sizes or conﬁdence sets. Our results in this paper also strengthen the b elief that the age of a node and its cen trality are strongly related in gro wing random trees, implying that it is extremely diﬃcult for a v ertex to hide its age. F anti et al. [ 16 ] explored the problem of ho w to create a diﬀusion pro cess ov er a regular tree in order to obfuscate the oldest no de, and it would b e v ery in teresting to see if classes of attraction functions exist that cause the tree to grow in such a w ay that the b est conﬁdence set for the ro ot no de do es not remain ﬁnite as the tree grows. Ac kno wledgmen t The authors would lik e to thank the AE and three anonymous review ers for their helpful and p ositiv e feedback while preparing the revision. 12 References [1] R. Alb ert and A.-L. Barab´ asi. Statistical mec hanics of complex netw orks. R ev. Mo d. Phys. , 74:47–97, Jan 2002. [2] K. B. A threya and P . Ney . Br anching Pr o c esses . Dov er Books on Mathematics. Do ver Publi- cations, 2004. [3] A.-L. Barab´ asi. Network Scienc e . Cam bridge Universit y Press, 2016. [4] A.-L. Barab´ asi and R. Alb ert. Emergence of scaling in random net works. Scienc e , 286(5439):509–512, 1999. [5] A. Barrat, M. Barthlem y , and A. V espignani. Dynamic al Pr o c esses on Complex Networks . Cam bridge Univ ersit y Press, New Y ork, NY, USA, 2008. [6] S. Bhamidi. Universal tec hniques to analyze preferential attac hment trees: Global and lo cal analysis. In pr ep ar ation , August 2007. [7] J. D. Biggins and D. R. Grey . Con tinuit y of limit random v ariables in the branc hing random w alk. Journal of Applie d Pr ob ability , pages 740–749, 1979. [8] S. Bub ec k, L. Devroy e, and G. Lugosi. Finding Adam in random growing trees. arXiv pr eprint arXiv:1411.3317 , 2014. [9] K. S. Crump and C. J. Mo de. A general age-dep enden t branc hing pro cess. I. Journal of Mathematic al Analysis and Applic ations , 24(3):494–508, 1968. [10] K. S. Crump and C. J. Mo de. A general age-dep enden t branc hing pro cess. I I. Journal of Mathematic al Analysis and Applic ations , 25(1):8–17, 1969. [11] S. Dereich and P . M¨ orters. Random netw orks with sublinear preferen tial attachmen t: Degree ev olutions. Ele ctr on. J. Pr ob ab. , 14(43):1222–1267, 2009. [12] R. A. Doney . A limit theorem for a class of sup ercritical branching pro cesses. Journal of Applie d Pr ob ability , pages 707–724, 1972. [13] W. Dong, W. Zhang, and C.-W. T an. Ro oting out the rumor culprit from susp ects. In Information The ory Pr o c e e dings (ISIT), 2013 IEEE International Symp osium on , pages 2671– 2675. IEEE, 2013. [14] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks: F r om Biolo gic al Nets to the Internet and WWW . Oxford Univ ersit y Press, Inc., New Y ork, NY, USA, 2003. [15] M. Ec khoﬀ and P . M¨ orters. V ulnerability of robust preferential attac hment net works. Ele ctr on. J. Pr ob ab. , 19(57):1–47, 2014. [16] G. F an ti, P . Kairouz, S. Oh, K. Ramchandran, and P . Viswanath. Hiding the Rumor Source. A rXiv e-prints , Septem b er 2015. [17] P . Galashin. Existence of a p ersisten t hub in the conv ex preferential attac hment mo del. arXiv pr eprint arXiv:1310.7513 , 2013. 13 [18] T. E. Harris. The The ory of Br anching Pr o c esses . Grundlehren der mathematisc hen Wis- sensc haften. Springer Berlin Heidelb erg, 2012. [19] M. O. Jac kson. So cial and Ec onomic Networks , volume 3. Princeton Universit y Press, 2008. [20] P . Jagers. Br anching pr o c esses with biolo gic al applic ations . Wiley , 1975. [21] P . Jagers and O. Nerman. The gro wth and comp osition of branching populations. A dvanc es in Applie d Pr ob ability , pages 221–259, 1984. [22] V. Jog and P . Loh. P ersistence of centralit y in random gro wing trees. arXiv pr eprint arXiv:1511.01975 , 2015. [23] C. Jordan. Sur les assem blages de lignes. J. R eine Angew. Math , 70(185):81, 1869. [24] N. Karamc handani and M. F ranceschetti. Rumor source detection under probabilistic sam- pling. In Information The ory Pr o c e e dings (ISIT), 2013 IEEE International Symp osium on , pages 2184–2188. IEEE, 2013. [25] J. Khim and P . Loh. Conﬁdence sets for the source of a diﬀusion in regular trees. arXiv pr eprint arXiv:1510.05461 , 2015. [26] P . L. Krapivsky, S. Redner, and F. Leyvraz. Connectivit y of growing random netw orks. Physic al R eview L etters , 85:4629–4632, Nov ember 2000. [27] J. Kunegis, M. Blattner, and C. Moser. Preferen tial attachmen t in online netw orks: Mea- suremen t and explanations. In Pr o c e e dings of the 5th Annual A CM Web Scienc e Confer enc e , W ebSci ’13, pages 205–214, New Y ork, NY, USA, 2013. ACM. [28] W. Luo, W.-P . T a y , and M. Leng. Identifying infection sources and regions in large netw orks. IEEE T r ans. Signal Pr o c essing , 61(11):2850–2865, 2013. [29] S. L. Mitchell. Another c haracterization of the cen troid of a tree. Discr ete Mathematics , 24(3):277–280, 1978. [30] O. Nerman. On the con vergence of sup ercritical general (CMJ) branc hing processes. Pr ob ability The ory and R elate d Fields , 57(3):365–395, 1981. [31] M. Newman. Networks: A n Intr o duction . Oxford Universit y Press, Inc., New Y ork, NY, USA, 2010. [32] A. Rudas, B. T´ oth, and B. V alk´ o. Random trees and general branching pro cesses. R andom Structur es & Algorithms , 31(2):186–202, 2007. [33] D. Shah and T. Zaman. Rumors in a netw ork: Who’s the culprit? IEEE T r ansactions on Information The ory , 57(8):5163–5181, 2011. [34] D. Shah and T. Zaman. Finding rumor sources on random trees. arXiv pr eprint arXiv:1110.6230 , 2015. [35] P . J. Slater. Maximin facilit y location. Journal of National Bur e au of Standar ds B , 79:107–115, 1975. [36] P . J. Slater. Accretion centers: A generalization of branch w eight cen troids. Discr ete Applie d Mathematics , 3(3):187–192, 1981. 14 [37] C. W. T an, P .-D. Y u, C.-K. Lai, W. Zhang, and H.-L. F u. Optimal detection of inﬂuen tial spreaders in online so cial net works. In 2016 Annual Confer enc e on Information Scienc e and Systems (CISS) , pages 145–150. IEEE, 2016. [38] B. Zelink a. Medians and p eripherians of trees. A r chivum Mathematicum , 4(2):87–95, 1968. A Results on CMJ pro cesses In this App endix, we review prop erties of CMJ pro cesses and v erify that the CMJ pro cess corre- sp onding to a sublinear preferen tial attachmen t tree enjo ys certain con vergence prop erties. A.1 Preliminary results W e b egin by stating several results that will b e crucial for our purp oses. F or a more detailed discussion of suc h results, see the survey pap er by Jager and Nerman [ 21 ]. Lemma 6 (Corollary 4.2 and Theorem 4.3 from Jager and Nerman [ 21 ]) . L et ξ b e a p oint pr o c ess on R + with Malthusian p ar ameter θ > 0 . Consider a CMJ pr o c ess driven by ξ in which individuals live for ever. L et the p opulation of the CMJ pr o c ess at time t ≥ 0 b e denote d by Z t . Deﬁne ˆ ξ ( θ ) = Z ∞ 0 e − θt dξ ( t ) . If the c ondition V ar ( ˆ ξ ( θ )) < ∞ ( ? ) is satisﬁe d, then we have the c onver genc e r esult e − θt Z t L 2 − → W , wher e W is a r andom variable satisfying W > 0 , almost sur ely. Lemma 7 (Theorem 5.4 from Nerman [ 30 ]) . L et ξ , θ , and Z t b e as in L emma 6 . If the me an intensity me asur e µ satisﬁes Z ∞ 0 e − ˜ θt dµ ( t ) < ∞ , for some ˜ θ < θ, ( ?? ) then e − θt Z t a.s. − → W , wher e W is as in L emma 6 . 2 Although not muc h is kno wn ab out the exact distribution of W in the case of a general CMJ pro cess, the following useful prop erties hav e b een established: Lemma 8 (Theorem 1 from Biggins and Grey [ 7 ]) . L et W b e the limit r andom variable app e aring in L emmas 6 and 7 . The the fol lowing pr op erties hold: (i) The distribution of W has no atoms. 2 In Nerman [ 30 ], the condition ( ?? ) app ears in a more general form denoted Condition 5.1. As explained in the remark follo wing Condition 5.1, the condition ( ?? ) is stronger and implies Condition 5.1. 15 (ii) The distribution of W is either singular c ontinuous or absolutely c ontinuous. (iii) The supp ort of W is al l of R + ; i.e., the set of p ositive p oints of incr e ase of the distribution function of W is al l of R + . Remark. Note that in al l the ab ove r esults, we have assume d that the br anching pr o c ess b e gins with a single individual. Supp ose, however, that the pr o c ess starts fr om some initial state c onsisting of a ﬁnite c ol le ction of no des { v 1 , . . . , v k } satisfying p ar ent-child r elationships ac c or ding to a dir e cte d tr e e T r o ote d at v 1 . In this c ase, we c an c ondition the CMJ pr o c ess b e ginning with a single no de on the event of observing the tr e e T at some p oint, and c onclude that the Malthusian normalize d p opulation c onver ges to a r andom variable ˜ W almost sur ely and in L 2 . A lthough we do not pr ovide a pr o of her e, the limit r andom variable ˜ W also satisﬁes al l the pr op erties in L emma 8 . A.2 Sublinear preferen tial attac hmen t W e now sp ecialize our discussion to sublinear preferential attachmen t pro cesses. Lemma 9. The Malthusian p ar ameter θ for a subline ar pr efer ential pr o c ess always exists and satisﬁes 1 < θ < 2 . Pr o of. A stronger v ersion of this lemma may b e found in Lemma 44 of Bhamidi [ 6 ]. Let ξ b e the p oin t pro cess asso ciated with a sublinear preferential attachmen t function f with mean intensit y µ ( t ). Let µ U A ( t ) and µ P A ( t ) b e the mean intensities of the standard Y ule process and the Poisson pro cess with rate 1, resp ectiv ely . Clearly , the mean in tensity functions satisfy µ U A ( t ) < µ ( t ) < µ P A ( t ) . Let X θ b e an exp onen tial random v ariable with rate θ , indep enden t of ξ . Note that the integral θ Z ∞ 0 e − θt µ ( t ) dt = E [ ξ ( X θ )] is monotonically decreasing in θ . A t θ = 1, using the fact that µ U A ( t ) < µ ( t ), w e ha ve 1 < E [ ξ ( X 1 )]. Similarly , at θ = 2, w e may use the fact that µ ( t ) < µ P A ( t ) to obtain E [ ξ ( X 2 )] < 1. By monotonicity , the v alue of E [ ξ ( X θ )] must therefore equal 1 at some 1 < θ < 2. Lemma 10. The p oint pr o c ess ξ c orr esp onding to a subline ar attr action function f satisﬁes c on- ditions ( ? ) and ( ?? ) . Pr o of. W e ﬁrst show that condition ( ? ) is satisﬁed by following an approac h used in Bhamidi [ 6 ]. F or 0 < α < 1, let 1 ≤ f ( i ) ≤ ( i + 1) α b e a sublinear attraction function and let ξ be the asso ciated p oin t pro cess with Malthusian parameter θ , existing by Lemma 9 . Let X θ b e an exp onen tial random v ariable with rate θ , indep enden t of ξ . Deﬁning the random function ˆ ξ ( θ ) := Z ∞ 0 e − θt dξ ( t ) , w e ha v e b y F ubini’s theorem that ˆ ξ ( θ ) = θ Z ∞ 0 e − θt ξ (0 , t ] dt = E [ ξ (0 , X θ ] | ξ ] . 16 Then V ar( ˆ ξ ( θ )) ≤ E h ˆ ξ ( θ ) 2 i = E h ( E [ ξ (0 , X θ ] | ξ ]) 2 i ( a ) ≤ E  E  ξ (0 , X θ ] 2 | ξ  = E  ξ (0 , X θ ] 2  , where inequality ( a ) follo ws from Jensen’s inequality . Th us, it is enough to deriv e the b ound E  ξ (0 , X θ ] 2  < ∞ . Let ξ α b e the the p oin t pro cess corresponding to the the attraction function f α ( i ) = (1 + i ) α . Note that since E  ξ (0 , X θ ) 2  ≤ E  ξ α (0 , X θ ] 2  , it is enough to show that E  ξ α (0 , X θ ] 2  < ∞ . Note that it is p ossible to ﬁnd the exact distribution of the random v ariable ξ α (0 , X θ ], as fol- lo ws: T he time of the k th arriv al in the p oin t pro cess ξ α ma y b e written as P k i =1 Y i , where Y i ∼ E xp ( f α ( i − 1)) and the Y i ’s are independent. Hence, P ( ξ α (0 , X θ ] ≥ k ) = P X θ ≥ k X i =1 Y i ! = E h e − θ P k i =1 Y i i = k − 1 Y i =0 f α ( i ) θ + f α ( i ) = k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α . The probability mass function of ξ α (0 , X θ ] is th us given by P ( ξ α (0 , X θ ] = k ) = k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α − k Y i =0 (1 + i ) α θ + (1 + i ) α = θ θ + (1 + k ) α k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α ∼ 1 k α exp  − θ k 1 − α 1 − α  . It is no w easy to c heck that E  ξ α (0 , X θ ] 2  < ∞ , and th us, E  ξ (0 , X θ ] 2  < ∞ . Finally , we sho w that condition ( ?? ) holds. Let µ ( t ) b e the in tensity measure asso ciated with the sublinear preferen tial attac hment pro cess. Let ˜ θ b e such that 1 < ˜ θ < θ . Note that such a parameter ˜ θ exists by Lemma 9 . As in Lemma 9 , let µ P A b e the mean intensit y measure asso ciated with the linear preferen tial attac hmen t process. Then Z ∞ 0 e − ˜ θt dµ ( t ) < Z ∞ 0 e − ˜ θt dµ P A ( t ) ( a ) = Z ∞ 0 e (1 − ˜ θ ) t dt < ∞ , 17 where equality ( a ) holds b ecause µ P A ( t ) = e t − 1. A.3 Pro of of Theorem 1 Ha ving v eriﬁed the conditions ( ? ) and ( ?? ) via Lemma 10 , w e obtain the desired L 2 and almost sure conv ergence b y applying Lemmas 6 and 7 , resp ectively . The absolute or singular contin uity of the limit random v ariable follows from Lemma 8 . B Useful results on trees In this App endix, we collect three k ey lemmas concerning trees and tree centroids that w e use in our pro ofs. Lemma 11 (Lemma 2.1 from Jog and Loh [ 22 ]) . F or a tr e e T on n vertic es, the fol lowing statements hold: (i) If v ∗ is a c entr oid, then ψ T ( v ∗ ) ≤ n 2 . (ii) T c an have at most two c entr oids. (iii) If u ∗ and v ∗ ar e two c entr oids, then u ∗ and v ∗ ar e adjac ent vertic es. F urthermor e, ψ T ( u ∗ ) = | ( T , u ∗ ) v ∗ ↓ | , and ψ T ( v ∗ ) = | ( T , v ∗ ) u ∗ ↓ | . Lemma 12 (Lemma 2.3 from Jog and Loh [ 22 ]) . L et { T n } n ≥ 1 b e a se quenc e of gr owing tr e es, with V ( T n ) = { v 1 , . . . , v n } . A t time n + 1 , we have the ine quality | ( T n +1 , v n +1 ) v ∗ ( n ) ↓ | ≥ n 2 . Lemma 13. Consider a tr e e T and pick any two vertic es u, v ∈ V ( T ) . Then we have the fol lowing r esult: ψ T ( u ) ≤ ψ T ( v ) ⇐ ⇒ | ( T , v ) u ↓ | ≥ | ( T , u ) v ↓ | . Pr o of. Let u 0 and v 0 b e the neigh b oring v ertices to u and v , resp ectiv ely , in the path from u to v . T o simplify notation, denote | ( T , v ) u ↓ | = a and | ( T , u ) v ↓ | = b . First supp ose a ≥ b . Let c = | T | − a − b b e num b er of vertices not in either of the tw o subtrees. (See Figure 4 .) W e hav e the follo wing inequality: ψ T ( v ) ≥ | ( T , v ) v 0 ↓ | = a + c. W e also ha ve the inequalit y ψ T ( u ) ≤ max  | ( T , v ) | u ↓ − 1 , | ( T , u ) u 0 ↓ |  = max ( a − 1 , b + c ) ( a ) ≤ a + c, where ( a ) follo ws from our assumption a ≥ b . Com bining the t wo inequalities, we then ha ve ψ T ( u ) ≤ a + c ≤ ψ T ( v ) , 18 ·· · u u 0 v v 0 a b c Figure 4: Subtrees with sizes a , b , and c from Lemma 13 . whic h is one direction of the implication. If instead a < b , the same steps establish the string of inequalities ψ T ( v ) ≤ max( b − 1 , a + c ) < b + c ≤ ψ T ( v ) , pro viding the other direction of the implication. C Supp orting pro ofs for Theorem 4 In this Appendix, w e provide pro ofs of the lemmas used to derive Theorem 4 . C.1 Pro of of Lemma 4 First note that w e clearly ha ve ψ ( v 1 ) ≤ max  | T n 1 , 2 | , | T n 2 , 2 |  and n = | T n 1 , 2 | + | T n 2 , 2 | . Thus, P ( ψ ( v 1 ) ≥ (1 − δ ) n ) ≤ P max  | T n 1 , 2 | , | T n 2 , 2 |  | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! ≤ P | T n 1 , 2 | | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! + P | T n 2 , 2 | | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! . (4) Consider the con tinuous time v ersions of the gro wing tree pro cesses, and let θ b e the Malth usian parameter of the p oin t pro cess asso ciated with T t 1 , 2 . Then | T t 1 , 2 | | T t 1 , 2 | + | T t 2 , 2 | = e − θt | T t 1 , 2 | e − θt | T t 1 , 2 | + e − θt | T t 2 , 2 | a.s. − → W 1 W 1 + W 2 := W , where b y Lemma 8 , the random v ariable W is absolutely or singular con tinuous and is supp orted on the entire in terv al [0 , 1]. In particular, w e may choose δ 0 0 > 0 suc h that P ( W ≥ 1 − δ 0 ) <  4 . This implies that lim sup t →∞ P | T t 1 , 2 | | T t 1 , 2 | + | T t 2 , 2 | ≥ (1 − δ 0 0 ) ! <  4 . 19 Using a similar argumen t for the second term, w e conclude that there exists a δ 00 0 suc h that lim sup t →∞ P | T t 2 , 2 | | T t 1 , 2 | + | T t 2 , 2 | ≥ (1 − δ 00 0 ) ! <  4 . T aking δ 0 = min( δ 0 0 , δ 00 0 ) and substituting bac k in to inequality ( 4 ), we obtain the desired b ound. C.2 Pro of of Lemma 5 As noted in the pro of sketc h, for any i > K , we hav e ψ ( v i ) ≥ min 1 ≤ k ≤ K K X j =1 ,j 6 = k | T n j,K | . Hence, P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) ≤ P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n   . (5) W e can break up the righ t-hand expression as follows: F rom Theorem 22 in Bhamidi [ 6 ], the max- im um degree of a sublinear preferential attachmen t mo del with attraction function f ( i ) = ( i + 1) α scales as (log n ) 1 1 − α . Concretely , there exists a constan t M suc h that lim sup n →∞ P  Max-Deg( T n ) > (log n ) 1 1 − α M  <  4 . Therefore, we may choose N large enough suc h that P  Max-Deg( T n ) > (log n ) 1 1 − α M  <  4 , for all n ≥ N . (6) Note that M depends only on  and the distribution of the normalized maximum degree that exists in the limit of the the α -sublinear attac hment tree growth pro cess. Thus, ﬁxing  ﬁxes M , as well. Ha ving c hosen M , note that N dep ends on how fast the normalized distribution of the maxim um degree con verges to the ﬁxed distribution, and on  . Since the former is solely a prop ert y of the sublinear attac hmen t pro cess, w e observ e that N also depends only on  . W e no w pic k a v alue K > N , and deﬁne the even t E K := n Max-Deg( T K ) ≤ (log K ) 1 1 − α M o . The right-hand side of inequality ( 5 ) may b e b ounded b y P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   + P ( E K ) ( a ) ≤ P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   +  4 ( b ) ≤ K X k =1 P   K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   +  4 . 20 Here, ( a ) follows from equation ( 6 ) and the choice of K > N . Step ( b ) is a simple application of the union bound. No w ﬁx k = 1, and consider the probabilit y P   K X j =2 | T n j,K | ≤ (1 − δ 0 ) n    E K   ( a ) = P   K X j =2 | T n j,K | ≤  1 − δ 0 δ 0  | T n 1 ,K |    E K   , where step ( a ) follo ws since P K j =1 | T n j,K | is simply the total n umber of v ertices, which is n . Since the degree of v 1 is at most (log K ) 1 1 − α M conditioned on E K , we may b ound the ab o ve probabilit y via sto c hastic domination, as follo ws: A t time n = K , replace v 1 b y d (log K ) 1 1 − α M e isolated vertices, and replace v j b y a single isolated v ertex, for each 2 ≤ j ≤ K . The crucial step is to observe that b y Lemma 15 , this replacement expedites the gro wth of | T t 1 ,K | and retards the growth of P K j =2 | T t j,K | . Applying Lemma 14 to the i.i.d. limit random v ariables W i and ˜ W i corresp onding to the renormalized populations of the contin uous time CMJ pro cesses, we then hav e lim sup t →∞ P   e − θt K X j =2 | T t j,K | ≤  1 − δ 0 δ 0  e − θt | T t 1 ,K |    E K   ≤ P    K − 1 X i =1 W i ≤  1 − δ 0 δ 0  d (log K ) 1 1 − α M e X i =1 ˜ W i    ≤ P K − 1 X i =1 W i ≤ ˜ U ! , where ˜ U is the random v ariable  1 − δ 0 δ 0  P d (log K ) 1 1 − α M e i =1 ˜ W i . In anticipation of using Lemma 14 , we b ound E [ ˜ U 2 ] as follo ws: E h ˜ U 2 i =  1 − δ 0 δ 0  2 E        d (log K ) 1 1 − α M e X i =1 ˜ W i    2     ( a ) ≤  1 − δ 0 δ 0  2 d (log K ) 1 1 − α M e 2 E ˜ W 2 1 , where step ( a ) is true b ecause for all 1 ≤ i, j ≤ d (log K ) 1 1 − α M e , we ha ve E [ ˜ W i ˜ W j ] = E [ ˜ W i ] 2 ≤ E [ ˜ W 2 i ] . No w w e apply Lemma 14 to conclude that P K − 1 X i =1 W i ≤ ˜ U ! ≤ C ×  1 − δ 0 δ 0  2 d (log K ) 1 1 − α M e 2 E ˜ W 2 1 ( K − 1) 2 = C 1 (log K ) 2 1 − α ( K − 1) 2 , 21 where the constant C 1 =  1 − δ 0 δ 0  2 × C × M 2 × E ˜ W 2 1 dep ends only on  , since by Lemma 14 , the constan t C dep ends only on the distribution of ˜ W i , which in turn dep ends only on the sublinear preferen tial attachmen t growth pro cess and is therefore ﬁxed. Arguing similarly , E ˜ W 2 1 is again a ﬁxed constant. Also, as noted earlier, δ 0 and M dep end only on  . Since such an inequality holds for all v alues 1 ≤ k ≤ K , substituting back in to inequalit y ( 5 ) and applying a union b ound yields lim sup n →∞ P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) ≤ K C 1 (log K ) 2 1 − α ( K − 1) 2 +  4 . W e now choose K > N suﬃciently large so that C 1 K (log K ) 2 1 − α ( K − 1) 2 <  4 , establishing the desired inequalit y . D Additional tec hnical lemmas In this Appendix, w e state and pro ve a useful Ho eﬀding b ound for sums of indep endent, nonnegative random v ariables. Lemma 14. L et X 1 , X 2 , . . . , X n b e i.i.d. r andom variables distribute d ac c or ding to Z , such that Z ≥ 0 almost sur ely and E [ Z 2 ] < ∞ . L et Y b e a r andom variable indep endent of X i ’s satisfying Y > 0 almost sur ely and E [ Y 2 ] < ∞ . Then P n X i =1 X i ≤ Y ! ≤ C E [ Y 2 ] n 2 , for some c onstant C dep ending only on the distribution of Z . Pr o of. Deﬁne W i = min { X i , M } , where the constant M is c hosen such that E [ W i ] ≥ E [ X i ] 2 . Since W i ≤ X i , we hav e P 1 n n X i =1 X i − E [ X i ] ≤ t ! ≤ P 1 n n X i =1 W i − E [ X i ] ≤ t ! ≤ P 1 n n X i =1 W i − 2 E [ W i ] ≤ t ! = P 1 n n X i =1 W i − E [ W i ] ≤ t + E [ W i ] ! ≤ C 1 exp( − nt 2 C 2 ) , (7) for suitable constants C 1 and C 2 , where the last inequality follows from Ho eﬀding’s inequality . Let E 1 := ( 1 n n X i =1 X i − E [ X i ] ≤ − E [ X i ] 2 ) . Then P n X i =1 X i ≤ Y ! ≤ P ( E 1 ) + P  Y ≥ n E [ X i ] 2  . 22 Note that b y Marko v’s inequalit y , w e hav e the bound P  Y ≥ n E [ X i ] 2  = P  Y 2 ≥ n 2 E [ X i ] 2 4  ≤ 4 E [ Y 2 ] n 2 E [ X i ] 2 = C 3 E [ Y 2 ] n 2 , for a suitable constant C 3 . Since P ( E 1 ) decays exp onen tially in n by inequality ( 7 ), we ma y ﬁnd another constant C 4 suc h that P n X i =1 X i ≤ Y ! ≤ C 4 E [ Y 2 ] n 2 , as claimed. Lemma 15. Consider a CMJ pr o c ess initiate d fr om the fol lowing state: The r o ot no de gives birth ac c or ding to a shifte d p oint pr o c ess ξ d such that P ( ξ d ( t + dt ) − ξ d ( t ) = 1 | ξ d ( t ) = i ) = f ( i + d ) dt + o ( dt ) , wher e f is the attr action function for an α -subline ar pr efer ential attachment pr o c ess. Ap art fr om the r o ot no de, every other no de gives birth ac c or ding the p oint pr o c ess ξ driven by the function f ( i ) = ( i +1) α . L et the p opulation of this CMJ pr o c ess b e denote d by H ( t ) . L et X i ( t ) for 1 ≤ i ≤ d + 1 b e i.i.d. CMJ pr o c esses initiate d fr om a single p oint. We claim that P d +1 i =1 X i ( t ) sto chastic al ly dominates H ( t ) . Pr o of. Consider the ro ot no de v and is CMJ pro cess H ( t ). W e compare its gro wth with the sum of d + 1 i.i.d. CMJ pro cesses starting from the isolated v ertices { u 1 , . . . , u d +1 } . Let C v ( t ) denote the num b er of c hildren of v at time t , and let C i ( t ) denote the n um b er of children of u i at time t . Let C u ( t ) = P d +1 i =1 C i ( t ). Note that C v ( t ) is simply a Marko v pro cess, given b y P ( C v ( t + dt ) − C v ( t ) | C v ( t ) = k ) = ( d + k + 1) α dt + o ( dt ) . Unlik e C v , the pro cess C u is not Marko v. Ho wev er, for any ( r 1 , . . . , r d +1 ) suc h that P d +1 i =1 r i = k , w e ma y write P C u ( t + dt ) − C u ( t ) = 1     C i ( t ) = r i , for 1 ≤ i ≤ d + 1 ! = d +1 X i =1 ( r i + 1) α . Since α < 1, w e see that no matter what the r i ’s are, w e must hav e ( d + k + 1) α ≤ d +1 X i =1 ( r i + 1) α . Th us, the pro cess C 2 ( t ) stochastically dominates C 1 ( t ). Since the children in eac h pro cess b eha ve iden tically; i.e., they repro duce according to ξ , and so do their descendants, we can couple the pro cesses in a straightforw ard wa y to conclude that the sum of d + 1 indep enden t CMJ pro cesses sto c hastically dominates H ( t ). 23

Analysis of centrality in sublinear preferential attachment trees via the CMJ branching process

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment