Analysis of centrality in sublinear preferential attachment trees via the CMJ branching process

We investigate centrality and root-inference properties in a class of growing random graphs known as sublinear preferential attachment trees. We show that a continuous time branching processes called the Crump-Mode-Jagers (CMJ) branching process is w…

Authors: Varun Jog, Po-Ling Loh

Analysis of centrality in sublinear preferential attachment trees via   the CMJ branching process
Analysis of cen tralit y in sublinear preferen tial attac hmen t trees via the Crump-Mo de-Jagers branc hing pro cess V arun Jog P o-Ling Loh vjog@ece.wisc.edu loh@ece.wisc.edu Departmen t of ECE Grainger Institute for Engineering Univ ersity of Wisconsin - Madison Madison, WI 53715 Octob er 2016 Abstract W e in vestigate cen trality and ro ot-inference prop erties in a class of growing random graphs kno wn as sublinear preferen tial attac hment trees. W e show that a con tin uous time branc hing pro cesses called the Crump-Mo de-Jagers (CMJ) branching pro cess is well-suited to analyze suc h random trees, and prov e that almost surely , a unique terminal tree centroid emerges, ha ving the prop ert y that it becomes more central than an y other fixed vertex in the limit of the random gro wth process. Our result generalizes and extends previous work establishing persistent cen trality in uniform and linear preferential attac hmen t trees. W e also sho w that centralit y may b e utilized to generate a finite-sized 1 −  confidence set for the root node, for any  > 0, in a certain sub class of sublinear preferential attac hment trees. 1 In tro duction Recen t y ears ha ve seen an explosion of datasets p ossessing some form of underlying netw ork struc- ture [ 14 , 5 , 19 , 31 ]. V arious mathematical mo dels hav e consequently b een deriv ed to imitate the b eha vior of real-world net works; desirable characteristics include degree distributions, connectivity , and clustering, to name a few. One p opular probabilistic mo del is the Barab´ asi-Alb ert mo del, also kno wn as the (linear) preferential attac hment mo del [ 4 ]. No des are added to the netw ork one at a time, and eac h new no de connects to a fixed num b er of existing nodes with probability prop or- tional to the degrees of the no des. In addition to modeling a “rich get richer” phenomenon, the Barab´ asi-Albert mo del giv es rise to a scale-free graph, in whic h the degree distribution in the graph deca ys as an inv erse p olynomial p ow er of the degree, and the maximum degree scales as the square ro ot of the size of the net w ork. Suc h a property is readily observ ed in man y netw ork data sets [ 1 ]. Ho wev er, netw orks also exist in whic h the disparity b et w een high- and lo w-degree nodes is not as severe. In the sublinear preferen tial attachmen t mo del, no des are added sequentially with probabilit y of attac hmen t proportional to a fractional pow er of the degree. This leads to a stretched exp onen tial degree distribution and a maxim um degree that scales as a p o w er of the logarithm of the n um b er of no des [ 26 , 3 ]. Net w orks exhibiting suc h behavior include certain citation netw orks, Wikip edia edit net works, rating n etw orks, and the Digg netw ork [ 27 ]. The case when the probability of attac hment is uniform o ver existing v ertices is kno wn as uniform attac hment and is used to mo del net works in which the preference giv en to older no des is attributed only to birth order and not degree. The iterativ e nature of the preferen tial attachmen t mo del generates interesting questions con- cerning phenomena that arise (and p otentially v anish) as the netw ork expands. Dereic h and 1 M¨ orters [ 11 ] established the emergence of a p ersisten t h ub—a v ertex that remains the highest- degree no de in the netw ork after a finite amount of time—in a certain preferential attac hment mo del where edges are added indep enden tly . Suc h a result was also sho wn to hold for the Barab´ asi- Alb ert preferential attac hment mo del in Galashin [ 17 ]. Motiv ated by the fact that p ersistent hubs do not exist in uniform attachmen t mo dels, how ev er, our previous w ork [ 22 ] studied the problem of p ersisten t cen troids and established that the K most cen tral nodes according to a notion of “balancedness centralit y” alwa ys p ersist in preferential and uniform attac hmen t trees. Another related problem concerns identifying the oldest no de(s) in a net w ork. Shah and Za- man [ 33 ] first studied this problem in the con text of a random gro wing tree formed by a diffusion spreading ov er a regular tree, and show ed that the cen troid of the diffusion tree agrees with the ro ot no de of the diffusion, with strictly p ositiv e probability . Bub ec k et al. [ 8 ] devised confidence set estimators for the first node in preferential and uniform attachmen t trees, in which the goal is to identify a set of no des con taining the oldest no de, with probabilit y at least 1 −  . They show ed that when nodes are selected according to an appropriate measure of “balancedness centralit y ,” the required size of the confidence set is a function of  that do es not grow with the o verall size of the netw ork. These results w ere later extended to diffusions spreading ov er regular trees by Khim and Loh [ 25 ]. Graph centralit y ideas, in particular balancedness centralit y , ha v e also b een lev er- aged in T an et al. [ 37 ] to iden tify the most influential vertices in a so cial netw ork. Luo et al. [ 28 ] studied the problem of iden tifying single or multiple sources of rumors in a graph and proposed certain efficiently computable estimators related to the MAP estimator emplo y ed in Shah and Za- man [ 33 ]. Recen tly , rumor iden tification has also b een analyzed in certain probabilistic mo dels, suc h as rep eated observ ations of rumor spreading in Dong et al. [ 13 ], and incomplete information ab out rumor spreading in Karamc handani et al. [ 24 ]. In addition to having ob vious practical im- plications for pinp oin ting the origin of a netw ork based on observing a large graph, identifying and remo ving the oldest no des may hav e desirable deleterious effects from the p oin t of view of netw ork robustness [ 15 ]. Previous analysis of determining a finite confidence set [ 8 , 25 ], as well as establishing the p ersis- tence of a unique tree centroid [ 22 ], crucially dep ended on the following prop ert y satisfied by linear preferen tial attachmen t, uniform attachmen t, and diffusions ov er regular trees: the “attraction function” relating the degree of a v ertex to its probability of connection at each time step is linear. Bub ec k et al. [ 8 ] p osed an open question concerning the existence of finite-sized confidence sets in the case of sublinear or superlinear preferen tial attachmen t; w e likewise conjectured in previous w ork that a unique centroid should persist for a more general class of nonlinear attraction func- tions [ 22 ]. How ev er, the techniques in these pap ers do not extend readily to nonlinear settings. An approac h to dealing with more complicated tree mo dels in the con text of diffusions w as presen ted in Shah and Zaman [ 34 ], using a con tin uous time branc hing process kno wn as the Bellman-Harris branc hing process. In this pap er, w e show that preferen tial attac hmen t trees with nonlinear at- traction functions may also b e analyzed via con tinuous time branc hing pro cesses. Our results rely on properties of the Crump-Mo de-Jagers (CMJ) branc hing process [ 9 , 10 , 20 ]. Con tinuous time branc hing pro cesses were previously leveraged by Bhamidi [ 6 ] and Rudas et al. [ 32 ] to establish prop erties regarding the degree distribution, maxim um degree, heigh t, and lo cal structure of a large class of preferen tial attachmen t trees. Our main contributions are tw ofold: First, we establish the prop ert y of terminal c entr ality for sublinear preferen tial attac hmen t trees, thereby addressing our conjecture in [ 22 ]. W e pro ve the existence of a unique v ertex that b ecomes more central than an y other vertex, in the limit of the gro wth process. In fact, the existence of a p ersisten t cen troid implies terminal cen tralit y , but the latter implication might not hold, since p ersisten t cen trality requires a tree cen troid to emerge and remain the centroid starting from a single finite time p oin t. Second, w e affirmatively 2 answ er the op en question of Bub eck et al. [ 8 ] by devising finite-sized confidence sets for the ro ot no de in sublinear preferen tial attachmen t trees. Due to the inapplicability of P´ olya urn theory in the present setting, the pro of tec hniques emplo yed in our pap er differ significantly from the analysis used in previous work. F urthermore, the literature concerning CMJ branching pro cesses is v ast and unconsolidated, and another imp ortan t technical contribution of our pap er is to gather relev an t results and show that they ma y be applied to study sublinear preferen tial attac hmen t trees. The remainder of the pap er is organized as follo ws: In Section 2 , we review CMJ branching pro cesses and sho w how to em b ed a preferential attachmen t tree in a CMJ pro cess. W e also verify that the CMJ pro cesses corresp onding to certain sublinear preferential attachmen t trees enjo y use- ful conv ergence prop erties. In Section 3 , w e establish the existence of a unique terminal centroid in sublinear preferential attachmen t trees. In Section 4 , w e prov e that the confidence set construction via the same centralit y measure leads to finite-sized confidence sets for the ro ot no de. Although we b eliev e sublinear preferen tial attachmen t trees should also p ossess a p ersisten t centroid, some chal- lenges arise in bridging the gap b et w een terminal centralit y and p ersisten t centralit y . W e discuss these challenges and related op en problems in Section 5 . Additional pro of details are con tained in the supplementary app endices. Notation: W e write V ( T ) to denote the set of vertices of a tree T , and write Max-Deg( T ) to denote the maximum degree of the vertices in T . F or u ∈ V ( T ), we write ( T , u ) to denote the corresp onding ro oted tree, which is a tree with directed edges emanating from u . W e write ( T , u ) v ↓ to denote the subtree directed a w ay from u and starting from v . Finally , we write Out-Deg( v ) to denote the n um b er of c hildren of v ertex v in the ro oted tree. 2 Preliminaries In this section, we review prop erties of the CMJ branching pro cess, la ying the groundwork for our analysis of sublinear preferen tial attachmen t trees. The CMJ branching pro cess is a general age- dep enden t con tin uous time branching pro cess mo del in tro duced b y Crump, Mode, and Jagers [ 9 , 10 , 20 ]. It b egins with a single individual, kno wn as the ancestor, at time t = 0. An individual x may giv e birth m ultiple times throughout its lifetime, and the times at whic h it pro duces offspring are giv en by a p oin t pro cess ξ on R + . The defining prop erty of branching pro cesses is that individuals b eha v e in an i.i.d. manner; i.e., ev ery individual starts its own indep endent p oin t pro cess of births from the moment it is b orn until the time it dies. The resulting branching pro cess is said to b e driven by ξ . Many common branc hing processes are special cases of a CMJ pro cess with an appropriate p oin t pro cess and lifetime random v ariable: If individuals hav e random lifetimes and giv e birth to a random num b er of c hildren at the moment of their death, the resulting branching pro cess is called the Bellman-Harris process. If the lifetimes of individuals are also constant (usually tak en to b e 1), the resulting pro cess is kno wn as the Galton-W atson process [ 2 , 18 ]. Definition 1 (Random preferen tial attac hmen t tree with attraction function f ) . A sequence of random trees { T n } is generated as follo ws: A t time n = 1, the tree T 1 consists of a single v ertex v 1 . A t time n + 1, a new vertex v n +1 is added to T n via a directed edge from a v ertex v i to v n +1 , where v i is chosen with probabilit y prop ortional to f (Out-Deg( v i )) and Out-Deg( v i ) is computed with resp ect to the tree T n . Th us, the linear preferen tial attachmen t tree corresponds to the attraction function f ( i ) = i + 1, 1 and the uniform attac hment tree corresp onds to the constan t function f ≡ 1. W e now define 1 Note that for all nodes except the root no de, Deg( v i ) = Out-Deg( v i ) + 1. Th us, this mo del differs sligh tly from the one considered in our previous work [ 22 ] and in Bub eck et al. [ 8 ], since the attractiv eness of v 1 is prop ortional to 3 sublinear preferen tial attachmen t trees, which ha v e an attraction function that lies strictly b et ween those of a linear preferential attachmen t tree and a uniform attac hmen t tree. Definition 2 (Sublinear preferential attac hmen t trees) . Sublinear preferential attac hmen t trees are preferential attachmen t trees with an attraction function f satisfying the following conditions: 1. f is a nondecreasing function. 2. f ( i ) ≥ 1 for all i ≥ 0, and f is not iden tically equal to 1. 3. There exists 0 < α < 1 suc h that f ( i ) ≤ ( i + 1) α , for all i ≥ 0. Note that the last condition implies f (0) = 1. When f ( i ) = ( i +) α , we denote the corresponding tree to b e the α -sublinear preferen tial attac hment tree. T o define the branc hing pro cess correspond- ing to a preferential attachmen t tree, we define the p oin t process ξ asso ciated with the attraction function f : Definition 3 (P oint process asso ciated to f ) . Giv en an attraction function f , the asso ciated p oin t pro cess ξ on R + is a pure-birth Mark ov pro cess with f as its rate function: P ( ξ ( t + dt ) − ξ ( t ) = 1 | ξ ( t ) = i ) = f ( i ) dt + o ( dt ) , with the initial condition ξ (0) = 0. Note that w e do not need to normalize the rate of this Mark o v pro cess: Consider a CMJ pro cess driv en by the p oin t pro cess ξ as ab o ve, in which individuals never die. Supp ose that at some time t 0 , the branching pro cess consists of n individuals { v 1 , . . . , v n } , where the n um b er of c hildren of no de v i is denoted by d i . In the discrete time tree evolution, the next vertex v n +1 attac hes to v ertex v i with probabilit y f ( d i ) P n j =1 f ( d j ) . In the con tinuous time process, the new v ertex “attaches to v i ” if and only if node i has a c hild b efore an y of the other no des. This child is then v n +1 . Using prop erties of the exp onen tial distribution, we may chec k that this happ ens with probability f ( d i ) P n j =1 f ( d j ) , which is exactly the same as that in the discrete time tree evolution. Th us, if we lo ok at the CMJ branching pro cess at the stopping times when successive vertices are b orn, the resulting trees evolv e in the same wa y as in the discrete time model described in Definition 1 . Definition 4 (Malthusian parameter) . F or a p oin t pro cess ξ on R + , let µ ( t ) = E [ ξ (0 , t ]] denote the mean in tensity measure. The p oint pro cess ξ is a Malthusian pr o c ess if there exists a parameter θ > 0 suc h that θ Z ∞ 0 e − θt µ ( t ) dt = 1 . The constant θ is called the Malthusian p ar ameter of the p oin t pro cess ξ . Example 1. F or the line ar pr efer ential attachment tr e e with f ( i ) = i + 1 , the asso ciate d p oint pr o c ess ξ is the standard Y ule process , define d as fol lows: (a) ξ (0) = 0 , and (b) P ( ξ ( t + dt ) − ξ ( t ) = 1 | ξ ( t ) = i ) = ( i + 1) dt + o ( dt ) . Deg( v 1 ) + 1 rather than Deg( v 1 ). 4 The me an intensity me asur e for the Y ule pr o c ess is µ ( t ) = e t − 1 , and the Malthusian p ar ameter is e qual to 2. Example 2. F or the uniform attachment tr e e with f ≡ 1 , the asso ciate d p oint pr o c ess ξ is the P oisson point pro cess with r ate 1. The me an intensity me asur e is µ ( t ) = t , and the Malthusian p ar ameter is e qual to 1. The Malthusian parameter of a point pro cess plays a critical role in the theory of branching pro cesses. It accurately c haracterizes the growth rate of the p opulation generated by the CMJ branc hing pro cess driven by the p oin t pro cess, as follows: If the p opulation at time t is given b y Z t , the random v ariable e − θt Z t con verges to a nondegenerate random v ariable W . V arious assumptions on the p oin t pro cess lead to different t yp es of conv ergence results, such as con vergence in distribution, in probabilit y , almost surely , in L 1 , or in L 2 [ 9 , 10 , 12 , 30 ]. As derived in Lemma 9 in App endix A , the Malth usian parameter for a sublinear preferential attachmen t pro cess alwa ys exists and lies b et ween the v alues corresp onding to linear preferential attachmen t and uniform attac hment trees describ ed in Examples 1 and 2 . Our results will rely heavily on the following theorem: Theorem 1. L et ξ b e the p oint pr o c ess c orr esp onding to a subline ar attr action function f . The CMJ br anching pr o c ess Z t driven by ξ describing the gr owing r andom tr e e satisfies e − θt Z t L 2 , a.s. − → W, wher e W is an absolutely or singular c ontinuous r andom variable supp orte d on al l of R + , satisfying W > 0 , almost sur ely. The pro of of Theorem 1 , which is contained in App endix A , is established b y showing that the tec hnical conditions required for certain theorems ab out CMJ pro cesses [ 30 , 21 , 7 ] are satisfied b y the p oin t process ξ . 3 T erminal cen tralit y W e no w turn to our main result, whic h establishes the existence of a unique terminal centroid in sublinear preferential attac hment trees. W e b egin b y introducing some notation and basic terminology . Consider the function ψ T : V ( T ) → N defined by ψ T ( u ) = max v ∈ V ( T ) \{ u } | ( T , u ) v ↓ | . Recall that ( T , u ) v ↓ denotes the subtree of T directed a wa y from u , starting at v , as depicted in Figure 1 . Thus, ψ T ( u ) is the size of the largest subtree of the ro oted tree ( T , u ), and measures the lev el of “balancedness” of the tree with respect to v ertex u . W e m ak e the following definition: Definition 5. Given a tree T , a v ertex u ∈ V ( T ) is called a c entr oid if ψ T ( u ) ≤ ψ T ( v ), for all v ∈ V ( T ). Note that although w e ha ve defined the centroid with respect to the criterion ψ T , n umerous equiv alent characterizations of tree cen troids exist [ 23 , 19 , 38 , 35 , 36 , 29 ]. (The c haracterization app earing in Definition 5 coincides with the notion of “rumor cen ter” defined by Shah and Za- man [ 34 ].) F urthermore, a tree may ha ve more than one centroid (although by Lemma 11 in 5 ( T, u ) u v ( T, u ) v # Figure 1: A tree T ro oted at vertex u . The subtree ( T , u ) v ↓ is highlighted. App endix B , a tree may hav e at most t w o cen troids, whic h must then be neigh b ors). F or any tw o no des u and v , if ψ T ( u ) ≤ ψ T ( v ), we sa y that u is at le ast as c entr al as v . Finally , w e define the notion of terminal cen trality: Definition 6. A vertex v ∗ ∈ ∪ ∞ n =1 V ( T n ) is a terminal c entr oid for the sequence of gro wing trees { T n } n ≥ 1 if for every vertex u 6 = v ∗ , there exists a time M (possibly dependent on u ), suc h that for all times n ≥ M , w e hav e ψ T n ( v ∗ ) < ψ T n ( u ) . Th us, the terminal cen troid ev entually b ecomes more central than any other fixed vertex. (Note, ho wev er, that terminal cen trality do es not immediately imply the prop ert y of p ersistent c entr ality ; for instance, v ∗ migh t b e a terminal centroid without ever b eing the centroid at an y finite time.) W e hav e the follo wing theorem: Theorem 2. Subline ar pr efer ential attachment tr e es have a unique terminal c entr oid with pr ob a- bility 1. The statemen t and proof of Theorem 2 ma y b e compared to the results obtained in our previous w ork [ 22 ], which establish p ersisten t centralit y for the sp ecial cases α = 0 and α = 1. F or a subtree T , define the attractiveness of T as the sum of the attraction functions ev aluated at each vertex of T . In the case of uniform attachmen t, the attractiveness of T is simply | T | , whereas for linear preferen tial attachmen t, it is the sum of the degrees of the vertices, which is 2 | T | − 1. The linearity of attractiv eness in | T | was critical to obtaining sharp b ounds on the diagonal crossing probability of certain random w alks. When α ∈ (0 , 1), how ever, the attractiv eness of T is no longer a function of | T | alone, rendering the metho ds of our previous w ork defunct. In the present pap er, we leverage a con tinuous time embedding and conv ergence results for CMJ pro cesses to pro ve terminal centralit y for a large class of sublinear preferential attac hment trees, with the tradeoff b eing a slightly w eaker theoretical guarantee. Pr o of of The or em 2 (sketch). The k ey steps of the pro of are as follo ws: (i) Identify a necessary condition that a v ertex m ust satisfy in order to be a terminal centroid. 6 (ii) Show that the set of v ertices satisfying the condition in (i), called the set of c andidate terminal c entr oids and denoted by C CAN , is nonempt y and finite with probability 1. (iii) Show that among the set of candidate terminal centroids, a unique v ertex emerges that ev entually b ecomes more cen tral than an y other candidate. (iv) Show that the v ertex in (iii) is the unique terminal cen troid. W e first describe the necessary condition in step (i). (F or an illustration, see Figure 2 .) Let v ∗ ( n ) be a cen troid of the tree T n . If T n has t w o cen troids, w e c ho ose v ∗ ( n ) to b e the y ounger v ertex from among the tw o. If v ertex v n +1 is a terminal centroid, it must necessarily b ecome more cen tral than v ∗ ( n ) after a finite amount of time. Consequently , let C CAN := { v n +1 : ∃ M s.t. ψ T m ( v n +1 ) < ψ T m ( v ∗ ( n )) ∀ m ≥ M } , and define E n to b e the even t { v n +1 ∈ C CAN } . W e follow the conv en tion of considering v 1 to b e a candidate terminal cen troid; in particular, C CAN 6 = φ . In fact, for n > 1, either v ∗ ( n ) or v n +1 ev entually b ecomes more cen tral than the other, whic h follo ws from the following lemma: Lemma 1. F or any two vertic es u and v , ther e exists a time M such that either ψ T m ( u ) < ψ T m ( v ) or ψ T m ( u ) > ψ T m ( v ) holds for al l m > M , almost sur ely. Pr o of. Without loss of generality , assume u is b orn b efore v . Let T v and T u denote the trees ( T m , u ) v ↓ and ( T m , v ) u ↓ , where m is the time of birth of v . Note that T v consists of the single v ertex v . W e now restart the pro cess in contin uous time; i.e., we start indep enden t CMJ pro cesses initiated from the starting states T u and T v . Using Theorem 1 , we ha ve the a.s. con vergence result | ( T m , v ) u ↓ | | ( T m , u ) v ↓ | a.s. − → W u W v , (1) for absolutely or singular contin uous indep enden t random v ariables W u and W v , whose distribu- tions are determined b y the structure of the starting states T u and T v , resp ectiv ely . Since W u − W v cannot hav e p oin t masses, we hav e P ( W u = W v ) = P ( W u − W v = 0) = 0 . Th us, either W u > W v or W u < W v , almost surely . The almost sure conv ergence in equation ( 1 ) implies that there exists M > 0 such that either | ( T m , v ) u ↓ | > | ( T m , u ) v ↓ | or | ( T m , v ) u ↓ | < | ( T m , u ) v ↓ | , for all m > M . Applying Lemma 13 in App endix B concludes the pro of. The following lemma furnishes the result in step (ii): Lemma 2. |C CAN | < ∞ , with pr ob ability 1. Pr o of. W e first sho w that any node joining the tree sufficiently late has a v ery small chance of b elonging to C CAN . By Lemma 13 in App endix B , the even t E n o ccurs if and only if there exists M > 0 such that for all m ≥ M , | ( T m , v ∗ ( n )) v n +1 ↓ | > | ( T m , v n +1 ) v ∗ ( n ) ↓ | . (2) T o sim plify notation, we define A m := ( T m , v ∗ ( n )) v n +1 ↓ and B m := ( T m , v n +1 ) v ∗ ( n ) ↓ , for m ≥ n + 1. Lemma 12 in App endix B implies that at time m = n + 1, the num b er of vertices in B m is at least 7 v n +1 v ⇤ ( n ) T n Figure 2: The notation from Lemma 2 is illustrated ab o ve. The centroid of tree T n is v ∗ ( n ), and v n +1 is the new est vertex joining T n to form T n +1 . n 2 . Thus, B m has a large lead ov er A m , whic h has only one vertex. A t time n + 1, we pause the pro cess in discrete time and restart it in con tinuous time, with state at t = 0 b eing the state at the (discrete) time n + 1. Observe that if a time M exists such that inequality ( 2 ) holds, a time Γ > 0 m ust also exist such that the contin uous time trees satisfy | A τ | > | B τ | , for all t > Γ. Note that the population | A t | is simply a sublinear preferential attac hment process started from a single v ertex, whic h we denote by Y ( t ). The p opulation | B t | sto c hastically dominates the sum of n 2 indep enden t sublinear preferential attachmen t pro cesses starting from a single vertex, whic h w e subsequen tly denote b y X 1 ( t ) , . . . , X n/ 2 ( t ). Thus, the probabilit y that E n o ccurs is upp er- b ounded by the probability that Y ( t ) even tually becomes larger than P n/ 2 i =1 X i ( t ). By Theorem 1 , the rescaled pro cesses e − θt Y ( t ) and e − θt X i ( t ), for 1 ≤ i ≤ n/ 2, all conv erge a.s. to i.i.d. random v ariables, whic h w e denote b y W Y and { W i } 1 ≤ i ≤ n/ 2 , resp ectively . Th us, the probability that Y ( t ) ev entually b ecomes larger than P n/ 2 i =1 X i ( t ) is equal to the probability that W Y is greater than P n/ 2 i =1 W i . Using Lemma 14 in App endix D , w e conclude that this probability is upp er-bounded by C n 2 , for some constant C . Finally , since P 1 n 2 is a conv ergent sequence, the B orel-Can telli lemma implies that with probabilit y 1, only finitely many even ts E n o ccur, completing the proof. F or step (iii), we simply note that Lemma 1 implies a fixed ordering via centralit y for an y t w o v ertices. Thus, if we ha ve a finite set such as C CAN , a rep eated application of Lemma 1 to members of this set yields a fixed ordering from the most central to the least central v ertices in C CAN . Let v ∗ b e the most cen tral vertex from the set C CAN that emerges from this ordering. Step (iv) is provided b y the follo wing lemma: Lemma 3. The vertex v ∗ is the unique terminal c entr oid. Pr o of. Let u 0 6 = v ∗ b e an y vertex. If u 0 ∈ C CAN , the c hoice of v ∗ implies that v ∗ ev entually becomes more central than u 0 . Th us, we assume u 0 / ∈ C CAN , me aning the centroid at the time vertex u 0 w as b orn, whic h we denote by u 1 , even tually b ecomes more cen tral than u 0 in the limit. If u 1 ∈ C CAN , then v ∗ ev entually b ecomes more cen tral than u 1 , whic h in turn even tually b ecomes more cen tral than u 0 , as w anted. If instead u 1 / ∈ C CAN , we may consider u 2 , which is the centroid when u 1 w as 8 b orn. Contin uing in this manner, w e define a sequence u 0 , u 1 , u 2 , . . . of progressively older, whic h is necessarily finite, with the last v ertex in the sequence b eing v 1 . Thus, if we define r = min i ≥ 0 { u i ∈ C CAN } , then u r is w ell-defined. W e then hav e that v ∗ is more central than u r , whic h is more cen tral than u r − 1 , whic h is more cen tral than u r − 2 , and so on, con tin uing up to u 0 . This completes the pro of. This also completes the pro of of Theorem 2 . In fact, Theorem 2 may be extended to establish the existence of a fixed set of size K > 0 consisting of the most terminally cen tral vertices. This is summarized in the following theorem: Theorem 3. F or any K ≥ 1 , a unique set of distinct vertic es { v ∗ 1 , v ∗ 2 , . . . , v ∗ K } exists such that for any other vertex u ∈ ∪ ∞ n =1 V ( T n ) , ther e exists a time M (p ossible dep endent on u ) such that ψ T n ( v ∗ 1 ) < ψ T n ( v ∗ 2 ) < · · · < ψ T n ( v ∗ K ) < ψ T n ( u ) , for al l n ≥ M . Pr o of. The argumen t closely parallels that of the pro of of Theorem 2 in our previous w ork [ 22 ], with appropriate mo difications to prov e terminal centralit y instead of p ersisten t centralit y . W e refer the reader to our earlier paper, noting that the argumen t only requires properties of absolute or singular con tinuit y of the appropriately normalized subtree sizes, which are pro vided by Theorem 1 . 4 Finite confidence set for the ro ot F or the results in this section, w e limit our consideration to α -sublinear preferen tial attac hment trees. Recall that these are trees in whic h the attraction function is given b y f ( i ) = (1 + i ) α , for α ∈ (0 , 1). The problem of finding a confidence set for the ro ot no de in the case of linear preferen tial and uniform attachmen t trees w as studied by Bub ec k et al. [ 8 ]. One prop osed metho d for constructing a confidence set that con tains the ro ot no de with probabilit y 1 −  is as follo ws: 1. Given a sequence of random trees { T n } , order the vertices according to the balancedness function ψ T n . 2. Select the K vertices with the smallest v alues of ψ T n , for a prop er v alue of K = K (  ). The abov e metho d was shown to pro duce finite-sized confidence sets in Bub ec k et al. [ 8 ], and the analysis w as later extended to diffusions o ver regular trees [ 25 ]. In fact, the con tinuous time analysis of sublinear preferen tial attachmen t trees also furnishes a metho d for b ounding the required size of a confidence set for the ro ot no de. F ollowing the notation of Bub ec k et al. [ 8 ], we use H K ψ ( T n ) to denote the set of K vertices c hosen according to the metho d describ ed ab o v e, and drop the argumen t T n when the context is unambiguous. Our main result shows that the same estimator pro duces finite-sized confidence sets for sublinear preferential attachmen t trees: Theorem 4. F or  > 0 , ther e exists a c onstant K (dep ending on  ) such that lim inf n →∞ P  v 1 ∈ H K ψ ( T n )  ≥ 1 − . 9 v 1 v 2 v 3 v 4 v 5 ·· · ·· · ·· · ·· · ·· · T n i =4 ,K =5 T n i =3 ,K =5 Figure 3: An illustration of the trees T n i,K defined in the pro of of Theorem 4 . The figure sho ws a tree with K = 5, with the tw o trees T n 3 , 5 and T n 4 , 5 highligh ted. Pr o of of The or em 4 (sketch). W e follo w the approac h of Bub eck et al. [ 8 ]. F or 1 ≤ i ≤ K , let T n i,K denote the tree containing v ertex v i in the forest obtained from T n b y remo ving all edges b et ween no des { v 1 , . . . , v K } . (See Figure 3 for an illustration.) Observe that P ( v 1 / ∈ H K ψ ) ≤ P ( ∃ i > K : ψ ( v i ) ≤ ψ ( v 1 )) ≤ P ( ψ ( v 1 ) ≥ (1 − δ ) n ) + P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ ) n ) , (3) for δ > 0 to b e chosen later. T o handle the first term in inequality ( 3 ), w e hav e the follo wing lemma: Lemma 4 (Pro of in App endix C.1 ) . Ther e exists δ 0 > 0 such that lim sup n →∞ P ( ψ ( v 1 ) ≥ (1 − δ 0 ) n ) <  2 . The pro of of the ab o ve lemma is simple and follo ws b y an argumen t similar to that in Bubeck et al. [ 8 ]. The analysis of the second term in inequality ( 3 ) is more technical: Lemma 5 (Pro of in App endix C.2 ) . Ther e exist c onstants N and C dep ending only on  such that if K > N and C K (log K ) 2 1 − α ( K − 1) 2 <  4 , then lim sup n →∞ P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) <  2 . A brief proof sk etc h of Lemma 5 is as follows. First, we claim that for an y i > K , ψ ( v i ) ≥ min 1 ≤ k ≤ K K X j =1 ,j 6 = k | T n j,K | . This is b ecause v i m ust lie in one of the trees T n k,K , for some 1 ≤ k ≤ K . Th us, the largest subtree hanging off v i is at least as large as the subtree of ( T , v i ) containing vertex v k . F rom Figure 4 , w e 10 see that this is at least P K j =1 ,j 6 = k | T n j,K | , which is in turn at least min 1 ≤ k ≤ K P K j =1 ,j 6 = k | T n j,K | . Hence, the desired expression ma y be lo w er-b ounded by P  ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n  ( a ) = P  ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ 1 − δ 0 δ 0 | T k,K |  , where ( a ) follo ws b ecause P K j =1 | T n j,K | is simply the total n umber of vertices, which is n . This final term may b e b ounded from ab o v e b y observing that (i) the growth rate of | T n k,K | is larger for a large degree of v k in T K ; and (ii) the maximum degree of T K is roughly (log K ) 1 / (1 − α ) , which is not sufficien t to increase the gro wth of | T n k,K | so as to compete with the sum of K − 1 trees giv en b y P K j =1 ,j 6 = k | T n j,K | , when K is sufficiently large. Com bining the bounds of Lemmas 4 and 5 and substituting back into inequality ( 3 ) completes the pro of. 5 Discussion In this pap er, we ha v e established the existence of a unique terminal centroid in sublinear preferen- tial attac hment trees. Ho w ever, our results hav e stopp ed short of proving that a p ersisten t cen troid exists, whic h w as conjectured in our previous w ork [ 22 ]. T o establish the stronger statemen t, it w ould suffice to show that the terminal centroid iden tified in our pap er is in fact p ersisten t. (Al- though somewhat counterin tuitive, the definition of terminal cen trality lea ves op en the p ossibilit y that the terminal cen troid nev er actually becomes the tree centroid at any finite time p oin t.) A p ossible approach lev erages ideas from our previous w ork [ 22 ]. W e now describ e the main b ottlenec k in extending the argumen t emplo yed there in the present setting. F or the purp ose of this discussion, supp ose the sublinear preferential attac hment tree has an attraction function f ( i ) = ( i + 1) α , for some 0 < α < 1. The analog of our previous approac h [ 22 ] would inv olve tw o steps: (i) sho wing that the total n um b er of vertices that ev er become cen troids is finite, almost surely; and (ii) concluding the existence of unique p ersisten t centroid by an application of Lemma 1 . In the first step, w e need to show that vertices which are b orn late hav e a very small probability of ev er b ecoming the tree cen troid at any future p oint in time, and then apply the Borel-Cantelli lemma to establish finiteness. As describ ed in the pro of of Theorem 2 , a v ertex v n +1 can b ecome cen troid at some p oin t in the future if and only if the tree A t is able to catc h up with the tree B t at some future time t . How ever, unlike in the case of uniform or linear preferen tial attac hment trees, the probabilit y that a new vertex joins A t or B t is no longer prop ortional to a simple linear function of the size of the subtree. Based on these ideas, ho wev er, one can show that a sufficient condition for persistent cen trality in the tree growth pro cess is as follo ws: Irrelev ance of structure condition: There exists a parameter η ∈ (0 , 1) suc h that for any t wo trees Γ 1 and Γ 2 with | Γ 1 | = | Γ 2 | , the probabilit y that the CMJ process started from Γ 1 has a larger population than the CMJ pro cess started from Γ 2 , in the limit, lies in the in terv al ( η , 1 − η ). The ab o ve condition essen tially ensures that the structure of the tree does not hav e a significant impact on ho w quickly it gro ws. Note that for linear preferen tial attac hment and uniform attachmen t trees, we can tak e any η < 1 / 2, since the probabilit y that the population of Γ 1 is larger than the p opulation of Γ 2 in the 11 limit is exactly 1 2 . Irrelev ance of structure is thus crucially leveraged in the pro of strategy of our previous work [ 22 ]. Unfortunately , the irrelev ance of structure condition do es not app ear to hold for sublinear preferen tial attachmen t trees, due to the following small example: Example 3. L et Γ 1 b e the “line tr e e”; i.e. a se quenc e of r vertic es { v 1 , v 2 , . . . , v r } such that the e dge set is { ( v i , v i +1 ) : 1 ≤ i ≤ r − 1 } . L et Γ 2 b e the “star tr e e” with the e dge set b eing { ( v 1 , v i ) : 2 ≤ i ≤ r } . If the p opulation of the CMJ pr o c ess starte d fr om Γ i is asymptotic al ly W i e θt for i ∈ { 1 , 2 } , it is p ossible to show that W 1 r and W 2 r c onver ge in pr ob ability to some c onstant s c 1 and c 2 , such that c 1 > c 2 , as r → ∞ . Thus, the pr ob ability that the p opulation of Γ 1 is lar ger than the p opulation of Γ 2 in the limit tends to 1 as r → ∞ . This violates the irr elevanc e of structur e c ondition, sinc e no matter which η > 0 is chosen, the pr ob ability of Γ 1 b eing lar ger than Γ 2 in the limit surp asses 1 − η for al l lar ge enough r . Of course, line trees and star trees do not t ypically sho w up in sublinear preferen tial attac hment trees, so it may be p ossible to redefine the irrelev ance of structure prop ert y to rule out such lo w-probability configurations. Ho wev er, a suitable mo dification that pav es the w ay to pro ving p ersisten t centralit y has yet to be determined. Finally , supp ose we define C ∞ to b e the set of all v ertices whic h are tree centroids for infinite amoun ts of time. It is easy to see that Lemma 1 rules out the possibility of C ∞ ≥ 2, since any tw o v ertices in C ∞ will ha ve a fixed centralit y ordering in the limit. Note that C ∞ = 0 implies that an infinite n umber of vertices ev er b ecome cen troids, alb eit for finite amoun ts of time; whereas C ∞ = 1 also do es not preclude suc h a p ossibilit y . A weak er conjecture than p ersistent cen trality , but stronger than terminal centralit y , would therefore be to sho w that |C ∞ | = 1. This question curren tly remains op en. Regarding ro ot inference, it is an op en question whether the method of constructing finite confidence sets based on cen trality con tinues to hold b ey ond the subclass of α -sublinear preferen tial attac hment trees. Also note that the results on ro ot inference in this pap er are generally weak er than those obtained for linear preferential and uniform attachmen t trees in Bub ec k et al. [ 8 ] and Jog and Loh [ 22 ], since we hav e not provided b ounds on the size of a confidence set for the ro ot no de, or the size of the h ub around v 1 that will ensure its p ersisten t centralit y . The main h urdle in establishing suc h b ounds is, again, the lac k of concrete information ab out the limiting random v ariable W in a CMJ pro cess. Although obtaining the exact distribution of W seems too optimistic, it ma y b e possible to obtain bounds on moments or tail probabilities, whic h could b e used to obtain b ounds on h ub sizes or confidence sets. Our results in this paper also strengthen the b elief that the age of a node and its cen trality are strongly related in gro wing random trees, implying that it is extremely difficult for a v ertex to hide its age. F anti et al. [ 16 ] explored the problem of ho w to create a diffusion pro cess ov er a regular tree in order to obfuscate the oldest no de, and it would b e v ery in teresting to see if classes of attraction functions exist that cause the tree to grow in such a w ay that the b est confidence set for the ro ot no de do es not remain finite as the tree grows. Ac kno wledgmen t The authors would lik e to thank the AE and three anonymous review ers for their helpful and p ositiv e feedback while preparing the revision. 12 References [1] R. Alb ert and A.-L. Barab´ asi. Statistical mec hanics of complex netw orks. R ev. Mo d. Phys. , 74:47–97, Jan 2002. [2] K. B. A threya and P . Ney . Br anching Pr o c esses . Dov er Books on Mathematics. Do ver Publi- cations, 2004. [3] A.-L. Barab´ asi. Network Scienc e . Cam bridge Universit y Press, 2016. [4] A.-L. Barab´ asi and R. Alb ert. Emergence of scaling in random net works. Scienc e , 286(5439):509–512, 1999. [5] A. Barrat, M. Barthlem y , and A. V espignani. Dynamic al Pr o c esses on Complex Networks . Cam bridge Univ ersit y Press, New Y ork, NY, USA, 2008. [6] S. Bhamidi. Universal tec hniques to analyze preferential attac hment trees: Global and lo cal analysis. In pr ep ar ation , August 2007. [7] J. D. Biggins and D. R. Grey . Con tinuit y of limit random v ariables in the branc hing random w alk. Journal of Applie d Pr ob ability , pages 740–749, 1979. [8] S. Bub ec k, L. Devroy e, and G. Lugosi. Finding Adam in random growing trees. arXiv pr eprint arXiv:1411.3317 , 2014. [9] K. S. Crump and C. J. Mo de. A general age-dep enden t branc hing pro cess. I. Journal of Mathematic al Analysis and Applic ations , 24(3):494–508, 1968. [10] K. S. Crump and C. J. Mo de. A general age-dep enden t branc hing pro cess. I I. Journal of Mathematic al Analysis and Applic ations , 25(1):8–17, 1969. [11] S. Dereich and P . M¨ orters. Random netw orks with sublinear preferen tial attachmen t: Degree ev olutions. Ele ctr on. J. Pr ob ab. , 14(43):1222–1267, 2009. [12] R. A. Doney . A limit theorem for a class of sup ercritical branching pro cesses. Journal of Applie d Pr ob ability , pages 707–724, 1972. [13] W. Dong, W. Zhang, and C.-W. T an. Ro oting out the rumor culprit from susp ects. In Information The ory Pr o c e e dings (ISIT), 2013 IEEE International Symp osium on , pages 2671– 2675. IEEE, 2013. [14] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of Networks: F r om Biolo gic al Nets to the Internet and WWW . Oxford Univ ersit y Press, Inc., New Y ork, NY, USA, 2003. [15] M. Ec khoff and P . M¨ orters. V ulnerability of robust preferential attac hment net works. Ele ctr on. J. Pr ob ab. , 19(57):1–47, 2014. [16] G. F an ti, P . Kairouz, S. Oh, K. Ramchandran, and P . Viswanath. Hiding the Rumor Source. A rXiv e-prints , Septem b er 2015. [17] P . Galashin. Existence of a p ersisten t hub in the conv ex preferential attac hment mo del. arXiv pr eprint arXiv:1310.7513 , 2013. 13 [18] T. E. Harris. The The ory of Br anching Pr o c esses . Grundlehren der mathematisc hen Wis- sensc haften. Springer Berlin Heidelb erg, 2012. [19] M. O. Jac kson. So cial and Ec onomic Networks , volume 3. Princeton Universit y Press, 2008. [20] P . Jagers. Br anching pr o c esses with biolo gic al applic ations . Wiley , 1975. [21] P . Jagers and O. Nerman. The gro wth and comp osition of branching populations. A dvanc es in Applie d Pr ob ability , pages 221–259, 1984. [22] V. Jog and P . Loh. P ersistence of centralit y in random gro wing trees. arXiv pr eprint arXiv:1511.01975 , 2015. [23] C. Jordan. Sur les assem blages de lignes. J. R eine Angew. Math , 70(185):81, 1869. [24] N. Karamc handani and M. F ranceschetti. Rumor source detection under probabilistic sam- pling. In Information The ory Pr o c e e dings (ISIT), 2013 IEEE International Symp osium on , pages 2184–2188. IEEE, 2013. [25] J. Khim and P . Loh. Confidence sets for the source of a diffusion in regular trees. arXiv pr eprint arXiv:1510.05461 , 2015. [26] P . L. Krapivsky, S. Redner, and F. Leyvraz. Connectivit y of growing random netw orks. Physic al R eview L etters , 85:4629–4632, Nov ember 2000. [27] J. Kunegis, M. Blattner, and C. Moser. Preferen tial attachmen t in online netw orks: Mea- suremen t and explanations. In Pr o c e e dings of the 5th Annual A CM Web Scienc e Confer enc e , W ebSci ’13, pages 205–214, New Y ork, NY, USA, 2013. ACM. [28] W. Luo, W.-P . T a y , and M. Leng. Identifying infection sources and regions in large netw orks. IEEE T r ans. Signal Pr o c essing , 61(11):2850–2865, 2013. [29] S. L. Mitchell. Another c haracterization of the cen troid of a tree. Discr ete Mathematics , 24(3):277–280, 1978. [30] O. Nerman. On the con vergence of sup ercritical general (CMJ) branc hing processes. Pr ob ability The ory and R elate d Fields , 57(3):365–395, 1981. [31] M. Newman. Networks: A n Intr o duction . Oxford Universit y Press, Inc., New Y ork, NY, USA, 2010. [32] A. Rudas, B. T´ oth, and B. V alk´ o. Random trees and general branching pro cesses. R andom Structur es & Algorithms , 31(2):186–202, 2007. [33] D. Shah and T. Zaman. Rumors in a netw ork: Who’s the culprit? IEEE T r ansactions on Information The ory , 57(8):5163–5181, 2011. [34] D. Shah and T. Zaman. Finding rumor sources on random trees. arXiv pr eprint arXiv:1110.6230 , 2015. [35] P . J. Slater. Maximin facilit y location. Journal of National Bur e au of Standar ds B , 79:107–115, 1975. [36] P . J. Slater. Accretion centers: A generalization of branch w eight cen troids. Discr ete Applie d Mathematics , 3(3):187–192, 1981. 14 [37] C. W. T an, P .-D. Y u, C.-K. Lai, W. Zhang, and H.-L. F u. Optimal detection of influen tial spreaders in online so cial net works. In 2016 Annual Confer enc e on Information Scienc e and Systems (CISS) , pages 145–150. IEEE, 2016. [38] B. Zelink a. Medians and p eripherians of trees. A r chivum Mathematicum , 4(2):87–95, 1968. A Results on CMJ pro cesses In this App endix, we review prop erties of CMJ pro cesses and v erify that the CMJ pro cess corre- sp onding to a sublinear preferen tial attachmen t tree enjo ys certain con vergence prop erties. A.1 Preliminary results W e b egin by stating several results that will b e crucial for our purp oses. F or a more detailed discussion of suc h results, see the survey pap er by Jager and Nerman [ 21 ]. Lemma 6 (Corollary 4.2 and Theorem 4.3 from Jager and Nerman [ 21 ]) . L et ξ b e a p oint pr o c ess on R + with Malthusian p ar ameter θ > 0 . Consider a CMJ pr o c ess driven by ξ in which individuals live for ever. L et the p opulation of the CMJ pr o c ess at time t ≥ 0 b e denote d by Z t . Define ˆ ξ ( θ ) = Z ∞ 0 e − θt dξ ( t ) . If the c ondition V ar ( ˆ ξ ( θ )) < ∞ ( ? ) is satisfie d, then we have the c onver genc e r esult e − θt Z t L 2 − → W , wher e W is a r andom variable satisfying W > 0 , almost sur ely. Lemma 7 (Theorem 5.4 from Nerman [ 30 ]) . L et ξ , θ , and Z t b e as in L emma 6 . If the me an intensity me asur e µ satisfies Z ∞ 0 e − ˜ θt dµ ( t ) < ∞ , for some ˜ θ < θ, ( ?? ) then e − θt Z t a.s. − → W , wher e W is as in L emma 6 . 2 Although not muc h is kno wn ab out the exact distribution of W in the case of a general CMJ pro cess, the following useful prop erties hav e b een established: Lemma 8 (Theorem 1 from Biggins and Grey [ 7 ]) . L et W b e the limit r andom variable app e aring in L emmas 6 and 7 . The the fol lowing pr op erties hold: (i) The distribution of W has no atoms. 2 In Nerman [ 30 ], the condition ( ?? ) app ears in a more general form denoted Condition 5.1. As explained in the remark follo wing Condition 5.1, the condition ( ?? ) is stronger and implies Condition 5.1. 15 (ii) The distribution of W is either singular c ontinuous or absolutely c ontinuous. (iii) The supp ort of W is al l of R + ; i.e., the set of p ositive p oints of incr e ase of the distribution function of W is al l of R + . Remark. Note that in al l the ab ove r esults, we have assume d that the br anching pr o c ess b e gins with a single individual. Supp ose, however, that the pr o c ess starts fr om some initial state c onsisting of a finite c ol le ction of no des { v 1 , . . . , v k } satisfying p ar ent-child r elationships ac c or ding to a dir e cte d tr e e T r o ote d at v 1 . In this c ase, we c an c ondition the CMJ pr o c ess b e ginning with a single no de on the event of observing the tr e e T at some p oint, and c onclude that the Malthusian normalize d p opulation c onver ges to a r andom variable ˜ W almost sur ely and in L 2 . A lthough we do not pr ovide a pr o of her e, the limit r andom variable ˜ W also satisfies al l the pr op erties in L emma 8 . A.2 Sublinear preferen tial attac hmen t W e now sp ecialize our discussion to sublinear preferential attachmen t pro cesses. Lemma 9. The Malthusian p ar ameter θ for a subline ar pr efer ential pr o c ess always exists and satisfies 1 < θ < 2 . Pr o of. A stronger v ersion of this lemma may b e found in Lemma 44 of Bhamidi [ 6 ]. Let ξ b e the p oin t pro cess asso ciated with a sublinear preferential attachmen t function f with mean intensit y µ ( t ). Let µ U A ( t ) and µ P A ( t ) b e the mean intensities of the standard Y ule process and the Poisson pro cess with rate 1, resp ectiv ely . Clearly , the mean in tensity functions satisfy µ U A ( t ) < µ ( t ) < µ P A ( t ) . Let X θ b e an exp onen tial random v ariable with rate θ , indep enden t of ξ . Note that the integral θ Z ∞ 0 e − θt µ ( t ) dt = E [ ξ ( X θ )] is monotonically decreasing in θ . A t θ = 1, using the fact that µ U A ( t ) < µ ( t ), w e ha ve 1 < E [ ξ ( X 1 )]. Similarly , at θ = 2, w e may use the fact that µ ( t ) < µ P A ( t ) to obtain E [ ξ ( X 2 )] < 1. By monotonicity , the v alue of E [ ξ ( X θ )] must therefore equal 1 at some 1 < θ < 2. Lemma 10. The p oint pr o c ess ξ c orr esp onding to a subline ar attr action function f satisfies c on- ditions ( ? ) and ( ?? ) . Pr o of. W e first show that condition ( ? ) is satisfied by following an approac h used in Bhamidi [ 6 ]. F or 0 < α < 1, let 1 ≤ f ( i ) ≤ ( i + 1) α b e a sublinear attraction function and let ξ be the asso ciated p oin t pro cess with Malthusian parameter θ , existing by Lemma 9 . Let X θ b e an exp onen tial random v ariable with rate θ , indep enden t of ξ . Defining the random function ˆ ξ ( θ ) := Z ∞ 0 e − θt dξ ( t ) , w e ha v e b y F ubini’s theorem that ˆ ξ ( θ ) = θ Z ∞ 0 e − θt ξ (0 , t ] dt = E [ ξ (0 , X θ ] | ξ ] . 16 Then V ar( ˆ ξ ( θ )) ≤ E h ˆ ξ ( θ ) 2 i = E h ( E [ ξ (0 , X θ ] | ξ ]) 2 i ( a ) ≤ E  E  ξ (0 , X θ ] 2 | ξ  = E  ξ (0 , X θ ] 2  , where inequality ( a ) follo ws from Jensen’s inequality . Th us, it is enough to deriv e the b ound E  ξ (0 , X θ ] 2  < ∞ . Let ξ α b e the the p oin t pro cess corresponding to the the attraction function f α ( i ) = (1 + i ) α . Note that since E  ξ (0 , X θ ) 2  ≤ E  ξ α (0 , X θ ] 2  , it is enough to show that E  ξ α (0 , X θ ] 2  < ∞ . Note that it is p ossible to find the exact distribution of the random v ariable ξ α (0 , X θ ], as fol- lo ws: T he time of the k th arriv al in the p oin t pro cess ξ α ma y b e written as P k i =1 Y i , where Y i ∼ E xp ( f α ( i − 1)) and the Y i ’s are independent. Hence, P ( ξ α (0 , X θ ] ≥ k ) = P X θ ≥ k X i =1 Y i ! = E h e − θ P k i =1 Y i i = k − 1 Y i =0 f α ( i ) θ + f α ( i ) = k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α . The probability mass function of ξ α (0 , X θ ] is th us given by P ( ξ α (0 , X θ ] = k ) = k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α − k Y i =0 (1 + i ) α θ + (1 + i ) α = θ θ + (1 + k ) α k − 1 Y i =0 (1 + i ) α θ + (1 + i ) α ∼ 1 k α exp  − θ k 1 − α 1 − α  . It is no w easy to c heck that E  ξ α (0 , X θ ] 2  < ∞ , and th us, E  ξ (0 , X θ ] 2  < ∞ . Finally , we sho w that condition ( ?? ) holds. Let µ ( t ) b e the in tensity measure asso ciated with the sublinear preferen tial attac hment pro cess. Let ˜ θ b e such that 1 < ˜ θ < θ . Note that such a parameter ˜ θ exists by Lemma 9 . As in Lemma 9 , let µ P A b e the mean intensit y measure asso ciated with the linear preferen tial attac hmen t process. Then Z ∞ 0 e − ˜ θt dµ ( t ) < Z ∞ 0 e − ˜ θt dµ P A ( t ) ( a ) = Z ∞ 0 e (1 − ˜ θ ) t dt < ∞ , 17 where equality ( a ) holds b ecause µ P A ( t ) = e t − 1. A.3 Pro of of Theorem 1 Ha ving v erified the conditions ( ? ) and ( ?? ) via Lemma 10 , w e obtain the desired L 2 and almost sure conv ergence b y applying Lemmas 6 and 7 , resp ectively . The absolute or singular contin uity of the limit random v ariable follows from Lemma 8 . B Useful results on trees In this App endix, we collect three k ey lemmas concerning trees and tree centroids that w e use in our pro ofs. Lemma 11 (Lemma 2.1 from Jog and Loh [ 22 ]) . F or a tr e e T on n vertic es, the fol lowing statements hold: (i) If v ∗ is a c entr oid, then ψ T ( v ∗ ) ≤ n 2 . (ii) T c an have at most two c entr oids. (iii) If u ∗ and v ∗ ar e two c entr oids, then u ∗ and v ∗ ar e adjac ent vertic es. F urthermor e, ψ T ( u ∗ ) = | ( T , u ∗ ) v ∗ ↓ | , and ψ T ( v ∗ ) = | ( T , v ∗ ) u ∗ ↓ | . Lemma 12 (Lemma 2.3 from Jog and Loh [ 22 ]) . L et { T n } n ≥ 1 b e a se quenc e of gr owing tr e es, with V ( T n ) = { v 1 , . . . , v n } . A t time n + 1 , we have the ine quality | ( T n +1 , v n +1 ) v ∗ ( n ) ↓ | ≥ n 2 . Lemma 13. Consider a tr e e T and pick any two vertic es u, v ∈ V ( T ) . Then we have the fol lowing r esult: ψ T ( u ) ≤ ψ T ( v ) ⇐ ⇒ | ( T , v ) u ↓ | ≥ | ( T , u ) v ↓ | . Pr o of. Let u 0 and v 0 b e the neigh b oring v ertices to u and v , resp ectiv ely , in the path from u to v . T o simplify notation, denote | ( T , v ) u ↓ | = a and | ( T , u ) v ↓ | = b . First supp ose a ≥ b . Let c = | T | − a − b b e num b er of vertices not in either of the tw o subtrees. (See Figure 4 .) W e hav e the follo wing inequality: ψ T ( v ) ≥ | ( T , v ) v 0 ↓ | = a + c. W e also ha ve the inequalit y ψ T ( u ) ≤ max  | ( T , v ) | u ↓ − 1 , | ( T , u ) u 0 ↓ |  = max ( a − 1 , b + c ) ( a ) ≤ a + c, where ( a ) follo ws from our assumption a ≥ b . Com bining the t wo inequalities, we then ha ve ψ T ( u ) ≤ a + c ≤ ψ T ( v ) , 18 ·· · u u 0 v v 0 a b c Figure 4: Subtrees with sizes a , b , and c from Lemma 13 . whic h is one direction of the implication. If instead a < b , the same steps establish the string of inequalities ψ T ( v ) ≤ max( b − 1 , a + c ) < b + c ≤ ψ T ( v ) , pro viding the other direction of the implication. C Supp orting pro ofs for Theorem 4 In this Appendix, w e provide pro ofs of the lemmas used to derive Theorem 4 . C.1 Pro of of Lemma 4 First note that w e clearly ha ve ψ ( v 1 ) ≤ max  | T n 1 , 2 | , | T n 2 , 2 |  and n = | T n 1 , 2 | + | T n 2 , 2 | . Thus, P ( ψ ( v 1 ) ≥ (1 − δ ) n ) ≤ P max  | T n 1 , 2 | , | T n 2 , 2 |  | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! ≤ P | T n 1 , 2 | | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! + P | T n 2 , 2 | | T n 1 , 2 | + | T n 2 , 2 | ≥ (1 − δ ) ! . (4) Consider the con tinuous time v ersions of the gro wing tree pro cesses, and let θ b e the Malth usian parameter of the p oin t pro cess asso ciated with T t 1 , 2 . Then | T t 1 , 2 | | T t 1 , 2 | + | T t 2 , 2 | = e − θt | T t 1 , 2 | e − θt | T t 1 , 2 | + e − θt | T t 2 , 2 | a.s. − → W 1 W 1 + W 2 := W , where b y Lemma 8 , the random v ariable W is absolutely or singular con tinuous and is supp orted on the entire in terv al [0 , 1]. In particular, w e may choose δ 0 0 > 0 suc h that P ( W ≥ 1 − δ 0 ) <  4 . This implies that lim sup t →∞ P | T t 1 , 2 | | T t 1 , 2 | + | T t 2 , 2 | ≥ (1 − δ 0 0 ) ! <  4 . 19 Using a similar argumen t for the second term, w e conclude that there exists a δ 00 0 suc h that lim sup t →∞ P | T t 2 , 2 | | T t 1 , 2 | + | T t 2 , 2 | ≥ (1 − δ 00 0 ) ! <  4 . T aking δ 0 = min( δ 0 0 , δ 00 0 ) and substituting bac k in to inequality ( 4 ), we obtain the desired b ound. C.2 Pro of of Lemma 5 As noted in the pro of sketc h, for any i > K , we hav e ψ ( v i ) ≥ min 1 ≤ k ≤ K K X j =1 ,j 6 = k | T n j,K | . Hence, P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) ≤ P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n   . (5) W e can break up the righ t-hand expression as follows: F rom Theorem 22 in Bhamidi [ 6 ], the max- im um degree of a sublinear preferential attachmen t mo del with attraction function f ( i ) = ( i + 1) α scales as (log n ) 1 1 − α . Concretely , there exists a constan t M suc h that lim sup n →∞ P  Max-Deg( T n ) > (log n ) 1 1 − α M  <  4 . Therefore, we may choose N large enough suc h that P  Max-Deg( T n ) > (log n ) 1 1 − α M  <  4 , for all n ≥ N . (6) Note that M depends only on  and the distribution of the normalized maximum degree that exists in the limit of the the α -sublinear attac hment tree growth pro cess. Thus, fixing  fixes M , as well. Ha ving c hosen M , note that N dep ends on how fast the normalized distribution of the maxim um degree con verges to the fixed distribution, and on  . Since the former is solely a prop ert y of the sublinear attac hmen t pro cess, w e observ e that N also depends only on  . W e no w pic k a v alue K > N , and define the even t E K := n Max-Deg( T K ) ≤ (log K ) 1 1 − α M o . The right-hand side of inequality ( 5 ) may b e b ounded b y P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   + P ( E K ) ( a ) ≤ P   ∃ 1 ≤ k ≤ K : K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   +  4 ( b ) ≤ K X k =1 P   K X j =1 ,j 6 = k | T n j,K | ≤ (1 − δ 0 ) n    E K   +  4 . 20 Here, ( a ) follows from equation ( 6 ) and the choice of K > N . Step ( b ) is a simple application of the union bound. No w fix k = 1, and consider the probabilit y P   K X j =2 | T n j,K | ≤ (1 − δ 0 ) n    E K   ( a ) = P   K X j =2 | T n j,K | ≤  1 − δ 0 δ 0  | T n 1 ,K |    E K   , where step ( a ) follo ws since P K j =1 | T n j,K | is simply the total n umber of v ertices, which is n . Since the degree of v 1 is at most (log K ) 1 1 − α M conditioned on E K , we may b ound the ab o ve probabilit y via sto c hastic domination, as follo ws: A t time n = K , replace v 1 b y d (log K ) 1 1 − α M e isolated vertices, and replace v j b y a single isolated v ertex, for each 2 ≤ j ≤ K . The crucial step is to observe that b y Lemma 15 , this replacement expedites the gro wth of | T t 1 ,K | and retards the growth of P K j =2 | T t j,K | . Applying Lemma 14 to the i.i.d. limit random v ariables W i and ˜ W i corresp onding to the renormalized populations of the contin uous time CMJ pro cesses, we then hav e lim sup t →∞ P   e − θt K X j =2 | T t j,K | ≤  1 − δ 0 δ 0  e − θt | T t 1 ,K |    E K   ≤ P    K − 1 X i =1 W i ≤  1 − δ 0 δ 0  d (log K ) 1 1 − α M e X i =1 ˜ W i    ≤ P K − 1 X i =1 W i ≤ ˜ U ! , where ˜ U is the random v ariable  1 − δ 0 δ 0  P d (log K ) 1 1 − α M e i =1 ˜ W i . In anticipation of using Lemma 14 , we b ound E [ ˜ U 2 ] as follo ws: E h ˜ U 2 i =  1 − δ 0 δ 0  2 E        d (log K ) 1 1 − α M e X i =1 ˜ W i    2     ( a ) ≤  1 − δ 0 δ 0  2 d (log K ) 1 1 − α M e 2 E ˜ W 2 1 , where step ( a ) is true b ecause for all 1 ≤ i, j ≤ d (log K ) 1 1 − α M e , we ha ve E [ ˜ W i ˜ W j ] = E [ ˜ W i ] 2 ≤ E [ ˜ W 2 i ] . No w w e apply Lemma 14 to conclude that P K − 1 X i =1 W i ≤ ˜ U ! ≤ C ×  1 − δ 0 δ 0  2 d (log K ) 1 1 − α M e 2 E ˜ W 2 1 ( K − 1) 2 = C 1 (log K ) 2 1 − α ( K − 1) 2 , 21 where the constant C 1 =  1 − δ 0 δ 0  2 × C × M 2 × E ˜ W 2 1 dep ends only on  , since by Lemma 14 , the constan t C dep ends only on the distribution of ˜ W i , which in turn dep ends only on the sublinear preferen tial attachmen t growth pro cess and is therefore fixed. Arguing similarly , E ˜ W 2 1 is again a fixed constant. Also, as noted earlier, δ 0 and M dep end only on  . Since such an inequality holds for all v alues 1 ≤ k ≤ K , substituting back in to inequalit y ( 5 ) and applying a union b ound yields lim sup n →∞ P ( ∃ i > K : ψ ( v i ) ≤ (1 − δ 0 ) n ) ≤ K C 1 (log K ) 2 1 − α ( K − 1) 2 +  4 . W e now choose K > N sufficiently large so that C 1 K (log K ) 2 1 − α ( K − 1) 2 <  4 , establishing the desired inequalit y . D Additional tec hnical lemmas In this Appendix, w e state and pro ve a useful Ho effding b ound for sums of indep endent, nonnegative random v ariables. Lemma 14. L et X 1 , X 2 , . . . , X n b e i.i.d. r andom variables distribute d ac c or ding to Z , such that Z ≥ 0 almost sur ely and E [ Z 2 ] < ∞ . L et Y b e a r andom variable indep endent of X i ’s satisfying Y > 0 almost sur ely and E [ Y 2 ] < ∞ . Then P n X i =1 X i ≤ Y ! ≤ C E [ Y 2 ] n 2 , for some c onstant C dep ending only on the distribution of Z . Pr o of. Define W i = min { X i , M } , where the constant M is c hosen such that E [ W i ] ≥ E [ X i ] 2 . Since W i ≤ X i , we hav e P 1 n n X i =1 X i − E [ X i ] ≤ t ! ≤ P 1 n n X i =1 W i − E [ X i ] ≤ t ! ≤ P 1 n n X i =1 W i − 2 E [ W i ] ≤ t ! = P 1 n n X i =1 W i − E [ W i ] ≤ t + E [ W i ] ! ≤ C 1 exp( − nt 2 C 2 ) , (7) for suitable constants C 1 and C 2 , where the last inequality follows from Ho effding’s inequality . Let E 1 := ( 1 n n X i =1 X i − E [ X i ] ≤ − E [ X i ] 2 ) . Then P n X i =1 X i ≤ Y ! ≤ P ( E 1 ) + P  Y ≥ n E [ X i ] 2  . 22 Note that b y Marko v’s inequalit y , w e hav e the bound P  Y ≥ n E [ X i ] 2  = P  Y 2 ≥ n 2 E [ X i ] 2 4  ≤ 4 E [ Y 2 ] n 2 E [ X i ] 2 = C 3 E [ Y 2 ] n 2 , for a suitable constant C 3 . Since P ( E 1 ) decays exp onen tially in n by inequality ( 7 ), we ma y find another constant C 4 suc h that P n X i =1 X i ≤ Y ! ≤ C 4 E [ Y 2 ] n 2 , as claimed. Lemma 15. Consider a CMJ pr o c ess initiate d fr om the fol lowing state: The r o ot no de gives birth ac c or ding to a shifte d p oint pr o c ess ξ d such that P ( ξ d ( t + dt ) − ξ d ( t ) = 1 | ξ d ( t ) = i ) = f ( i + d ) dt + o ( dt ) , wher e f is the attr action function for an α -subline ar pr efer ential attachment pr o c ess. Ap art fr om the r o ot no de, every other no de gives birth ac c or ding the p oint pr o c ess ξ driven by the function f ( i ) = ( i +1) α . L et the p opulation of this CMJ pr o c ess b e denote d by H ( t ) . L et X i ( t ) for 1 ≤ i ≤ d + 1 b e i.i.d. CMJ pr o c esses initiate d fr om a single p oint. We claim that P d +1 i =1 X i ( t ) sto chastic al ly dominates H ( t ) . Pr o of. Consider the ro ot no de v and is CMJ pro cess H ( t ). W e compare its gro wth with the sum of d + 1 i.i.d. CMJ pro cesses starting from the isolated v ertices { u 1 , . . . , u d +1 } . Let C v ( t ) denote the num b er of c hildren of v at time t , and let C i ( t ) denote the n um b er of children of u i at time t . Let C u ( t ) = P d +1 i =1 C i ( t ). Note that C v ( t ) is simply a Marko v pro cess, given b y P ( C v ( t + dt ) − C v ( t ) | C v ( t ) = k ) = ( d + k + 1) α dt + o ( dt ) . Unlik e C v , the pro cess C u is not Marko v. Ho wev er, for any ( r 1 , . . . , r d +1 ) suc h that P d +1 i =1 r i = k , w e ma y write P C u ( t + dt ) − C u ( t ) = 1     C i ( t ) = r i , for 1 ≤ i ≤ d + 1 ! = d +1 X i =1 ( r i + 1) α . Since α < 1, w e see that no matter what the r i ’s are, w e must hav e ( d + k + 1) α ≤ d +1 X i =1 ( r i + 1) α . Th us, the pro cess C 2 ( t ) stochastically dominates C 1 ( t ). Since the children in eac h pro cess b eha ve iden tically; i.e., they repro duce according to ξ , and so do their descendants, we can couple the pro cesses in a straightforw ard wa y to conclude that the sum of d + 1 indep enden t CMJ pro cesses sto c hastically dominates H ( t ). 23

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment