On central limit theorems for Ewens--Pitman model

ON CENTRAL LIMIT THEOREMS F OR EWENS–PITMAN MODEL YIZA O W ANG Abstract. W e establish a quenc hed functional central limit theorem for the total n um b er of comp onen ts of random partitions induced by Chinese restaurant process with parameters ( α, θ ) , α ∈ (0 , 1) , θ > − α . With P j denoting the asymptotic frequency of j -th table, it is w ell- kno wn that the comp onent count has the same law as the o ccupancy count of an inﬁnite urn sc heme with sampling frequencies b eing ( P j ) j ∈ N . Our analysis follo ws this approach and is based on earlier results of Karlin [26] and Durieu and W ang [13]. In w ords, our result reveals that the ﬂuctuations of comp onen t count consist of tw o parts, one due to the sampling eﬀect given the asymptotic frequencies ( P j ) j ∈ N , the other due to the ﬂuctuations of the random asymptotic frequencies, and in the limit the ﬂuctuations of tw o parts are conditionally indep enden t given the α -diversit y . Our result strengthens a recent central limit theorem obtained by Bercu and F av aro [6] via a diﬀerent metho d. 1. Intr oduction and main resul t Consider the random partitions induced by a Chinese restaurant process with ( α, θ )- seating, with α ≥ 0 , θ > − α . This family of exchangeable random partitions, referred to as ( α , θ )-partitions in the sequel, is arguably one of the most fundamental mo dels in com- binatorial sto c hastic processes [29]. In recent literature the ( α, θ )-partitions hav e also b een named as the Ewens–Pitman mo del (see [15] and some references therein). When α = 0, the la w of the exchangeable partitions follows the well-kno wn Ew ens sampling formula [11] with parameter θ > 0, and when θ = 1 the induced random p erm utations are the uni- form ones. Earlier developmen ts from the com binatorial and probabilistic asp ects of the random partitions can b e found in [1, 29]. The induced random p ermutations hav e b een studied more recently [3, 5, 18, 20, 32] in the literature of random matrix theory . On the application side, great success of these exchangeable random partitions has b een found in Ba y esian nonparametrics. The asymptotic frequencies ( P j ) j ∈ N of the ( α, θ )-partitions ha ve the Poisson-Diric hlet distribution [16] and show up in the weigh ts of Pitman–Y or pro cess; the Diric hlet pro cess with α = 0 as a sp ecial case w as ﬁrst inv estigated by F erguson [17]. In w ords, the random partitions corresp ond to random clusters in v arious hierarc hical mo dels built up on the Pitman–Y or pro cess. W e refer to [8, 10] and references therein on results on Ba y esian nonparametrics related to the ( α , θ )-partitions. It is worth p oin ting out that the ma jority of the developmen ts hav e b een mostly on α = 0 in the literature. F or some recen t dev elopmen ts on α ∈ (0 , 1), see [10, 15] and references therein. In this pap er, we fo cus on the case α ∈ (0 , 1) , θ > − α . Let K n denote the total num ber of comp onen ts of the partition of { 1 , . . . , n } . Then, it is well-kno wn that (1.1) lim n →∞ K n n α = S α almost surely , 1 2 YIZA O W ANG where S α is kno wn as the α -div ersit y . Let ( P j ) j ∈ N denote the asymptotic frequencies of the tables. Set P := σ (( P j ) j ∈ N ). It is kno wn that S α is P -measurable. In particular, with ( P ↓ j ) j ∈ N denoting the asymptotic frequencies in decreasing order, (1.2) S α ≡ S α,θ := lim j →∞ j ( P ↓ j ) α Γ(1 − α ) , almost surely . The normalization dep ends only α , and hence in the notation usually the parameter θ is omitted (while its law depends ob viously on θ to o). A standard reference for us is Pitman [29, Chapter 3], and detailed deﬁnitions are provided in Section 2. There are t w o central limit theorems in the literature regarding K n . First, w e hav e (1.3) K n − E ( K n | P ) n α/ 2 ⇒ (2 α − 1) 1 / 2 S 1 / 2 α Z as n → ∞ , where P = σ (( P j ) j ∈ N ) and Z is a standard Gaussian random v ariable independent of S α . Throughout, w e let ⇒ denote con v ergence in distribution. This result can b e read already from a result of Karlin in 1967 [26] and Kingman’s represen tation theorem [27], but seems to ha v e only been explicitly stated recen tly in [20, Theorem 3.1] (with a few extensions, taking a j = 1 therein). In fact, the con v ergence in (1.3) was established in the quenc hed sense, and a quenched functional cen tral limit theorem has b een known [13]; all these will b e recalled in (1.8) b elo w. It is w orth p oin ting out that the centering b y E ( K n | P ) is natural in view of Karlin’s result (whic h concerns equiv alen tly an inﬁnite urn scheme to b e explained b elo w), although is not the right centering in view of (1.1). Second, recently Bercu and F a v aro [6] established the following cen tral limit theorem: (1.4) n α/ 2  K n n α − S α  ≡ K n − n α S α n α/ 2 ⇒ S 1 / 2 α Z as n → ∞ , where on the right-hand side, S α is indep enden t from the standard Gaussian random v ariable Z . The proof is completely diﬀeren t from the one for (1.3), and in particular it relies on a martingale cen tral limit theorem for the ‘tail sum’ of inﬁnite series due to Heyde [24]. Note that (1.4) considered the natur al centering in view of (1.1). In view of the t wo previously mentioned results, it is natural to write K n − n α S α n α/ 2 = K n − E ( K n | P ) n α/ 2 + E ( K n | P ) − n α S α n α/ 2 , and then to exp ect a cen tral limit theorem also for the second term on the righ t-hand side ab o v e. The main contribution of this paper is a functional central limit theorem for the join t con v ergence of the t w o terms on the righ t-hand side abov e. Our functional central limit theorem concerns the follow ing pro cesses W n ( t ) := K ⌊ nt ⌋ − E ( K ⌊ nt ⌋ | P ) n α/ 2 , Y n ( t ) := E ( K ⌊ nt ⌋ | P ) − ⌊ nt ⌋ α S α n α/ 2 , t ∈ [0 , 1] , n ∈ N , (1.5) ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 3 and the limit pro cesses inv olv e tw o centered Gaussian pro cesses denoted by Z (1) α , Z (2) α , of whic h the cov ariance functions are as follows: Co v  Z (1) α ( s ) , Z (1) α ( t )  = ( s + t ) α − max( s α , t α ) , Co v  Z (2) α ( s ) , Z (2) α ( t )  = s α + t α − ( s + t ) α , s, t > 0 . Note that if Z (1) α and Z (2) α are indep endent, then (1.6)  Z (1) α ( t ) + Z (2) α ( t )  t ∈ [0 , 1] d = ( B t α ) t ∈ [0 , 1] , where the righ t-hand side is a time-changed Bro wnian motion (the standard Brownian motion B indexed by t α ). The main result of this pap er is the following. Theorem 1.1. With the notations ab ove, (1.7)  ( W n ( t )) t ∈ [0 , 1] , ( Y n ( t )) t ∈ [0 , 1]  ⇒  S 1 / 2 α  Z (1) α ( t )  t ∈ [0 , 1] , S 1 / 2 α  Z (2) α ( t )  t ∈ [0 , 1]  , in D [0 , 1] 2 as n → ∞ , wher e on the right-hand side Z (1) α and Z (2) α ar e two indep endent Gaussian pr o c esses intr o duc e d ab ove and indep endent fr om S α . Moreo v er, we shall establish a quenched version of (1.7) in Theorem 3.1, from whic h Theorem 1.1 follo ws as an immediate consequence. The σ -algebras ( P n b elo w) inv olv ed in Theorem 3.1 would take a little preparation to introduce, and w e skip the details in the in tro duction here. A quenched functional central limit theorem for the ﬁrst comp onen t W n has already b een established in [13, Theorem 2.3] (in combination with Kingman’s representation theorem). Namely , it w as sho wn that (1.8) ( W n ( t )) t ∈ [0 , 1] ≡  K ⌊ nt ⌋ − E ( K ⌊ nt ⌋ | P ) n α/ 2  t ∈ [0 , 1] a.s.w. →  S 1 / 2 α Z (1) α ( t )  t ∈ [0 , 1] in D [0 , 1] with resp ect to P as n → ∞ , where S α is P -measurable and Z (1) α is indep endent from P . Here, the notation a.s.w. → stands for almost sur e we ak c onver genc e [23]; for the sake of simplicity w e also refer to a.s.w. → as a quenche d con v ergence. W e sa y a sequence of random elemen ts ( X n ) n ∈ N con v erges almost surely weakly to X in a certain Polish space M with resp ect to a σ -algebra G , if for all con tinuous and b ounded functions f : M → R w e hav e lim n →∞ E ( f ( X n ) | G ) = E ( f ( X ) | G ) , almost surely . When writing X n a.s.w. → X with resp ect to G , implicitly we assume all ( X n ) n ∈ N and X to b e deﬁned on a probability space of which G is a σ -algebra; this assumption is needed for the conditional exp ectations ab o v e to b e well-deﬁned. This is diﬀerent from the in terpretation of w eak conv ergence, when only the la ws of the random v ariables are concerned and hence the underlying probability space is irrelev an t. In particular, in all our statemen ts regarding quenched conv ergence, all the pre-limit statistics W n ( t ) , Y n ( t ) and S α (in the pre-limit statistic and in the limit pro cess) are deﬁned on a common probability space. 4 YIZA O W ANG W e commen t brieﬂy on the pro of. It is well-kno wn that by Kingman’s represen tation theorem, the studies of exchangeable random partitions with asymptotic frequencies ( P j ) j ∈ N can b e translated into the studies of random partitions induced by an inﬁnite urn scheme (pain tb o x partitions) with sampling frequencies ( P j ) j ∈ N , which decays at a p olynomial rate with tail index − 1 /α . In the language of Ba yesian nonparametrics, the latter algorithm is referred to as the random partitions generated b y i.i.d. sampling from the random frequencies follo wing P oisson–Dirichlet distribution with parameters ( α, θ ) (and the Chinese restaurant pro cess is in v olv ed in practice); a closely related ob ject is the Pitman–Y or process, taking the form of a random probabilit y measure P ∞ j =1 P j δ V j where V j are i.i.d. from a base distribution, indep enden t from ( P j ) j ∈ N , and when α = 0 this is known as the Dirichlet pro cess. The inﬁnite urn sc heme is another fundamental mo del in probability theory with early dev elopmen ts dating bac k to the 1960s [2, 12, 26]; see [22] and references therein for early dev elopmen ts. Some recent developmen ts relev an t to the comp onent W n ( t ) include [9, 13, 25], b y essentially exploiting the (conditional) i.i.d. structure. In w ords, most of the previous analysis exploiting the connection to inﬁnite urn sc heme only needs the assumption that the asymptotic frequencies deca ys polynomially almost sur ely ; that is, P ↓ j ∼ d 0 j − 1 /α as j → ∞ . The Gaussian ﬂuctuation in the limit on the ﬁrst comp onen t W n is due to the sampling pro cedure given the sampling fr e quencies , and the deviation of P ↓ j from d 0 j − 1 /α is not captured in the limit ﬂuctuation (in fact, the deviation is not considered when setting up the question). This part is essen tially due to Karlin, who ﬁrst extensiv ely inv estigated limit theorems for inﬁnite urn sc heme with polynomially decaying sampling frequencies. The mo del with suc h sampling frequencies are referred to as the Karlin mo del in the literature. Here, our analysis essentially examines the ﬂuctuations of P ↓ j around d 0 j − 1 /α . This ﬂuc- tuation leads to the Gaussian ﬂuctuation in the second comp onent Y n ( t ) of the main result. This result is quite diﬀerent from all the aforementioned ones on the Karlin mo del. At the heuristic level it is clear that the t wo ﬂuctuations should b e conditionally indep endent . Our Theorem 3.1 explains this in more details by pro viding a sp eciﬁc c hoice of the σ -algebra P n to condition on. On the other hand, it is remark able that the approach b y Bercu and F a v aro [6] do es not exploit the urn sc heme connection at all, although it is not clear whether it could deal with E ( K n | P ) in order to hav e access to the limit of the decomp osition or the quenc hed con ve rgence. R emark 1.2 . The decomp osition (1.6) app eared already in [13], and it is worth p oin ting out that a corresp onding functional central limit theorem for the decomp osition was established. The model inv estigated therein w as the Karlin mo del (not necessarily the Chinese restauran t pro cess) randomized by Rademacher random v ariables. Sto c hastic-integral represen tations of the pro cesses Z (1) α , Z (2) α can b e found in [19]. R emark 1.3 . Our metho dology can b e further extended to study the total num ber of com- p onen ts in Π n with exactly j elemen ts, denoted by C n,j b elo w. In the language of random p erm utations C n,j is referred to as the j -cycle counts in the literature. It is well-kno wn that lim n →∞ C n,j n α = S α p ( α ) j =: S α α Γ( j − α ) Γ(1 − α )Γ( j + 1) , almost surely , for j ∈ N , ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 5 and ( p ( α ) j ) j ∈ N is the probabilit y mass function of α -Sibuya distribution. Cen tral limit the- orems regarding C n,j ha v e also b een studied in the literature. Again, the choice of the cen tering is delicate, and we could write C n,j − S α p ( α ) j n α n α/ 2 = C n,j − E ( C n,j | P ) n α/ 2 + E ( C n,j | P ) − S α p ( α ) j n α n α/ 2 . The conv ergence of the left-hand side ab ov e w as established in Bercu and F av aro [6], and the quenc hed joint conv ergence of the ﬁrst statistics on the righ t-hand side in Chebunin and Ko v alevskii [9], Karlin [26] (the con v ergence is joint for random v ariables indexed b y j ∈ N , and the join t con v ergence w as in fact established for a functional version with n replaced by ⌊ nt ⌋ ). W e do not pro ceed this extension here. In fact, w e exp ect a more general quenched functional central limit theorem for the following sequence of biv ariate pro cesses 1 n α/ 2 ∞ X j =1 a j  C ⌊ nt ⌋ ,j − E ( C ⌊ nt ⌋ ,j | P )  , 1 n α/ 2 ∞ X j =1 b j  E ( C ⌊ nt ⌋ ,j | P ) − S α p ( α ) j ( ⌊ nt ⌋ ) α  ! t ∈ [0 , 1] for suitable constan ts ( a j ) j ∈ N , ( b j ) j ∈ N . The quenc hed con vergence of the ﬁrst comp onen t on the right-hand side has b een shown in [21], with motiv ations from random p ermutation matrices; therein statistics of in terest often take the form of P ∞ j =1 a j C n,j [20]. The challenge is to ﬁnd a not-to o-restrictiv e condition on ( a j ) j ∈ N and ( b j ) j ∈ N when establishing the tightness. This is left for future research. The pap er is organized as follo ws. In Section 2 w e provide preliminary results on the ( α, θ )-partitions. In Section 3 we state and prov e the stronger quenc hed functional central limit theorem in Theorem 3.1. Ac kno wledgemen ts. Y.W. would like to thank Vishakha for discussions. Y.W. was par- tially supp orted b y Simons F oundation (MP-TSM-00002359). 2. Preliminar y resul ts on Chinese rest a urant process W e ﬁrst recall the Chinese restaurant process. The standard reference is Pitman [29]. The pro cess has t w o parameters ( α , θ ) with α ∈ [0 , 1) and θ > − α and consists of a family of exc hangeable random partitions (Π n ) n ∈ N , eac h of [ n ] = { 1 , . . . , n } , constructed consecutiv ely . The pro cedure go es as follo ws. Set Π 1 = {{ 1 }} . Supp ose Π n = { Π n, 1 , . . . , Π n,k } ((Π n,j ) j =1 ,...,k are disjoin t non-empty subsets of [ n ] and S k j =1 Π n,j = [ n ]; in this case Π n is said to hav e k comp onen ts). Then, the partition Π n +1 is obtained by (i) adding element n + 1 to an existing blo c k j (i.e., setting Π n +1 ,j := Π n,j ∪ { n + 1 } ) with probabilit y ( | Π n,j | − α ) / ( n + θ ); (ii) creating a new blo ck with a single element n + 1 (i.e., setting Π n +1 ,k +1 := { n + 1 } ) with probabilit y ( k α + θ ) / ( n + θ ); (iii) all other existing blo c ks remain unchanged (i.e., setting Π n +1 ,j = Π n,j , for all j = 1 , . . . , k that has not b een in volv ed). 6 YIZA O W ANG The statistic of our in terest is the total num ber of comp onen ts of the partition Π n , denoted b y K n throughout. T o state our main result, the asymptotic frequencies are inv olv ed. It is w ell-kno wn that P j ≡ P ( α,θ ) j := lim n →∞ | Π n,j | n exists almost surely , for ev ery j ∈ N . The law of ( P j ) j ∈ N is kno wn as the Griﬀths–Engen–McCloskey (GEM) distribution with parameters ( α, θ ). F or a recen t generalization, see [4]. The la w of decreas- ingly ordered sequence ( P ↓ j ) j ∈ N is known as the P oisson–Dirichlet distribution with parameter ( α, θ ), denoted by P α,θ b elo w. Recall the deﬁnition of S α in (1.2). It is well-kno wn that P ↓ j ∼ d 0 j − 1 /α with d 0 = ( S α / Γ(1 − α )) 1 /α . This w as ﬁrst established via an intrinsic relation to the jumps of an α -stable sub ordinator. W e need to exploit this fact further to hav e an estimate on the deviation of P ↓ j from d 0 j − 1 /α . The following can b e read from [29]. Lemma 2.1. L et ( P ↓ j ) j ∈ N fol low the Poisson–Dirichlet distribution with p ar ameter ( α , θ ) , and S α b e as in (1.2) . Set (2.1) Γ j := S α Γ(1 − α )  P ↓ j  − α , j ∈ N . (i) When θ = 0 , the se quenc e (Γ j ) j ∈ N has the law of c onse cutive arrival times of a standar d Poisson r andom variables. (ii) Mor e gener al ly for al l θ > − α , the law of P α,θ is absolutely c ontinuous with r esp e ct to P α, 0 . As a c onse quenc e, al l the almost sur e statements r e gar ding P α, 0 r emain to hold for P α,θ . In p articular, with (1.2) and (2.1) for al l α ∈ (0 , 1) , θ > − α , as j → ∞ P ↓ j =  S α Γ(1 − α )  1 /α Γ − 1 /α j ∼  S α Γ(1 − α )  1 /α j − 1 /α , almost sur ely. Pr o of. W e ﬁrst pro ve part (i). Assume θ = 0. Let ( e Γ j ) j ∈ N denote the consecutive sequence of arriv al times of a standard P oisson pro cess. Recall that one can express the law of the ordered asymptotic frequencies ( P ↓ j ) j ∈ N in terms of jumps from an α -stable sub ordinator:  P ↓ j  j ∈ N d = e Γ − 1 /α j P ∞ k =1 e Γ − 1 /α k ! j ∈ N . See [28, 30]. The ab o ve then implies that   P ↓ j  j ∈ N , lim j →∞ j ( P ↓ j ) α Γ(1 − α )  d =   e Γ − 1 /α j P ∞ k =1 e Γ − 1 /α k ! j ∈ N , Γ(1 − α ) ∞ X k =1 e Γ − 1 /α k ! − α   , since on b oth sides the second random v ariable is the same deterministic transform of the ﬁrst random sequence. It follows that  S α Γ(1 − α )  − 1 /α P ↓ j ! j ∈ N d =  e Γ − 1 /α j  j ∈ N , ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 7 as claimed. The part (ii) is w ell-known [29, Chapter 3.3], for which w e brieﬂy recall the density formula. F or general θ > − α . Supp ose ( P ↓ j ) j ∈ N has la w P α, 0 , and let P denote the probability measure on the probability space. Then, on the same probabilit y space but consider the measure Q α,β determined by the follo wing c hange of measure (2.2) d Q α,θ d P ( ω ) = Γ( θ + 1) Γ( θ /α + 1) S θ/α α . It is known that Q α,θ is a probability measure and ( P ↓ j ) j ∈ N under Q α,θ has the law P α,θ . □ 3. A quenched functional central limit theorem W e shall prov e a stronger quenc hed functional central limit theorem for ( W n , Y n ) n ∈ N with W n = ( W n ( t )) t ∈ [0 , 1] and Y n = ( Y n ( t )) t ∈ [0 , 1] , from which Theorem 1.1 follo ws immediately . F or this purpose, we shall rely on the represen tation of asymptotic frequencies ( P ↓ j ) j ∈ N dev elop ed in Lemma 2.1. Set random v ariables Γ j := S α Γ(1 − α )  P ↓ j  − α , j ∈ N , and (3.1) N ( t ) := max { n ∈ N : Γ n ≤ t } , t ≥ 0 , and max ∅ = 0 by conv en tion. Set, for a sequence of decreasing n umbers ( ϵ n ) n ∈ N , ϵ n ↓ 0 as n → ∞ , P n := σ ( N ( t ) : t ∈ [0 , ϵ n n α ]) . Clearly , P n ⊂ P := σ (( P ↓ j ) j ∈ N ). Lemma 2.1 tells that when θ = 0, ( N ( t )) t ≥ 0 in (3.1) is a standard Poisson pro cess with (Γ j ) j ∈ N its consecutive arriv al times, while for other θ > − α this representation no longer holds but up to a change of measure. W e shall then prov e the following quenched version of Theorem 1.1. Theorem 3.1. L et W n and Y n b e as in The or em 1.1, and assume that (3.2) ϵ n = o ((log log n ) − 1 / 2 ) . We have L  ( W n ( t )) t ∈ [0 , 1] , ( Y n ( t )) t ∈ [0 , 1]    P n  a.s.w. → L  S 1 / 2 α ( Z (1) α ( t )) t ∈ [0 , 1] , ( Z (2) α ( t )) t ∈ [0 , 1]    P  in D [0 , 1] 2 as n → ∞ , wher e Z (1) α , Z (2) α ar e indep endent fr om P . Here, for almost sure weak conv ergence our reference is Gr ¨ ub el and Kabluchk o [23]. F or random elemen ts ( X n ) n ∈ N and X in a complete and separable metric space M , w e write L ( X n | P n ) a.s.w. → L ( X | P ) as n → ∞ , if lim n →∞ E ( f ( X n ) | P n ) = E ( f ( X ) | P ) almost surely for all contin uous and b ounded function f : M → R . Again, in order to deﬁne the conditional exp ectations we assume implicitly that ( X n ) n ∈ N , X are deﬁned on a common probabilit y space of whic h ( P n ) n ∈ N , P are σ -algebras. When P n ≡ P , w e simply write X n a.s.w. → X with resp ect to P as n → ∞ as in (1.8). 8 YIZA O W ANG In particular, throughout we assume S α is deﬁned in (1.2) and is P -measurable. W e do not rep eat this in the statements of quenched limit theorems in the sequel. W e hav e already established a quenc hed conv ergence of the pro cess W n (1.8) [13]. It turned out that to prov e Theorem 3.1 it suﬃces to establish the following quenched con v ergence for Y n and (1.8). Prop osition 3.2. F or Y n in (1.5) with α ∈ (0 , 1) , θ > − α , under (3.2) , L  ( Y n ( t )) t ∈ [0 , 1]    P n  a.s.w. → L  S 1 / 2 α  Z (2) α ( t )  t ∈ [0 , 1]    P  , as n → ∞ in D [0 , 1] , wher e Z (2) α is indep endent fr om P . Once the ab ov e prop osition is established, the pro of of Theorem 3.1 is relatively simple, and is pro vided in Section 3.3 at the end. A similar idea has already b een applied in [13]. Moreo v er, for each α ∈ (0 , 1) ﬁxed, the ab o v e results for θ > − α, θ  = 0 follows relativ ely easily from the case θ = 0, as explained b elo w. (It is key to ha ve the quenche d conv ergence of Y n with θ = 0; if only the anne ale d conv ergence is established then the argument b elow do es not w ork.) Pr o of of Pr op osition 3.2 with θ > − α, θ  = 0 . Assume that Pr op osition 3.2 has b e en pr ove d with θ = 0 . Recall that the la w of ( α, θ )-partitions can b e deriv ed from the law of ( α, 0)- partitions via a change of measure as discussed around (2.2). Then, to pro v e the claimed result it is equiv alent to show that, alw ays assuming b elow that Y n is based on ( α, 0)- partitions, (3.3) lim n →∞ E ( Q α,θ f ( Y n ) | P n ) = E ( Q α,θ f ( Y ) | P ) where f is a contin uous and b ounded function and Q α,θ := Γ( θ + 1) Γ( θ /α + 1) S θ/α α = Γ( θ + 1) Γ( θ /α + 1) Γ(1 − α ) ∞ X j =1 Γ − 1 /α j ! − α ! θ/α . W rite Q = Q α,θ from now on, and set Q n := Γ( θ + 1)Γ(1 − α ) Γ( θ /α + 1)   Γ(1 − α )   X j :Γ j ≤ ϵ n n α Γ − 1 /α j   − α   θ/α . Notice that Q n is P n -measurable. Thus, to show (3.3) it suﬃces to remark E ( Qf ( Y n ) | P n ) − E ( Qf ( Y ) | P ) = E ( Qf ( Y n ) | P n ) − E ( Q n f ( Y n ) | P n ) + Q n E ( f ( Y n ) | P n ) − Q n E ( f ( Y ) | P ) + Q n E ( f ( Y ) | P ) − Q E ( f ( Y ) | P ) , and each of three diﬀerences on the right-hand side con verges to zero almost surely . Indeed, the ﬁrst and the third conv erge to zero b ecause of the fact that Q n ↓ Q as n → ∞ almost surely . The second diﬀerence con verges to zero thanks to the con vergence of Prop osition 3.2 with θ = 0. □ ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 9 The ma jorit y of the rest of this section is devoted to the pro of of Prop osition 3.2 with θ = 0. F rom now on, we consider equiv alen tly the total num ber of non-empt y urns in a Karlin mo del with random frequencies ( P j ) j ∈ N (i.e. the nonparametric Bay esian view). That is, giv en P , we sample conditionally i.i.d. random v ariables ( X j ) j ∈ N with P ( X j = ℓ | P ) = P ↓ ℓ , ℓ ∈ N , and set K n,ℓ := n X j =1 1 { X j = ℓ } , ℓ ∈ N and K n := ∞ X ℓ =1 1 { K n,ℓ > 0 } . The K n deﬁned this w a y has the same law as the one of the total count of ( α, θ )-partitions. F or computation conv enience, the idea of P oissonization is to consider the following appro x- imation. Giv en P , let (Λ ℓ ( t )) t ≥ 0 , ℓ ∈ N b e conditionally indep enden t P oisson pro cess each with parameter P ↓ ℓ resp ectiv ely . The approximation of K n is e K ( t ) := ∞ X ℓ =1 1 { Λ ℓ ( t ) > 0 } , t ≥ 0 . The adv antage of working with e K ( t ) is that giv en P it is a summation of conditionally indep enden t random v ariables. It is obvious that limit theorem concerning e K ( t ) is m uch easier thanks to indep endence than those concerning K n , and this fact has b een exploited substan tially in the analysis of W n [13, 26]. In the analysis of Y n ho w ever we do not exploit this indep endence, but instead a con tinuous mapping argument on functionals of a Poisson pro cess. The obstacle here is that the inv olv ed Poisson pro cess takes the form of ( N ( D t )) t ≥ 0 where the random v ariable D dep ends on the Poisson pro cess itself. Most of the eﬀort is dev oted to decouple of the dep endence b etw een the tw o as t → ∞ (Lemma 3.4 b elo w). As Then we need to control the appro ximation of e K ( n ) and K n . This step is kno wn as the de-P oissonization: often a functional cen tral limit theorem for e K is established, and the tigh tness for the functional conv ergence migh t b e quite challenging to prov e. 3.1. Pro of for the P oissonized mo del. Set ν ( t ) := max { j : P ↓ j ≥ 1 /t } . Then, E  e K ( n )    P  = Z ∞ 0 (1 − e − n/x ) ν (d x ) = Z ∞ 0 n x 2 e − n/x ν ( x )d x = Z ∞ 0 e − x ν ( n/x )d x. Recall θ = 0 and the represen tation of P ↓ j in Lemma 2.1. F or notational con v enience, in tro duce D := S α Γ(1 − α ) = ∞ X j =1 Γ − 1 /α j ! − α . In particular, P ↓ j = D 1 /α Γ − 1 /α j , and hence ν ( t ) = max  j ∈ N : P ↓ j ≥ 1 t  = N ( D t α ) . 10 YIZA O W ANG Notice also that Z ∞ 0 e − x  t x  α d x = t α Γ(1 − α ) . The statistic of interest is now e Y n ( t ) := E ( e K ( nt ) | P ) − Γ(1 − α ) D ( nt ) α n α/ 2 = Z ∞ 0 e − x N ( D ( nt/x ) α ) − D ( nt/x ) α n α/ 2 d x. Note that the integrand ab ov e is in tegrable at b oth 0 and ∞ almost surely . The diﬃculty of the analysis is that the Poisson pro cess N induced by (Γ n ) n ∈ N and D are dep enden t. W e start with a heuristic calculation that identiﬁes the limit pro cess. By the functional central limit theorem of a Poisson pro cess, we know that  N ( nt ) − nt √ n  t ∈ [0 , ∞ ) ⇒ ( B t ) t ∈ [0 , ∞ ) in D [0 , ∞ ) where B is a standard Brownian motion. Thus, assuming D is indep enden t from N ( t ) (which is not true here so w e are using a little abuse of notation), one exp e cts ( e Y n ( t )) t ∈ [0 , 1] ⇒  Z ∞ 0 e − x B D ( t/x ) α d x  t ∈ [0 , 1] d = D 1 / 2  Z ∞ 0 e − x B ( t/x ) α d x  t ∈ [0 , 1] =  S α Γ(1 − α )  1 / 2  Z ∞ 0 (1 − e − t/x 1 /α )d B x  t ∈ [0 , 1] d = S 1 / 2 α  Z (2) α ( t )  t ∈ [0 , 1] , where Z α is a centered Gaussian pro cess with Co v( Z (2) α ( s ) , Z (2) α ( t )) = s α + t α − ( s + t ) α , s, t > 0 , as in tro duced earlier. The ﬁrst equalit y in distribution follo ws from self-similarit y of Brow- nian motion and indep endence b et ween D and B . The equality step follo ws from sto c hastic in tegration b y part. The last step follo ws from Itˆ o’s isometry: w e ha ve Z ∞ 0 (1 − e − s/x α )(1 − e − t/x α )d x = Z ∞ 0 (1 − e − sy )(1 − e − ty ) αy − 1 − α d y = Z ∞ 0  1 − e − sy − e − ty + e − ( s + t ) y  αy − α − 1 d y = Γ(1 − α ) ( s α + t α − ( s + t ) α ) . T o make the ab ov e argumen t rigorous, the key step is the following. Prop osition 3.3. With ( ϵ n ) n ∈ N satisfying (3.2) , α ∈ (0 , 1) , and θ = 0 , L   e Y n ( t )  t ∈ [0 , 1]     P n  a.s.w. → L   S 1 / 2 α Z (2) α ( t )  t ∈ [0 , 1]    P  as n → ∞ in D [0 , 1] , wher e Z (2) α is indep endent fr om P . The ﬁrst step is to prov e the following. ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 11 Lemma 3.4. With ( ϵ n ) n ∈ N satisfying (3.2) , α ∈ (0 , 1) , and θ = 0 , L  N ( D ( nt ) α )) − D ( nt ) α n α/ 2  t ∈ [0 , ∞ )      P n ! a.s.w. → L  D 1 / 2 ( B t α ) t ∈ [0 , ∞ )   P  , in D [0 , ∞ ) , wher e on the right-hand side D = Γ(1 − α ) − 1 S α and the Br ownian motion B is indep endent fr om P . Pr o of. It suﬃces to prov e the con vergence in the functional space D [0 , K ] for any K ﬁxed [7], and for the sak e of simplicity we consider K = 1. In tro duce (3.4) D n :=   X j :Γ j ≤ ϵ n n α Γ − 1 /α j   − α . So D n ↓ D almost surely . Note also that D n is P n -measurable. T o simplify the notation, write N n ( t ) := N ( n α t ) − n α t n α/ 2 . So e Y n ( t ) = Z ∞ 0 e − x N n ( D ( t/x ) α )d x. W e start b y writing N n ( D t α ) = N n ( D n t α + ϵ n ) − N n ( ϵ n ) + N n ( D t α ) − N n ( D n t α ) + N ( D n t α ) − N ( D n t α + ϵ n ) + N n ( ϵ n ) =: N n ( D n t α + ϵ n ) − N n ( ϵ n ) + R n ( t α ) . (3.5) The key observ ation for the ab ov e decomp osition is that for n large enough, D n < ϵ n n α , and hence all the arriv al times after ϵ n n α are indep endent from P n and hence D n . Set  b N n ( t )  t ≥ 0 :=  N n ( ϵ n + t ) − N n ( ϵ n )  t ≥ 0 . This is a standard P oisson pro cess indep enden t from P n . So the decomp osition (3.5) becomes N n ( D t α ) = b N n ( D n t α ) + R n ( t α ) , t ≥ 0 . Because of the indep endence b etw een b N n and D n , it follows immediately that L   b N n ( D n t α )  t ∈ [0 , 1]     P n  a.s.w. → L ( D 1 / 2 ( B t α ) t ∈ [0 , 1] | P ) , as n → ∞ . Therefore, to conclude the pro of it remains to pro v e the follo wing: (3.6) lim sup n →∞ P sup t ∈ [0 , 1] | R n ( t α ) | > η      P n ! = 0 , for all η > 0 . 12 YIZA O W ANG Set ∆ n := sup u ∈ [0 ,ϵ n ] | N n ( u ) | , b N ∗ n ( t ) := sup u ∈ [0 ,t ] | b N n ( u ) | , ω b N n ( t, δ ) := sup u,v ∈ [0 ,t ] , | u − v |≤ δ | b N n ( u ) − b N n ( v ) | . Note that ∆ n is P n measurable, and b oth b N ∗ n ( t ) and ω b N n ( t, δ ) are indep enden t from P n . Recall the la w of iterated logarithm for a standard Poisson pro cess lim sup n →∞ | N ( n ) − n | / √ 2 n log log n = 1. It then follo ws that lim n →∞ ∆ n = 0 almost surely; this is the step we need the assumption ϵ n = o ((log log n ) − 1 / 2 ). W e also ha v e b N ∗ n ( t ) ⇒ sup s ≥ t | B s | b y con tin uous mapping theorem. Note that P ( ω b N n ( t, δ ) > η | P n ) = P ( ω N n ( t, δ ) > η ) (with ω N n deﬁned similarly as the mo dulo of contin uit y of the pro cess N n ) and (3.7) lim δ ↓ 0 lim sup n →∞ P  ω N n ( t, δ ) > η  = 0 , almost surely for all η > 0 . W e did not ﬁnd an exact reference for the ab ov e, but it is essentially from the Donsk er’s theorem. In the pro of of Donsker’s theorem, it is shown that the ab ov e holds with the suprem um replaced by max r,s = i/n,i =0 ,..., ⌊ tn ⌋ +1 , | r − s |≤ δ . Then, to sho w (3.7) it suﬃces to notice sup r,s ∈ [0 ,t ] | r − s |≤ δ | N ( nr ) − N ( ns ) | ≤ max r,s = i/n,i =0 ,..., ⌊ tn ⌋ +1 | r − s |≤ δ | N ( nr ) − N ( ns ) | + 2 . T o sho w (3.6), the key estimate is to sho w that for every δ > 0 the follo wing holds for n large enough, (3.8) sup t ∈ [0 , 1] | R n ( t α ) | ≤ 5  ∆ n + ω b N n (Γ 1 , δ ) + b N ∗ n ( δ )  . Then, P sup t ∈ [0 , 1] | R n ( t α ) > η |      P n ! ≤ P  ∆ n > η 15    P n  + P  ω b N n (Γ 1 , δ ) > η 15    P n  + P  b N ∗ n ( δ ) > η 15    P n  . By the discussions ab ov e, the ﬁrst conditional probability is nothing but 1 { ∆ n >η / 15 } whic h go es to zero almost surely , and the second and third, denoted by sa y b p n ( δ, η ), b oth satisfy lim δ ↓ 0 lim sup n →∞ b p n ( δ, η ) = 0 almost surely for all η > 0. Hence (3.8) follows. ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 13 The rest of the pro of is dev oted to the pro of of (3.8). First, for all δ > 0, for n large enough so that D n < Γ 1 , D n − D < δ, ϵ n < Γ 1 (whic h w e assume in the sequel), sup t ∈ [0 , 1] | N n ( D t α ) − N n ( D n t α ) | ≤ sup u,v ∈ [0 , Γ 1 ] , | u − v |≤ D n − D | N n ( u ) − N n ( v ) | ≤ sup [ u,v ] ∈ [0 ,ϵ n + δ ] | N n ( u ) − N n ( v ) | ∨ sup u,v ∈ [ ϵ n , Γ 1 ] | u − v |≤ δ | N n ( u ) − N n ( v ) | ≤ 2 sup u ∈ [0 ,ϵ n + δ ] | N n ( u ) | ∨ sup u,v ∈ [ ϵ n , Γ 1 ] , | u − v |≤ δ | N n ( u ) − N n ( v ) | . Notice that sup u,v ∈ [ ϵ n , Γ 1 ] , | u − v |≤ δ | N n ( u ) − N n ( v ) | = sup u,v ∈ [0 , Γ 1 − ϵ n ] , | u − v |≤ δ | b N n ( u ) − b N n ( v ) | ≤ ω b N n (Γ 1 , δ ) , and (3.9) sup u ∈ [0 ,ϵ n + δ ] | N n ( u ) | ≤ sup u ∈ [0 ,ϵ n ] | N n ( u ) | ∨ sup u ∈ [0 ,δ ]  | b N n ( u ) | + | N n ( ϵ n ) |  ≤ ∆ n + b N ∗ n ( δ ) . Therefore, (3.10) sup t ∈ [0 , 1] | N n ( D t α ) − N ( D n t α ) | ≤ 2∆ n + 2 b N ∗ n ( δ ) + ω b N n (Γ 1 , δ ) . Next, sup t ∈ [0 , 1] | N n ( D n t α ) − N n ( D n t α + ϵ n ) | ≤ sup u,v ∈ [0 , Γ 1 + ϵ n ] | u − v | = ϵ n | N n ( u ) − N n ( v ) | ≤ sup u,v ∈ [0 , 2 ϵ n ] | u − v | = ϵ n | N n ( u ) − N n ( v ) | ∨ sup u,v ∈ [ ϵ n , Γ 1 − ϵ n ] | u − v | = ϵ n | N n ( u ) − N n ( v ) | ≤ 2 sup u ∈ [0 , 2 ϵ n ] | N n ( u ) | + ω b N n (Γ 1 , ϵ n ) ≤ 2∆ n + 2 b N ∗ n ( ϵ n ) + ω b N n (Γ 1 , ϵ n ) . (3.11) W e hav e used (3.9) in the last inequalit y again. Combining (3.9), (3.10), (3.11), and the deﬁnition of R n ( t α ) in (3.5), we hav e thus pro ved (3.8). □ Pr o of of Pr op osition 3.3. W e ﬁrst prov e the con vergence of ﬁnite-dimensional distribution ( e Y n ( t )) t ∈ [0 , 1] ; that is, for all k ∈ N , t 1 , . . . , t k ∈ [0 , 1], (3.12)  e Y n ( t j )  j =1 ,...,k ⇒  D Z ∞ 0 e − x B ( t j /x ) α d x  j =1 ,...,k , as n → ∞ . W e contin ue to use the notations introduced in Lemma 3.4. W e decomp ose e Y n ( t ) in to, for each ϵ ∈ (0 , 1) (the c hoice is indep endent from ( ϵ n ) n ∈ N ), (3.13) e Y n ( t ) = e Y (1) n,ϵ ( t ) + e Y (2) n,ϵ ( t ) , with e Y (1) n,ϵ ( t ) := Z ϵ 0 e − x N n ( D ( t/x ) α ) d x, and e Y (2) n,ϵ ( t ) := Z ∞ ϵ e − x N n ( D ( t/x ) α ) d x. 14 YIZA O W ANG By Lemma 3.4, it follo ws that L   e Y (2) n,ϵ ( t j )  j =1 ,...,k     P n  a.s.w. → L D  Z ∞ ϵ e − x B ( t j /x ) α d x  j =1 ,...,k      P ! , b y con tin uous mapping theorem. It is clear that the righ t-hand side con v erges to the claimed limit as ϵ ↓ 0. Then, the claimed conv ergence follo ws b y a triangular array argumen t (e.g. [7, Theorem 3.2]) if we could prov e lim ϵ ↓ 0 lim sup n →∞ P  | e Y (1) n,ϵ ( t ) | ≥ η    P n  = 0 , almost surely , for all η > 0 . W e con tinue to rely on the decomp osition inv olving ϵ n and estimates on ∆ n , b N n in the pro of of Lemma 3.4. Indeed, by (3.9) (as in the pro of therein in the sequel inequalities hold for n large enough), sup s ∈ [0 ,t ] | N n ( D s α ) | ≤ sup u ∈ [0 , Γ 1 t ] | N n ( u ) | ≤ ∆ n + b N ∗ n (Γ 1 t ) . Th us,     Z ϵ 0 e − x N n ( D ( t/x ) α )d x     ≤ ϵ ∆ n + Z ϵ 0 e − x b N ∗ n (Γ 1 ( t/x ) α )d x. Therefore, P  | e Y (1) n,ϵ | > η    P n  ≤ P  ϵ ∆ n + Z ϵ 0 e − x b N ∗ n (Γ 1 ( t/x ) α )d x > η     P n  . Since ∆ n → 0 almost surely , it suﬃces to control, writing E P n ( · ) = E ( · | P n ), P  Z ϵ 0 e − x b N ∗ n (Γ 1 ( t/x ) α )d x > η     P n  ≤ 1 η 2 E  Z ϵ 0 e − x b N ∗ n (Γ 1 ( t/x ) α )d x  2      P n ! ≤ ϵ η 2 Z ϵ 0 E  b N ∗ n (Γ 1 ( t/x ) α ) 2    P n  d x. Recall the deﬁnition of b N ∗ n . By Do ob’s martingale inequality E b N ∗ n ( t ) 2 ≤ 4 E b N n ( t ) 2 = 4 E  N ( tn α ) − tn α n α/ 2  2 = 4 t, whence E ( b N ∗ n (Γ 1 ( t/x ) α ) 2 | P n ) ≤ 4Γ 1 ( t/x ) α almost surely . That is, for all η > 0, (3.14) P  Z ϵ 0 e − x b N ∗ n (Γ 1 ( t/x ) α )d x > η     P n  ≤ ϵ 2 − α 4Γ 1 t α η 2 (1 − α ) . W e hav e th us pro ved (3.12). ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 15 It no w remains to show the tightness for ( e Y n ( t )) t ∈ [0 , 1] . Recall the decomp osition of e Y n ( t ) in (3.13). The tightness will then follow from the following assertions lim δ ↓ 0 lim sup n →∞ P sup s,t ∈ [0 , 1] , | s − t |≤ δ | e Y (1) n,ϵ ( t ) − e Y (1) n,ϵ ( s ) | > η      P n ! ≤ C Γ 1 ϵ 2 − α , (3.15) lim δ ↓ 0 lim sup n →∞ P sup s,t ∈ [0 , 1] , | s − t |≤ δ | e Y (2) n,ϵ ( t ) − e Y (2) n,ϵ ( s ) | > η      P n ! = 0 , (3.16) for all ϵ > 0 , η > 0. This time, sup s,t ∈ [0 , 1] | s − t |≤ δ | e Y (1) n,ϵ ( t ) − e Y (2) n,ϵ ( t ) | ≤ sup s,t ∈ [0 , 1] | s − t |≤ δ Z ϵ 0 e − x | N n ( D ( t/x ) α ) − N n ( D ( s/x ) α ) | d x ≤ 2 Z ϵ 0 e − x sup t ∈ [0 , 1] | N n ( D ( t/x ) α ) | d x ≤ 2 ϵ ∆ n + 2 Z ϵ 0 b N ∗ n (Γ 1 ( t/x ) α )d x, whic h yield (3.15) (by a similar argument around (3.14)). Next, sup x ≥ ϵ sup s,t ∈ [0 , 1] | s − t |≤ δ | N n ( D ( t/x ) α ) − N n ( D ( s/x ) α ) | ≤ sup x ≥ ϵ sup u,v ∈ [0 , Γ 1 x − α ] | u − v |≤ Γ 1 ( δ /x ) α | N n ( u ) − N n ( v ) | ≤ sup u,v ∈ [0 ,ϵ n +Γ 1 ( δ /ϵ ) α ] | u − v |≤ Γ 1 ( δ /ϵ ) α | N n ( u ) − N n ( v ) | ∨ sup u,v ∈ [ ϵ n , Γ 1 ϵ − α ] | u − v |≤ Γ 1 ( δ /ϵ ) α | N n ( u ) − N n ( v ) | ≤ 2∆ n + 2 b N ∗ n (Γ 1 ( δ /ϵ ) α ) + ω b N n (Γ 1 ϵ − α , Γ 1 ( δ /ϵ ) α ) . That is, sup s,t ∈ [0 , 1] | s − t |≤ δ Z ∞ ϵ e − x | N n ( D ( t/x ) α ) − N n ( D ( s/x ) α ) | d x ≤ 2∆ n + 2 b N ∗ n (Γ 1 ( δ /ϵ ) α ) + ω b N n (Γ 1 ϵ − α , Γ 1 ( δ /ϵ ) α ) . This time, P  2 b N ∗ n (Γ 1 ( δ /ϵ ) α ) > η    P n  ≤ 8 E ( b N n (Γ 1 ( δ /ϵ ) α ) 2 | P n ) η 2 = 8 η 2 Γ 1 ( δ /ϵ ) α . Therefore taking limsup as n → ∞ ﬁrst and then δ ↓ 0 the ab o v e probability is zero for all ϵ, η > 0 ﬁxed. Also, lim δ ↓ 0 lim sup n →∞  ω b N n (Γ 1 ϵ − α , Γ( δ /ϵ ) α ) > η   P n  = 0 , almost surely , b y standard estimates of mo dulo of contin uit y of Poisson pro cess. No w (3.16) follows. □ 16 YIZA O W ANG 3.2. De-P oissonization. In this section, w e transfer this result from the P oissonized model ( e Y n ( t )) t ∈ [0 , 1] to the original mo del ( Y n ( t )) t ∈ [0 , 1] . This pro cedure is often referred to as the de-P oissonization. Pr o of of Pr op osition 3.2 with θ = 0 . Recall that we hav e prov ed the follo wing conv ergence in Prop osition 3.3 L   e Y n ( t )  t ∈ [0 , 1]     P n  a.s.w. → L  S 1 / 2 α ( Z α ( t )) t ∈ [0 , 1]    P  in D [0 , 1]. Let us denote ( Y ( t )) t ∈ [0 , 1] := S 1 / 2 α ( Z α ( t )) t ∈ [0 , 1] . Our goal is to show that L  ( Y n ( t )) t ∈ [0 , 1]    P n  a.s.w. → L  ( Y ( t )) t ∈ [0 , 1]    P  . Let us recall e Y n ( t ) = E ( e K ( nt ) | P ) − Γ(1 − α ) D ( nt ) α n α/ 2 , Y n ( t ) = E ( K ⌊ nt ⌋ | P ) − Γ(1 − α ) D ⌊ nt ⌋ α n α/ 2 . (3.17) W e start b y deﬁning the random time c hange λ n ( t ) := Γ ⌊ nt ⌋ n , with Γ 0 := 0, and set λ ∗ n ( t ) := min( λ n ( t ) , 1) . It is well-kno wn that the t w o mo dels can b e coupled suc h that e K ( nλ n ( t )) = K ⌊ nt ⌋ almost surely , and in particular (3.18) e Y n ( nλ n ( t )) = E ( K ⌊ nt ⌋ | P ) − Γ(1 − α ) D ( nλ n ( t )) α n α/ 2 . W e ha v e already established the quenched con vergence of e Y n ( t ). A t the same time, since lim n →∞ sup t ∈ [0 , 1] | λ ∗ n ( t ) − t | = 0 almost surely , b y the dominated con v ergence theorem for conditional exp ectations [14, Theorem 5.5.9] (if lim n →∞ A n = 0 and sup n ∈ N | A n | ≤ C < ∞ almost surely , then lim n →∞ E ( A n | P n ) = 0 almost surely), lim n →∞ P (sup t ∈ [0 , 1] | λ ∗ n ( t ) − t | > ϵ | P n ) = 0 almost surely . That is, λ ∗ n a.s.w. → I where I denotes the identit y function. Essentially b y the same pro of as Slutsky’s lemma we then hav e  L  e Y n ( t )  t ∈ [0 , 1] , ( λ ∗ n ( t )) t ∈ [0 , 1]     P n  a.s.w. → L  ( Y ( t )) t ∈ [0 , 1] , I    P  . By the change-of-time lemma [7] L  e Y n ( nλ ∗ n ( t ))    P n  t ∈ [0 , 1] a.s.w. → L ( Y ( t ) | P ) t ∈ [0 , 1] . (The statemen t of the lemma concerns weak con v ergence only , although what is prov ed is that the map corresp onding to ( e Y n , λ ∗ n ) 7→ ( e Y n ( nλ n ( t ))) t ∈ [0 , 1] is con tinuous when restricted to ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 17 the supp ort of the measure induced by ( Y , I ).) One can go further as in [13, Pro of of Lemma 4.3] to conclude that L  e Y n ( nλ n ( t ))    P n  t ∈ [0 , 1] a.s.w. → L ( Y ( t ) | P ) t ∈ [0 , 1] . The only step that needs to b e modiﬁed is that no w w e need to sho w for all η ∈ (0 , 1), P ( λ ∗ n  = λ n on [0 , 1 − η ] | P n ) ≤ P (Γ ⌊ n (1 − η ) ⌋ > n | P n ) → 0 almost surely . This essentially follo ws from the large deviation of the P oisson pro cess and the fact the impact of the conditioning is negligible; we omit the details. No w, in order to conclude the pro of, we need to show the diﬀerence b et w een e Y n ( nλ n ( t )) and Y n ( t ) is negligible. The diﬀerence is only due to the centering terms in (3.17) and (3.18). That is, it remains to show for all ϵ > 0, (3.19) lim n →∞ P  b d n > ϵ    P n  = 0 in probability , with b d n := sup t ∈ [0 , 1] | ( nλ n ( t )) α − ( nt ) α | n α/ 2 . W e shall in fact show that lim n →∞ b d n = 0 almost surely , whic h implies the ab ov e b y the dominated conv ergence theorem for conditional exp ectations (actually , in the almost sure sense). T o analyze b d n , we shall break the in terv al of [0 , 1] into [0 , n − β ] and [ n − β , 1] for some β to b e determined later. First, by the strong law of large num bers, for any β > 1 / 2 (3.20) n α/ 2 sup t ∈ [0 ,n − β ]      Γ ⌊ nt ⌋ n  α − t α     ≤ (Γ ⌊ n 1 − β ⌋ ) α n α/ 2 + n α/ 2 − αβ → 0 as n → ∞ almost surely . W e fo cus now on sho wing sup t ∈ [ n − β , 1]      Γ ⌊ nt ⌋ n  α − t α     = o ( n − α/ 2 ) almost surely . Let g ( t ) = t α . By T aylor expansion,  Γ ⌊ nt ⌋ n  α − t α =  Γ ⌊ nt ⌋ n − t  g ′ ( ω n,t ) , where ω n,t lies b etw een t and Γ ⌊ nt ⌋ /n . Recall the la w of iterated logarithms: lim sup n →∞ | Γ n − n | √ 2 n log log n = 1 almost surely . By the la w of large num bers, there exists strictly p ositiv e constants c 1 , c 2 suc h that c 1 < inf t ∈ [ n − β , 1] ω n,t /t ≤ sup t ∈ [ n − β , 1] ω n,t /t < c 2 for n large enough. So, we obtain sup t ∈ [ n − β , 1]      Γ ⌊ nt ⌋ n  α − t α     = 1 n sup t ∈ [ n − β , 1]   Γ ⌊ nt ⌋−⌊ nt ⌋ + ⌊ nt ⌋− nt   t α − 1 ≤ C n − 1 / 2 (log log n ) 1 / 2 sup t ∈ [ n − β , 1] t α − 1 / 2 . If α ≥ 1 / 2 then the ab ov e is of order o ( n − α/ 2 ) almost surely as desired. So assume α ∈ (0 , 1 / 2). Then, the righ t-hand side ab ov e is of order O ( n − 1 / 2 (log log n ) 1 / 2 n β (1 / 2 − α ) ), and for 18 YIZA O W ANG it to b e of order o ( n − α/ 2 ) it corresp onds to set β < (1 − α ) / (1 − 2 α ). That is, (3.21) sup t ∈ [ n − β , 1]      Γ ⌊ nt ⌋ n  α − t α     = o ( n α/ 2 ) almost surely , for all α ∈ [1 / 2 , 1) , β > 0 or α ∈ (0 , 1 / 2) , β ∈  0 , 1 − α 1 − 2 α  . Th us, the combining (3.20) and (3.21) with β > 1 / 2 if α ∈ [1 / 2 , 1) or β ∈ (1 / 2 , (1 − α ) / (1 − 2 α )) if α ∈ (0 , 1 / 2) w e ha v e prov ed lim n →∞ b d n = 0 almost surely and hence (3.19). □ 3.3. Join t con vergence. With the quenched con vergence of W n and the conv ergence of Y n established, we are ready to prov e the main result. Pr o of of The or em 3.1. Recall that we write W n = ( W n ( t )) t ∈ [0 , 1] and Y n = ( Y n ( t )) t ∈ [0 , 1] . W rite similarly Z = Z (1) α = ( Z (1) α ( t )) t ∈ [0 , 1] , Z ′ = Z (2) α = ( Z (2) α ( t )) t ∈ [0 , 1] , and furthermore S ≡ S 1 / 2 α . W rite also E P ( · ) = E ( · | P ). Throughout, S α is deﬁned as in (1.2) is on the same probabilit y space as W n , Y n , and is P -measurable. In the sequel we assume further that Z , Z ′ are on the same probability space, the tw o are mutually indep endent, and also indep enden t from P . The quenched conv ergence W n a.s.w. → S 1 / 2 α Z (1) α with resp ect to P as n → ∞ (recall (1.8)) is equiv alent to lim n →∞ E P f ( W n ) = lim n →∞ E P f ( S Z ) , almost surely , for all con tin uous and b ounded functions f : D [0 , 1] → R . Similarly , Prop osition 3.2 is the same as lim n →∞ E P n g ( Y n ) = E P g ( S Z ′ ) , for all contin uous and b ounded function g . Now, to prov e Theorem 3.1 it suﬃces to sho w that E P n ( f ( W n ) g ( Y n )) − E P ( f ( S Z ) g ( S Z ′ )) = E P n ( f ( W n ) g ( Y n )) − E P n ( f ( S Z ) g ( Y n )) + E P n ( f ( S Z ) g ( Y n )) − E P ( f ( S Z ) g ( S Z ′ )) (3.22) tends to zero as n → ∞ for all f , g as ab o ve [31, Corollary 1.4.5]. The absolute v alue of the ﬁrst diﬀerence on the righ t-hand side ab ov e is the same as | E P n ( g ( Y n )( f ( W n ) − f ( S Z ))) | = | E P n ( g ( Y n ) E P ( f ( W n ) − f ( S Z ))) | ≤ ∥ g ∥ ∞ E P n | ( E P ( f ( W n ) − f ( S Z ))) | → 0 , where we applied the dominated conv ergence theorem for conditional exp ectations. Recall S ≡ S 1 / 2 α = (Γ(1 − α ) D ) 1 / 2 . Introduce S n = (Γ(1 − α ) D n ) 1 / 2 (recall D n ∈ P n in tro duced in (3.4)). F or the second term on the right-hand side of (3.22), we decomp ose further as E P n ( f ( S Z ) g ( Y n )) − E P ( f ( S Z ) g ( S Z ′ )) = E P n ( f ( S Z ) g ( Y n )) − E P n ( f ( S n Z ) g ( Y n )) + E P n f ( S n Z ) E P n g ( Y n ) − E P n f ( S n Z ) E P g ( S Z ′ ) + E P f ( S n Z ) E P g ( S Z ′ ) − E P f ( S Z ) E P g ( S Z ′ ) . Resp ectiv ely , w e hav e for the ﬁrst diﬀerence, | E P n ( f ( S Z ) g ( Y n )) − E P n ( f ( S n Z ) g ( Y n )) | ≤ ∥ g ∥ ∞ E P n | f ( S Z ) − f ( S n Z ) | → 0 ON CENTRAL LIMIT THEOREMS FOR EWENS–PITMAN MODEL 19 almost surely , the second diﬀerence go es to zero thanks to Prop osition 3.2, and the third diﬀerence goes to zero b y the fact that S n → S and again the dominated con v ergence theorem. This completes the pro of. □ References [1] Arratia, R., Barb our, A. D., and T a v ar ´ e, S. (2003). L o garithmic c ombinatorial structur es: a pr ob abilistic appr o ach . EMS Monographs in Mathematics. Europ ean Mathematical So ciet y (EMS), Z ¨ uric h. [2] Bahadur, R. R. (1960). On the n um b er of distinct v alues in a large sample from an inﬁnite discrete distribution. Pr o c. Nat. Inst. Sci. India Part A , 26(supplemen t I I):67–75. [3] Bahier, V. and Na jn udel, J. (2022). On smo oth mesoscopic linear statistics of the eigen- v alues of random p erm utation matrices. J. The or et. Pr ob ab. , 35(3):1640–1661. [4] Basrak, B. (2025). On generalized arcsine laws and residual allo cation mo dels. arXiv preprin t [5] Ben Arous, G. and Dang, K. (2015). On ﬂuctuations of eigenv alues of random p ermuta- tion matrices. A nn. Inst. Henri Poinc ar´ e Pr ob ab. Stat. , 51(2):620–647. [6] Bercu, B. and F av aro, S. (2024). A martingale approach to Gaussian ﬂuctuations and la ws of iterated logarithm for Ewens-Pitman mo del. Sto chastic Pr o c ess. Appl. , 178:P ap er No. 104493, 19. [7] Billingsley , P . (1999). Conver genc e of pr ob ability me asur es . Wiley Series in Probabilit y and Statistics: Probability and Statistics. John Wiley & Sons Inc., New Y ork, second edition. A Wiley-Interscience Publication. [8] Bro derick, T., Jordan, M. I., and Pitman, J. (2012). Beta pro cesses, stick-breaking and p o w er la ws. Bayesian Anal. , 7(2):439–475. [9] Chebunin, M. and Kov alevskii, A. (2016). F unctional central limit theorems for certain statistics in an inﬁnite urn scheme. Statist. Pr ob ab. L ett. , 119:344–348. [10] Contardi, C., Dolera, E., and F a v aro, S. (2025). Laws of large n um b ers and cen tral limit theorem for Ewens-Pitman mo del. Ele ctr on. J. Pr ob ab. , 30:P ap er No. 193, 51. [11] Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. , 31(1):1–19. [12] Darling, D. A. (1967). Some limit theorems asso ciated with multinomial trials. In Pr o c. Fifth Berkeley Symp os. Math. Statist. and Pr ob ability (Berkeley, Calif., 1965/66), Vol. II: Contributions to Pr ob ability The ory, Part 1 , pages 345–350. Univ. California Press, Berk eley , CA. [13] Durieu, O. and W ang, Y. (2016). F rom inﬁnite urn schemes to decomp ositions of self- similar Gaussian pro cesses. Ele ctr on. J. Pr ob ab. , 21:Paper No. 43, 23. [14] Durrett, R. (2010). Pr ob ability: the ory and examples . Cam bridge Series in Statistical and Probabilistic Mathematics. Cam bridge Universit y Press, Cambridge, fourth edition. [15] F a v aro, S., F eng, S., and Paguy o, J. (2025). Asymptotic b ehavior of clusters in hierar- c hical sp ecies sampling mo dels. arXiv preprin t [16] F eng, S. (2010). The Poisson-Dirichlet distribution and r elate d topics . Probabilit y and its Applications (New Y ork). Springer, Heidelb erg. Mo dels and asymptotic b ehaviors. [17] F erguson, T. S. (1973). A Bay esian analysis of some nonparametric problems. A nn. Statist. , 1:209–230. 20 YIZA O W ANG [18] F ran¸ cois, Q. (2025). Characteristic p olynomial of generalized Ewens random p erm uta- tions. Ele ctr on. Commun. Pr ob ab. , 30:Paper No. 97, 12. [19] F u, Z. and W ang, Y. (2020). Stable pro cesses with stationary increments parameterized b y metric spaces. J. The or et. Pr ob ab. , 33(3):1737–1754. [20] Garza, J. and W ang, Y. (2024). Limit theorems for random p ermutations induced by Chinese restaurant pro cesses. Arxiv preprint , . [21] Garza, J. and W ang, Y. (2025). A functional central limit theorem for weigh ted o ccu- pancy pro cesses of the Karlin mo del. Sto chastic Pr o c ess. Appl. , 188:Paper No. 104665. [22] Gnedin, A., Hansen, B., and Pitman, J. (2007). Notes on the o ccupancy problem with inﬁnitely many b o xes: general asymptotics and p o w er la ws. Pr ob ab. Surv. , 4:146–171. [23] Gr ¨ ub el, R. and Kabluc hko, Z. (2016). A functional central limit theorem for branc hing random walks, almost sure weak con v ergence and applications to random trees. Ann. Appl. Pr ob ab. , 26(6):3659–3698. [24] Heyde, C. C. (1977). On cen tral limit and iterated logarithm supplements to the mar- tingale conv ergence theorem. J. Appl. Pr ob ability , 14(4):758–775. [25] Iksanov, A., Kabluc hko, Z., and Kotelniko v a, V. (2022). A functional limit theorem for nested Karlin’s o ccupancy sc heme generated b y discrete Weibull-lik e distributions. J. Math. Anal. Appl. , 507(2):Paper No. 125798, 24. [26] Karlin, S. (1967). Cen tral limit theorems for certain inﬁnite urn sc hemes. J. Math. Me ch. , 17:373–401. [27] Kingman, J. F. C. (1978). The representation of partition structures. J. L ondon Math. So c. (2) , 18(2):374–380. [28] Perman, M., Pitman, J., and Y or, M. (1992). Size-biased sampling of Poisson p oin t pro cesses and excursions. Pr ob ab. The ory R elate d Fields , 92(1):21–39. [29] Pitman, J. (2006). Combinatorial sto chastic pr o c esses , v olume 1875 of L e ctur e Notes in Mathematics . Springer-V erlag, Berlin. Lectures from the 32nd Summer School on Proba- bilit y Theory held in Saint-Flour, July 7–24, 2002, With a foreword b y Jean Picard. [30] Pitman, J. and Y or, M. (1997). The t wo-parameter Poisson-Diric hlet distribution de- riv ed from a stable sub ordinator. Ann. Pr ob ab. , 25(2):855–900. [31] v an der V aart, A. W. and W ellner, J. A. (1996). We ak c onver genc e and empiric al pr o c esses: with applic ations to statistics . Springer Series in Statistics. Springer-V erlag, New Y ork. [32] Wieand, K. (2000). Eigen v alue distributions of random p erm utation matrices. Ann. Pr ob ab. , 28(4):1563–1587. Dep ar tment of Ma thema tical Sciences, University of Cincinna ti, 2815 Commons W a y, Cincinna ti, OH, 45221-0025, USA. Email addr ess : yizao.wang@uc.edu

On central limit theorems for Ewens--Pitman model

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment