Sharp Threshold for the Convergence of Nonstationary Averaging

We study non-stationary averaging processes, where each term of a sequence is a weighted average of previous terms, namely $a_{n+1} = \sum_{j=1}^n p_n(j) a_j$. Our results extend classical theory in two distinct regimes. First, we prove a sharp thres…

Authors: Saba Lepsveridze, Elchanan Mossel

Sharp Threshold for the Con v ergence of Nonstationary Av eraging Saba Lepsv eridze, Elchanan Mossel Abstract W e study non-stationary a veraging pro cesses, where each term of a se- quence is a w eighted a verage of previous terms, namely a n +1 = P n j =1 p n ( j ) a j . Our results extend classical theory in t w o distinct regimes. First, we pro ve a sharp threshold for conv ergence in the regime where the weigh ts are bounded b et w een t w o en v elop es (log n ) − α ≤ np n ( · ) ≤ (log n ) β . W e sho w that the sequence necessarily con verges when α + β / 2 ≤ 1, while α + β / 2 > 1 the con v ergence can fail. Second, w e study complemen tary fixed shape regime, when p n is obtained b y a fixed limiting densit y on (0 , 1). W e show that under mild regularity assumptions, the sequence conv erges. 1 In tro duction W eigh ted running a v erages show up frequen tly in probabilit y , dynamics, and opti- mization. A basic primitiv e b ehind these pro cedures is that the next iterate is a w eighted av erage of the past. While classical theory understands these pro cesses w ell when the weigh ts are constan t or uniform, m uc h less is known when the av er- aging w eigh ts could b e non-stationary and p otentially spiky . Let { p n } ∞ n =1 b e a sequence of probabilit y measures with p n ∈ P ([ n ]). Consider a sequence of vectors { a n } ∞ n =1 ⊂ R d defined b y a n +1 = E j ∼ p n a j = n X j =1 p n ( j ) a j for n ≥ k, and an arbitrary initialization a 1 , . . . , a k ∈ R d . W e ask: Question 1. Under what conditions on { p n } ∞ n =1 do es the sequence { a n } ∞ n =1 neces- sarily con v erge? 1 A natural global constraint is to b ound each weigh t b etw een tw o env elop es f ( n ) n ≤ p n ( j ) ≤ c ( n ) n for all j ∈ [ n ] , where f ( n ) and c ( n ) describ e a flo or and a ceiling, resp ectively . This allows a deca y of f ( n ) and blo w-up of c ( n ), whic h is far from uniform a veraging. W e answer this question b y obtaining optimal rates for f and c under which con vergence holds. Theorem 1. Supp ose f ( n ) = A (log n ) − α and c ( n ) = B (log n ) β with α, β > 0 . - If α + β / 2 ≤ 1 , then for any choic e of { p n } ∞ n =1 and any initialization, the se quenc e { a n } ∞ n =1 ne c essarily c onver ges. - If α + β / 2 > 1 , then ther e exists a choic e of { p n } ∞ n =1 and an initialization such that { a n } ∞ n =1 fails to c onver ge. In addition to the worst-case env elop e-b ounded weigh ts considered ab ov e, sec- tion 6 establishes theorem 5 for a complemen tary fixe d-shap e regime. In this setting, the w eigh ts are obtained b y discretizing a limiting probabilit y densit y on (0 , 1). The additional structure permits a renew al-theoretic analysis and yields conv ergence un- der suitable regularit y and approximation assumptions. Theorem 2 (Informal) . L et { p n } ∞ n =1 b e a se quenc e of discr ete pr ob ability distribu- tions that appr oximate a density p : (0 , 1) → [0 , ∞ ) with finite lo garithmic moment. L et { a n } ∞ n =1 ⊂ R d satisfy a n +1 = E j ∼ p n  a j  for al l n ≥ k. Then the se quenc e { a n } ∞ n =1 c onver ges. The precise assumptions and statement appear in theorem 5. 1.1 Related W ork Note that the recursion a n +1 = P n j =1 p n ( j ) a j can b e view ed as a n +1 = E [ a J n ], where J n ∼ p n . Such recursions often app ear in probabilistic mo dels with memory as de- tailed b elo w. These works motiv ate our setting but w e assume neither stationarity nor mo del sp ecific structure. In our pap er, w e fo cus on t w o general regimes: theo- rem 1 gives a sharp worst case conv ergence threshold ov er all nonstationary k ernels ob eying p oint wise en v elop e bounds, while theorem 5 treats a structured scaling in v ariant fixed shape regime. 2 Split trees and tagged lineages. A concrete wa y our recurrence shows up in random trees is through the standard tree = ro ot + subtrees decomp osition. In a split tree with n items, the ro ot partitions the items in to subtree sizes ( J n, 1 , . . . , J n,b ) according to a random split, and man y parameters are obtained b y iterating this decomp osition do wn the tree as in [11]. If one follo ws a tagged ob ject (for example an uniformly chosen item, or the lineage of a random leaf ), then at each split one k eeps only the unique child subtree containing the tag. Hence, for man y scalar quan tity a n that is a function of the tagged subtree, one obtains a n +1 = E [ a J n ] = X j ≤ n P ( J n = j ) a j . This recursion is often the key input b efore lifting to global statemen ts ab out the whole tree lik e LLNs/CL Ts for depths, profiles, and related additiv e functionals. In fact, our fixed shap e theorem 5 can b e applied as a black box to Aldous’ β -splitting mo del [2] to obtain con vergence b eha viour for man y recursions in these tree mo dels for β > − 1, when the splitting distribution has finite moments. Long-memory w alks such as the elephan t random w alk also generate full-memory a veraging recurrences, but we men tion them only as further motiv ation rather than as a structural input [14, 4]. Self-a veraging sequences Oftentimes ”probabilit y of an even t at time n ” prob- lems can b e written in a self a veraging form (see group Russian roulette example [5] and man y other references therein): one identifies some b ounded statistic or an ev ent probabilit y a n and a random lo okback index J n ∈ [ n ] suc h that a n = E [ a J n ] . Setting p n ( j ) = P ( J n = j ), w e reduce to studying our recursion. Cator and Don [5] study precisely this equation for b ounded sequences, but under a concentration h yp othesis J n ≈ α n with V ar( J n ) = O ( n ). Our results complement this line of re- searc h b y studying worst-case adversarial env elop e and a structured scaling regimes. In fact, our div ergence construction theorem 4 shows that oscillations can p ersist ev en when most of the mass spread ov er p olylogarithmic num b er of indices. Renew al theory Renew al theory analyzes con v olution recursions suc h as u 0 = 1 and u n = n X k =1 f k u n − k , 3 where ( f k ) k ≥ 1 is a probabilit y mass function and ( u n ) is the asso ciated renewal sequence [8, 6, 15]. A classical result by Erd˝ os-F eller-P ollard theorem states that when the mean µ ≜ P k ≥ 1 k f k is finite and f is ap erio dic, one has the sharp limit u n − → 1 µ , whic h is the discrete-time analogue of the ”renewal densit y → 1 /µ ” principle [8]. More generally , key renewal theorems describ e limits for p erturb ed renew al equa- tions and iden tify the limiting constant via the mean µ and an explicit o vershoot la w [6, 15]. Our fixed shap e regime in section 6 can b e put in this framework after a logarith- mic change of v ariables. Roughly sp eaking, when the w eights come from discretizing a densit y on (0 , 1) the recursion has form F ( x ) ≈ E [ F ( T x )] with T ∈ (0 , 1). W riting x = e s and G ( s ) = F ( e s ) turns this into an additiv e renew al equation G ( s ) = E [ G ( s − Y )] + η ( s ) , Y ≜ log(1 /T ) , where η captures discretization error. Thus a finite log moment E [log (1 /T )] < ∞ pla ys exactly the role of a finite mean incremen t in classical renewal theory , and yields con vergence together with an explicit residual description. This is stated in lemma 7, and might b e of indep enden t use b eyond this application. Consensus/so cial learning and sto chastic appro ximation. Classical dis- tributed consensus and so cial learning models iterate stochastic matrices on a fixed agen t set and analyze con vergence via connectivit y/mixing assumptions [7, 13]. In the Bay esian setting Bala and Go yal [3] consider a Bay esian mo del, where when an agen t joins they observ e all previous agen ts (and their priv ate signal) b efore taking their action. The analogous mo del in the DeGrott framew ork consists of growing net work where when agents arriv e sequen tially , and agen t n + 1 forms an opinion b y a veraging earlier opinions with an attention profile p n , x n +1 = X j ≤ n p n ( j ) x j . It is natural to ask when is asymptotic consensus reac hed in such a mo del. The Erd˝ os-F eller-Pollard theorem provides a p ositiv e answer when p ( n, j ) = f n − j and f has a finite mean and is non-p erio dic. Our results show that consensus is ac hieved ev en if agen ts use differen t av eraging weigh ts as long as these are equitable 4 enough among the preceding agen ts. Since our results are tight they also pro vide examples where if the w eights are not equitable consensus is not reached. Prior w ork in learning on netw orks highlighted the role of v arious notions of equabilit y in reac hing consensus and in learning [9, 1, 12] 1.2 Ac kno wledgements E.M. Is partially supported b y ARO MURI N00014241274, by V annev ar Bush F ac- ult y F ello wship ONR-N00014-20-1-2826 and b y a Simons Inv estigator Aw ard. S.L.’s researc h supp orted in part b y NSF–Simons collab oration gran t DMS- 2031883. 2 Reduction T o prov e the threshold in theorem 1, we reduce the problem to a one dimensional extremal pro cess. This reduction starts by applying the argument co ordinatewise and using affine inv ariance of the recursion. In particular, we assume without loss of generalit y that { a n } ∞ n =1 ⊂ R . By affine inv ariance of the sequence, we can shift and scale the initialization so that all terms are con tained in the interv al [0 , 1]. F or con venience, w e also reparametrize the problem by setting ε n ≜ f ( n ) and δ n ≜ 1 − f ( n ) c ( n ) − f ( n ) (1) The parameter δ n is chosen so that assigning maximal w eigh t c ( n ) /n to the largest δ n n terms 1 and f ( n ) /n to the remaining terms yields a v alid probability measure. W e no w introduce imp ortan t notation. F or a sequence { a n } ∞ n =1 and an index m ∈ N denote a ( m ) 1 ≥ · · · ≥ a ( m ) m and a [ m ] 1 ≤ · · · ≤ a [ m ] m to b e the descending and ascending orderings of the partial sequence { a 1 , . . . , a m } , resp ectiv ely . Similarly , we define the p ercen tile av erages ¯ a m ≜ a 1 + · · · + a m m and ¯ a ( m ) j ≜ a ( m ) 1 + · · · + a ( m ) j j and ¯ a [ m ] j ≜ a [ m ] 1 + · · · + a [ m ] j j . 1 T o impro ve readability , w e will ignore integralit y issues throughout the pap er, since they can b e handled with routine adjustments and do not affect the asymptotic arguments. 5 In words, ¯ a m is the a verage of all terms a 1 , . . . , a m , while ¯ a ( m ) j and ¯ a [ m ] j are av erages of the largest and smallest j terms among the first m elemen ts, resp ectiv ely . W e now describ e an equiv alen t formulation of the problem that w e will w ork with for the rest of the pap er. Lemma 1. L et { a n } ∞ n =1 b e a se quenc e of r e al numb ers and { p n } ∞ n =1 b e a se quenc e of pr ob ability me asur es with p n supp orte d on [ n ] such that a n +1 = E j ∼ p n a j for al l n ≥ k and f ( n ) n ≤ p n ( j ) ≤ c ( n ) n for al l j ∈ [ n ] . (2) Then, for al l n ≥ k the se quenc e { a n } ∞ n =1 satisfies a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k . (3) F urthermor e, if { a n } ∞ n =1 satisfies (3), then ther e exist { p n } ∞ n =1 such that (2) holds. Pr o of. Note that ε n and δ n are c hosen precisely so that c ( n ) n × δ n n + f ( n ) n × ( n − δ n n ) = 1 and f ( n ) = ε n . Supp ose first that { a n } ∞ n =1 and { p n } ∞ n =1 satisfy (2). Then, a n +1 = n X j =1 p n ( j ) a j ≤ δ n n X j =1 c ( n ) n a ( n ) j + n X j = δ n n +1 f ( n ) n a ( n ) j = (1 − ε n )¯ a ( n ) δ n n + ε n ¯ a n In w ords, since the ob jectiv e is linear in { p n } , the maxim um is attained b y assigning maximal w eigh t to the largest entries. Analogously , w e ha ve a n +1 ≥ (1 − ε n )¯ a [ n ] δ n n + ε n ¯ a n . On the other hand, supp ose { a n } ∞ n =1 satisfies (3). Then, for all n ≥ k there exists λ n ∈ [0 , 1], such that a n +1 = ε n ¯ a n + (1 − ε n )( λ n ¯ a ( n ) δ n n + (1 − λ n ) a [ n ] δ n n ) . Hence, w e can write a n +1 = E j ∼ p n a j , where p n ( j ) = f ( n ) n + c ( n ) − f ( n ) n          λ n if a j is among the largest δ n n terms 1 − λ n if a j is among the smallest δ n n terms 0 otherwise 6 This concludes the tw o w a y reduction. With the reparameterization (1), we abuse notation and write ε n = A (log n ) − α and δ n = B (log n ) − β , for some constan ts A, B > 0. 3 Ma jorization In this section, we develop a comparison to ol that helps us lift the analysis from simple sequences to more complex ones. In particular, we use ma jorization to pro v e the main tec hnical prop ositions in the next section. Definition 1. (Ma jorization) W e say that { a n } k n =1 ma jorizes { b n } k n =1 if a ( k ) 1 ≥ b ( k ) 1 a ( k ) 1 + a ( k ) 2 ≥ b ( k ) 1 + b ( k ) 2 · · · a ( k ) 1 + · · · + a ( k ) k ≥ b ( k ) 1 + · · · + b ( k ) k W e denote this relation by { a n } k n =1 ⪰ { b n } k n =1 . W e note that this definition differs from the t ypical notion of ma jorization as w e do not require equality of total sums. The follo wing lemma states that if the initialization of one sequence ma jorizes that of another, then the former dominates the latter at all future times. Lemma 2 (Ma jorization) . Supp ose { a n } k n =1 majorizes { b n } k n =1 and a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n and b n +1 ≤ ε n ¯ b n + (1 − ε n ) ¯ b ( n ) δ n n for al l n ≥ k . Then 1. { a n } m n =1 ⪰ { b n } m n =1 for al l m ≥ k , and 2. a m ≥ b m for al l m ≥ k + 1 . Pr o of. W e pro ceed to prov e b y induction. Base case m = k holds by assumption. Assume no w that { a n } m n =1 ⪰ { b n } m n =1 . Ma jorization implies that ¯ a ( m ) j ≥ ¯ b ( m ) j holds 7 for an y j ∈ { 1 , . . . , m } . In particular, a m +1 = ε m ¯ a m + (1 − ε m )¯ a ( m ) δ m m ≥ ε m ¯ b m + (1 − ε m ) ¯ b ( m ) δ m m ≥ b m +1 . It remains to show that { a n } m +1 n =1 ⪰ { b n } m +1 n =1 . Define multisets A ( m ) j ≜ n a ( m ) 1 , . . . , a ( m ) j o and B ( m ) j ≜ n b ( m ) 1 , . . . , b ( m ) j o . F or a m ultiset S we define Σ S ∈ R to b e the sum of the terms within. W e consider t wo cases as follo ws. - If b m +1 / ∈ B ( m ) j , then Σ B ( m +1) j = Σ B ( m ) j ≤ Σ A ( m ) j ≤ Σ A ( m +1) j . - If b m +1 ∈ B ( m ) j , then Σ B ( m +1) j = b m +1 + Σ B ( m ) j − 1 ≤ a m +1 + Σ A ( m ) j − 1 ≤ Σ A ( m +1) j . This concludes the pro of. Another useful observ ation is that even if one sequence initially ma jorizes an- other, this dominance need not manifest itself in future iterates. More precisely , supp ose the initialization of the second sequence is obtained from that of the first b y replacing the top m terms with their a verage. Then the tw o sequences evolv e iden tically for as long as newly generated terms do not exceed the m -th largest term in a . Lemma 3. (R everse Majorization) L et δ n b e a de c aying p ar ameter such that the se quenc e { δ n n } ∞ n = k is non-de cr e asing. Supp ose { a n } ∞ n =1 and { b n } ∞ n =1 satisfy a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n b n +1 = ε n ¯ b n + (1 − ε n ) ¯ b ( n ) δ n n for al l n ≥ k . Supp ose that ther e is m ≤ δ k k , such that b ( k ) j = ¯ a ( k ) m for al l j ≤ m and b ( k ) j = a ( k ) j for al l m < j ≤ k ; Then, a n = b n for al l n > k as long as b n ≤ a ( k ) m for al l n > k . 8 Pr o of. The main idea is to show that the sequences a and b evolv e identically: the only discrepancy b etw een them is confined to the top m terms, and these terms remain at the top throughout. It suffices to show ¯ a n = ¯ b n and ¯ a ( n ) δ n n = ¯ b ( n ) δ n n for all n ≥ k . W e will pro ceed to pro ve b y induction that for all n ≥ k , b ( n ) j = ¯ a ( n ) m for all j ≤ m and b ( n ) j = a ( n ) j for all m < j ≤ k . Note that since m ≤ δ k k ≤ δ n n , this implies b oth ¯ a ( n ) δ n n = ¯ b ( n ) δ n n and ¯ a n = ¯ b n . The b ase case n = k holds b y assumption. On the other hand, inductiv e h yp oth- esis immediately implies a n +1 = b n +1 . Moreo v er, b y assumption, b n +1 ≤ a ( k ) m = a ( n ) m . This means that the new term do es not enter the top m terms in a or b , which concludes the pro of. 4 Main T ec hnical T o ols In this section, w e use ma jorization to establish t wo main tec hnical results, prop osi- tion 1 and prop osition 2, whic h will b e used in the next section to pro ve con vergence and non-con v ergence, resp ectiv ely . Roughly sp eaking, proposition 1 asserts that if the initialization of the sequence is mostly contained in an interv al [ B , U ] and its av erage is close to B , then the future iterates remain uniformly b ounded aw ay from U . Prop osition 1. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α, β ≥ 0 . Ther e exist c onstants c and C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ther e exists an interval [ B , U ] such that 1. ¯ a k = γ U + (1 − γ ) B with γ ≤ 1 2 ; 2. Among the initial k terms { a n } k n =1 , less than ck ( U − B ) min { γ , ε k δ k } terms lie outside the interval [ B , U ] . Then for al l n > k , a n ≤ B + ( U − B )    C γ /ε k δ k for γ ≤ 1 / 2 , 1 − c ε k δ k /γ for γ ≥ ε k δ k / 2 C 9 Prop osition 1 will b e used to show that interv als capturing the tails of the sequence con tract sufficien tly to ensure conv ergence. In con trast, prop osition 2 sho ws that if the initialization of the sequence contains a sufficien tly large prop ortion of terms exceeding a threshold U , then these v alues can force subsequen t iterates to increase. Prop osition 2. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α, β > 0 . Ther e exists a c onstant C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose that among the initial k terms { a n } k n =1 , at le ast δ 1 / 2 k fr action exc e e d U . Then, among the first k / 2 δ 1 / 2 k terms, at le ast half satisfy a n ≥  1 − C ε k δ 1 / 2 k  U. Prop osition 2 will b e used to show that the sequence can b e driv en sufficiently ”bac k and forth” to preven t con v ergence. Remark 1. Prop osition 1 is a partial con v erse of prop osition 2. In particular, note that if B = 0 and γ = δ 1 / 2 k , then it promises a n ≤ U (1 − cε k δ 1 / 2 k ), while from the latter we ha ve a n ≥ U (1 − C ε k δ 1 / 2 k ). This is where the threshold α + β / 2 = 1 arises. 4.1 T o ols for Upp er Bounds In this section, w e prov e a help er lemma, which we will later lift b y ma jorization to pro v e prop osition 1. The latter is the main to ol w e will use for upp er b ounds. Lemma 4. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α , β ≥ 0 . Ther e exists a c onstant C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ¯ a k ≤ γ . Then, for al l n > k a n ≤ C γ ε k δ k . Pr o of. By lemma 2, w e assume without loss of generalit y that the initialization { a n } k n =1 is comprised of γ k ones and (1 − γ ) k zero es, as this sequence ma jorizes all 10 others in [0 , 1] with the same mean. Since the recursion is monotone with resp ect to ma jorization, an y upp er b ound prov ed for this dominating sequence automatically applies to the original sequence. Set C to b e a large constan t to b e sp ecified in the argumen t b elow. Define a sequence { u n } ∞ n =1 b y setting u 1 = · · · = u k = 0 and letting u n +1 = ε n ¯ u n + (1 − ε n ) u n + γ k δ n n for all n ≥ k . Claim 1. The se quenc e { u n } ∞ n =1 is non-de cr e asing and lim n →∞ u n ≤ C γ ε k δ k Assuming claim 1, to finish the pro of of lemma 4, it suffices to prov e a n ≤ u n for all n > k . This follows from an inductiv e argument. In what follows, we v erify the base case and the inductive step sim ultaneously . Supp ose that a m ≤ u m for all k < m ≤ n . Then, ¯ a n = k ¯ a k + a k +1 + · · · + a n n ≤ k ¯ a k + u k +1 + · · · + u n n = n ¯ u n + γ k n . Observ e this b ound also holds for the base case n = k . Contin uing, ¯ a ( n ) δ n n = γ k + a ( n ) γ k +1 + · · · + a ( n ) δ n n δ n n ≤ γ k + ( δ n n − γ k ) u n δ n n . This b ound also holds for the base case n = k . Combining the tw o b ound, w e get a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n ≤ ε n n ¯ u n + γ k n + (1 − ε n ) ( δ n n − γ k ) u n + γ k δ n n = ε n ¯ u n + (1 − ε n ) u n + (1 − ε n + ε n δ n ) γ k δ n n ≤ u n +1 , whic h pro v es a n +1 ≤ u n +1 and hence the lemma 4. No w it remains to prov e claim 1. Pr o of of Claim 1. Define ∆ n ≜ n ( u n − ¯ u n ). Then, the recurrence is rewritten as ∆ n +1 = ( n + 1)( u n +1 − ¯ u n +1 ) = nu n +1 − n ¯ u n = ε n n ¯ u n + (1 − ε n ) nu n + γ k δ n − n ¯ u n = (1 − ε n )∆ n + γ k δ n . 11 W e first sho w that this recursive formula implies that ∆ n ≤ γ k ε n δ n . (4) W e pro ve this b ound by induction. The base case j ≤ k follows from ∆ j = 0. Now ∆ n +1 = (1 − ε n )∆ n + γ k δ n ≤ (1 − ε n ) γ k ε n δ n + γ k δ n = γ k ε n δ n ≤ γ k ε n +1 δ n +1 . Here we used ε n +1 δ n +1 ≤ ε n δ n , which holds as long as C is a sufficiently large constan t. W e now show that { ∆ n } ∞ n = k +1 is non-decreasing. This follows immediately from the iden tit y ∆ n +1 = ∆ n − ε n ∆ n + γ k δ n ≥ ∆ n . (5) No w w e translate (4) and (5) to b ounds on the sequence u n using the iden tit y ∆ n +1 − ∆ n n = ( n + 1) u n +1 − ( n + 1) ¯ u n +1 − nu n + n ¯ u n n = u n +1 − u n . Moreo ver, (5) immediately implies that { u n } ∞ n =1 is non-decreasing. Now u n +1 = u n + ∆ n +1 − ∆ n n = ⇒ u n +1 = ∆ n +1 n + n X t = k +1 ∆ t t ( t − 1) Applying the b ound (4) to this, w e get u n +1 ≤ γ k ε n +1 δ n +1 n + ∞ X t = k +1 γ k ε t δ t t ( t − 1) . If C is a sufficiently large constan t, w e hav e ε n +1 δ n +1 n ≥ ε k δ k k for all n ≥ k , so the first term is b ounded b y γ /ε k δ k . W e b ound the second term by splitting the sum in to t wo parts ∞ X t = k +1 1 ε t δ t t ( t − 1) ≤ k 2 X t = k +1 1 ε t δ t t ( t − 1) + ∞ X t = k 2 1 ε t δ t t ( t − 1) . Since ε t and δ t deca y p olylogarithmically , there exists a small constan t c suc h that ε t δ t ≥ cε k δ k for t ≤ k 2 and ε t δ t t ( t − 1) ≥ ct 3 / 2 for t ≥ k 2 . Hence, k 2 X t = k +1 1 ε t δ t t ( t − 1) + ∞ X t = k 2 1 ε t δ t t ( t − 1) ≤ 1 cε k δ k ∞ X t = k 1 t 2 + 1 c ∞ X t = k 2 1 t 3 / 2 ≤ ( C − 1) 1 ε k δ k k . 12 In the last inequalit y , w e chose C to b e a large enough constant. Substituting this in to the b ound for u n +1 ab o v e, w e conclude the pro of. Ha ving pro ved claim 1, w e conclude the pro of of lemma 4. Lemma 5. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α , β ≥ 0 . Ther e exist c onstants c and C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ¯ a k ≤ γ ≤ 3 4 . Then, for al l n > k we have a n ≤    C γ /ε k δ k for γ ≤ 3 / 4 , 1 − c ε k δ k /γ for γ ≥ ε k δ k / 16 C Pr o of. Let K b e the constant from lemma 4. Fix c and C to b e some small and large constant, resp ectively . Since k ≥ C and ε n and δ n satisfy logarithmic decay , w e can tak e C large enough so that 1. b oth { ε n } ∞ n = k and { δ n } ∞ n = k are non-increasing and at most 1 2 ; 2. the sequence { ε n δ n n } ∞ n = k is increasing; 3. if m is chosen so that ε m δ m m ≜ K k then ε m ≥ ε k K and δ m ≥ δ k K . W e no w b egin the pro of. Lemma 4 shows that for all n > k we hav e a n ≤ K γ / ( ε k δ k ). T aking C ≥ K , we obtain the first part of the desired b ound. In what follo ws, w e assume γ ≥ ε k δ k / 16 C . First, we use induction to sho w that for all k < n ≤ m a n ≤ λ ≜ 1 − ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ . W e verify the base case and the inductive step simultaneously . Assume a j ≤ λ for 13 all k < j ≤ n . Since n ≤ m , w e get a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n ≤ ε m ¯ a n + (1 − ε m )¯ a ( n ) δ m n ≤ ε m γ k + λ ( n − k ) n + (1 − ε m ) γ k + λ ( δ m n − γ k ) δ m n ≤ λ. Observ e that this b ound also holds for the base case n = k , so the induction is complete. Next, we recall that b y assumption ε k δ k / 16 C < γ ≤ 3 / 4, so ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ ≥ 1 4 ε m δ m ε m δ m + γ ≥ 1 4(16 C + 1) ε m δ m γ ≥ ε m δ m 100 C γ Since our goal is to obtain an upp er b ound on the terms of the sequence, it suffices to establish the bound for a simpler sequence that dominates the original one. More precisely , b y applying lemma 2 we ma y assume without loss of generalit y that among { a n } m n =1 there are γ k ones, and the remaining terms are equal to λ + where λ + ≜ 1 − ε m δ m 100 C γ ≥ 1 − ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ = λ. By affine in v ariance of the recurrence, w e may shift and rescale the sequence so that λ + is sent to 0 and 1 is sen t to 1. Applying lemma 4 in this scale and then undoing the affine transformation yields the desired b ound for the original sequence. More precisely , w e conclude that for all n > k a n ≤ λ + + (1 − λ + ) K γ k ε m δ m m ≤ λ + + (1 − λ + ) × 3 4 = 1 − ε m δ m 400 C γ ≤ 1 − c ε k δ k γ . In the last step, we c hose c = 1 / (400 C K 2 ). W e are no w ready to prov e prop osition 1. Pr o of of pr op osition 1. W e will apply ma jorization lemma (lemma 2) and reverse ma jorization lemma (lemma 3) to reduce to lemma 5. Since our goal is to obtain an upper b ound on the terms of the sequence, it suffices to establish the bound for a simpler sequence that dominates the original one. T o this end, we apply lemma 2 to assume without loss of generality that all terms originally lying in the interv al [ B , U ] are mov ed to one of the endp oints { B , U } in a wa y that preserves b oth the a verage and the n um b er of terms lying outside the interv al. W e ma y apply lemma 2 once again to replace all terms not in { B , U } b y 1. At this p oin t, every term in the initialization { a n } k n =1 tak es one of the v alues { B , U, 1 } . 14 Note that this step alters the a verage. W e will quan tify this change and sho w that it affects the situation only marginally . Denote by k B the num b er of terms equal to B , and similarly define k U and k 1 . Let U + b e the av erage of the terms are equal to U or 1, and let γ + b e the fraction of terms taking v alues in { U, 1 } . Th us, U + ≜ k U U + k 1 k U + k 1 and γ + ≜ k U + k 1 k . W e first sho w that U + ≈ U and γ + ≈ γ . Claim 2. With the notation ab ove, we have (1 − c ) γ ≤ γ + ≤ (1 + c ) γ and U ≤ U + ≤ U + 2 c ( U − B ) min  1 , ε k δ k γ  Pr o of. Since the num b er of terms outside the interv al [ B , U ] has not changed, w e ha ve k 1 ≤ cγ k ( U − B ). Moreo v er, by replacing at most k 1 terms b y 1, the total sum can increase by at most k 1 . Therefore, ¯ a k ≤ γ U + (1 − γ ) B + k 1 k ≤ (1 + c ) γ U + (1 − (1 + c ) γ ) B . By Mark o v inequalit y , this implies the upp er b ound γ + ≤ (1 + c ) γ . F or the lo wer b ound on γ + , note that replacing some terms by 1 can only increase the a v erage. Hence, B k + γ k ( U − B ) ≤ k ¯ a k = U k U + B k B + k 1 ≤ k U ( U − B ) + B k + k 1 ≤ k U ( U − B ) + B k + cγ k ( U − B ) In particular, this implies γ + ≥ k U k ≥ (1 − c ) γ . Finally , observe that low er b ound on U + is trivial, while upp er b ound follows from U + ≤ U k U + k 1 k U ≤ U + ck ( U − B ) (1 − c ) γ k min { γ , ε k δ k } ≤ U + 2 c ( U − B ) min  1 , ε k δ k γ  . Here w e c hose c < 1 / 2. Let us now collect the all terms taking v alue in { U, 1 } in to their av erage U + . By lemma 3, this op eration do es not affect the future evolution of the sequence, 15 pro vided that the newly generated terms remain b ounded by U . In particular, as long as we establish the b ound a n ≤ U for all n > k , which is w eak er than the conclusion of the prop osition, w e ma y apply rev erse ma jorization without loss of generalit y . In this transformed sequence, we hav e a 1 , . . . , a k ∈ { B , U + } , with the fraction of terms equal to U + b eing exactly γ + . By claim 2, w e ha v e γ + ≤ (1 + c ) γ ≤ 3 / 4. In particular, after shifting and rescaling the sequence to tak e v alues in { 0 , 1 } , w e ma y apply lemma 5 and undo the transformation to conclude that there exist constan ts c ′ and C ′ , suc h that for all n > k , a n ≤ B + ( U + − B )    C ′ γ + /ε k δ k for γ + ≤ 3 / 4 , 1 − c ′ ε k δ k /γ + for γ + ≥ ε k δ k / 16 C ′ It now suffices to b ound the right-hand side, for which w e inv oke claim 2 again. First, whenev er γ ≤ 1 / 2, w e ha v e γ + ≤ 3 / 4. Hence, for C = 4 C ′ w e obtain a n ≤ B + C ′ γ + ε k δ k ( U + − B ) ≤ B + 4 C ′ γ ε k δ k ( U − B ) = B + C γ ε k δ k ( U − B ) . On the other hand, whenever γ ≥ ε k δ k / 2 C , w e ha ve γ + ≥ ε k δ k / 16 C ′ , so we obtain a n ≤ B + ( U + − B )  1 − c ′ ε k δ k γ  ≤ B + ( U − B )  1 + 2 cε k δ k γ   1 − c ′ ε k δ k γ  ≤ B + ( U − B )  1 − cε k δ k γ  . Here we applied ( U + − U ) ≤ 2 c ε k δ k ( U − B ) /γ and chose c to b e a sufficiently small constan t relativ e to c ′ . 4.2 T o ols for Low er Bounds In this section, w e first establish a help er lemma, whic h w e then lift via ma jorization to pro v e prop osition 2. The latter is the main to ol for lo w er b ounds. Lemma 6. Fix 0 < ε, δ, γ ≤ 1 2 such that γ ≥ 2 δ . Supp ose { a n } ∞ n =1 is a se quenc e with initialization a 1 = · · · = a γ k = 1 and a γ k +1 = · · · = a k = 0 , satisfying a n +1 ≥ ε ¯ a n + (1 − ε ) ¯ a ( n ) δ n for al l n ≥ k . 16 Then, at le ast half of the terms up to index γ k / 2 δ satisfy a n ≥ 1 − ε  5 δ γ  1 − ε . Pr o of. Let n 0 ≜ γ k / (4 δ ). Observ e that if n ≤ 2 n 0 then δ n ≤ 2 δ n 0 ≤ γ k . In particular ¯ a ( n ) δ n = 1. Th us, as long as n ≤ 2 n 0 , w e ha v e a n +1 ≥ ε ¯ a n + (1 − ε ) so ¯ a n +1 ≥ ( n + ε )¯ a n + (1 − ε ) n + 1 . W e can solv e this recurrence by setting ¯ a n = 1 − d n and observing that d n +1 ≤ d n  1 − 1 − ε n + 1  so d n ≤ (1 − γ ) n Y j = k +1  1 − 1 − ε j  ≤ exp  (1 − ε ) log  k + 1 n + 1  . Since ¯ a ( n ) δ n ≥ ¯ a n , the sequence ¯ a n is non-decreasing. Hence, for all n ≥ n 0 ¯ a n ≥ 1 −  k + 1 n 0 + 1  1 − ε ≥ 1 −  5 δ γ  1 − ε . This means that whenever n 0 ≤ n < 2 n 0 w e ha ve that a n +1 ≥ ε ¯ a n + 1 − ε ≥ 1 − ε  5 δ γ  1 − ε . This concludes the pro of. W e are no w ready to prov e prop osition 2. Pr o of of pr op osition 2. Fix a constan t C to b e sp ecified b elo w. F or notational con- v enience, set ε ≜ ε k , δ ≜ δ k , and γ ≜ δ 1 / 2 k . By the ma jorization lemma (lemma 2), w e ma y assume without loss of gener- alit y that among the initial terms { a n } k n =1 , a γ fraction are equal to U , while the remaining terms are equal to 0. If C is chosen sufficiently large, then 0 ≤ ε, δ, γ ≤ 1 / 2 and γ ≥ 2 δ . Moreo v er, w e may assume that ε n and δ n are non-increasing for n ≥ k . In this case, we may rescale the sequence so that all terms tak e v alues in { 0 , 1 } , and apply lemma 6 to 17 conclude that, for at least half the terms up to k / 2 δ 1 / 2 satisfy a n ≥ " 1 − ε  5 δ γ  1 − ε # U It no w suffices to sho w that (5 δ/γ ) 1 − ε ≤ C δ /γ provided that C is c hosen sufficien tly large. This follows from the observ ation that log n γ δ  ε o = ε 2 log 1 δ = A β log log k − log B 2(log k ) α = O (1) . 5 Pro of of the Main Theorem W e first apply prop osition 1 to prov e conv ergence in theorem 3. Theorem 3. L et A, B > 0 and α, β ≥ 0 b e c onstants satisfying α + β / 2 ≤ 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β and let { a n } ∞ n =1 b e a se quenc e with initialization ( a 1 , . . . , a k ) that satisfies a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k . Then the se quenc e ne c essarily c onver ges. Pr o of. Since the recurrence is affine in v arian t, we ma y shift and rescale the initial- ization to assume without loss of generality that a 1 , . . . , a k ∈ [0 , 1]. F urthermore, w e ma y assume that k arbitrarily large. W e will construct a nested sequence of shrinking interv als [ B T , U T ] that capture the tails of the sequence. The interv als will b e defined inductively in stages. More precisely , at eac h stage T ∈ N 0 , w e will maintain an in terv al [ B T , U T ], an index n T , and the guaran tee that a n ∈ [ B T , U T ] for all n ≥ n T . A t stage T = 0, we initialize with B 0 = 0, U 0 = 1, and n 0 = k . Since every term of the sequence is a conv ex combination of a 1 , . . . , a k , all subsequen t terms remain b ounded in [0 , 1]. In particular, a n ∈ [0 , 1] for all n ≥ k , so the base case holds. Assume no w that we are at stage T . W e define the starting index of the next stage implicitly b y setting n T +1 ≜ n T × K 2 ε 2 n T +1 δ 3 n T +1 ( U T − B T ) 2 . 18 Here K is a sufficiently large constan t to be sp ecified b elow. This definition is well p osed as the sequence mε 2 m δ 3 m is increasing for m ≥ k when k is sufficien tly large. F or notational con venience, w e set ε ∗ ≜ ε n T +1 and δ ∗ ≜ δ n T +1 . Belo w, w e construct the next in terv al [ B T +1 , U T +1 ] ⊂ [ B T , U T ] such that a n ∈ [ B T +1 , U T +1 ] for all n ≥ n T +1 , and ( U T +1 − B T +1 ) ≤ ( U T − B T ) × (1 − cε ∗ δ 1 / 2 ∗ ) for a sufficien tly small constant c , indep enden t of T . Observ e that this suffices to complete the proof, for the follo wing reason. Given the contraction b ound ab ov e, prop osition 3 implies that ( U T − B T ) → 0 as T → ∞ . Since the in terv als are nested, there exists a limit L suc h that B T → L and U T → L . Moreo ver, since a n ∈ [ B T , U T ] for all n ≥ n T , w e conclude that lim inf n →∞ a n = L = lim sup n →∞ a n . This implies that the sequence conv erges. It no w suffices to construct the interv al with the desired prop erties. F or the purp oses of the analysis, we define an in termediate index b y n + T ≜ n T × K ε ∗ δ 2 ∗ ( U T − B T ) . T o aid the reader’s understanding, let us outline the reasoning b ehind the argumen t. - Recall that, by inductive h yp othesis, a n ∈ [ B T , U T ] for all n ≥ n T . This means that if w e wait long enough, an ov erwhelming ma jority of the sequence will lie in [ B T , U T ]. This allo ws us to effective ly zo om in on this interv al and apply our technical lemma. More precisely , b y time n + T the second condition of prop osition 1 is met. - If, at any time m b etw een n + T and n T +1 , the av erage ¯ a m comes to o close to B T , then by prop osition 1, subsequen t terms will b e b ounded aw ay from U T . This lets us shrink the interv al. By symmetry , the same argumen t applies with the roles of U T and B T rev ersed. - Alternativ ely , if at no time b etw een n + T and n T +1 do es the a v erage come close to B T or U T , then the sequence remained a w a y from both endp oints for to o long. In this case, b y prop osition 1 again, all subsequent terms remain 19 uniformly b ounded a wa y from at least one endp oint, which again allows us to shrink the in terv al. W e now carry out the ab ov e analysis as follo ws. First, suppose that there exists an index m with n + T ≤ m ≤ n T +1 , such that ¯ a m ≤ δ 1 / 2 m U T + (1 − δ 1 / 2 m ) B T . Since all terms { a j : n T ≤ j ≤ m } are con tained in [ B T , U T ], the fraction of terms lying outside the in terv al is at most n T m ≤ n T n + T = ε ∗ δ 2 ∗ ( U T − B T ) K ≤ ε m δ m ( U T − B T ) K . Therefore, if K is c hosen sufficien tly large, the conditions prop osition 1 are met. It follo ws that for sufficien tly small constant c > 0 and all n ≥ m , a n ≤ U T − c ε m δ m δ 1 / 2 m ( U T − B T ) ≤ U T − c ε ∗ δ 1 / 2 ∗ ( U T − B T ) . In this case, w e ma y set B T +1 ≜ B T and U T +1 ≜ U T − c ε ∗ δ 1 / 2 ∗ ( U T − B T ) to finish the pro of. Analogously , if there exists m ∈ [ n + T , n T +1 ] suc h that ¯ a m ≥ δ 1 / 2 m B T + (1 − δ 1 / 2 m ) U T , then we may instead set U T +1 ≜ U T and B T +1 ≜ B T + cε ∗ δ 1 / 2 ∗ ( U T − B T ) to finish the pro of. In the remainder of the pro of, w e assume that for all m ∈ [ n + T , n T +1 ], δ 1 / 2 m U T + (1 − δ 1 / 2 m ) B T < ¯ a m < δ 1 / 2 m B T + (1 − δ 1 / 2 m ) U T . Define the in termediate candidate endp oints b y U − T ≜ U T − 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) and B + T ≜ B T + 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) . First, we will show that a m +1 ∈ [ B + T , U − T ] for all m ∈ [ n + T , n T +1 ). T o this end, observ e that a m +1 ≤ ε m ¯ a m + (1 − ε m )¯ a ( m ) δ m m ≤ ε ∗ ¯ a m + (1 − ε ∗ )¯ a ( m ) δ ∗ m ≤ ε ∗ [ δ 1 / 2 ∗ B T + (1 − δ 1 / 2 ∗ ) U T ] + (1 − ε ∗ )  1 × n T + U T × ( δ ∗ m − n T ) δ ∗ m  = ε ∗ δ 1 / 2 ∗ B T + (1 − ε ∗ δ 1 / 2 ∗ ) U T + (1 − U T )(1 − ε ∗ ) n T δ ∗ m . The second inequality ab ov e follo ws from monotonicity δ ∗ ≤ δ m . In the third 20 inequalit y , we use the crude b ounds: a n ≤ 1 for n ≤ n T and a n ≤ U T if a n > n T . T o conclude, w e b ound the final term as (1 − U T )(1 − ε ∗ ) n T δ ∗ m ≤ n T δ ∗ n + T = 1 K ε ∗ δ ∗ ( U T − B T ) ≤ 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) . Substituting this b ound ab ov e yields a m +1 ≤ U − T and analogously a m +1 ≥ B + T . Define γ b y ¯ a n T +1 = γ U − T + (1 − γ ) B + T . Without loss of generality , let us assume that γ ≤ 1 / 2. Recall that ¯ a n T +1 ≥ δ 1 / 2 ∗ U T + (1 − δ 1 / 2 ∗ ) B T . Assuming ε ∗ is sufficien tly small, this implies γ ≥ 2 δ 1 / 2 ∗ . F urthermore, since a m +1 ∈ [ B + T , U − T ] for all m ∈ [ n + T , n T +1 ), the fraction of terms up to time n T +1 that fall outside the interv al [ B + T , U − T ] is at most n + T n T +1 = ε ∗ δ ∗ ( U T − B T ) K ≤ 2 ε ∗ δ ∗ ( U − T − B + T ) K . Hence, by prop osition 1 we get a n ≤ U − T for all n ≥ n T +1 . In particular, we may set B T +1 = B T and U T +1 = U − T to conclude the pro of. W e no w apply prop osition 2 to prov e the non-conv ergence in theorem 4. Theorem 4. L et A, B > 0 and α, β > 0 b e c onstants satisfying α + β / 2 > 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . Ther e exists a se quenc e { a n } ∞ n =1 and an initialization ( a 1 , . . . , a k ) satisfying a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k that do es not c onver ge. Pr o of. W e will construct the sequence together with a nested chain of interv als [ B T , U T ] such that T ∞ T =0 [ B T , U T ] = [ B ∞ , U ∞ ] for some B ∞ < 1 / 2 < U ∞ . F urther- more, there will b e infinitely many terms of the sequence { a n } ∞ n =1 lying ab o v e U ∞ and b elo w B ∞ . The sequence and the interv als will b e constructed inductively in stages. More precisely , at eac h stage T ∈ N 0 , w e will maintain an in terv al [ B T , U T ], an index n T , and the follo wing inv arian t • if T is even, then – at least δ 1 / 2 n T fraction of terms among { a n } n T n =1 are ab o v e U T ; – at least 1 2 fraction of terms among { a n } n T n =1 are b elo w B T . 21 • if T is o dd, then – at least 1 2 fraction of terms among { a n } n T n =1 are ab o v e U T ; – at least δ 1 / 2 n T fraction of terms among { a n } n T n =1 are b elo w B T . A t stage T = 0, we initialize with B 0 = 0, U 0 = 1, n 0 = k , and define the sequence a 1 = · · · = a ⌊ k/ 2 ⌋ = 0 and a ⌈ k/ 2 ⌉ = · · · = a k = 1 . W e assume that k is sufficiently large. Supp ose no w that we are at stage T , ha ving defined all terms of the sequence up to index n T . Without loss of generalit y , assume T is ev en. F or brevity , set ε ∗ ≜ ε n T and δ ∗ ≜ δ n T . Define n T +1 ≜ n T / 2 δ 1 / 2 ∗ and a n +1 ≜ ε ∗ ¯ a n + (1 − ε ∗ )¯ a ( n ) δ ∗ n for n T ≤ n < n T +1 . Since ε n ≤ ε ∗ and δ n ≤ δ ∗ , this defines a v alid extension of the sequence. T o define the new endp oin ts, observ e that the fraction of terms among { a n } n T n =1 that are less than B T is at least n T / 2 n T +1 = δ 1 / 2 ∗ . W e may therefore set B T +1 ≜ B T . F urthermore, by prop osition 2, at least half of the terms among { a n } n T +1 n =1 exceed (1 − C ε n T δ 1 / 2 n T ) U T ≜ U T +1 for a sufficien tly large constan t C . This concludes the construction. Observ e that, by construction, there are infinitely many terms of the sequence { a n } ∞ n =1 lying ab ov e U ∞ and b elow B ∞ . It therefore suffices to sho w U ∞ > B ∞ . This follo ws directly from prop osition 4 since α + β / 2 > 1 and B ∞ ≜ lim T →∞ B T < 1 2 < lim T →∞ U T ≜ U ∞ . This concludes the pro of. 6 Fixed Shap e Analysis The main results of this pap er fo cus on w orst-case (p ossibly adv ersarial) non- stationary w eights that are constrained betw een t wo en velopes. In con trast, we ma y encoun ter w eights that are obtained b y discretizing a limiting “shap e” on (0 , 1). This setting is largely orthogonal to our env elop e framework: discretized shap es ma y assign p olynomially large weigh t to the v ery recen t past, which is far b eyond the p olylogarithmic ceilings in theorem 1. Y et, the additional structure allo ws a differen t pro of strategy based on renewal theory . 22 6.1 A Renew al Lemma for Multiplicativ e Recursions In this subsection, we develop a general and useful lemma that will b e used to obtain the main result of the section. Definition 2. A real-v alued random v ariable Y is lattic e if there exist constants a ∈ R and d > 0 such that P ( Y ∈ a + d Z ) = 1. Otherwise Y is non-lattic e . Definition 3 (DRI) . A real-v alued function η : R + → R is said to b e dir e ctly R iemann inte gr able if lim h ↓ 0 L ( h ) = lim h ↓ 0 U ( h ) , where L and U are the lo w er and upp er mesh sums, defined b y L ( h ) = h ∞ X k =0 inf { η ( t ) : t ∈ ( k h, ( k + 1) h ] } U ( h ) = h ∞ X k =0 sup { η ( t ) : t ∈ ( k h, ( k + 1) h ] } . Lemma 7 (Renewal lemma with in tegrable error) . L et T b e a r andom variable taking values in (0 , 1) , and let F : R + → R b e a b ounde d pie c ewise c ontinuous function satisfying F ( x ) = E  F ( T x )  + ϵ ( x ) for al l x ≥ 1 . (6) Assume that Y ≜ log (1 /T ) is non-lattic e with finite me an µ ≜ E [ Y ] < ∞ , and that | ϵ ( x ) | /x is dir e ctly Riemann inte gr able over (1 , ∞ ) . Then, F ( x ) c onver ges and lim x →∞ F ( x ) = E  F ( e T )  + 1 µ Z ∞ 1 ϵ ( x ) x dx, (7) wher e e T is a r andom variable on (0 , 1) with density e p ( t ) = P ( T < t ) / ( µt ) . Pr o of. Define G, η : R → R b y the logarithmic change of v ariables G ( s ) ≜ F ( e s ) and η ( s ) ≜ ϵ ( e s ) for all s ∈ R . Then (b y the substitution x = e s ) (6) b ecomes G ( s ) = E  G ( s − Y )  + η ( s ) for all s ≥ 0 , where Y = log(1 /T ) is non-lattice with mean µ . Moreov er, G is b ounded and piecewise con tin uous, and η is DRI. Let { Y j } j ≥ 1 b e i.i.d. copies of Y , and set 23 S 0 = 0 and S m ≜ P m i =1 Y i . A straightforw ard induction on m yields G ( s ) = E  G ( s − S m )  + E h m − 1 X n =0 η ( s − S n ) i for all m ≥ 1 . F or each s ≥ 0, define the first passage time τ s ≜ inf { m ≥ 1 : S m ≥ s } and the o vershoot R s ≜ S τ s − s ∈ [0 , ∞ ). W e will no w sho w that G ( s ) = E  G ( − R s )  + E h τ s − 1 X n =0 η ( s − S n ) i . (8) T o this end, let ( F m ) ∞ m =0 b e the natural filtration of the pro cess ( Y ) ∞ m =0 . Fix s ≥ 0 and define M 0 = 0 and M m = G ( s − S m ∧ τ s ) + m ∧ τ s − 1 X n =0 η ( s − S n ) . Observ e that since η ( s ) = G ( s ) − E [ G ( s − Y )] and G is b ounded, ∥ η ∥ ∞ ≤ 2 ∥ G ∥ ∞ . This immediately implies the integrabilit y of M m . F urthermore, it is easy to chec k that E [ M m +1 |F m ] = M m , so ( M m ) ∞ m =0 is a martingale. Since τ s ∧ m is a b ounded stopping time, the optional stopping theorem applies so G ( s ) = E [ G ( s − S τ s ∧ m )] + E h m ∧ τ s − 1 X n =0 η ( s − S n ) i . Note that since Y > 0 we ha ve E [ τ s ] < ∞ . Since G and η are b ounded, | G ( s − S τ s ∧ m ) | ≤ ∥ G ∥ ∞ and m ∧ τ s − 1 X n =0 η ( s − S n ) ≤ ∥ η ∥ ∞ τ s . Hence, b y dominated con v ergence theorem and the fact that τ s ∧ m → τ s almost surely , w e deduce (8). W e now proceed to analyze equation (8). Oversho ot T erm. Since Y is non-lattice with finite mean µ , the excess life conv er- gence theorem (section 10.3 of [10]) implies that R s con verges in distribution to a random v ariable R with density r 7→ P ( Y > r ) /µ on [0 , ∞ ); Hence, lim s →∞ E  G ( − R s )  = E  G ( − R )  . If we define e T ≜ e − R , then G ( − R ) = F ( e T ). A change of v ariables sho ws that e T has densit y e p ( t ) = P ( T < t ) / ( µt ) on (0 , 1), so E [ G ( − R )] = E [ F ( e T )]. 24 Err or term. Let σ b e the renew al measure σ ([ a, b ]) ≜ X n ≥ 0 P ( S n ∈ [ a, b ]) for all a ≤ b. Then, since η is in tegrable, E h τ s − 1 X n =0 η ( s − S n ) i = E h X n ≥ 0 η ( s − S n ) I { S n < s } i = Z [0 ,s ) η ( s − u ) σ ( du ) . Applying the k ey renew al theorem (Theorem 35 in [15] or section 10.2 of [10]) yields lim s →∞ Z [0 ,s ) η ( s − u ) σ ( du ) = 1 µ Z ∞ 0 η ( u ) du = 1 µ Z ∞ 1 ϵ ( x ) x dx, where the last equality uses the substitution x = e u . Finally , taking the limit of (8) giv es (7). Remark 2. The non-lattice condition and the finiteness of E [log (1 /T )] are b oth necessary for con v ergence in lemma 7, even if ϵ ≡ 0. 6.2 Con v ergence for Fixed Shap es In this subsection we formalize what it means for the sequence { p n } n ≥ 1 to hav e a ”fixed shap e”, and we pro ve a sufficient condition for con v ergence in this regime. Definition 4. Let p : (0 , 1) → [0 , ∞ ) b e a contin uous probabilit y densit y and { p n } ∞ n =1 b e a sequence of probabilit y mass functions with p n supp orted on [ n ]. W e sa y that { p n } ∞ n =1 str ongly discr etizes p if there exist constan ts κ > 0 and C > 0 suc h that ∥ p x − p ∥ 1 ≤ C x − κ for all x ≥ C , where p x is a probabilit y density on (0 , 1) defined b y p x ( t ) ≜ xp ⌊ x ⌋ ( ⌈ tx ⌉ ) for all t ∈ (0 , 1) . The next theorem sho ws that the fixed shap e structure, together with the quan- titativ e discretization assumption ab ov e, is sufficien t to guarantee conv ergence of the lo okbac k a v erages. Theorem 5 (Conv ergence for Fixed Shap es) . L et p : (0 , 1) → [0 , ∞ ) b e a c ontinuous pr ob ability density with finite lo g-moment R 1 0 p ( t ) log(1 /t ) dt . Supp ose that { p n } ∞ n =1 25 str ongly discr etizes p , and that { a n } ∞ n =1 ⊆ R d satisfies a n +1 = E j ∼ p n  a j  for al l n ≥ k . Then the se quenc e { a n } ∞ n =1 c onver ges. Pr o of. It suffices to prov e con v ergence in the case d = 1. The idea is to embed the discrete recursion into a contin uous equation and apply lemma 7. Define a function F : R + → R by F ( x ) ≜ a ⌈ x ⌉ . It suffices to show that F ( x ) conv erges as x → ∞ . Define { p x } x>k as in definition 4, and note that for all n ≥ k and x ∈ ( n, n + 1], Z 1 0 p x ( t ) F ( tx ) dt = Z 1 0 x n X j =1 I { tx ∈ ( j − 1 , j ] } p n ( j ) F ( tx ) dt = Z x 0 n X j =1 I { u ∈ ( j − 1 , j ] } p n ( j ) F ( u ) du = n X j =1 p n ( j ) a j = a n +1 = F ( x ) . Consequen tly , F ( x ) = Z 1 0 p ( t ) F ( tx ) dt + Z 1 0  p x ( t ) − p ( t )  F ( tx ) dt = E  F ( T x )  + ϵ ( x ) , where T ∼ p and ϵ ( x ) ≜ R 1 0 [ p x ( t ) − p ( t )] F ( tx ) dt . T o sho w that F conv erges, our goal becomes to v erify the conditions of lemma 7. Note that since { a n } ∞ n =0 is b ounded, F is a b ounded piecewise contin uous function. Moreo ver, since p has a con tin uous density and finite log-moment, log (1 /T ) is non- lattice and has finite mean. Hence, it suffices to prov e that ϵ ( x ) /x is directly Riemann in tegrable o v er (1 , ∞ ). By a standard criterion for direct Riemann integrabilit y (Remark 34 in [15]), to sho w that ϵ ( x ) /x is directly Riemann in tegrable ov er (1 , ∞ ), it suffices to exhibit an ev en tually decreasing in tegrable function ∆( x ), such that | ϵ ( x ) | /x ≤ ∆( x ) for all sufficien tly large x . This follo ws from the following inequalities | ϵ ( x ) | x ≤ 1 x Z 1 0 | p x ( t ) − p ( t ) | | F ( tx ) | dt ≤ ∥ F ∥ ∞ ∥ p x − p ∥ 1 x ≤ C ∥ F ∥ ∞ x 1+ κ . In the last inequality , w e used the fact that { p n } ∞ n =1 strongly discretizes p . This concludes the pro of. 26 Remark 3 (Comparison to En v elop e Bounded W eigh ts) . If p ( t ) ≍ t − γ as t ↓ 0, then the discretization assigns p n (1) ≍ n γ − 1 , i.e., a p olynomial ceiling c ( n ) ≍ n γ . This illustrates that theorem 5 is genuinely outside the p olylogarithmic env elop e regime of theorem 1, and relies crucially on the fixed shap e structure. A App endix Prop osition 3. L et A, B > 0 and α, β ≥ 0 b e c onstants satisfying α + β / 2 ≤ 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . L et { (∆ T , n T ) } ∞ T =0 b e a se quenc e with initialization ∆ 0 = 1 and n 0 = k , define d by ∆ T +1 ≜ ∆ T × (1 − cε n T +1 δ 1 / 2 n T +1 ) wher e n T +1 ≜ n T × C ε 2 n T +1 δ 3 n T +1 ∆ 2 T . Then, ∆ T → 0 as T → ∞ . Pr o of. T o pro ve the con vergence, it suffices to sho w that ∞ X T =1 ε n T δ 1 / 2 n T = ∞ . Assume con trary that ∆ T → ∆ > 0. By definition of n T +1 , w e get n T +1 log( n T +1 ) 2 α +3 β ≤ C A 2 B 3 ∆ 2 × n T = ⇒ n T +1 ≤ n T × log( n T ) O (1) = ⇒ log n T ≤ O ( T log T ) . Consequen tly , ε n T δ 1 / 2 n T = Ω  [ T log T ] − { α + β 2 }  , which forms a div ergen t series in T when α + β / 2 ≤ 1. Prop osition 4. L et A, B > 0 and α, β > 0 b e c onstants satisfying α + β / 2 > 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . L et { ( U T , n T ) } ∞ T =0 b e a se quenc e with initialization U 0 > 0 and n 0 ≥ k , define d by n T +1 = n T / 2 δ 1 / 2 n T and U T +1 = U T (1 − C ε n T δ 1 / 2 n T ) for al l T ≥ 0 . If k is sufficiently lar ge, then U ∞ ≜ lim T →∞ U T > U 0 2 Pr o of. Observ e that as long as U ∞ > 0, w e may c ho ose k sufficien tly large to 27 guaran tee U ∞ > U 0 / 2. It therefore suffices to pro ve U ∞ > 0, which is equiv alen t to ∞ X T =0 ε n T δ 1 / 2 n T < ∞ . It follo ws from the definition that n T +1 ≥ 2 n T , as long as k is sufficien tly large. This implies log n T = Ω( T ). In particular, ε n T δ 1 / 2 n T = O  T − { α + β 2 }  forms a con vergen t series when α + β / 2 > 1. References [1] Daron Acemoglu, Munther A Dahleh, Ilan Lob el, and Asuman Ozdaglar. Ba yesian learning in so cial net w orks. The R eview of Ec onomic Studies , 78(4):1201–1236, 2011. [2] Da vid Aldous. Probability distributions on cladograms. In R andom discr ete structur es (Minne ap olis, MN, 1993) , volume 76 of IMA V ol. Math. Appl. , pages 1–18. Springer, New Y ork, 1996. [3] V enk atesh Bala and Sanjeev Goy al. Learning from neighbours. The R eview of Ec onomic Studies , 65(3):595–621, 1998. [4] Eric h Baur and Jean Bertoin. Elephan t random w alks and their connection to p´ olya-t yp e urns. Phys. R ev. E , 94:052134, Nov 2016. [5] Eric Cator and Henk Don. Self-a veraging sequences which fail to conv erge. Ele ctr on. Commun. Pr ob ab. , 22:P ap er No. 16, 12, 2017. [6] K. L. Chung and J. W olfowitz. On a limit theorem in renew al theory . A nn. of Math. (2) , 55:1–6, 1952. [7] Morris H. DeGro ot. Reac hing a consensus. Journal of the A meric an Statistic al Asso ciation , 69(345):118–121, 1974. [8] P . Erd¨ os, W. F eller, and H. P ollard. A prop erty of p ow er series with p ositive co efficien ts. Bul l. Amer. Math. So c. , 55:201–204, 1949. [9] Benjamin Golub and Matthew O Jackson. Naive learning in so cial netw orks and the wisdom of crowds. Americ an Ec onomic Journal: Micr o e c onomics , 2(1):112–149, 2010. 28 [10] Geoffrey R. Grimmett and David R. Stirzaker. Pr ob ability and r andom pr o- c esses . Oxford Universit y Press, New Y ork, third edition, 2001. [11] Sv an te Janson. Random recursiv e trees and preferential attachmen t trees are random split trees. Combin. Pr ob ab. Comput. , 28(1):81–99, 2019. [12] Elc hanan Mossel, Allan Sly , and Omer T amuz. Strategic learning and the top ology of so cial netw orks. Ec onometric a , 83(5):1755–1794, 2015. [13] Alex Olshevsky and John N. Tsitsiklis. Conv ergence speed in distributed con- sensus and av eraging [reprint of mr2480125]. SIAM R ev. , 53(4):747–772, 2011. [14] Gun ter M. Sc h ¨ utz and Steffen T rimp er. Elephants can alw a ys remember: Exact long-range memory effects in a non-mark o vian random walk. Phys. R ev. E , 70:045101, Oct 2004. [15] Ric hard Serfozo. Basics of applie d sto chastic pr o c esses . Probability and its Applications (New Y ork). Springer-V erlag, Berlin, 2009. 29

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment