Sharp Threshold for the Convergence of Nonstationary Averaging

Sharp Threshold for the Con v ergence of Nonstationary Av eraging Saba Lepsv eridze, Elchanan Mossel Abstract W e study non-stationary a veraging pro cesses, where each term of a se- quence is a w eighted a verage of previous terms, namely a n +1 = P n j =1 p n ( j ) a j . Our results extend classical theory in t w o distinct regimes. First, we pro ve a sharp threshold for conv ergence in the regime where the weigh ts are bounded b et w een t w o en v elop es (log n ) − α ≤ np n ( · ) ≤ (log n ) β . W e sho w that the sequence necessarily con verges when α + β / 2 ≤ 1, while α + β / 2 > 1 the con v ergence can fail. Second, w e study complemen tary ﬁxed shape regime, when p n is obtained b y a ﬁxed limiting densit y on (0 , 1). W e show that under mild regularity assumptions, the sequence conv erges. 1 In tro duction W eigh ted running a v erages show up frequen tly in probabilit y , dynamics, and opti- mization. A basic primitiv e b ehind these pro cedures is that the next iterate is a w eighted av erage of the past. While classical theory understands these pro cesses w ell when the weigh ts are constan t or uniform, m uc h less is known when the av er- aging w eigh ts could b e non-stationary and p otentially spiky . Let { p n } ∞ n =1 b e a sequence of probabilit y measures with p n ∈ P ([ n ]). Consider a sequence of vectors { a n } ∞ n =1 ⊂ R d deﬁned b y a n +1 = E j ∼ p n a j = n X j =1 p n ( j ) a j for n ≥ k, and an arbitrary initialization a 1 , . . . , a k ∈ R d . W e ask: Question 1. Under what conditions on { p n } ∞ n =1 do es the sequence { a n } ∞ n =1 neces- sarily con v erge? 1 A natural global constraint is to b ound each weigh t b etw een tw o env elop es f ( n ) n ≤ p n ( j ) ≤ c ( n ) n for all j ∈ [ n ] , where f ( n ) and c ( n ) describ e a ﬂo or and a ceiling, resp ectively . This allows a deca y of f ( n ) and blo w-up of c ( n ), whic h is far from uniform a veraging. W e answer this question b y obtaining optimal rates for f and c under which con vergence holds. Theorem 1. Supp ose f ( n ) = A (log n ) − α and c ( n ) = B (log n ) β with α, β > 0 . - If α + β / 2 ≤ 1 , then for any choic e of { p n } ∞ n =1 and any initialization, the se quenc e { a n } ∞ n =1 ne c essarily c onver ges. - If α + β / 2 > 1 , then ther e exists a choic e of { p n } ∞ n =1 and an initialization such that { a n } ∞ n =1 fails to c onver ge. In addition to the worst-case env elop e-b ounded weigh ts considered ab ov e, sec- tion 6 establishes theorem 5 for a complemen tary ﬁxe d-shap e regime. In this setting, the w eigh ts are obtained b y discretizing a limiting probabilit y densit y on (0 , 1). The additional structure permits a renew al-theoretic analysis and yields conv ergence un- der suitable regularit y and approximation assumptions. Theorem 2 (Informal) . L et { p n } ∞ n =1 b e a se quenc e of discr ete pr ob ability distribu- tions that appr oximate a density p : (0 , 1) → [0 , ∞ ) with ﬁnite lo garithmic moment. L et { a n } ∞ n =1 ⊂ R d satisfy a n +1 = E j ∼ p n  a j  for al l n ≥ k. Then the se quenc e { a n } ∞ n =1 c onver ges. The precise assumptions and statement appear in theorem 5. 1.1 Related W ork Note that the recursion a n +1 = P n j =1 p n ( j ) a j can b e view ed as a n +1 = E [ a J n ], where J n ∼ p n . Such recursions often app ear in probabilistic mo dels with memory as de- tailed b elo w. These works motiv ate our setting but w e assume neither stationarity nor mo del sp eciﬁc structure. In our pap er, w e fo cus on t w o general regimes: theo- rem 1 gives a sharp worst case conv ergence threshold ov er all nonstationary k ernels ob eying p oint wise en v elop e bounds, while theorem 5 treats a structured scaling in v ariant ﬁxed shape regime. 2 Split trees and tagged lineages. A concrete wa y our recurrence shows up in random trees is through the standard tree = ro ot + subtrees decomp osition. In a split tree with n items, the ro ot partitions the items in to subtree sizes ( J n, 1 , . . . , J n,b ) according to a random split, and man y parameters are obtained b y iterating this decomp osition do wn the tree as in [11]. If one follo ws a tagged ob ject (for example an uniformly chosen item, or the lineage of a random leaf ), then at each split one k eeps only the unique child subtree containing the tag. Hence, for man y scalar quan tity a n that is a function of the tagged subtree, one obtains a n +1 = E [ a J n ] = X j ≤ n P ( J n = j ) a j . This recursion is often the key input b efore lifting to global statemen ts ab out the whole tree lik e LLNs/CL Ts for depths, proﬁles, and related additiv e functionals. In fact, our ﬁxed shap e theorem 5 can b e applied as a black box to Aldous’ β -splitting mo del [2] to obtain con vergence b eha viour for man y recursions in these tree mo dels for β > − 1, when the splitting distribution has ﬁnite moments. Long-memory w alks such as the elephan t random w alk also generate full-memory a veraging recurrences, but we men tion them only as further motiv ation rather than as a structural input [14, 4]. Self-a veraging sequences Oftentimes ”probabilit y of an even t at time n ” prob- lems can b e written in a self a veraging form (see group Russian roulette example [5] and man y other references therein): one identiﬁes some b ounded statistic or an ev ent probabilit y a n and a random lo okback index J n ∈ [ n ] suc h that a n = E [ a J n ] . Setting p n ( j ) = P ( J n = j ), w e reduce to studying our recursion. Cator and Don [5] study precisely this equation for b ounded sequences, but under a concentration h yp othesis J n ≈ α n with V ar( J n ) = O ( n ). Our results complement this line of re- searc h b y studying worst-case adversarial env elop e and a structured scaling regimes. In fact, our div ergence construction theorem 4 shows that oscillations can p ersist ev en when most of the mass spread ov er p olylogarithmic num b er of indices. Renew al theory Renew al theory analyzes con v olution recursions suc h as u 0 = 1 and u n = n X k =1 f k u n − k , 3 where ( f k ) k ≥ 1 is a probabilit y mass function and ( u n ) is the asso ciated renewal sequence [8, 6, 15]. A classical result by Erd˝ os-F eller-P ollard theorem states that when the mean µ ≜ P k ≥ 1 k f k is ﬁnite and f is ap erio dic, one has the sharp limit u n − → 1 µ , whic h is the discrete-time analogue of the ”renewal densit y → 1 /µ ” principle [8]. More generally , key renewal theorems describ e limits for p erturb ed renew al equa- tions and iden tify the limiting constant via the mean µ and an explicit o vershoot la w [6, 15]. Our ﬁxed shap e regime in section 6 can b e put in this framework after a logarith- mic change of v ariables. Roughly sp eaking, when the w eights come from discretizing a densit y on (0 , 1) the recursion has form F ( x ) ≈ E [ F ( T x )] with T ∈ (0 , 1). W riting x = e s and G ( s ) = F ( e s ) turns this into an additiv e renew al equation G ( s ) = E [ G ( s − Y )] + η ( s ) , Y ≜ log(1 /T ) , where η captures discretization error. Thus a ﬁnite log moment E [log (1 /T )] < ∞ pla ys exactly the role of a ﬁnite mean incremen t in classical renewal theory , and yields con vergence together with an explicit residual description. This is stated in lemma 7, and might b e of indep enden t use b eyond this application. Consensus/so cial learning and sto chastic appro ximation. Classical dis- tributed consensus and so cial learning models iterate stochastic matrices on a ﬁxed agen t set and analyze con vergence via connectivit y/mixing assumptions [7, 13]. In the Bay esian setting Bala and Go yal [3] consider a Bay esian mo del, where when an agen t joins they observ e all previous agen ts (and their priv ate signal) b efore taking their action. The analogous mo del in the DeGrott framew ork consists of growing net work where when agents arriv e sequen tially , and agen t n + 1 forms an opinion b y a veraging earlier opinions with an attention proﬁle p n , x n +1 = X j ≤ n p n ( j ) x j . It is natural to ask when is asymptotic consensus reac hed in such a mo del. The Erd˝ os-F eller-Pollard theorem provides a p ositiv e answer when p ( n, j ) = f n − j and f has a ﬁnite mean and is non-p erio dic. Our results show that consensus is ac hieved ev en if agen ts use diﬀeren t av eraging weigh ts as long as these are equitable 4 enough among the preceding agen ts. Since our results are tight they also pro vide examples where if the w eights are not equitable consensus is not reached. Prior w ork in learning on netw orks highlighted the role of v arious notions of equabilit y in reac hing consensus and in learning [9, 1, 12] 1.2 Ac kno wledgements E.M. Is partially supported b y ARO MURI N00014241274, by V annev ar Bush F ac- ult y F ello wship ONR-N00014-20-1-2826 and b y a Simons Inv estigator Aw ard. S.L.’s researc h supp orted in part b y NSF–Simons collab oration gran t DMS- 2031883. 2 Reduction T o prov e the threshold in theorem 1, we reduce the problem to a one dimensional extremal pro cess. This reduction starts by applying the argument co ordinatewise and using aﬃne inv ariance of the recursion. In particular, we assume without loss of generalit y that { a n } ∞ n =1 ⊂ R . By aﬃne inv ariance of the sequence, we can shift and scale the initialization so that all terms are con tained in the interv al [0 , 1]. F or con venience, w e also reparametrize the problem by setting ε n ≜ f ( n ) and δ n ≜ 1 − f ( n ) c ( n ) − f ( n ) (1) The parameter δ n is chosen so that assigning maximal w eigh t c ( n ) /n to the largest δ n n terms 1 and f ( n ) /n to the remaining terms yields a v alid probability measure. W e no w introduce imp ortan t notation. F or a sequence { a n } ∞ n =1 and an index m ∈ N denote a ( m ) 1 ≥ · · · ≥ a ( m ) m and a [ m ] 1 ≤ · · · ≤ a [ m ] m to b e the descending and ascending orderings of the partial sequence { a 1 , . . . , a m } , resp ectiv ely . Similarly , we deﬁne the p ercen tile av erages ¯ a m ≜ a 1 + · · · + a m m and ¯ a ( m ) j ≜ a ( m ) 1 + · · · + a ( m ) j j and ¯ a [ m ] j ≜ a [ m ] 1 + · · · + a [ m ] j j . 1 T o impro ve readability , w e will ignore integralit y issues throughout the pap er, since they can b e handled with routine adjustments and do not aﬀect the asymptotic arguments. 5 In words, ¯ a m is the a verage of all terms a 1 , . . . , a m , while ¯ a ( m ) j and ¯ a [ m ] j are av erages of the largest and smallest j terms among the ﬁrst m elemen ts, resp ectiv ely . W e now describ e an equiv alen t formulation of the problem that w e will w ork with for the rest of the pap er. Lemma 1. L et { a n } ∞ n =1 b e a se quenc e of r e al numb ers and { p n } ∞ n =1 b e a se quenc e of pr ob ability me asur es with p n supp orte d on [ n ] such that a n +1 = E j ∼ p n a j for al l n ≥ k and f ( n ) n ≤ p n ( j ) ≤ c ( n ) n for al l j ∈ [ n ] . (2) Then, for al l n ≥ k the se quenc e { a n } ∞ n =1 satisﬁes a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k . (3) F urthermor e, if { a n } ∞ n =1 satisﬁes (3), then ther e exist { p n } ∞ n =1 such that (2) holds. Pr o of. Note that ε n and δ n are c hosen precisely so that c ( n ) n × δ n n + f ( n ) n × ( n − δ n n ) = 1 and f ( n ) = ε n . Supp ose ﬁrst that { a n } ∞ n =1 and { p n } ∞ n =1 satisfy (2). Then, a n +1 = n X j =1 p n ( j ) a j ≤ δ n n X j =1 c ( n ) n a ( n ) j + n X j = δ n n +1 f ( n ) n a ( n ) j = (1 − ε n )¯ a ( n ) δ n n + ε n ¯ a n In w ords, since the ob jectiv e is linear in { p n } , the maxim um is attained b y assigning maximal w eigh t to the largest entries. Analogously , w e ha ve a n +1 ≥ (1 − ε n )¯ a [ n ] δ n n + ε n ¯ a n . On the other hand, supp ose { a n } ∞ n =1 satisﬁes (3). Then, for all n ≥ k there exists λ n ∈ [0 , 1], such that a n +1 = ε n ¯ a n + (1 − ε n )( λ n ¯ a ( n ) δ n n + (1 − λ n ) a [ n ] δ n n ) . Hence, w e can write a n +1 = E j ∼ p n a j , where p n ( j ) = f ( n ) n + c ( n ) − f ( n ) n          λ n if a j is among the largest δ n n terms 1 − λ n if a j is among the smallest δ n n terms 0 otherwise 6 This concludes the tw o w a y reduction. With the reparameterization (1), we abuse notation and write ε n = A (log n ) − α and δ n = B (log n ) − β , for some constan ts A, B > 0. 3 Ma jorization In this section, we develop a comparison to ol that helps us lift the analysis from simple sequences to more complex ones. In particular, we use ma jorization to pro v e the main tec hnical prop ositions in the next section. Deﬁnition 1. (Ma jorization) W e say that { a n } k n =1 ma jorizes { b n } k n =1 if a ( k ) 1 ≥ b ( k ) 1 a ( k ) 1 + a ( k ) 2 ≥ b ( k ) 1 + b ( k ) 2 · · · a ( k ) 1 + · · · + a ( k ) k ≥ b ( k ) 1 + · · · + b ( k ) k W e denote this relation by { a n } k n =1 ⪰ { b n } k n =1 . W e note that this deﬁnition diﬀers from the t ypical notion of ma jorization as w e do not require equality of total sums. The follo wing lemma states that if the initialization of one sequence ma jorizes that of another, then the former dominates the latter at all future times. Lemma 2 (Ma jorization) . Supp ose { a n } k n =1 majorizes { b n } k n =1 and a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n and b n +1 ≤ ε n ¯ b n + (1 − ε n ) ¯ b ( n ) δ n n for al l n ≥ k . Then 1. { a n } m n =1 ⪰ { b n } m n =1 for al l m ≥ k , and 2. a m ≥ b m for al l m ≥ k + 1 . Pr o of. W e pro ceed to prov e b y induction. Base case m = k holds by assumption. Assume no w that { a n } m n =1 ⪰ { b n } m n =1 . Ma jorization implies that ¯ a ( m ) j ≥ ¯ b ( m ) j holds 7 for an y j ∈ { 1 , . . . , m } . In particular, a m +1 = ε m ¯ a m + (1 − ε m )¯ a ( m ) δ m m ≥ ε m ¯ b m + (1 − ε m ) ¯ b ( m ) δ m m ≥ b m +1 . It remains to show that { a n } m +1 n =1 ⪰ { b n } m +1 n =1 . Deﬁne multisets A ( m ) j ≜ n a ( m ) 1 , . . . , a ( m ) j o and B ( m ) j ≜ n b ( m ) 1 , . . . , b ( m ) j o . F or a m ultiset S we deﬁne Σ S ∈ R to b e the sum of the terms within. W e consider t wo cases as follo ws. - If b m +1 / ∈ B ( m ) j , then Σ B ( m +1) j = Σ B ( m ) j ≤ Σ A ( m ) j ≤ Σ A ( m +1) j . - If b m +1 ∈ B ( m ) j , then Σ B ( m +1) j = b m +1 + Σ B ( m ) j − 1 ≤ a m +1 + Σ A ( m ) j − 1 ≤ Σ A ( m +1) j . This concludes the pro of. Another useful observ ation is that even if one sequence initially ma jorizes an- other, this dominance need not manifest itself in future iterates. More precisely , supp ose the initialization of the second sequence is obtained from that of the ﬁrst b y replacing the top m terms with their a verage. Then the tw o sequences evolv e iden tically for as long as newly generated terms do not exceed the m -th largest term in a . Lemma 3. (R everse Majorization) L et δ n b e a de c aying p ar ameter such that the se quenc e { δ n n } ∞ n = k is non-de cr e asing. Supp ose { a n } ∞ n =1 and { b n } ∞ n =1 satisfy a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n b n +1 = ε n ¯ b n + (1 − ε n ) ¯ b ( n ) δ n n for al l n ≥ k . Supp ose that ther e is m ≤ δ k k , such that b ( k ) j = ¯ a ( k ) m for al l j ≤ m and b ( k ) j = a ( k ) j for al l m < j ≤ k ; Then, a n = b n for al l n > k as long as b n ≤ a ( k ) m for al l n > k . 8 Pr o of. The main idea is to show that the sequences a and b evolv e identically: the only discrepancy b etw een them is conﬁned to the top m terms, and these terms remain at the top throughout. It suﬃces to show ¯ a n = ¯ b n and ¯ a ( n ) δ n n = ¯ b ( n ) δ n n for all n ≥ k . W e will pro ceed to pro ve b y induction that for all n ≥ k , b ( n ) j = ¯ a ( n ) m for all j ≤ m and b ( n ) j = a ( n ) j for all m < j ≤ k . Note that since m ≤ δ k k ≤ δ n n , this implies b oth ¯ a ( n ) δ n n = ¯ b ( n ) δ n n and ¯ a n = ¯ b n . The b ase case n = k holds b y assumption. On the other hand, inductiv e h yp oth- esis immediately implies a n +1 = b n +1 . Moreo v er, b y assumption, b n +1 ≤ a ( k ) m = a ( n ) m . This means that the new term do es not enter the top m terms in a or b , which concludes the pro of. 4 Main T ec hnical T o ols In this section, w e use ma jorization to establish t wo main tec hnical results, prop osi- tion 1 and prop osition 2, whic h will b e used in the next section to pro ve con vergence and non-con v ergence, resp ectiv ely . Roughly sp eaking, proposition 1 asserts that if the initialization of the sequence is mostly contained in an interv al [ B , U ] and its av erage is close to B , then the future iterates remain uniformly b ounded aw ay from U . Prop osition 1. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α, β ≥ 0 . Ther e exist c onstants c and C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ther e exists an interval [ B , U ] such that 1. ¯ a k = γ U + (1 − γ ) B with γ ≤ 1 2 ; 2. Among the initial k terms { a n } k n =1 , less than ck ( U − B ) min { γ , ε k δ k } terms lie outside the interval [ B , U ] . Then for al l n > k , a n ≤ B + ( U − B )    C γ /ε k δ k for γ ≤ 1 / 2 , 1 − c ε k δ k /γ for γ ≥ ε k δ k / 2 C 9 Prop osition 1 will b e used to show that interv als capturing the tails of the sequence con tract suﬃcien tly to ensure conv ergence. In con trast, prop osition 2 sho ws that if the initialization of the sequence contains a suﬃcien tly large prop ortion of terms exceeding a threshold U , then these v alues can force subsequen t iterates to increase. Prop osition 2. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α, β > 0 . Ther e exists a c onstant C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose that among the initial k terms { a n } k n =1 , at le ast δ 1 / 2 k fr action exc e e d U . Then, among the ﬁrst k / 2 δ 1 / 2 k terms, at le ast half satisfy a n ≥  1 − C ε k δ 1 / 2 k  U. Prop osition 2 will b e used to show that the sequence can b e driv en suﬃciently ”bac k and forth” to preven t con v ergence. Remark 1. Prop osition 1 is a partial con v erse of prop osition 2. In particular, note that if B = 0 and γ = δ 1 / 2 k , then it promises a n ≤ U (1 − cε k δ 1 / 2 k ), while from the latter we ha ve a n ≥ U (1 − C ε k δ 1 / 2 k ). This is where the threshold α + β / 2 = 1 arises. 4.1 T o ols for Upp er Bounds In this section, w e prov e a help er lemma, which we will later lift b y ma jorization to pro v e prop osition 1. The latter is the main to ol w e will use for upp er b ounds. Lemma 4. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α , β ≥ 0 . Ther e exists a c onstant C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ¯ a k ≤ γ . Then, for al l n > k a n ≤ C γ ε k δ k . Pr o of. By lemma 2, w e assume without loss of generalit y that the initialization { a n } k n =1 is comprised of γ k ones and (1 − γ ) k zero es, as this sequence ma jorizes all 10 others in [0 , 1] with the same mean. Since the recursion is monotone with resp ect to ma jorization, an y upp er b ound prov ed for this dominating sequence automatically applies to the original sequence. Set C to b e a large constan t to b e sp eciﬁed in the argumen t b elow. Deﬁne a sequence { u n } ∞ n =1 b y setting u 1 = · · · = u k = 0 and letting u n +1 = ε n ¯ u n + (1 − ε n ) u n + γ k δ n n for all n ≥ k . Claim 1. The se quenc e { u n } ∞ n =1 is non-de cr e asing and lim n →∞ u n ≤ C γ ε k δ k Assuming claim 1, to ﬁnish the pro of of lemma 4, it suﬃces to prov e a n ≤ u n for all n > k . This follows from an inductiv e argument. In what follows, we v erify the base case and the inductive step sim ultaneously . Supp ose that a m ≤ u m for all k < m ≤ n . Then, ¯ a n = k ¯ a k + a k +1 + · · · + a n n ≤ k ¯ a k + u k +1 + · · · + u n n = n ¯ u n + γ k n . Observ e this b ound also holds for the base case n = k . Contin uing, ¯ a ( n ) δ n n = γ k + a ( n ) γ k +1 + · · · + a ( n ) δ n n δ n n ≤ γ k + ( δ n n − γ k ) u n δ n n . This b ound also holds for the base case n = k . Combining the tw o b ound, w e get a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n ≤ ε n n ¯ u n + γ k n + (1 − ε n ) ( δ n n − γ k ) u n + γ k δ n n = ε n ¯ u n + (1 − ε n ) u n + (1 − ε n + ε n δ n ) γ k δ n n ≤ u n +1 , whic h pro v es a n +1 ≤ u n +1 and hence the lemma 4. No w it remains to prov e claim 1. Pr o of of Claim 1. Deﬁne ∆ n ≜ n ( u n − ¯ u n ). Then, the recurrence is rewritten as ∆ n +1 = ( n + 1)( u n +1 − ¯ u n +1 ) = nu n +1 − n ¯ u n = ε n n ¯ u n + (1 − ε n ) nu n + γ k δ n − n ¯ u n = (1 − ε n )∆ n + γ k δ n . 11 W e ﬁrst sho w that this recursive formula implies that ∆ n ≤ γ k ε n δ n . (4) W e pro ve this b ound by induction. The base case j ≤ k follows from ∆ j = 0. Now ∆ n +1 = (1 − ε n )∆ n + γ k δ n ≤ (1 − ε n ) γ k ε n δ n + γ k δ n = γ k ε n δ n ≤ γ k ε n +1 δ n +1 . Here we used ε n +1 δ n +1 ≤ ε n δ n , which holds as long as C is a suﬃciently large constan t. W e now show that { ∆ n } ∞ n = k +1 is non-decreasing. This follows immediately from the iden tit y ∆ n +1 = ∆ n − ε n ∆ n + γ k δ n ≥ ∆ n . (5) No w w e translate (4) and (5) to b ounds on the sequence u n using the iden tit y ∆ n +1 − ∆ n n = ( n + 1) u n +1 − ( n + 1) ¯ u n +1 − nu n + n ¯ u n n = u n +1 − u n . Moreo ver, (5) immediately implies that { u n } ∞ n =1 is non-decreasing. Now u n +1 = u n + ∆ n +1 − ∆ n n = ⇒ u n +1 = ∆ n +1 n + n X t = k +1 ∆ t t ( t − 1) Applying the b ound (4) to this, w e get u n +1 ≤ γ k ε n +1 δ n +1 n + ∞ X t = k +1 γ k ε t δ t t ( t − 1) . If C is a suﬃciently large constan t, w e hav e ε n +1 δ n +1 n ≥ ε k δ k k for all n ≥ k , so the ﬁrst term is b ounded b y γ /ε k δ k . W e b ound the second term by splitting the sum in to t wo parts ∞ X t = k +1 1 ε t δ t t ( t − 1) ≤ k 2 X t = k +1 1 ε t δ t t ( t − 1) + ∞ X t = k 2 1 ε t δ t t ( t − 1) . Since ε t and δ t deca y p olylogarithmically , there exists a small constan t c suc h that ε t δ t ≥ cε k δ k for t ≤ k 2 and ε t δ t t ( t − 1) ≥ ct 3 / 2 for t ≥ k 2 . Hence, k 2 X t = k +1 1 ε t δ t t ( t − 1) + ∞ X t = k 2 1 ε t δ t t ( t − 1) ≤ 1 cε k δ k ∞ X t = k 1 t 2 + 1 c ∞ X t = k 2 1 t 3 / 2 ≤ ( C − 1) 1 ε k δ k k . 12 In the last inequalit y , w e chose C to b e a large enough constant. Substituting this in to the b ound for u n +1 ab o v e, w e conclude the pro of. Ha ving pro ved claim 1, w e conclude the pro of of lemma 4. Lemma 5. Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β with α , β ≥ 0 . Ther e exist c onstants c and C such that the fol lowing holds. Supp ose k ≥ C and let { a n } ∞ n =1 b e a se quenc e with initialization a 1 , . . . , a k ∈ [0 , 1] satisfying a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n for al l n ≥ k . Supp ose ¯ a k ≤ γ ≤ 3 4 . Then, for al l n > k we have a n ≤    C γ /ε k δ k for γ ≤ 3 / 4 , 1 − c ε k δ k /γ for γ ≥ ε k δ k / 16 C Pr o of. Let K b e the constant from lemma 4. Fix c and C to b e some small and large constant, resp ectively . Since k ≥ C and ε n and δ n satisfy logarithmic decay , w e can tak e C large enough so that 1. b oth { ε n } ∞ n = k and { δ n } ∞ n = k are non-increasing and at most 1 2 ; 2. the sequence { ε n δ n n } ∞ n = k is increasing; 3. if m is chosen so that ε m δ m m ≜ K k then ε m ≥ ε k K and δ m ≥ δ k K . W e no w b egin the pro of. Lemma 4 shows that for all n > k we hav e a n ≤ K γ / ( ε k δ k ). T aking C ≥ K , we obtain the ﬁrst part of the desired b ound. In what follo ws, w e assume γ ≥ ε k δ k / 16 C . First, we use induction to sho w that for all k < n ≤ m a n ≤ λ ≜ 1 − ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ . W e verify the base case and the inductive step simultaneously . Assume a j ≤ λ for 13 all k < j ≤ n . Since n ≤ m , w e get a n +1 = ε n ¯ a n + (1 − ε n )¯ a ( n ) δ n n ≤ ε m ¯ a n + (1 − ε m )¯ a ( n ) δ m n ≤ ε m γ k + λ ( n − k ) n + (1 − ε m ) γ k + λ ( δ m n − γ k ) δ m n ≤ λ. Observ e that this b ound also holds for the base case n = k , so the induction is complete. Next, we recall that b y assumption ε k δ k / 16 C < γ ≤ 3 / 4, so ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ ≥ 1 4 ε m δ m ε m δ m + γ ≥ 1 4(16 C + 1) ε m δ m γ ≥ ε m δ m 100 C γ Since our goal is to obtain an upp er b ound on the terms of the sequence, it suﬃces to establish the bound for a simpler sequence that dominates the original one. More precisely , b y applying lemma 2 we ma y assume without loss of generalit y that among { a n } m n =1 there are γ k ones, and the remaining terms are equal to λ + where λ + ≜ 1 − ε m δ m 100 C γ ≥ 1 − ε m δ m (1 − γ ) ε m δ m + (1 − ε m ) γ = λ. By aﬃne in v ariance of the recurrence, w e may shift and rescale the sequence so that λ + is sent to 0 and 1 is sen t to 1. Applying lemma 4 in this scale and then undoing the aﬃne transformation yields the desired b ound for the original sequence. More precisely , w e conclude that for all n > k a n ≤ λ + + (1 − λ + ) K γ k ε m δ m m ≤ λ + + (1 − λ + ) × 3 4 = 1 − ε m δ m 400 C γ ≤ 1 − c ε k δ k γ . In the last step, we c hose c = 1 / (400 C K 2 ). W e are no w ready to prov e prop osition 1. Pr o of of pr op osition 1. W e will apply ma jorization lemma (lemma 2) and reverse ma jorization lemma (lemma 3) to reduce to lemma 5. Since our goal is to obtain an upper b ound on the terms of the sequence, it suﬃces to establish the bound for a simpler sequence that dominates the original one. T o this end, we apply lemma 2 to assume without loss of generality that all terms originally lying in the interv al [ B , U ] are mov ed to one of the endp oints { B , U } in a wa y that preserves b oth the a verage and the n um b er of terms lying outside the interv al. W e ma y apply lemma 2 once again to replace all terms not in { B , U } b y 1. At this p oin t, every term in the initialization { a n } k n =1 tak es one of the v alues { B , U, 1 } . 14 Note that this step alters the a verage. W e will quan tify this change and sho w that it aﬀects the situation only marginally . Denote by k B the num b er of terms equal to B , and similarly deﬁne k U and k 1 . Let U + b e the av erage of the terms are equal to U or 1, and let γ + b e the fraction of terms taking v alues in { U, 1 } . Th us, U + ≜ k U U + k 1 k U + k 1 and γ + ≜ k U + k 1 k . W e ﬁrst sho w that U + ≈ U and γ + ≈ γ . Claim 2. With the notation ab ove, we have (1 − c ) γ ≤ γ + ≤ (1 + c ) γ and U ≤ U + ≤ U + 2 c ( U − B ) min  1 , ε k δ k γ  Pr o of. Since the num b er of terms outside the interv al [ B , U ] has not changed, w e ha ve k 1 ≤ cγ k ( U − B ). Moreo v er, by replacing at most k 1 terms b y 1, the total sum can increase by at most k 1 . Therefore, ¯ a k ≤ γ U + (1 − γ ) B + k 1 k ≤ (1 + c ) γ U + (1 − (1 + c ) γ ) B . By Mark o v inequalit y , this implies the upp er b ound γ + ≤ (1 + c ) γ . F or the lo wer b ound on γ + , note that replacing some terms by 1 can only increase the a v erage. Hence, B k + γ k ( U − B ) ≤ k ¯ a k = U k U + B k B + k 1 ≤ k U ( U − B ) + B k + k 1 ≤ k U ( U − B ) + B k + cγ k ( U − B ) In particular, this implies γ + ≥ k U k ≥ (1 − c ) γ . Finally , observe that low er b ound on U + is trivial, while upp er b ound follows from U + ≤ U k U + k 1 k U ≤ U + ck ( U − B ) (1 − c ) γ k min { γ , ε k δ k } ≤ U + 2 c ( U − B ) min  1 , ε k δ k γ  . Here w e c hose c < 1 / 2. Let us now collect the all terms taking v alue in { U, 1 } in to their av erage U + . By lemma 3, this op eration do es not aﬀect the future evolution of the sequence, 15 pro vided that the newly generated terms remain b ounded by U . In particular, as long as we establish the b ound a n ≤ U for all n > k , which is w eak er than the conclusion of the prop osition, w e ma y apply rev erse ma jorization without loss of generalit y . In this transformed sequence, we hav e a 1 , . . . , a k ∈ { B , U + } , with the fraction of terms equal to U + b eing exactly γ + . By claim 2, w e ha v e γ + ≤ (1 + c ) γ ≤ 3 / 4. In particular, after shifting and rescaling the sequence to tak e v alues in { 0 , 1 } , w e ma y apply lemma 5 and undo the transformation to conclude that there exist constan ts c ′ and C ′ , suc h that for all n > k , a n ≤ B + ( U + − B )    C ′ γ + /ε k δ k for γ + ≤ 3 / 4 , 1 − c ′ ε k δ k /γ + for γ + ≥ ε k δ k / 16 C ′ It now suﬃces to b ound the right-hand side, for which w e inv oke claim 2 again. First, whenev er γ ≤ 1 / 2, w e ha v e γ + ≤ 3 / 4. Hence, for C = 4 C ′ w e obtain a n ≤ B + C ′ γ + ε k δ k ( U + − B ) ≤ B + 4 C ′ γ ε k δ k ( U − B ) = B + C γ ε k δ k ( U − B ) . On the other hand, whenever γ ≥ ε k δ k / 2 C , w e ha ve γ + ≥ ε k δ k / 16 C ′ , so we obtain a n ≤ B + ( U + − B )  1 − c ′ ε k δ k γ  ≤ B + ( U − B )  1 + 2 cε k δ k γ   1 − c ′ ε k δ k γ  ≤ B + ( U − B )  1 − cε k δ k γ  . Here we applied ( U + − U ) ≤ 2 c ε k δ k ( U − B ) /γ and chose c to b e a suﬃciently small constan t relativ e to c ′ . 4.2 T o ols for Low er Bounds In this section, w e ﬁrst establish a help er lemma, whic h w e then lift via ma jorization to pro v e prop osition 2. The latter is the main to ol for lo w er b ounds. Lemma 6. Fix 0 < ε, δ, γ ≤ 1 2 such that γ ≥ 2 δ . Supp ose { a n } ∞ n =1 is a se quenc e with initialization a 1 = · · · = a γ k = 1 and a γ k +1 = · · · = a k = 0 , satisfying a n +1 ≥ ε ¯ a n + (1 − ε ) ¯ a ( n ) δ n for al l n ≥ k . 16 Then, at le ast half of the terms up to index γ k / 2 δ satisfy a n ≥ 1 − ε  5 δ γ  1 − ε . Pr o of. Let n 0 ≜ γ k / (4 δ ). Observ e that if n ≤ 2 n 0 then δ n ≤ 2 δ n 0 ≤ γ k . In particular ¯ a ( n ) δ n = 1. Th us, as long as n ≤ 2 n 0 , w e ha v e a n +1 ≥ ε ¯ a n + (1 − ε ) so ¯ a n +1 ≥ ( n + ε )¯ a n + (1 − ε ) n + 1 . W e can solv e this recurrence by setting ¯ a n = 1 − d n and observing that d n +1 ≤ d n  1 − 1 − ε n + 1  so d n ≤ (1 − γ ) n Y j = k +1  1 − 1 − ε j  ≤ exp  (1 − ε ) log  k + 1 n + 1  . Since ¯ a ( n ) δ n ≥ ¯ a n , the sequence ¯ a n is non-decreasing. Hence, for all n ≥ n 0 ¯ a n ≥ 1 −  k + 1 n 0 + 1  1 − ε ≥ 1 −  5 δ γ  1 − ε . This means that whenever n 0 ≤ n < 2 n 0 w e ha ve that a n +1 ≥ ε ¯ a n + 1 − ε ≥ 1 − ε  5 δ γ  1 − ε . This concludes the pro of. W e are no w ready to prov e prop osition 2. Pr o of of pr op osition 2. Fix a constan t C to b e sp eciﬁed b elo w. F or notational con- v enience, set ε ≜ ε k , δ ≜ δ k , and γ ≜ δ 1 / 2 k . By the ma jorization lemma (lemma 2), w e ma y assume without loss of gener- alit y that among the initial terms { a n } k n =1 , a γ fraction are equal to U , while the remaining terms are equal to 0. If C is chosen suﬃciently large, then 0 ≤ ε, δ, γ ≤ 1 / 2 and γ ≥ 2 δ . Moreo v er, w e may assume that ε n and δ n are non-increasing for n ≥ k . In this case, we may rescale the sequence so that all terms tak e v alues in { 0 , 1 } , and apply lemma 6 to 17 conclude that, for at least half the terms up to k / 2 δ 1 / 2 satisfy a n ≥ " 1 − ε  5 δ γ  1 − ε # U It no w suﬃces to sho w that (5 δ/γ ) 1 − ε ≤ C δ /γ provided that C is c hosen suﬃcien tly large. This follows from the observ ation that log n γ δ  ε o = ε 2 log 1 δ = A β log log k − log B 2(log k ) α = O (1) . 5 Pro of of the Main Theorem W e ﬁrst apply prop osition 1 to prov e conv ergence in theorem 3. Theorem 3. L et A, B > 0 and α, β ≥ 0 b e c onstants satisfying α + β / 2 ≤ 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β and let { a n } ∞ n =1 b e a se quenc e with initialization ( a 1 , . . . , a k ) that satisﬁes a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k . Then the se quenc e ne c essarily c onver ges. Pr o of. Since the recurrence is aﬃne in v arian t, we ma y shift and rescale the initial- ization to assume without loss of generality that a 1 , . . . , a k ∈ [0 , 1]. F urthermore, w e ma y assume that k arbitrarily large. W e will construct a nested sequence of shrinking interv als [ B T , U T ] that capture the tails of the sequence. The interv als will b e deﬁned inductively in stages. More precisely , at eac h stage T ∈ N 0 , w e will maintain an in terv al [ B T , U T ], an index n T , and the guaran tee that a n ∈ [ B T , U T ] for all n ≥ n T . A t stage T = 0, we initialize with B 0 = 0, U 0 = 1, and n 0 = k . Since every term of the sequence is a conv ex combination of a 1 , . . . , a k , all subsequen t terms remain b ounded in [0 , 1]. In particular, a n ∈ [0 , 1] for all n ≥ k , so the base case holds. Assume no w that we are at stage T . W e deﬁne the starting index of the next stage implicitly b y setting n T +1 ≜ n T × K 2 ε 2 n T +1 δ 3 n T +1 ( U T − B T ) 2 . 18 Here K is a suﬃciently large constan t to be sp eciﬁed b elow. This deﬁnition is well p osed as the sequence mε 2 m δ 3 m is increasing for m ≥ k when k is suﬃcien tly large. F or notational con venience, w e set ε ∗ ≜ ε n T +1 and δ ∗ ≜ δ n T +1 . Belo w, w e construct the next in terv al [ B T +1 , U T +1 ] ⊂ [ B T , U T ] such that a n ∈ [ B T +1 , U T +1 ] for all n ≥ n T +1 , and ( U T +1 − B T +1 ) ≤ ( U T − B T ) × (1 − cε ∗ δ 1 / 2 ∗ ) for a suﬃcien tly small constant c , indep enden t of T . Observ e that this suﬃces to complete the proof, for the follo wing reason. Given the contraction b ound ab ov e, prop osition 3 implies that ( U T − B T ) → 0 as T → ∞ . Since the in terv als are nested, there exists a limit L suc h that B T → L and U T → L . Moreo ver, since a n ∈ [ B T , U T ] for all n ≥ n T , w e conclude that lim inf n →∞ a n = L = lim sup n →∞ a n . This implies that the sequence conv erges. It no w suﬃces to construct the interv al with the desired prop erties. F or the purp oses of the analysis, we deﬁne an in termediate index b y n + T ≜ n T × K ε ∗ δ 2 ∗ ( U T − B T ) . T o aid the reader’s understanding, let us outline the reasoning b ehind the argumen t. - Recall that, by inductive h yp othesis, a n ∈ [ B T , U T ] for all n ≥ n T . This means that if w e wait long enough, an ov erwhelming ma jority of the sequence will lie in [ B T , U T ]. This allo ws us to eﬀective ly zo om in on this interv al and apply our technical lemma. More precisely , b y time n + T the second condition of prop osition 1 is met. - If, at any time m b etw een n + T and n T +1 , the av erage ¯ a m comes to o close to B T , then by prop osition 1, subsequen t terms will b e b ounded aw ay from U T . This lets us shrink the interv al. By symmetry , the same argumen t applies with the roles of U T and B T rev ersed. - Alternativ ely , if at no time b etw een n + T and n T +1 do es the a v erage come close to B T or U T , then the sequence remained a w a y from both endp oints for to o long. In this case, b y prop osition 1 again, all subsequent terms remain 19 uniformly b ounded a wa y from at least one endp oint, which again allows us to shrink the in terv al. W e now carry out the ab ov e analysis as follo ws. First, suppose that there exists an index m with n + T ≤ m ≤ n T +1 , such that ¯ a m ≤ δ 1 / 2 m U T + (1 − δ 1 / 2 m ) B T . Since all terms { a j : n T ≤ j ≤ m } are con tained in [ B T , U T ], the fraction of terms lying outside the in terv al is at most n T m ≤ n T n + T = ε ∗ δ 2 ∗ ( U T − B T ) K ≤ ε m δ m ( U T − B T ) K . Therefore, if K is c hosen suﬃcien tly large, the conditions prop osition 1 are met. It follo ws that for suﬃcien tly small constant c > 0 and all n ≥ m , a n ≤ U T − c ε m δ m δ 1 / 2 m ( U T − B T ) ≤ U T − c ε ∗ δ 1 / 2 ∗ ( U T − B T ) . In this case, w e ma y set B T +1 ≜ B T and U T +1 ≜ U T − c ε ∗ δ 1 / 2 ∗ ( U T − B T ) to ﬁnish the pro of. Analogously , if there exists m ∈ [ n + T , n T +1 ] suc h that ¯ a m ≥ δ 1 / 2 m B T + (1 − δ 1 / 2 m ) U T , then we may instead set U T +1 ≜ U T and B T +1 ≜ B T + cε ∗ δ 1 / 2 ∗ ( U T − B T ) to ﬁnish the pro of. In the remainder of the pro of, w e assume that for all m ∈ [ n + T , n T +1 ], δ 1 / 2 m U T + (1 − δ 1 / 2 m ) B T < ¯ a m < δ 1 / 2 m B T + (1 − δ 1 / 2 m ) U T . Deﬁne the in termediate candidate endp oints b y U − T ≜ U T − 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) and B + T ≜ B T + 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) . First, we will show that a m +1 ∈ [ B + T , U − T ] for all m ∈ [ n + T , n T +1 ). T o this end, observ e that a m +1 ≤ ε m ¯ a m + (1 − ε m )¯ a ( m ) δ m m ≤ ε ∗ ¯ a m + (1 − ε ∗ )¯ a ( m ) δ ∗ m ≤ ε ∗ [ δ 1 / 2 ∗ B T + (1 − δ 1 / 2 ∗ ) U T ] + (1 − ε ∗ )  1 × n T + U T × ( δ ∗ m − n T ) δ ∗ m  = ε ∗ δ 1 / 2 ∗ B T + (1 − ε ∗ δ 1 / 2 ∗ ) U T + (1 − U T )(1 − ε ∗ ) n T δ ∗ m . The second inequality ab ov e follo ws from monotonicity δ ∗ ≤ δ m . In the third 20 inequalit y , we use the crude b ounds: a n ≤ 1 for n ≤ n T and a n ≤ U T if a n > n T . T o conclude, w e b ound the ﬁnal term as (1 − U T )(1 − ε ∗ ) n T δ ∗ m ≤ n T δ ∗ n + T = 1 K ε ∗ δ ∗ ( U T − B T ) ≤ 1 2 ε ∗ δ 1 / 2 ∗ ( U T − B T ) . Substituting this b ound ab ov e yields a m +1 ≤ U − T and analogously a m +1 ≥ B + T . Deﬁne γ b y ¯ a n T +1 = γ U − T + (1 − γ ) B + T . Without loss of generality , let us assume that γ ≤ 1 / 2. Recall that ¯ a n T +1 ≥ δ 1 / 2 ∗ U T + (1 − δ 1 / 2 ∗ ) B T . Assuming ε ∗ is suﬃcien tly small, this implies γ ≥ 2 δ 1 / 2 ∗ . F urthermore, since a m +1 ∈ [ B + T , U − T ] for all m ∈ [ n + T , n T +1 ), the fraction of terms up to time n T +1 that fall outside the interv al [ B + T , U − T ] is at most n + T n T +1 = ε ∗ δ ∗ ( U T − B T ) K ≤ 2 ε ∗ δ ∗ ( U − T − B + T ) K . Hence, by prop osition 1 we get a n ≤ U − T for all n ≥ n T +1 . In particular, we may set B T +1 = B T and U T +1 = U − T to conclude the pro of. W e no w apply prop osition 2 to prov e the non-conv ergence in theorem 4. Theorem 4. L et A, B > 0 and α, β > 0 b e c onstants satisfying α + β / 2 > 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . Ther e exists a se quenc e { a n } ∞ n =1 and an initialization ( a 1 , . . . , a k ) satisfying a n +1 ∈ ε n ¯ a n + (1 − ε n )[¯ a [ n ] δ n n , ¯ a ( n ) δ n n ] for al l n ≥ k that do es not c onver ge. Pr o of. W e will construct the sequence together with a nested chain of interv als [ B T , U T ] such that T ∞ T =0 [ B T , U T ] = [ B ∞ , U ∞ ] for some B ∞ < 1 / 2 < U ∞ . F urther- more, there will b e inﬁnitely many terms of the sequence { a n } ∞ n =1 lying ab o v e U ∞ and b elo w B ∞ . The sequence and the interv als will b e constructed inductively in stages. More precisely , at eac h stage T ∈ N 0 , w e will maintain an in terv al [ B T , U T ], an index n T , and the follo wing inv arian t • if T is even, then – at least δ 1 / 2 n T fraction of terms among { a n } n T n =1 are ab o v e U T ; – at least 1 2 fraction of terms among { a n } n T n =1 are b elo w B T . 21 • if T is o dd, then – at least 1 2 fraction of terms among { a n } n T n =1 are ab o v e U T ; – at least δ 1 / 2 n T fraction of terms among { a n } n T n =1 are b elo w B T . A t stage T = 0, we initialize with B 0 = 0, U 0 = 1, n 0 = k , and deﬁne the sequence a 1 = · · · = a ⌊ k/ 2 ⌋ = 0 and a ⌈ k/ 2 ⌉ = · · · = a k = 1 . W e assume that k is suﬃciently large. Supp ose no w that we are at stage T , ha ving deﬁned all terms of the sequence up to index n T . Without loss of generalit y , assume T is ev en. F or brevity , set ε ∗ ≜ ε n T and δ ∗ ≜ δ n T . Deﬁne n T +1 ≜ n T / 2 δ 1 / 2 ∗ and a n +1 ≜ ε ∗ ¯ a n + (1 − ε ∗ )¯ a ( n ) δ ∗ n for n T ≤ n < n T +1 . Since ε n ≤ ε ∗ and δ n ≤ δ ∗ , this deﬁnes a v alid extension of the sequence. T o deﬁne the new endp oin ts, observ e that the fraction of terms among { a n } n T n =1 that are less than B T is at least n T / 2 n T +1 = δ 1 / 2 ∗ . W e may therefore set B T +1 ≜ B T . F urthermore, by prop osition 2, at least half of the terms among { a n } n T +1 n =1 exceed (1 − C ε n T δ 1 / 2 n T ) U T ≜ U T +1 for a suﬃcien tly large constan t C . This concludes the construction. Observ e that, by construction, there are inﬁnitely many terms of the sequence { a n } ∞ n =1 lying ab ov e U ∞ and b elow B ∞ . It therefore suﬃces to sho w U ∞ > B ∞ . This follo ws directly from prop osition 4 since α + β / 2 > 1 and B ∞ ≜ lim T →∞ B T < 1 2 < lim T →∞ U T ≜ U ∞ . This concludes the pro of. 6 Fixed Shap e Analysis The main results of this pap er fo cus on w orst-case (p ossibly adv ersarial) non- stationary w eights that are constrained betw een t wo en velopes. In con trast, we ma y encoun ter w eights that are obtained b y discretizing a limiting “shap e” on (0 , 1). This setting is largely orthogonal to our env elop e framework: discretized shap es ma y assign p olynomially large weigh t to the v ery recen t past, which is far b eyond the p olylogarithmic ceilings in theorem 1. Y et, the additional structure allo ws a diﬀeren t pro of strategy based on renewal theory . 22 6.1 A Renew al Lemma for Multiplicativ e Recursions In this subsection, we develop a general and useful lemma that will b e used to obtain the main result of the section. Deﬁnition 2. A real-v alued random v ariable Y is lattic e if there exist constants a ∈ R and d > 0 such that P ( Y ∈ a + d Z ) = 1. Otherwise Y is non-lattic e . Deﬁnition 3 (DRI) . A real-v alued function η : R + → R is said to b e dir e ctly R iemann inte gr able if lim h ↓ 0 L ( h ) = lim h ↓ 0 U ( h ) , where L and U are the lo w er and upp er mesh sums, deﬁned b y L ( h ) = h ∞ X k =0 inf { η ( t ) : t ∈ ( k h, ( k + 1) h ] } U ( h ) = h ∞ X k =0 sup { η ( t ) : t ∈ ( k h, ( k + 1) h ] } . Lemma 7 (Renewal lemma with in tegrable error) . L et T b e a r andom variable taking values in (0 , 1) , and let F : R + → R b e a b ounde d pie c ewise c ontinuous function satisfying F ( x ) = E  F ( T x )  + ϵ ( x ) for al l x ≥ 1 . (6) Assume that Y ≜ log (1 /T ) is non-lattic e with ﬁnite me an µ ≜ E [ Y ] < ∞ , and that | ϵ ( x ) | /x is dir e ctly Riemann inte gr able over (1 , ∞ ) . Then, F ( x ) c onver ges and lim x →∞ F ( x ) = E  F ( e T )  + 1 µ Z ∞ 1 ϵ ( x ) x dx, (7) wher e e T is a r andom variable on (0 , 1) with density e p ( t ) = P ( T < t ) / ( µt ) . Pr o of. Deﬁne G, η : R → R b y the logarithmic change of v ariables G ( s ) ≜ F ( e s ) and η ( s ) ≜ ϵ ( e s ) for all s ∈ R . Then (b y the substitution x = e s ) (6) b ecomes G ( s ) = E  G ( s − Y )  + η ( s ) for all s ≥ 0 , where Y = log(1 /T ) is non-lattice with mean µ . Moreov er, G is b ounded and piecewise con tin uous, and η is DRI. Let { Y j } j ≥ 1 b e i.i.d. copies of Y , and set 23 S 0 = 0 and S m ≜ P m i =1 Y i . A straightforw ard induction on m yields G ( s ) = E  G ( s − S m )  + E h m − 1 X n =0 η ( s − S n ) i for all m ≥ 1 . F or each s ≥ 0, deﬁne the ﬁrst passage time τ s ≜ inf { m ≥ 1 : S m ≥ s } and the o vershoot R s ≜ S τ s − s ∈ [0 , ∞ ). W e will no w sho w that G ( s ) = E  G ( − R s )  + E h τ s − 1 X n =0 η ( s − S n ) i . (8) T o this end, let ( F m ) ∞ m =0 b e the natural ﬁltration of the pro cess ( Y ) ∞ m =0 . Fix s ≥ 0 and deﬁne M 0 = 0 and M m = G ( s − S m ∧ τ s ) + m ∧ τ s − 1 X n =0 η ( s − S n ) . Observ e that since η ( s ) = G ( s ) − E [ G ( s − Y )] and G is b ounded, ∥ η ∥ ∞ ≤ 2 ∥ G ∥ ∞ . This immediately implies the integrabilit y of M m . F urthermore, it is easy to chec k that E [ M m +1 |F m ] = M m , so ( M m ) ∞ m =0 is a martingale. Since τ s ∧ m is a b ounded stopping time, the optional stopping theorem applies so G ( s ) = E [ G ( s − S τ s ∧ m )] + E h m ∧ τ s − 1 X n =0 η ( s − S n ) i . Note that since Y > 0 we ha ve E [ τ s ] < ∞ . Since G and η are b ounded, | G ( s − S τ s ∧ m ) | ≤ ∥ G ∥ ∞ and m ∧ τ s − 1 X n =0 η ( s − S n ) ≤ ∥ η ∥ ∞ τ s . Hence, b y dominated con v ergence theorem and the fact that τ s ∧ m → τ s almost surely , w e deduce (8). W e now proceed to analyze equation (8). Oversho ot T erm. Since Y is non-lattice with ﬁnite mean µ , the excess life conv er- gence theorem (section 10.3 of [10]) implies that R s con verges in distribution to a random v ariable R with density r 7→ P ( Y > r ) /µ on [0 , ∞ ); Hence, lim s →∞ E  G ( − R s )  = E  G ( − R )  . If we deﬁne e T ≜ e − R , then G ( − R ) = F ( e T ). A change of v ariables sho ws that e T has densit y e p ( t ) = P ( T < t ) / ( µt ) on (0 , 1), so E [ G ( − R )] = E [ F ( e T )]. 24 Err or term. Let σ b e the renew al measure σ ([ a, b ]) ≜ X n ≥ 0 P ( S n ∈ [ a, b ]) for all a ≤ b. Then, since η is in tegrable, E h τ s − 1 X n =0 η ( s − S n ) i = E h X n ≥ 0 η ( s − S n ) I { S n < s } i = Z [0 ,s ) η ( s − u ) σ ( du ) . Applying the k ey renew al theorem (Theorem 35 in [15] or section 10.2 of [10]) yields lim s →∞ Z [0 ,s ) η ( s − u ) σ ( du ) = 1 µ Z ∞ 0 η ( u ) du = 1 µ Z ∞ 1 ϵ ( x ) x dx, where the last equality uses the substitution x = e u . Finally , taking the limit of (8) giv es (7). Remark 2. The non-lattice condition and the ﬁniteness of E [log (1 /T )] are b oth necessary for con v ergence in lemma 7, even if ϵ ≡ 0. 6.2 Con v ergence for Fixed Shap es In this subsection we formalize what it means for the sequence { p n } n ≥ 1 to hav e a ”ﬁxed shap e”, and we pro ve a suﬃcient condition for con v ergence in this regime. Deﬁnition 4. Let p : (0 , 1) → [0 , ∞ ) b e a contin uous probabilit y densit y and { p n } ∞ n =1 b e a sequence of probabilit y mass functions with p n supp orted on [ n ]. W e sa y that { p n } ∞ n =1 str ongly discr etizes p if there exist constan ts κ > 0 and C > 0 suc h that ∥ p x − p ∥ 1 ≤ C x − κ for all x ≥ C , where p x is a probabilit y density on (0 , 1) deﬁned b y p x ( t ) ≜ xp ⌊ x ⌋ ( ⌈ tx ⌉ ) for all t ∈ (0 , 1) . The next theorem sho ws that the ﬁxed shap e structure, together with the quan- titativ e discretization assumption ab ov e, is suﬃcien t to guarantee conv ergence of the lo okbac k a v erages. Theorem 5 (Conv ergence for Fixed Shap es) . L et p : (0 , 1) → [0 , ∞ ) b e a c ontinuous pr ob ability density with ﬁnite lo g-moment R 1 0 p ( t ) log(1 /t ) dt . Supp ose that { p n } ∞ n =1 25 str ongly discr etizes p , and that { a n } ∞ n =1 ⊆ R d satisﬁes a n +1 = E j ∼ p n  a j  for al l n ≥ k . Then the se quenc e { a n } ∞ n =1 c onver ges. Pr o of. It suﬃces to prov e con v ergence in the case d = 1. The idea is to embed the discrete recursion into a contin uous equation and apply lemma 7. Deﬁne a function F : R + → R by F ( x ) ≜ a ⌈ x ⌉ . It suﬃces to show that F ( x ) conv erges as x → ∞ . Deﬁne { p x } x>k as in deﬁnition 4, and note that for all n ≥ k and x ∈ ( n, n + 1], Z 1 0 p x ( t ) F ( tx ) dt = Z 1 0 x n X j =1 I { tx ∈ ( j − 1 , j ] } p n ( j ) F ( tx ) dt = Z x 0 n X j =1 I { u ∈ ( j − 1 , j ] } p n ( j ) F ( u ) du = n X j =1 p n ( j ) a j = a n +1 = F ( x ) . Consequen tly , F ( x ) = Z 1 0 p ( t ) F ( tx ) dt + Z 1 0  p x ( t ) − p ( t )  F ( tx ) dt = E  F ( T x )  + ϵ ( x ) , where T ∼ p and ϵ ( x ) ≜ R 1 0 [ p x ( t ) − p ( t )] F ( tx ) dt . T o sho w that F conv erges, our goal becomes to v erify the conditions of lemma 7. Note that since { a n } ∞ n =0 is b ounded, F is a b ounded piecewise contin uous function. Moreo ver, since p has a con tin uous density and ﬁnite log-moment, log (1 /T ) is non- lattice and has ﬁnite mean. Hence, it suﬃces to prov e that ϵ ( x ) /x is directly Riemann in tegrable o v er (1 , ∞ ). By a standard criterion for direct Riemann integrabilit y (Remark 34 in [15]), to sho w that ϵ ( x ) /x is directly Riemann in tegrable ov er (1 , ∞ ), it suﬃces to exhibit an ev en tually decreasing in tegrable function ∆( x ), such that | ϵ ( x ) | /x ≤ ∆( x ) for all suﬃcien tly large x . This follo ws from the following inequalities | ϵ ( x ) | x ≤ 1 x Z 1 0 | p x ( t ) − p ( t ) | | F ( tx ) | dt ≤ ∥ F ∥ ∞ ∥ p x − p ∥ 1 x ≤ C ∥ F ∥ ∞ x 1+ κ . In the last inequality , w e used the fact that { p n } ∞ n =1 strongly discretizes p . This concludes the pro of. 26 Remark 3 (Comparison to En v elop e Bounded W eigh ts) . If p ( t ) ≍ t − γ as t ↓ 0, then the discretization assigns p n (1) ≍ n γ − 1 , i.e., a p olynomial ceiling c ( n ) ≍ n γ . This illustrates that theorem 5 is genuinely outside the p olylogarithmic env elop e regime of theorem 1, and relies crucially on the ﬁxed shap e structure. A App endix Prop osition 3. L et A, B > 0 and α, β ≥ 0 b e c onstants satisfying α + β / 2 ≤ 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . L et { (∆ T , n T ) } ∞ T =0 b e a se quenc e with initialization ∆ 0 = 1 and n 0 = k , deﬁne d by ∆ T +1 ≜ ∆ T × (1 − cε n T +1 δ 1 / 2 n T +1 ) wher e n T +1 ≜ n T × C ε 2 n T +1 δ 3 n T +1 ∆ 2 T . Then, ∆ T → 0 as T → ∞ . Pr o of. T o pro ve the con vergence, it suﬃces to sho w that ∞ X T =1 ε n T δ 1 / 2 n T = ∞ . Assume con trary that ∆ T → ∆ > 0. By deﬁnition of n T +1 , w e get n T +1 log( n T +1 ) 2 α +3 β ≤ C A 2 B 3 ∆ 2 × n T = ⇒ n T +1 ≤ n T × log( n T ) O (1) = ⇒ log n T ≤ O ( T log T ) . Consequen tly , ε n T δ 1 / 2 n T = Ω  [ T log T ] − { α + β 2 }  , which forms a div ergen t series in T when α + β / 2 ≤ 1. Prop osition 4. L et A, B > 0 and α, β > 0 b e c onstants satisfying α + β / 2 > 1 . Supp ose ε n = A (log n ) − α and δ n = B (log n ) − β . L et { ( U T , n T ) } ∞ T =0 b e a se quenc e with initialization U 0 > 0 and n 0 ≥ k , deﬁne d by n T +1 = n T / 2 δ 1 / 2 n T and U T +1 = U T (1 − C ε n T δ 1 / 2 n T ) for al l T ≥ 0 . If k is suﬃciently lar ge, then U ∞ ≜ lim T →∞ U T > U 0 2 Pr o of. Observ e that as long as U ∞ > 0, w e may c ho ose k suﬃcien tly large to 27 guaran tee U ∞ > U 0 / 2. It therefore suﬃces to pro ve U ∞ > 0, which is equiv alen t to ∞ X T =0 ε n T δ 1 / 2 n T < ∞ . It follo ws from the deﬁnition that n T +1 ≥ 2 n T , as long as k is suﬃcien tly large. This implies log n T = Ω( T ). In particular, ε n T δ 1 / 2 n T = O  T − { α + β 2 }  forms a con vergen t series when α + β / 2 > 1. References [1] Daron Acemoglu, Munther A Dahleh, Ilan Lob el, and Asuman Ozdaglar. Ba yesian learning in so cial net w orks. The R eview of Ec onomic Studies , 78(4):1201–1236, 2011. [2] Da vid Aldous. Probability distributions on cladograms. In R andom discr ete structur es (Minne ap olis, MN, 1993) , volume 76 of IMA V ol. Math. Appl. , pages 1–18. Springer, New Y ork, 1996. [3] V enk atesh Bala and Sanjeev Goy al. Learning from neighbours. The R eview of Ec onomic Studies , 65(3):595–621, 1998. [4] Eric h Baur and Jean Bertoin. Elephan t random w alks and their connection to p´ olya-t yp e urns. Phys. R ev. E , 94:052134, Nov 2016. [5] Eric Cator and Henk Don. Self-a veraging sequences which fail to conv erge. Ele ctr on. Commun. Pr ob ab. , 22:P ap er No. 16, 12, 2017. [6] K. L. Chung and J. W olfowitz. On a limit theorem in renew al theory . A nn. of Math. (2) , 55:1–6, 1952. [7] Morris H. DeGro ot. Reac hing a consensus. Journal of the A meric an Statistic al Asso ciation , 69(345):118–121, 1974. [8] P . Erd¨ os, W. F eller, and H. P ollard. A prop erty of p ow er series with p ositive co eﬃcien ts. Bul l. Amer. Math. So c. , 55:201–204, 1949. [9] Benjamin Golub and Matthew O Jackson. Naive learning in so cial netw orks and the wisdom of crowds. Americ an Ec onomic Journal: Micr o e c onomics , 2(1):112–149, 2010. 28 [10] Geoﬀrey R. Grimmett and David R. Stirzaker. Pr ob ability and r andom pr o- c esses . Oxford Universit y Press, New Y ork, third edition, 2001. [11] Sv an te Janson. Random recursiv e trees and preferential attachmen t trees are random split trees. Combin. Pr ob ab. Comput. , 28(1):81–99, 2019. [12] Elc hanan Mossel, Allan Sly , and Omer T amuz. Strategic learning and the top ology of so cial netw orks. Ec onometric a , 83(5):1755–1794, 2015. [13] Alex Olshevsky and John N. Tsitsiklis. Conv ergence speed in distributed con- sensus and av eraging [reprint of mr2480125]. SIAM R ev. , 53(4):747–772, 2011. [14] Gun ter M. Sc h ¨ utz and Steﬀen T rimp er. Elephants can alw a ys remember: Exact long-range memory eﬀects in a non-mark o vian random walk. Phys. R ev. E , 70:045101, Oct 2004. [15] Ric hard Serfozo. Basics of applie d sto chastic pr o c esses . Probability and its Applications (New Y ork). Springer-V erlag, Berlin, 2009. 29

Sharp Threshold for the Convergence of Nonstationary Averaging

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment