A Variant of Azumas Inequality for Martingales with Subgaussian Tails

Reading time: 6 minute
...

📝 Original Info

  • Title: A Variant of Azumas Inequality for Martingales with Subgaussian Tails
  • ArXiv ID: 1110.2392
  • Date: 2011-10-14
  • Authors: Ohad Shamir

📝 Abstract

We provide a variant of Azuma's concentration inequality for martingales, in which the standard boundedness requirement is replaced by the milder requirement of a subgaussian tail.

💡 Deep Analysis

Figure 1

📄 Full Content

arXiv:1110.2392v2 [cs.LG] 13 Oct 2011 A Variant of Azuma’s Inequality for Martingales with Subgaussian Tails Ohad Shamir Microsoft Research New England ohadsh@microsoft.com A sequence of random variables Z1, Z2, . . . is called a martingale difference sequence with respect to another sequence of random variables X1, X2, . . ., if for any t, Zt+1 is a function of X1, . . . , Xt, and E[Zt+1|X1, . . . , Xt] = 0 with probability 1. Azuma’s inequality is a useful concentration bound for martingales. Here is one possible formulation of it: Theorem 1 (Azuma’s Inequality). Let Z1, Z2, . . . be a martingale difference sequence with respect to X1, X2, . . ., and suppose there is a constant b such that for any t, Pr(|Zt| ≤b) = 1. Then for any positive integer T and any δ > 0, it holds with probability at least 1 −δ that 1 T T X t=1 Zt ≤b r 2 log(1/δ) T . Sometimes, for the martingale we have at hand, Zt is not bounded, but rather bounded with high probability. In particular, suppose we can show that the probability of Zt being larger than a (and smaller than −a), conditioned on any X1, . . . , Xt−1, is on the order of exp(−Ω(a2)). Random variables with this behavior are referred to as having subgaussian tails (since their tails decay at least as fast as a Gaussian random variable). Intuitively, a variant of Azuma’s inequality for these ‘almost-bounded’ martingales should still hold, and is prob- ably known. However, we weren’t able to find a convenient reference for it, and the goal of this technical report is to formally provide such a result: Theorem 2 (Azuma’s Inequality for Martingales with Subgaussian Tails). Let Z1, Z2, . . . , ZT be a martingale differ- ence sequence with respect to a sequence X1, X2, . . . , XT , and suppose there are constants b > 1, c > 0 such that for any t and any a > 0, it holds that max {Pr (Zt > a|X1, . . . , Xt−1) , Pr (Zt < −a|X1, . . . , Xt−1)} ≤b exp(−ca2). Then for any δ > 0, it holds with probability at least 1 −δ that 1 1 T T X t=1 Zt ≤ r 28b log(1/δ) cT . Proof of Thm. 2 We begin by proving the following lemma, which bounds the moment generating function of subgaussian random variables. 1It is quite likely that the numerical constant in the bound can be improved. 1 Lemma 1. Let X be a random variable with E[X] = 0, and suppose there exist a constant b ≥1 and a constant c such that for all t > 0, it holds that max{Pr(X ≥t), Pr(X ≤−t)} ≤b exp(−ca2). Then for any s > 0, E[esX] ≤e7bs2/c. Proof. We begin by noting that E[X2] = Z ∞ t=0 Pr(X2 ≥t)dt ≤ Z ∞ t=0 Pr(X ≥ √ t)dt + Z ∞ t=0 Pr(X ≤− √ t)dt ≤2b Z ∞ t=0 exp(−ct)dt = 2b c Using this, the fact that E[X] = 0, and the fact that ea ≤1 + a + a2 for all a ≤1, we have that E[esX] = E  esX X ≤1 s  Pr  X ≤1 s  + ∞ X j=1 E  esX j < sX ≤j + 1  Pr (j < sX ≤j + 1) ≤E  1 + sX + s2X2 sX ≤1  Pr (sX ≤1) + ∞ X j=1 ej+1 Pr  X > j s  ≤  1 + 2bs2 c  + b ∞ X j=1 e2j−cj2/s2. (1) We now need to bound the series P∞ j=1 ej(2−cj/s2). If s ≤√c/2, we have 2 −cj s2 ≤−c 2s2 ≤−2 for all j. Therefore, the series can be upper bounded by the convergent geometric series ∞ X j=1  e−c/(2s2)j = e−c/(2s2) 1 −e−c/(2s2) < 2e−c/(2s2) ≤4s2/c, where we used the upper bound e−c/(2s2) ≤e−2 < 1/2 in the second transition, and the last transition is by the inequality e−x ≤1 x for all x > 0. Overall, we get that if s ≤√c/2, then E[esX] ≤1 + 2bs2 c + b4s2 c ≤e6bs2/c. (2) We will now deal with the case s > √c/2. For all j > 3s2/c, we have 2 −jc/s2 < −1, so the tail of the series satisfies X j>3s2/c ej(2−jc/s2) ≤ ∞ X j=0 e−j < 2 < 8s2 c . Moreover, the function j 7→j(2 −jc/s2) is maximized at j = s2/c, and therefore ej(2−jc/s2) ≤es2/c for all j. Therefore, the initial part of the series is at most ⌊3s2/c⌋ X j=1 ej(2−jc/s2) ≤3s2 c es2/c ≤es2/eces2/c ≤e(1+1/e)s2/c, where the second to last transition is from the fact that a ≤ea/e for all a. 2 Overall, we get that if s > √c/2, then E[esX] ≤1 + 10bs2 c + be(1+1/e)s2/c ≤e7bs2/c, (3) where the last transition follows from the easily verified fact that 1 + 10ba + e(1+1/e)ba ≤e7ba for any a ≥1/4, and indeed bs2/c ≥1/4 by the assumption on s and the assumption that b ≥1. Combining Eq. (2) and Eq. (3) to handle the different cases of s, the result follows. After proving the lemma, we turn to the proof of Thm. 2. Proof of Thm. 2. We proceed by the standard Chernoff method. Using Markov’s inequality and Lemma 1, we have for any s > 0 that Pr

1 T T X t=1 Zt > ǫ ! = Pr  e PT t=1 Zt > esT ǫ ≤e−sT ǫE h es P t Zti = e−sT ǫE " E " T Y t=1 esZt X1, . . . , XT

= e−sT ǫE " E " esZT T −1 Y t=1 esZt X1, . . . , XT −1

= e−sT ǫE " E  esZT X1, . . . , XT −1  E “T −1 Y t=1 esZt X1, . . . , XT −1

≤e−sT ǫe7bs2/cE “T −1 Y t=1 esZt X1, . . . , XT −1

. . . ≤e−sT ǫ+7T bs2/c. Choosing s = cǫ/14b, the expression above equals e−cT ǫ2/28, and we get that Pr

1 T T X t=1 Zt > ǫ ! ≤e−cT ǫ2/28b, setting the r.h.s. to δ and solving for ǫ, the theorem follows. Acknowledgements We thank S´ebastien Bubeck for pointing

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut