On a connection between Stein characterizations and Fisher information

We generalize the so-called density approach to Stein characterizations of probability distributions. We prove an elementary factorization property of the resulting Stein operator in terms of a generalized (standardized) score function. We use this r…

Authors: Christophe Ley, Yvik Swan

arXiv: math.PR/ 0000000 On a connectio n b e t w een Stei n c haracter izations and Fisher infor mation Christophe Ley 1 and Yvik Sw an 2 D´ ep artement de Math ´ ematique Universit´ e Libr e de Bruxel les Campus Plaine – CP210 B-1050 Brussels e-mail: chrisley @ulb.ac. be , yvswan@ulb.ac .be Abstract: W e generalize the so-called densit y ap pr o ach to Stein charact er- izations of probabili t y distributions. W e prov e an elementa ry factorization property of the resulting Stein op erator in terms of a ge ner alize d (standar d- ize d) sc or e function . W e use this result to connec t Stein c haracterizations with information distances suc h as t he gener alize d (st andar dize d) Fisher information . AMS 2 0 00 sub ject cla ssifications: Primary 60F05; secondary 94A17. Keywords and phr a ses: density approach , generalized (standardized ) Fisher information, generalized (standardized) score functions, infor m ation functionals, probability metrics. 1. F orew ord In recent years a num b er of authors have noted how Charles Stein’s character- ization of the Ga ussian (see [ 11 ]) and the so-called “magic factors” cr op up in matters related to informatio n theo ry (see [ 5 ], [ 6 ], [ 7 ], [ 3 ] o r [ 1 ] a nd the references therein). The purp ose of this note is to make this connection explicit. 2. Results W e co nsider densities p : R → R + whose supp ort is a n in terv al S := S p with closure ¯ S = [ a, b ], for some −∞ ≤ a < b ≤ ∞ . Among these w e deno te by G the collection of densities which are (strongly) differentiable at ev ery point in the int erior o f their supp ort. Definition 2.1. Fix p ∈ G with supp ort S and define F ( p ) t he c ol le ction of test functions f : R → R such that the mapping x 7→ f ( x ) p ( x ) is b ounde d on R and str ongly differ entiable on the i nterior of S . T ake a real b ounded function h with suppo rt S , and suppos e that h is (strongly) differen tiable on the in terior of S . Then h can b e written as ˜ h I S with 1 Supported b y a Mandat de Charg ´ e de r ec herc he from the F onds National de la Rec herc he Scien tifique, Communaut ´ e fran¸ caise de Belgique. 2 Supported b y a Mandat de Charg ´ e de r ec herc he from the F onds National de la Rec herc he Scien tifique, Communaut ´ e fran¸ caise de Belgique. 1 C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 2 ˜ h a differe ntiable cont in uation of h on R . In the sequel we will w r ite ∂ y h ( y ) | y = x for the d ifferential in the sense of distr ibutio ns of h ev aluated at x , so that ∂ y h ( y ) | y = x = ( ˜ h ) ′ ( x ) I S ( x ) + ˜ h ( x )  δ { x = a } − δ { x = b }  where δ repr esents a D irac delta. Definition 2 .2. L et R ⋆ b e the c ol le ction of al l functions f : R → R . We define the (lo cation-ba sed) Stein o pe r ator as the op er ator T : R ⋆ × G → R ⋆ : ( f , p ) 7→ T ( f , p ) such t hat T ( f , p ) : R → R : x 7→ ∂ y ( f ( y ) p ( y )) | y = x p ( x ) (2.1) for al l f for which the di ffer ential (in the sense of distributions) exists. Remark 2.1. The terminolo gy “lo c ation-b ase d” Stein op er ator is inherite d fr om our p ar ametric appr o ach to S tein char acterizations (se e [ 8 ]), wher e a much mor e gener al char acterization r esult is pr op ose d. T o av oid ambiguities related to division by 0, thr oughout this pap er w e adopt the conven tion that, whenever an express ion in volv es the divisio n by an indicator function I A for some measurable set A , we are m ultiplying the expression by the said indicator function. This conven tion ensures that for p ∈ G and f ∈ F ( p ) and for a n y contin uous rando m v a riable X , the quantit y T ( f , p )( X ) is well- defined. W e fu rther dra w the r eader’s atten tion to the fact that, in par ticular, ratios p ( x ) /p ( x ) do not necessa r ily simplify to 1 . Example 2. 1. It is p erhaps informative to se e how D efinitions 2.1 and 2.2 sp el l out for differ ent explicit cho ic es of tar get densities. 1. If p = φ , the standar d Gaussia n, then F ( φ ) c ontains the set of al l differ- entiable b ounde d fun ctions and T ( f , φ )( x ) = f ′ ( x ) − xf ( x ) , which i s S t ein ’s wel l-known op er ator fo r cha r acterizing the Gaussian. 2. If p ( x ) = e − x I [0 , ∞ ) ( x ) , the exp onential E xp , then (abusing notations) F ( E xp ) c ontains the set of al l differ ent iable b ounde d functions and T ( f , E xp )( x ) =  f ′ ( x ) − f ( x ) + f ( x ) δ { x =0 }  I [0 , ∞ ) ( x ) . 3. If p ( x ) = I [0 , 1] ( x ) , t he st andar d uniform U (0 , 1) , then F ( U (0 , 1)) c ontains the set of al l differ entiable b ounde d fu n ctions and T ( f , U (0 , 1))( x ) =  f ′ ( x ) + f ( x )( δ { x =0 } − δ { x =1 } )  I [0 , 1] ( x ) . 4. If p ( x ) = 1 2 π √ 4 − x 2 I ( − 2 , 2) ( x ) , Wigner’ s semicir cle law S C , t hen F ( S C ) c ontains the set of al l funct ions of the form f ( x ) = f 0 ( x )(4 − x 2 ) for some b ounde d differ en tiable f 0 and, for these f , t he op er ator b e c omes T ( f , S C )( x ) =  (4 − x 2 ) f ′ 0 ( x ) − 3 xf 0 ( x )  I ( − 2 , 2) ( x ) . C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 3 5. If f ( x ) = 1 π √ x (1 − x ) I (0 , 1) ( x ) , the ar csine distribution AS , then F ( AS ) c on- tains t he c ol le ction of al l functions of the form f ( x ) = f 0 ( x ) p x (1 − x ) for some b ounde d di ffer entiable f 0 and, for these f , the op er ator b e c omes T ( f , AS )( x ) = p x (1 − x ) f ′ 0 ( x ) I (0 , 1) ( x ) . 6. If p ( x ) is a m emb er of Pe arson ’s family of distributions and thus satisfies ( s ( x ) p ( x )) ′ = τ ( x ) p ( x ) for τ a p olynomial of exact de gr e e one and s a p olynomial of de gr e e at most two, then, abusing notations one last time, we e asily se e that F ( P ( s, τ )) c ontains the set of al l functions of the form f ( x ) = f 0 ( x ) s ( x ) for f 0 b ounde d, differ entiable such that f ( a + ) = f ( b − ) = 0 and, for these f , the op er ator b e c omes T ( f , P ( s, τ ))( x ) = ( s ( x ) f ′ 0 ( x ) + τ ( x ) f 0 ( x )) I S ( x ) . The first thr e e op er ators ar e wel l- kn own and c an b e found, for instanc e, in [ 12 ]. The fourth example c an b e found i n [ 4 ]. The last example c omes fr om [ 9 ]. W e are now ready to state and prov e our first main result. Theorem 2.1 (Densit y approach) . L et p ∈ G with su pp ort S , and take Z ∼ p . L et F ( p ) b e as in Definition 2.1 and T as in Definition 2.2 . L et X b e a r e al- value d c ontinuous r andom variable. (1) If X L = Z then E [ T ( f , p )( X )] = 0 for al l f ∈ F ( p ) . (2) If E [ T ( f , p )( X )] = 0 fo r al l f ∈ F ( p ) , then X | X ∈ S L = Z . Pr o of. T o see (1 ), note that the hypo theses o n f a nd p guar antee that we have E [ T ( f , p )( Z )] = [ f ( y ) p ( y )] b a + f ( a + ) p ( a + ) − f ( b − ) p ( b − ) = 0 . T o see (2), consider for z ∈ R the functions f p z defined through f p z : R → R : x 7→ 1 p ( x ) Z x a l z ( u ) p ( u ) du with l z ( u ) := ( I ( −∞ ,z ] ( u ) − P p ( X ≤ z )) I S ( u ) and P p ( X ≤ z ) := R z −∞ p ( u ) du . Clearly f p z ∈ F ( p ) for all z . Mo reov er w e hav e ∂ y ( f p z ( y ) p ( y )) | y = x = l z ( x ) p ( x ) since R c a l z ( u ) p ( u ) du = 0 for c = a and c = b . Therefore f p z satisfies, for all z , the s o-called St ein e quation T ( f p z , p )( x ) = l z ( x ) . (2.2) Hence we can use E [ T ( f p z , p )( X )] = 0 to deduce that P( X ∈ ( −∞ , z ] ∩ S ) = P( Z ≤ z )P( X ∈ S ) for a ll z , whence the claim. Theorem 2 .1 encompasses Pro po sition 4 in [ 12 ] and Theorem 1 in [ 9 ] and is easily shown to con tain many o f the other b etter known Stein characterizations (such as the characterization of the semi-circular in [ 4 ]). W e draw the reader’s C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 4 attent ion to the fact that our wa y of wr iting the Stein o p e r ator ( 2.1 ) also shows that a ll Stein equations of the form ( 2.2 ) (that is, most such eq uations fro m the literature) can b e solved by simple integration. Also, the form of our op era tors leads dir ectly to our second main result. Theorem 2.2 (F actorization Theor em o f Stein Op erato rs) . L et p and q b e pr ob ability density functions in G sharing supp ort S . F or al l f ∈ F ( p ) ∩ F ( q ) , we have T ( f , p )( x ) = T ( f , q )( x ) + f ( x ) r ( p, q )( x ) , with r ( p, q )( x ) := p ′ ( x ) p ( x ) − q ′ ( x ) q ( x ) + ( δ { x = a } − δ { x = b } ) I S ( x ) . Pr o of. The res triction o n the supp or t o f q guar a nt ees that we hav e f ( y ) p ( y ) = f ( y ) q ( y ) p ( y ) /q ( y ) for any real-v alued function f . W e ca n therefor e write T ( f , p )( x ) = ∂ y ( f ( y ) q ( y ) p ( y ) /q ( y )) | y = x p ( x ) = ∂ y ( f ( y ) q ( y )) | y = x p ( x ) p ( x ) q ( x ) + f ( x ) q ( x ) ∂ y ( p ( y ) /q ( y )) | y = x p ( x ) = T ( f , q )( x ) + f ( x ) q ( x ) p ( x ) ∂ y ( p ( y ) /q ( y )) | y = x . The claim follows. Note that, whenever S = R or S is an op en interv al, r ( p, q ) simplifies to p ′ /p − q ′ /q . Now, le t l b e a real-v a lued function. In the seq uel we will wr ite E p [ l ( X )] := R R l ( x ) p ( x ) dx. O ur next and final main r esult is immediate and hence its pro o f is left to the rea de r . Theorem 2. 3 (Stein’s metho d and information distances ) . L et p and q b e pr ob- ability density functions in G sharing supp ort S . L et l b e a r e al-value d function such t hat E p [ l ( X )] a nd E q [ l ( X )] exist. Define f p l to b e the solution of the St ein e quation T ( f , p )( x ) = ( l ( x ) − E p [ l ( X )]) I S ( x ) (2.3) and supp ose that f p l ∈ F ( q ) . Then E q [ l ( X )] − E p [ l ( X )] = E q [ f p l ( X ) r ( p, q )( X )] . (2.4) Whenever p is well-behaved, the so lutions to ( 2.3 ) are of the w ell-known form f p l : R → R : x 7→ 1 p ( x ) Z x a ( l ( u ) − E p [ l ( X )]) p ( u ) du. (2.5) In ca ses such as the SC or the AS, the fo rm of this solution (expressed in terms of f 0 instead of f ) is slightly different but easily pr ovided, see Example 2.1 or equations (18 ) a nd (19) in Prop osition 1 of [ 9 ]. C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 5 In all explicit instances covered in Ex ample 2.1 , the condition that f p l ∈ F ( q ) is trivia lly verified (se e page 4 in [ 2 ] for the Gaussia n). Under moment conditions o n p , Schoutens shows that members of the Pearson family satisfy this assumption a s well (see [ 9 ], Lemma 1). 3. Application Applying H¨ older’s inequality to ( 2.4 ) s hows that, under the same conditions, | E q [ l ( X )] − E p [ l ( X )] | ≤ κ p l q E q [( r ( p, q )( X )) 2 ] , (3.1) with κ p l = q E q [( f p l ( X )) 2 ] . Equation ( 3.1 ) provides so me form of universal bound on differenc e s of ex- pec tations in terms of what can be likened to a gener alize d (st andar dize d) Fisher information distanc e J ( p, q ) = E q [( r ( p, q )( X )) 2 ] (the terminolog y and no tations are bo rrow ed fro m [ 1 ]). Note how, for instance, taking p = φ the standard Gaussian density yields the Fisher information dis- tanc e studied, e.g., in [ 6 ]. Theorem 2.3 also pro vides a bound on any probability metric whic h can b e written as d H ( p, q ) = sup h ∈H | E q [ h ( X )] − E p [ h ( X )] | (3.2) for some clas s of functions H . The Kolmo gor ov, Wasserstein and total variation distances, to cite but these, can a ll be written in this fo rm. Spec ifying the target as well as the class H yields the following immediate corolla r ies. Corollary 3. 1. L et p and q b e pr ob ability densities with supp ort S ⊆ R satis- fying the hyp otheses in The or em 2.3 . Then ther e exist c onst ants κ 1 := κ 1 ( p, q ) and κ 2 := κ 2 ( p, q ) such that Z | p ( u ) − q ( u ) | du ≤ κ 1 p J ( p, q ) and sup x ∈ R | p ( x ) − q ( x ) | ≤ κ 2 p J ( p, q ) . Pr o of. T ake l ( u ) = I { p ( u ) ≤ q ( u ) } − I { p ( u ) ≥ q ( u ) } . Using ( 2 .4 ) with this choice of l and applying H¨ older’s inequality , one readily sees that there ex is ts a constant κ 1 > 0 s uch tha t Z | p ( x ) − q ( x ) | dx ≤ κ 1 p J ( p, q ) where κ 1 = p E q [( f p l ( X )) 2 ]. C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 6 Regarding the second ineq uality first no te that, whenever x ∈ S c , | p ( x ) − q ( x ) | = 0, hence we can concentrate on the s upremum ov er the supp or t S . Now choose l ( u ) = δ { x = u } the Dira c delta function in x ∈ S . F o r this c hoice o f l we obtain a fter some computations | q ( x ) − p ( x ) | ≤ p ( x ) r E q h  I [ x,b ) ( X ) − P ( X )  2 / ( p ( X )) 2 i p J ( p, q ) , where P is the cumulativ e distribution function of the density p (for which evidently P ( a ) = 0). T aking the supremum yields the second co nstant κ 2 . W e conclude this pap er with a computation of bounds on the constants κ 1 and κ 2 for v arious ex amples. While these are somewhat rela ted to the so -called “magic factor s” app earing in the litera ture on Stein’s method, the technique we employ to b ound them is differen t and, we believe, of indep endent interest. T o the b est o f our knowledge, suc h b ounds were first obtained in [ 10 ] for Ga ussian target o nly . Shimizu’s results were la ter impro v ed and extended in [ 5 ] and [ 6 ]. W e r ecov er in Co rollary 3.2 below the b est known v alues for κ 1 and our b ound for κ 2 yields a significa n t improv ement. W e stress the fact that the res ults av a il- able in the liter ature only concern a Gaus sian target, wher eas o ur approa ch allows to obtain such relationships for virtually an y targ et distribution. F urther exploratio ns o f the consequences of Theor em 2.3 a lso show that it is po ssible to rela te Stein characterizations with o ther (pseudo-)metrics than those of the form ( 3.2 ), such as , e.g ., Kul lb ack-L eibler diver genc e or r elative entr opy (see [ 5 ]). Corollary 3.2 . 1. If p is the exp onential d istribution with r ate 1, t hen κ 1 ≤ 1 . 2. If p = φ is the standar d n ormal distribution, then κ 1 ≤ √ 2 . 3. If p is pr op ortional t o e − x 4 / 12 , then κ 1 ≤ p 2 √ 2 . In al l t hr e e c ases we h ave κ 2 ≤ 1 . Pr o of of the c onstants κ 1 . T ake l ( u ) = I { p ( u ) ≤ q ( u ) } − I { p ( u ) ≥ q ( u ) } . Using ( 2.5 ) and the fact that R b a ( l ( u ) − E p [ l ( X )]) p ( u ) du = 0, we obtain that f p l ( x ) = − 1 p ( x ) Z b x ( l ( u ) − E p [ l ( X )]) p ( u ) du = − 2 p ( x ) Z b x  I { p ( u ) ≤ q ( u ) } − P p ( p ( X ) ≤ q ( X )  p ( u ) du =: 2 p ( x ) Z b x h ( u ) p ( u ) du, where P p ( X ∈ A ) = R A p ( u ) du for some set A . Let p ( x ) = e − x I [0 , ∞ ) ( x ), the density of an exp onential-1 random v a r iable (in other words, a = 0 and b = ∞ ). C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 7 Recall that, in this ca se, the supp ort o f f p l is a subset of R + . Then we can wr ite κ 1 := E q [( f p l ( X )) 2 ] = 4 Z ∞ 0 q ( x ) e 2 x  Z ∞ x h ( u ) e − u du  2 dx ≤ 4 Z ∞ 0 q ( x ) e 2 x  Z ∞ x h 2 ( u ) e − 2 u du  dx ≤ 4 2 Z ∞ 0 q ( x ) e 2 x  Z ∞ 2 x h 2  u 2  e − u du  dx, where the first inequality follows fr om Jensen and the se c ond inequality from a simple change of v a riables. App lying H¨ older’s inequalit y and again changing v ariables in the ab ove yields κ 1 ≤ 4 2 s Z ∞ 0 q ( x ) dx s Z ∞ 0 q ( x ) e 4 x  Z ∞ 2 x h 2  u 2  e − u du  2 dx ≤ 4 2 1+ 1 2  Z ∞ 0 q ( x ) e 4 x  Z ∞ 4 x h 4  u 4  e − u du  dx  1 / 2 , where R ∞ 0 q ( x ) dx = 1 b y our assumption that p and q share the same supp ort. Iterating this pro cedur e m ∈ N times, we obtain κ 1 ≤ 4 2 M ( m )  Z ∞ 0 q ( x ) e 2 m +1 x  Z ∞ 2 m +1 x h 2 m +1  u 2 m +1  e − u du  dx  1 / 2 m , where M ( m ) = 1 + 1 2 + . . . + 1 2 m . Now note that, for each m ≥ 0, we hav e 0 ≤ h 2 m +1 ( u/ 2 m +1 ) ≤ 1. Hence Z ∞ 2 m +1 x h 2 m +1  u 2 m +1  e − u du ≤ e − 2 m +1 x . Since M ( m ) → 2, the result follows. If the suppor t of p (and hence also of q ) is the r eal line, w e use similarly as ab ov e the identit y R ∞ −∞ ( l ( u ) − E p [ l ( X )]) p ( u ) du = 0 to write, equiv alently , f p l ( x ) = 2 p ( x ) Z ∞ x h ( u ) p ( u ) du = − 2 p ( x ) Z x −∞ h ( u ) p ( u ) du. This yields E q [( f p l ( X )) 2 ] = 4 Z ∞ −∞ q ( x )  1 p ( x ) Z ∞ x h ( u ) p ( u ) du  2 dx = 4 Z 0 −∞ q ( x )  1 p ( x ) Z x −∞ h ( u ) p ( u ) du  2 dx + 4 Z ∞ 0 q ( x )  1 p ( x ) Z ∞ x h ( u ) p ( u ) du  2 dx. C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 8 Setting p ( x ) = (2 π ) − 1 / 2 e − x 2 / 2 we get by Jense n’s inequality E q [( f p l ( X )) 2 ] ≤ 4 Z 0 −∞ q ( x )  e x 2 Z x −∞ h 2 ( u ) e − u 2 du  dx + 4 Z ∞ 0 q ( x )  e x 2 Z ∞ x h 2 ( u ) e − u 2 du  dx =: I − + I + . Both integrals ab ov e can b e tackled in the same way as for the exp onential distribution. Cons ider, for instance, I − for which we c an write (thanks to a simple change of v aria ble s ) I − = 4 Z 0 −∞ q ( x )  e x 2 Z x −∞ h 2 ( u ) e − u 2 du  dx = 4 √ 2 Z 0 −∞ q ( x ) e x 2 Z √ 2 x −∞ h 2 ( u/ √ 2) e − u 2 / 2 du ! dx. Now apply H¨ older’s inequality to get I − ≤ 4 √ p √ 2 v u u t Z 0 −∞ q ( x ) e x 2 Z √ 2 x −∞ h 2 ( u/ √ 2) e − u 2 / 2 du ! 2 dx ≤ 4 √ p √ 2 v u u t Z 0 −∞ q ( x ) e 2 x 2 Z √ 2 x −∞ h 4 ( u/ √ 2) e − u 2 du ! dx, where p = P q ( X < 0). Changing v aria bles once more yields I − ≤ I − 1 with I − 1 = 4 p 1 2 2 1 2 + 1 4 Z 0 −∞ q ( x ) e 2 x 2 Z ( √ 2) 2 x −∞ h 4  u ( √ 2) 2  e − u 2 / 2 du ! dx ! 1 2 . Iterating this pro cedur e m ∈ N times we deduce I − ≤ I − 1 ≤ . . . ≤ I − m with I − m given b y 4 p N ( m ) 2 N ( m +1) Z 0 −∞ q ( x ) e 2 m x 2 Z ( √ 2) m +1 x −∞ h 2 m +1  u ( √ 2) m +1  e − u 2 / 2 du ! dx ! 1 2 m where we set N ( m ) = 1 2 + 1 4 + . . . + 1 2 m (= M ( m ) − 1 ). F or every m we ha ve 0 ≤ h 2 m +1 ( u/ ( √ 2) m +1 ) ≤ 1 a nd Z 0 −∞ q ( x ) e 2 m x 2 Z ( √ 2) m +1 x −∞ e − u 2 / 2 du ! dx ≤ P q ( X < 0) r π 2 . C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 9 Therefore I − m ≤ 4 2 N ( m +1) (P q ( X < 0)) N ( m )  P q ( X < 0) r π 2  1 / 2 m . Since N ( m ) → 1 as m → ∞ , we conclude I − ≤ 2 P q ( X < 0) . One ca n similar ly show that I + ≤ 2 P q ( X > 0), and the result follows. The co mputations for dens ities prop or tional to e − x 4 / 12 are s imilar and a r e left to the reader. Pr o of of the c onstants κ 2 . Let p ( x ) = e − x I [0 , ∞ ) ( x ), which r e a dily implies P ( x ) = (1 − e − x ) I [0 , ∞ ) ( x ). This leads to E q h  I [ x, ∞ ) ( X ) − P ( X )  2 / ( p ( X )) 2 i = Z ∞ 0 q ( y ) e 2 y ( I [ x, ∞ ) ( y ) − 1 + e − y ) 2 dy = Z x 0 q ( y ) e 2 y (1 − e − y ) 2 dy + Z ∞ x q ( y ) e 2 y e − 2 y dy = Z x 0 q ( y ) e 2 y (1 − 2 e − y ) dy + Z x 0 q ( y ) dy + Z ∞ x q ( y ) dy ≤ 1 + e 2 x (1 − 2 e − x )P q ( X ≤ x ) , since e 2 y (1 − 2 e − y ) is a monotone increa sing function on R + . This immediately yields κ 2 ≤ sup x ≥ 0  e − x q 1 + e 2 x (1 − 2 e − x )P q ( X ≤ x )  , a qua n tit y which can b e bo unded by 1. Now let p ( x ) = (2 π ) − 1 / 2 e − x 2 and P ( x ) = Φ( x ), the cumulativ e distribution function of the standard normal distribution. Similar ly a s f or the exponential, we ha ve E q [  I [ x, ∞ ) ( X ) − P ( X )  2 / ( p ( X )) 2 ] = 2 π Z ∞ −∞ q ( y ) e y 2 ( I [ x, ∞ ) ( y ) − Φ ( y )) 2 dy = 2 π Z x −∞ q ( y ) e y 2 (Φ( y )) 2 dy + 2 π Z ∞ x q ( y ) e y 2 (1 − Φ( y )) 2 dy ≤ 2 π e x 2 (Φ( x )) 2 Z x −∞ q ( y ) dy + 2 π e x 2 (1 − Φ( x )) 2 Z ∞ x q ( y ) dy = 2 π e x 2 (Φ( x )) 2 + 2 π e x 2 (1 − 2 Φ( x ))P q ( X ≥ x ) . C. L ey and Y. Swan./Stein ’s densit y appr o ach and Fisher information 10 This ag ain dire ctly leads to κ 2 ≤ sup x ∈ R  (2 π ) − 1 / 2 e − x 2 / 2 q 2 π e x 2 ((Φ( x )) 2 + (1 − 2Φ( x ))P q ( X ≥ x )  = sup x ∈ R  q (Φ( x )) 2 + (1 − 2Φ( x ))P q ( X ≥ x )  , a qua n tit y which can b e shown to equal 1. The co mputations for dens ities prop or tional to e − x 4 / 12 are s imilar and a r e left to the reader. References [1] Barb our, A. D., Johnso n, O ., K onto y ia nnis, I. a nd Madiman, M. (2010) Comp ound Poisson approximation via information functionals. Ele ctr on. J. Pr ob ab. 15 , 1344– 1369. [2] Chen, L. H. Y. and Shao, Q.-M. (20 05) Stein’s method for normal a ppr ox- imation. In An intr o duction to Stein ’s metho d , Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singa p. 4 , 1–59. [3] Cov er , T. and Thomas, J. (2006 ) Elements of Information The ory . Wiley & Sons, New Y ork, Seco nd edition. [4] G¨ otze, F. and Tikhomir ov, A. N. (2006) Limit theor e ms for sp ectra of ra n- dom matric es with ma rtingale structur e. T e or. V er oyatn. Primen. 51 , 171– 192; tra nslation in The ory Pr ob ab. Appl. 51 , 42–64 (20 07). [5] Johnson, O. (200 4) Information t he ory and the c entr al limit the or em . Imp e- rial College Pr ess, London, UK. [6] Johnson, O. and Barro n, A. B. (2004) Fisher infor mation inequa lities and the c e n tral limit theore m. Pr ob ab. The ory R elate d Fields 129 , 391–4 09. [7] Konto yiannis, I., Harr emo¨ es, P . and Johnson, O . (200 5) E ntrop y and the law of small num b er s. IEEE T r ans. inf orm. the ory 51 , 466 – 472. [8] Ley , C. and Swan, Y. (201 1) A unified approa ch to Stein characterizations. Pr eprint a vailable at http://arxiv.or g/abs/1 105.4925 . [9] Schoutens, W. (2001) O rthogona l Polynomials in Stein’s Method. J. Math. Anal. A ppl. 253 , 515 –531. [10] Shimizu, R. (19 75) O n Fisher’s a mount o f info r mation for location family . In: G.P. Patil et al., e ditors, S tatistic al Distributions in Scientific Work 3 , 305–3 12. [11] Stein, C. (1972) A b ound for the er ror in the normal approximation to the distribution of a s um o f dependent random v aria bles. In Pr o c. S ixth Berkeley Symp. Math. Statist. Pr ob ab. V ol. II: Pr ob ability The ory , 583– 602. [12] Stein, C. with Diaconis, P ., Holmes, S. and Reinert, G. (2004) Use of ex- changeable pair s in the a nalysis of sim ulations. In Stein ’s Metho d: Exp ository L e ctu r es and Appli c ations, IMS L e ctur e Notes Mono gr. S er. 46 , 69–7 7.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment