On the entropy and log-concavity of compound Poisson measures

On the en trop y and log-conca vity of comp ound P oisso n measur es Oliv er Johnson ∗ Ioannis Konto yiannis † Moksha y Madiman ‡ No v em b er 7, 2018 Abstract Motiv ated, in part, b y the desire to develop an informa tion-theoretic foundation for comp ound Poisson approximation limit theorems (analo gous to the corr esp onding dev el- opments for the central limit theore m and for simple Poisson a pproximation), this w ork examines suﬃcient conditions under which the compound Poisson distribution has max- imal entrop y within a natura l class of proba bility measures on the nonnegative integers. W e sho w that the natura l analog of the P oiss on maxim um entrop y prop erty remains v alid if the measur e s under consideration are log-concave, but that it fails in gener al. A parallel maximum en tropy result is established fo r the fa mily of comp ound binomial measures. The pro ofs ar e lar gely based on ideas r elated to the semigroup a pproach in tro duced in recent work b y Johnson [12] for the Poisson family . Suﬃcient conditions are given for comp ound distributions to b e lo g-concave, and s p eciﬁc examples are presented illustra ting all the ab ov e results. Keyw ords ∗ Department of Mathematics, Universit y of Bristol, Universit y W alk, Bristol, BS8 1TW, UK. Email: O.Johnson@ bristol.ac.uk † Department of Informatics, Athens Universit y of Economi cs & Business, P atission 76, A thens 10434, Greece. Email: yiannis@aueb.gr ‡ Department of S tatistics, Y ale Universit y , 24 Hillhouse Avenue, New H aven, CT 06511, USA. Email: mokshay.ma diman@yale.edu 1 1 In tro duction A p articularly app ealing wa y to state the classical central limit theorem is to say that, if X 1 , X 2 , . . . are in dep end en t and iden tically distributed, con tinuous random v aria bles with zero mean and u nit v ariance, th en the en trop y of their normalized partial sums S n = 1 √ n P n i =1 X i increases with n to the entrop y of the standard normal distribu tion, whic h is maximal among all rand om v ariables with zero mean an d unit v ariance. More pr ecisely , if f n denotes the densit y of S n and φ the stand ard normal density , then, as n → ∞ , h ( f n ) ↑ h ( φ ) = sup { h ( f ) : densities f w ith mean 0 and v ariance 1 } , (1) where h ( f ) = − R f log f denotes the d iﬀeren tial en tropy and log d enotes the natural logarithm. Precise conditions under wh ic h (1) holds are giv en in [1][25][20]; also see [19][4][11] and the references therein, wh ere n umerous related results are stated, along with their history . P art of the app eal of this formalization of the cen tral limit theorem come s f r om its analogy to the second la w of thermo d ynamics: T he “state” (meaning the distrib ution) of the random v aria bles S n ev olv es monotonically , until the maximum entr opy state, the standard normal dis- tribution, is reac hed. Moreo v er, the introdu ction of information-theoretic ideas and tec hniques in connection with the entrop y has motiv ate d n umerous r elated results (and their pro ofs), gen- eralizing and strengthening th e cen tral limit theorem in diﬀeren t directions; see the references men tioned ab o v e for details. The cla ssical Poi sson con v ergence limit theorems, of wh ic h th e binomial-to -Po isson is the pro- tot ypical example, ha v e also b een examined under a similar ligh t. An analog ous program has b een r ecen tly ca rried out in this case [23][14][9][18][12]. T he starting p oint is the id en tiﬁcation of the Po isson distribution as that whic h h as m aximal en tropy within a natural class of pr ob- abilit y measures. Perhaps the sim p lest wa y to state and prov e this is along the follo w ing lines; ﬁrst we make some simple deﬁ nitions: Deﬁnition 1.1 F or any p ar ameter ve ctor p = ( p 1 , p 2 , . . . , p n ) with e ach p i ∈ [0 , 1] , the sum of indep endent Bernoul li r ando m variables B i ∼ Bern ( p i ) , S n = n X i =1 B i , is c al le d a Bern oulli sum , and its pr ob ability mass function is denote d by b p ( x ) := Pr { S n = x } , for x = 0 , 1 , . . . . F urther, for e ach λ > 0 , we deﬁne the fol lowing sets of p ar ameter ve ctors: P n ( λ ) =  p ∈ [0 , 1] n : p 1 + p 2 + · · · + p n = λ  and P ∞ ( λ ) = [ n ≥ 1 P n ( λ ) . Shepp and Olkin [23] sho w ed that, for ﬁ xed n ≥ 1, the Bernoulli su m b p whic h has maximal en trop y among all Bernoulli sums with mean λ , is Bin( n, λ/n ), th e binomial w ith parameters n and λ/n , H (Bin( n, λ/n )) = max n H ( b p ) : p ∈ P n ( λ ) o , (2) 2 where H ( P ) = − P x P ( x ) log P ( x ) denotes the discrete en trop y fun ction. Noting that the binomial Bin( n, λ/n ) con ve rges to the Poi sson distribution Po( λ ) as n → ∞ , and that the classes of Bernou lli sums in (2) are nested, { b p : p ∈ P n ( λ ) } ⊂ { b p : p ∈ P n +1 ( λ ) } , Harremo ¨ es [9] n oticed that a simple limiti ng argumen t giv es the follo w ing maxim um en trop y prop erty for the Po isson distribution: H (P o( λ )) = s up n H ( b p ) : p ∈ P ∞ ( λ ) o . (3) P artly motiv ated b y the desire to pro vide an information-theoretic found ation for c omp ound Poisson limit the o r ems and the more general p roblem of c omp ound P oisson appr oximation , as a ﬁrst step we consid er the problem of generaliz ing the maximum entrop y prop erties (2) and (3) to the case of c omp ound Poisson distrib u tions on Z + . 1 W e b egin with some deﬁnitions: Deﬁnition 1.2 L et P b e an arbitr ary distribution on Z + = { 0 , 1 , . . . } , and Q a distribution on N = { 1 , 2 , . . . } . The Q -compou n d distribution C Q P is the distribution of the r andom sum, Y X j =1 X j , (4) wher e Y has distribution P and the r andom variables { X j } ar e indep endent and identic al ly distribute d (i . i.d.) with c ommo n distribution Q and indep endent of Y . The distribution Q is c al le d a comp ounding distrib ution , and the map P 7→ C Q P is the Q -comp ou n ding op eration . The Q -c omp ound distribution C Q P c an b e explicitly written as the mixtur e, C Q P ( x ) = ∞ X y = 0 P ( y ) Q ∗ y ( x ) , x ≥ 0 , (5) wher e Q ∗ j ( x ) is the j th c onvolution p ower of Q and Q ∗ 0 is the p oint mass at x = 0 . Ab o ve and throughout the pap er, the empt y sum P 0 j =1 ( · · · ) is tak en to b e zero; all random v aria bles considered are supp orted on Z + = { 0 , 1 , . . . } ; and all comp oun ding distrib u tions Q are supp orted on N = { 1 , 2 , . . . } . Example 1.3 L et Q b e an arbitr ary distribution on N . 1. F or any 0 ≤ p ≤ 1 , the comp ound Bernoulli distribution CBern ( p, Q ) is the distribution of the pr o duct B X , wher e B ∼ Bern ( p ) and X ∼ Q ar e indep endent. It ha s pr ob ability mass function C Q P , wher e P is the Bern ( p ) mass function, so tha t, C Q P (0) = 1 − p and C Q P ( x ) = pQ ( x ) for x ≥ 1 . 1 Recall that the compou n d P oisson distributions are the on ly inﬁn itely divisible distributions on Z + , and also they are (discrete) stable la ws [24]. In the wa y of motiv ation w e also recall Gneden ko and Korolev’s remark that “there should be mathematical ... probabilistic models of the univ ersal principle of non-d ecrease of un certaint y ,” and their prop osal that w e should “ﬁnd conditions under which certain limit la ws app earing in limit theorems of probabilit y theory p ossess extremal entropy prop erties. Immediate candidates to b e sub jected to such analysis are, of course, stable la ws . . . ”; see [8, pp . 211-215]. 3 2. A comp ound Bernoulli sum is a sum of indep endent c omp ound Bernoul li r andom vari- ables, al l with r esp e ct to the same c omp ounding distribution Q : L et X 1 , X 2 , . . . , X n b e i.i.d. with c o mmon distribution Q and B 1 , B 2 , . . . , B n b e indep endent Bern( p i ). We c al l, n X i =1 B i X i D = P n i =1 B i X j =1 X j , a comp ound Bernoulli s um ; in view of (4) , its distribution is C Q b p , wher e p = ( p 1 , p 2 , . . . , p n ) . 3. In the sp e cial c ase of a c omp o und Bernoul li sum with al l its p ar ameter s p i = p for a ﬁxe d p ∈ [0 , 1] , we say that it has a co mp ound b inomial distribu tion , denote d b y CBin( n, p, Q ) . 4. L et Π λ ( x ) = e − λ λ x /x ! , x ≥ 0 , denote the Po ( λ ) mass func tion. Then, for any λ > 0 , the comp ound Po isson distribution CPo( λ, Q ) i s the distribution with mass fu nc tion C Q Π λ : C Q Π λ ( x ) = ∞ X j =0 Π λ ( j ) Q ∗ j ( x ) = ∞ X j =0 e − λ λ j j ! Q ∗ j ( x ) , x ≥ 0 . (6) In view of the Shepp -Olkin maximum en tropy prop ert y (2) for the binomial distribu tion, a ﬁrst natural co njecture migh t b e that the compoun d binomial h as maximum en tropy among all comp ound Bernoulli sums C Q b p with a ﬁxed mean; that is, H (CBin( n, λ/n, Q )) = max n H ( C Q b p ) : p ∈ P n ( λ ) o . (7) But, p erh aps somewhat surprisin gly , as Chi [6] has noted, (7) fails in general. F or examp le, taking Q to b e the uniform distribu tion on { 1 , 2 } , p = (0 . 0 0125 , 0 . 0 0875) and λ = p 1 + p 2 = 0 . 01, direct computation s h o ws that, H (CBin(2 , λ/ 2 , Q )) < 0 . 090798 < 0 . 090804 < H ( C Q b p ) . (8) As the Shepp-Olkin result (2) was only s een as an inte rmediate step in pro ving the maxim um en trop y prop ert y of the Poi sson distr ibution (3), we ma y still h op e that the corresp onding result remains true for comp ound Po isson measures, namely that, H (CP o( λ, Q )) = sup n H ( C Q b p ) : p ∈ P ∞ ( λ ) o . (9) Again, (9) fails in general. F or example, taking the same Q, λ and p as ab ov e, yields, H (CP o( λ, Q )) < 0 . 090765 < 0 . 0908 04 < H ( C Q b p ) . The main pu rp ose of the presen t work is to sho w that, despite th ese negativ e r esults, it is p ossible to pro vide natural, broad suﬃcien t conditions, un der whic h the comp ound b inomial and comp ound Po isson distributions can b e sho wn to ha v e maximal en tropy in an app ropriate class of measur es. Our ﬁrs t r esult, Th eorem 1.4 b elo w, states that (7) do e s hold, u nder certain conditions on Q and CBin( n, λ, Q ): 4 Theorem 1.4 If the distribution Q on N and the c omp ound binomial distribution CBin( n, λ/n, Q ) ar e b oth lo g - c onc ave, then, H (CBin( n, λ/n, Q )) = max n H ( C Q b p ) : p ∈ P n ( λ ) o , as lon g as the tail of Q satisﬁes either one of the fol lowing pr o p erties: ( a ) Q has ﬁnite supp ort; or ( b ) Q ha s tails he avy enough so that, for some ρ, β > 0 and N 0 ≥ 1 , we have, Q ( x ) ≥ ρ x β , for al l x ≥ N 0 . The pro of of the theorem is giv en in Section 3 . As ca n b e seen there, conditions ( a ) and ( b ) are introdu ced purely f or tec hnical r easons, and can pr ob ab ly b e signiﬁ can tly relaxed. The notion of log-conca vity , on the other hand, is cen tral in the dev elopmen t of the ideas in this w ork. [In a diﬀeren t setting, log-conca vit y also app ears as a natural condition f or a diﬀerent maxim um entrop y pr oblem considered by Co v er and Z hang [7].] Recall that the distrib ution P of a random v a riable X on Z + is lo g-c onc ave if its supp ort is a (p ossibly inﬁnite) interv al of successiv e intege rs in Z + , and, P ( x ) 2 ≥ P ( x + 1) P ( x − 1) , for all x ≥ 1. (10) W e also recall that most of th e commonly used distrib u tions app earing in applications (e.g ., the Po isson, bin omial, geometric, n egativ e b in omial, hyp ergeometric logarithmic series, or P oly a-Eggen b erger distribu tion) are log- conca v e. Another k ey prop ert y is that of ultra log-co nca vit y; cf. [22]. Th e distribu tion P of a rand om v aria ble X is ultr a lo g-c onc ave if P ( x ) / Π λ ( x ) is log-conca v e, that is, if, xP ( x ) 2 ≥ ( x + 1) P ( x + 1) P ( x − 1) , for all x ≥ 1. (11) Note that the Poisson distribution as well as all Bernoulli su ms are ultra log-conca v e. Johnson [12] recen tly pro ve d the follo wing maximum entrop y prop erty for the P oisson distri- bution, generalizing (3): H (P o( λ )) = m ax n H ( P ) : u ltra log-co nca ve P with m ean λ o . (12) Our n ext result (pr o v ed in S ection 2) states that, as long as Q and th e comp ound P oisson measure CP o( λ, Q ) are log-conca v e, the same maximum en trop y statemen t as in (12) remains v al id in the comp ound Po isson case: Theorem 1.5 If the distribution Q on N and and the c omp ound P oisson distribution C P o( λ, Q ) ar e b oth lo g - c onc ave, then, H (CP o( λ, Q )) = max n H ( C Q P ) : u ltr a lo g -c onc ave P with me an λ o . In Section 4 w e giv e conditions un d er whic h the comp ound P oisson and comp ound Bernoulli distributions are log-conca v e. In p articular, the results th ere imp ly the f ollo wing explicit maxim um entrop y statemen ts. 5 Example 1.6 1. L et Q b e an arbitr a ry lo g-c onc ave distribution on N . Then L emma 4.1 c om bine d with The or em 1.4 implies that the maximum entr opy pr op erty of the c omp ound binomial distribution in e quation (7) holds, for al l λ lar ge enough. That is, the c om- p ound bi nomial CBin( n, λ/n, Q ) has maximal entr o py among al l c omp ound Bernoul li sums C Q b p with p 1 + p 2 + · · · + p n = λ , as long as λ ≥ nQ (2) Q (1) 2 + Q (2) . 2. Supp ose Q is supp orte d on { 1 , 2 } , with pr o b abilities Q (1) = q , Q (2) = 1 − q , and c on sider the class of al l Bernoul li sums b p with me an p 1 + p 2 + · · · + p n = λ . The or em 4.2 c ombine d with The o r em 1.5 implies that the c om p ound Poisson maximum entr op y pr op erty (9) holds in this c ase, as long as λ is lar ge enough. In other wor ds, the distribution CPo ( λ, Q ) has maximal e ntr opy among al l c omp ound Bernoul li sums C Q b p with p 1 + p 2 + · · · + p n = λ ≥ 2(1 − q ) q 2 . 3. Supp ose Q is ge ometric with p ar ameter α ∈ (0 , 1) , i. e ., Q ( x ) = α (1 − α ) x − 1 for al l x ≥ 1 , and again c onsider the class of a Bernoul li sums b p with me an λ . Then The or em 4.4 c om bine d with The or em 1.5 implies that (9) holds for al l lar ge λ : The c omp ound Poisson distribution CP o( λ, Q ) has maximal entr opy among al l c o mp ound Bernoul li sums C Q b p with p 1 + p 2 + · · · + p n = λ ≥ 2(1 − α ) α . Clearly , it remains an op en question to giv e ne c e ssary and su ﬃ cien t conditions on λ and Q for the comp oun d Po isson and comp ound b in omial distributions to ha v e maximal en trop y within an app ropriately d eﬁned class, or even for the comp ound P oisson distribution to b e log- conca v e. Sect ion 4 end s with a co njecture, together with some sup p orting evidence, stating that CPo ( λ, Q ) is log-co nca ve w hen Q is log-conca v e and λQ (1) 2 ≥ 2 Q (2). 6 2 Maxim um En trop y Prop ert y of the Comp ound P oisson Dis- tribution Here we sho w that, if Q and the comp ound Poisson distribution CP o( λ, Q ) = C Q Π λ are b oth log-co nca ve , then CP o( λ, Q ) has maxim um en trop y among all distribu tions of the form C Q P , when P has mean λ and is ultra log-conca v e. Our app r oac h is an extension of the ‘semigroup’ argumen ts of [12]. W e b egin by recording some basic p rop erties of log-conca v e and u ltra log-conca v e distributions: ( i ) If P is ultra log-conca v e, then fr om the deﬁnitions it is immediate that P is log-co nca ve . ( ii ) If Q is log-conca v e, then it has ﬁn ite momen ts of all orders; see [16, Th eorem 7]. ( iii ) If X is a rand om v ariable with ultra log-conca v e distr ib ution P , then (b y ( i ) and ( ii )) it h as ﬁnite momen ts of all ord ers. Moreo v er, consid er in g the co v a riance b et we en the decreasing function P ( x + 1)( x + 1) / P ( x ) and th e increasing fu nction x ( x − 1) · · · ( x − n ), sho ws that th e falling factorial moment s of P satisfy , E [( X ) n ] := E [ X ( X − 1) · · · ( X − n + 1)] ≤ ( E ( X )) n ; see [12] and [10] for details. ( iv ) The Poisson distribution and all Bern oulli sums are u ltra log-co nca ve . Recall the follo wing deﬁnition from [12]: Deﬁnition 2.1 Given α ∈ [0 , 1] and a r and om variable X ∼ P on Z + with me an λ ≥ 0 , let U α P denote the distribution of the r andom variable, X X i =1 B i + Z λ (1 − α ) , wher e the B i ar e i.i. d. Bern ( α ) , Z λ (1 − α ) has distribution Po ( λ (1 − α )) , and al l r andom variables ar e indep endent of e ach other and of X . Note that, if X ∼ P has mean λ , then U α P has the s ame mean. Also, recall the follo wing useful relation that was established in P r op osition 3.6 of [12]: F or all y ≥ 0, ∂ ∂ α U α P ( y ) = 1 α ( λ ( U α P ( y ) − U α P ( y − 1) − (( y + 1) U α P ( y + 1) − y U α P ( y ))) . (13) Next w e d eﬁne another transformation of probabilit y distribu tions P on Z + : 7 Deﬁnition 2.2 Given α ∈ [0 , 1] , a distribution P on Z + and a c omp ounding distribution Q on N , let U Q α P denote the distribution C Q U α P : U Q α P ( x ) := C Q U α P ( x ) = ∞ X y = 0 U α P ( y ) Q ∗ y ( x ) , x ≥ 0 . An imp ortan t observ ation that will b e at the heart of the pro of of Theorem 1.5 b elo w is that, for α = 0, U Q 0 P is simp ly the comp oun d Poisson measure CP( λ, Q ), wh ile for α = 1, U Q 1 P = C Q P . The follo wing lemma, p ro v ed in the app endix, gives a rough b oun d on the third momen t of U Q α P : Lemma 2.3 Supp ose P is an ultr a lo g- c onc ave distribution with me an λ > 0 on Z + , and let Q b e a lo g- c onc ave c omp ounding distribution on N . F or e ach α ∈ [0 , 1] , let W α , V α b e r andom variables with distributions U Q α P = C Q U α P and C Q ( U α P ) # , r esp e ctively, wher e, for any distribution R with me a n ν , we write R # ( y ) = R ( y + 1)( y + 1) /ν for its size- biased version. Then the thir d moments E ( W 3 α ) and E ( V 3 α ) ar e b oth b ounde d ab ove by, λq 3 + 3 λ 2 q 1 q 2 + λ 3 q 3 1 , wher e q 1 , q 2 , q 3 denote the ﬁrst, se c ond and thir d moments of Q , r esp e ctively. In [12], the c haracterizatio n of th e Poisson as a maxim um en trop y distribution was p ro v ed through the decrease of its score function. In an analogous w a y , follo wing [3], we deﬁne the score function of a Q -compoun d random v ariable as follo ws. Deﬁnition 2.4 Given a distribution P on Z + with me an λ , the c orr esp onding Q -c omp ound distribution C Q P has sc or e f u nction deﬁne d by: r 1 ,C Q P ( x ) = P ∞ y = 0 ( y + 1) P ( y + 1) Q ∗ y ( x ) λ P ∞ y = 0 P ( y ) Q ∗ y ( x ) − 1 = P ∞ y = 0 ( y + 1) P ( y + 1) Q ∗ y ( x ) λC Q P ( x ) − 1 . (14) Notice that the mean of of r 1 ,C Q P with resp ect to C Q P is zero, and that if P ∼ Po( λ ) then r 1 ,C Q P ( x ) ≡ 0. F urther, when Q is the p oint m ass at 1 this score fu nction red u ces to the “scaled score function” in trod uced in [18]. But, un lik e the sca led score function and the alte rn ative score function r 2 ,C Q P giv en in [3], this score function is not only a function of the comp ound distribution C Q P , but also explicitly dep en d s on P . A pro jection ident it y and other p rop erties of r 1 ,C Q P are prov ed in [3]. Next we sho w that, if Q is log-conca v e and P is ultra log-conca v e, then the score function r 1 ,C Q P ( x ) is d ecreasing in x . Lemma 2.5 If P is ultr a lo g-c onc ave and the c omp ounding distribution Q is lo g-c onc ave, then the sc or e function r 1 ,C Q P ( x ) of C Q P is de cr e asing in x . 8 Pro of First w e recall Theorem 2.1 of Keilson and Sumita [17], w hic h implies th at, if Q is log-co nca ve , then for an y m ≥ n , and for any x : Q ∗ m ( x + 1) Q ∗ n ( x ) − Q ∗ m ( x ) Q ∗ n ( x + 1) ≥ 0 . (15) [This can b e p ro v ed b y considering Q ∗ m as the con vo lution of Q ∗ n and Q ∗ ( m − n ) , and wr iting Q ∗ m ( x + 1) Q ∗ n ( x ) − Q ∗ m ( x ) Q ∗ n ( x + 1) = X l Q ∗ ( m − n ) ( l )  Q ∗ n ( x + 1 − l ) Q ∗ n ( x ) − Q ∗ n ( x − l ) Q ∗ n ( x + 1)  . Since Q is log-c onca v e, then so is Q ∗ n , cf. [15], so the ratio Q ∗ n ( x + 1) /Q ∗ n ( x ) is decreasing in x , and (15) follo ws.] By deﬁn ition, r 1 ,C Q P ( x ) ≥ r 1 ,C Q P ( x + 1) if and only if, 0 ≤ X y ( y + 1) P ( y + 1) Q ∗ y ( x ) ! X z P ( z ) Q ∗ z ( x + 1) ! − X y ( y + 1) P ( y + 1) Q ∗ y ( x + 1) ! X z P ( z ) Q ∗ z ( x ) ! = X y , z ( y + 1) P ( y + 1) P ( z ) [ Q ∗ y ( x ) Q ∗ z ( x + 1) − Q ∗ y ( x + 1) Q ∗ z ( x )] . (16) Noting th at for y = z th e term in square brac ke ts in the double sum b ecomes zero, and sw apping the v alues of y and z in the range y > z , the d ouble sum in (16) b ecomes, X y < z [( y + 1) P ( y + 1) P ( z ) − ( z + 1) P ( z + 1) P ( y )] [ Q ∗ y ( x ) Q ∗ z ( x + 1) − Q ∗ y ( x + 1) Q ∗ z ( x )] . By the ultra log- conca vit y of P , the ﬁ rst square brac k et is p ositiv e for y ≤ z , and b y equa- tion (15) the second square brac k et is also p ositiv e for y ≤ z . W e r emark that, und er the same assumptions, and using a v ery similar argument, an analogous result holds for the score function r 2 ,C Q P recen tly introd uced in [3]. Com bining Lemmas 2.5 and 2.3 w ith equation (13) we d ed uce the follo w ing resu lt, whic h is the main tec hnical step in the pr o of of Theorem 1.5 b elo w. Prop osition 2.6 L et P b e an ultr a lo g-c onc ave distribution on Z + with me an λ > 0 , and as- sume that Q and CP o( λ, Q ) ar e b oth lo g-c onc ave. L et W α b e a r andom variable with distribution U Q α P , and deﬁne, for al l α ∈ [0 , 1] , the function, E ( α ) := E [ − log C Q Π λ ( W α )] . Then E ( α ) is c ontinuous for al l α ∈ [0 , 1] , it is diﬀer entiable for α ∈ (0 , 1) , and, mor e over, E ′ ( α ) ≤ 0 for α ∈ (0 , 1) . In p articular, E (0) ≥ E (1) . 9 Pro of Recall that, U Q α P ( x ) = C Q U α P ( x ) = ∞ X y = 0 U α P ( y ) Q ∗ y ( x ) = x X y = 0 U α P ( y ) Q ∗ y ( x ) , where the last sum is restricted to the range 0 ≤ y ≤ x , b ecause Q is su pp orted on N . Therefore, since U α P ( x ) is con tin uous in α [12], so is U Q α P ( x ), and to s ho w that E ( α ) is con tin uous it suﬃces to show that the series, E ( α ) := E [ − log C Q Π λ ( W α )] = − ∞ X x =0 U Q α P ( x ) log C Q Π λ ( x ) , (17) con v erges uniformly . T o that end, ﬁr st observ e that log-conca vit y of C Q Π λ implies that Q (1) is nonzero. [Otherw ise, if i > 1 b e the sm allest in teger i suc h that Q ( i ) 6 = 0, then C Q Π λ ( i + 1) = 0, but C Q Π λ ( i ) and C Q Π λ (2 i ) are b oth strictly p ositiv e, con tradicting the log-conca vit y of C Q Π λ .] Since Q (1) is n onzero, w e can b ound the comp ound Poisson probabilities as, 1 ≥ C Q Π λ ( x ) = X y [ e − λ λ y /y !] Q ∗ y ( x ) ≥ e − λ [ λ x /x !] Q (1) x , for all x ≥ 1 , so that the su mmands in (17) can b e b oun ded, 0 ≤ − log C Q Π λ ( x ) ≤ λ + log x ! − x log ( λQ (1)) ≤ C x 2 , x ≥ 1 , (18) for a constan t C > 0 that dep end s only on λ and Q (1). Therefore, for an y N ≥ 1, the tail of the series (17) can b e b ou n ded, 0 ≤ − ∞ X x = N U Q α P ( x ) log C Q Π λ ( x ) ≤ C E [ W 2 α I { W α ≥ N } ] ≤ C N E [ W 3 α ] , and, in view of Lemma 2.3, it con v erges uniformly . Therefore, E ( α ) is contin uous in α , and, in particular, conv erge nt for all α ∈ [0 , 1]. T o prov e that it is diﬀeren tiable at eac h α ∈ (0 , 1) w e need to establish that: (i) the summand s in (17) are con tin uously diﬀeren tiable in α for eac h x ; and (ii) the series of der iv ativ es con v erges uniformly . Since, as noted ab ov e, U Q α P ( x ) is d eﬁned b y a ﬁnite sum , w e can diﬀeren tiate with resp ect to α un d er the sum, to obtain, ∂ ∂ α U Q α P ( x ) = ∂ ∂ α C Q U α P ( x ) = x X y = 0 ∂ ∂ α U α P ( y ) Q ∗ y ( x ) . (19) And since U α P is cont inuously diﬀeren tiable in α ∈ (0 , 1) for eac h x (cf. [12, Prop osition 3.6] or equation (13) ab o v e), so are the summands in (17), establishing (i); in fact, th ey are inﬁ n itely diﬀeren tiable, w hic h can b e seen b y rep eated applications of (13). T o show that the series of 10 deriv ativ es con v erges uniformly , let α b e restricted in an arbitrary op en inte rv al ( ǫ, 1) for some ǫ > 0. T he relation (13) com bined with (19) yields, for an y x , ∂ ∂ α U Q α P ( x ) = x X y = 0  λ ( U α P ( y ) − U α P ( y − 1) − (( y + 1) U α P ( y + 1) − y U α P ( y ))  Q ∗ y ( x ) = − 1 α x X y = 0 (( y + 1) U α P ( y + 1) − λU α P ( y )) ( Q ∗ y ( x ) − Q ∗ y +1 ( x )) = − 1 α x X y = 0 (( y + 1) U α P ( y + 1) − λU α P ( y )) Q ∗ y ( x ) + x X v =0 Q ( v ) 1 α x X y = 0 (( y + 1) U α P ( y + 1) − λU α P ( y )) Q ∗ y ( x − v ) = − λ α U Q α P ( x ) P x y = 0 ( y + 1) U α P ( y + 1) Q ∗ y ( x ) λU Q α P ( x ) − 1 ! + λ α x X v =0 Q ( v ) U Q α P ( x − v ) P x y = 0 ( y + 1) U α P ( y + 1) Q ∗ y ( x − v ) λU Q α P ( x − v ) − 1 ! = − λ α U Q α P ( x ) r 1 ,U Q α P ( x ) − x X v =0 Q ( v ) U Q α P ( x − v ) r 1 ,U Q α P ( x − v ) ! . (20) Also, for an y x , b y deﬁ n ition, | U Q α P ( x ) r 1 ,U Q α P ( x ) | ≤ C Q ( U α P ) # ( x ) + U Q α P ( x ) , where, for an y distribu tion P , we write P # ( y ) = P ( y + 1)( y + 1) /λ for its size-biased v ersion. Hence for an y N ≥ 1, equations (20 ) and (18) yield the b ound ,      ∞ X x = N ∂ ∂ α U Q α P ( x ) log C Q Π λ ( x )      ≤ ∞ X x = N C λx 2 α n C Q ( U α P ) # ( x ) + U Q α P ( x ) + x X v =0 Q ( v )[ C Q ( U α P ) # ( x − v ) + U Q α P ( x − v )] o = 2 C α E h V 2 α + W 2 α + X 2 + X V α + X W α  I { V α ≥ N , W α ≥ N , X ≥ N } i ≤ C ′ α n E [ V 2 α I { V α ≥ N } ] + E [ W 2 α I { W α ≥ N } ] + E [ X 2 I { X ≥ N } ] o ≤ C ′ N α n E [ V 3 α ] + E [ W 3 α ] + E [ X 3 ] o , where C , C ′ > 0 are appropriate ﬁnite constan ts, and the rand om v a riables V α ∼ C Q ( U α P ) # , W α ∼ U Q α P and X ∼ Q are indep en den t. Lemma 2.3 implies that this b ound con v erges to 11 zero u n iformly in α ∈ ( ǫ, 1), as N → ∞ . Since ǫ > 0 was arbitrary , this establishes that E ( α ) is diﬀerentia ble for all α ∈ (0 , 1) and, in fact, th at we can d iﬀeren tiate the series (17) term-b y-term, to obtain, E ′ ( α ) = − ∞ X x =0 ∂ ∂ α U Q α P ( x ) log C Q Π λ ( x ) (21) = λ α ∞ X x =0 U Q α P ( x ) r 1 ,U Q α P ( x ) − x X v =0 Q ( v ) U Q α P ( x − v ) r 1 ,U Q α P ( x − v ) ! log C Q Π λ ( x ) = λ α ∞ X x =0 U Q α P ( x ) r 1 ,U Q α P ( x ) log C Q Π λ ( x ) − ∞ X v =0 Q ( v ) log C Q Π λ ( x + v ) ! , where the second equalit y f ollo ws from u s ing (20) abov e, and the r earrangemen t leading to the third equalit y f ollo ws b y int erchanging the order of (second) double summation and replacing x b y x + v . No w w e note that, exactly as in [12 ], the last series ab o ve is the co v ariance b et we en the (zero- mean) function r 1 ,U Q α P ( x ) and the fu nction ( log C Q Π λ ( x ) − P v Q ( v ) log C Q Π λ ( x + v )), u nder the measure U Q α P . Since P is ultra log-conca v e, so is U α P [12 ], hence the score fu nction r 1 ,U Q α P ( x ) is decreasing in x , by Lemma 2.5 . Also, the log-conca vit y of C Q Π λ implies that the second f u nction is increasing, and Chebyshev’s rearrangemen t lemma implies that the co v ariance is less than or equal to zero, pro ving that E ′ ( α ) ≤ 0, as claimed. Finally , the fact th at E (0) ≥ E (1) is an immediate consequence of the con tin uit y of E ( α ) on [0 , 1] and the fact that E ′ ( α ) ≤ 0 for all α ∈ (0 , 1). Notice that, for the ab o v e pro of to wo rk, it is not n ecessary that C Q Π λ b e log-c onca v e; the w eak er prop erty that ( log C Q Π λ ( x ) − P v Q ( v ) log C Q Π λ ( x + v )) b e increasing is enough. Pro of of Theorem 1.5 As in Prop osition 2.6, let W α ∼ U Q α P = C Q U α P , and let D ( P k Q ) denote the relativ e en trop y b et w een P and Q , D ( P k Q ) := X x ≥ 0 P ( x ) log P ( x ) Q ( x ) . Then, noting that W 0 ∼ C Q Π λ and W 1 ∼ C Q P , w e hav e, H ( C Q P ) ≤ H ( C Q P ) + D ( C Q P k C Q Π λ ) = − E [log C Q Π λ ( W 1 )] ≤ − E [log C Q Π λ ( W 0 )] = H ( C Q Π λ ) , where the ﬁr s t in equalit y is simply the nonnegativit y of relativ e en trop y , and the seco nd in- equalit y is exactly the statemen t th at E (1) ≤ E (0), pr o v ed in Prop osition 2.6. 12 3 Maxim um En trop y Prop ert y of the Comp ound Binomial Dis- tribution Here we pro v e the maxim um entrop y result for comp ound binomial r andom v ariables, Th e- orem 1.4. The pro of, to some extent , parallels some of the arguments in [9][21][23], whic h rely on d iﬀeren tiating the comp ound -sum p robabilities b p ( x ) for a giv en parameter vecto r p = ( p 1 , p 2 , . . . , p n ) (recall Deﬁnition 1.1 in the Intro duction), with resp ect to an individual p i . Using the representat ion, C Q b p ( y ) = n X x =0 b p ( x ) Q ∗ x ( y ) , y ≥ 0 , (22) diﬀeren tiating C Q b p ( x ) reduces to d iﬀeren tiating b p ( x ), and leads to an expr ession equiv ale nt to that derived earlier in (20) for th e deriv at ive of C Q U α P with resp ect to α . Lemma 3.1 Given a p ar ameter ve cto r p = ( p 1 , p 2 , . . . , p n ) , with n ≥ 2 and e ach 0 ≤ p i ≤ 1 , let, p t =  p 1 + p 2 2 + t, p 1 + p 2 2 − t, p 3 , . . . , p n  , for t ∈ [ − ( p 1 + p 2 ) / 2 , ( p 1 + p 2 ) / 2] . Then, ∂ ∂ t C Q b p t ( x ) = ( − 2 t ) n X y = 0 b e p ( y )  Q ∗ ( y +2) ( x ) − 2 Q ∗ ( y +1) ( x ) + Q ∗ y ( x )  , (23) wher e e p = ( p 3 , . . . , p n ) . Pro of Note that the su m of the en tries of p t is constant as t v arie s, and that p t = p for t = ( p 1 − p 2 ) / 2, while p t = (( p 1 + p 2 ) / 2 , ( p 1 + p 2 ) / 2 , p 3 , . . . , p n ) for t = 0. W riting k = p 1 + p 2 , b p t can b e expressed, b p t ( y ) =  k 2 4 − t 2  b e p ( y − 2) +  k  1 − k 2  + 2 t 2  b e p ( y − 1) +  1 − k 2  2 − t 2 ! b e p ( y ) , and its d er iv ativ e with r esp ect to t is, ∂ ∂ t b p t ( y ) = − 2 t  b e p ( y − 2) − 2 b e p ( y − 1) + b e p ( y )  . 13 The expr ession (2 2) for C Q b p t sho ws that it is a ﬁnite linear com bination of co mp ound-su m probabilities b p t ( x ), so w e can diﬀeren tiate inside the sum to obtain, ∂ ∂ t C Q b p t ( x ) = n X y = 0 ∂ ∂ t b p t ( y ) Q ∗ y ( x ) = − 2 t n X y = 0  b e p ( y − 2) − 2 b e p ( y − 1) + b e p ( y )  Q ∗ y ( x ) = − 2 t n − 2 X y = 0 b e p ( y )  Q ∗ ( y +2) ( x ) − 2 Q ∗ ( y +1) ( x ) + Q ∗ y ( x )  , since b e p ( y ) = 0 for y ≤ − 1 and y ≥ n − 1. Next w e state and prov e the equiv alent of Prop osition 2.6 ab ov e: Prop osition 3.2 Supp o se that the distribution Q on N and the c omp ound binomial distribu- tion CBin( n, λ/n, Q ) ar e b ot h lo g-c onc ave; let p = ( p 1 , p 2 , . . . , p n ) b e a given p ar ameter ve c tor with n ≥ 2 , p 1 + p 2 + . . . + p n = λ > 0 , and p 1 ≥ p 2 ; let W t b e a r andom variable with distribution C Q b p t ; and deﬁne, for al l t ∈ [0 , ( p 1 − p 2 ) / 2] , the function, E ( t ) := E [ − log C Q b p ( W t )] , wher e p denotes the p ar ameter ve ctor with al l entries e qual to λ/n . If Q satisﬁes either of the c ond itions: ( a ) Q ﬁnite supp ort; or ( b ) Q ha s tails he avy enough so that, for some ρ, β > 0 and N 0 ≥ 1 , we have, Q ( x ) ≥ ρ x β , for al l x ≥ N 0 , then E ( t ) is c o ntinuous for al l t ∈ [0 , ( p 1 − p 2 ) / 2] , it is diﬀer entiable for t ∈ (0 , ( p 1 − p 2 ) / 2) , and, mor e over, E ′ ( t ) ≤ 0 for t ∈ (0 , ( p 1 − p 2 ) / 2) . In p articular, E (0) ≥ E (( p 1 − p 2 ) / 2) . Pro of The comp ound distrib ution C Q b p t is deﬁn ed b y the ﬁnite s um, C Q b p t ( x ) = n X y = 0 b p t ( y ) Q ∗ y ( x ) , and is, ther efore, con tin uous in t . First, assume that Q h as ﬁn ite su pp ort. Th en so do es C Q b p for an y p arameter v ector p , and the con tin uit y and diﬀerent iabilit y of E ( t ) are trivia l. In particular, the series d eﬁning E ( t ) is a ﬁnite su m , so we can diﬀerentia te term-b y-term, to 14 obtain, E ′ ( t ) = − ∞ X x =0 ∂ ∂ t C Q b p t ( x ) log C Q b p ( x ) = 2 t ∞ X x =0 n − 2 X y = 0 b e p ( y )  Q ∗ ( y +2) ( x ) − 2 Q ∗ ( y +1) ( x ) + Q ∗ y ( x )  log C Q b p ( x ) (24) = 2 t n − 2 X y = 0 ∞ X z =0 b e p ( y ) Q ∗ y ( z ) X v,w Q ( v ) Q ( w )  log C Q b p ( z + v + w ) − log C Q b p ( z + v ) − log C Q b p ( z + w ) + log C Q b p ( z )  , (25) where (24) follo ws by Lemma 3.1. By assumption, the distribu tion C Q b p = CBin( n, λ/n, Q ) is log-conca v e, wh ic h implies that, for all z , v , w suc h that z + v + w is in the su pp ort of CBin( n, λ/n, Q ), C Q b p ( z ) C Q b p ( z + v ) ≤ C Q b p ( z + w ) C Q b p ( z + v + w ) . Hence the term in square brac k ets in equation (25) is n egativ e, and the result follo ws. No w, supp ose condition ( b ) holds on the tail s of Q . First we note that the momen ts of W t are all uniformly b ound ed in t : Indeed, for an y γ > 0, E [ W γ t ] = ∞ X x =0 C Q b p t ( x ) x γ = ∞ X x =0 n X y = 0 b p t ( y ) Q ∗ y ( x ) x γ ≤ n X y = 0 ∞ X x =0 Q ∗ y ( x ) x γ ≤ C n q γ , (26) where C n is a constan t dep en d ing only on n , and q γ is the γ th moment of Q , w hic h is of course ﬁnite; recall pr op ert y ( ii ) in the b eginning of Section 2. F or the con tinuit y of E ( t ), it suﬃces to sh ow that the series, E ( t ) := E [ − log C Q b p ( W t )] = − ∞ X x =0 C Q b p t ( x ) log C Q b p ( x ) , (27 ) con v erges u niformly . Th e tail assump tion on Q implies that, for all x ≥ N 0 , 1 ≥ C Q b p ( x ) = n X y = 0 b p ( y ) Q ∗ y ( x ) ≥ λ (1 − λ/n ) n − 1 Q ( x ) ≥ λ (1 − λ/n ) n − 1 ρ x β , so that, 0 ≤ − log C Q b p ( x ) ≤ C x β , (28) for an app ropriate constan t C > 0. Th en, for N ≥ N 0 , the tail of the series (27) can b e b ound ed, 0 ≤ − ∞ X x = N C Q b p t ( x ) log C Q b p ( x ) ≤ C E [ W β t I { W t ≥ N } ] ≤ C N E [ W β +1 t ] ≤ C N C n q β +1 , 15 where the last inequalit y follo w s from (26). This ob viously co nv erges to zero, un iformly in t , therefore E ( t ) is con tin uous. F or the diﬀeren tiabilit y of E ( t ), note that the su mmands in (17) are con tinuously diﬀerentia ble (b y Lemma 3.1), and that the series of deriv ativ es co nv erges uniformly in t ; to see that, for N ≥ N 0 w e apply L emma 3.1 together w ith the b oun d (28) to get,      ∞ X x = N ∂ ∂ t C Q b p t ( x ) log C Q b p ( x )      ≤ 2 t ∞ X x = N n X y = 0 b e p ( y )  Q ∗ ( y +2) ( x ) + 2 Q ∗ ( y +1) ( x ) + Q ∗ y ( x )  C x β ≤ 2 C t n X y = 0 ∞ X x = N  Q ∗ ( y +2) ( x ) + 2 Q ∗ ( y +1) ( x ) + Q ∗ y ( x )  x β , whic h is again easily seen to con v erge to zero un iformly in t as N → ∞ , since Q has ﬁnite momen ts of all orders. This establishes the diﬀerentiabilit y of E ( t ) and ju stiﬁes the term-b y- term diﬀeren tiation of the series (17); the rest of the pro of that E ′ ( t ) ≤ 0 is the same as in case ( a ). Note that, as with Prop osition 2.6, the ab o v e pro of only requires that the comp oun d bin omial distribution CBin( n, λ/n, Q ) = C Q b p satisﬁes a pr op ert y we ak er than log-conca vity , namely that the fu nction, log C Q b p ( x ) − P v Q ( v ) log C Q b p ( x + v ) , b e increasing in x . Pro of of Theorem 1.4 Assume, without loss of generalit y , that n ≥ 2. If p 1 > p 2 , then Prop osition 3.2 says that, E (( p 1 − p 2 ) / 2) ≤ E (0), that is, − ∞ X x =0 C Q b p ( x ) log C Q b p ( x ) ≤ − ∞ X x =0 C Q b p ∗ ( x ) log C Q b p ( x ) , where p ∗ = (( p 1 + p 2 ) / 2 , ( p 1 + p 2 ) / 2 , p 3 , . . . p n ) and p = ( λ/n, . . . , λ/n ). Sin ce the expression P ∞ x =0 C Q b p t ( x ) log C Q b p ( x ) is in v arian t u nder p ermutat ions of the elemen ts of the p arameter v ectors, w e d educe that it is maximized b y p t = p . Therefore, using, as b efore, the nonnega- tivit y of the relativ e entrop y , H ( C Q b p ) ≤ H ( C Q b p ) + D ( C Q b p k C Q b p ) = − ∞ X x =0 C Q b p ( x ) log C Q b p ( x ) ≤ − ∞ X x =0 C Q b p ( x ) log C Q b p ( x ) = H ( C Q b p ) = H (CBin( n, λ/n, Q )) , as claimed. 16 4 Conditions for Log-Conca vit y Theorems 1.5 and 1. 4 state that log-conca vit y is a su ﬃcien t condition for comp oun d binomial and compoun d Poisso n distribu tions to ha v e maximal entrop y within a natural class. Here w e giv e examples of wh en log-conca vit y h olds; if the r esults in this section can b e strengthened (in particular, if Conjecture 4.5 can b e prov ed), then the class of maxim um ent ropy distributions will b e acc ordingly widened. Belo w w e sho w th at a comp oun d Bernoulli sum is log-conca v e if th e p arameters are suﬃcien tly large, and that comp ound Bernoulli sum s and comp oun d Poi sson distrib utions are log-conca v e if Q is either supp orted only on the set { 1 , 2 } or is geometric. Lemma 4.1 Supp ose Q is a lo g-c onc ave distribution on N . (i) The c omp ound Bernoul li distribution C Bern ( p, Q ) is lo g-c onc ave if and only if p ≥ 1 1+ Q (1) 2 /Q (2) . (ii) The c om p ound Bernoul li sum distribution C Q b p is lo g- c onc ave as along as al l the elements p i of the p ar ameter ve ctor p = ( p 1 , p 2 , . . . , p n ) satisfy p i ≥ 1 1+ Q (1) 2 /Q (2) . Pro of Let Y ha v e distribu tion CBern ( p, Q ). Sin ce Q is log-co nca ve itself, the log-conca v ity of CBern ( p, Q ) is equiv alen t to the inequalit y , Pr( Y = 1) 2 ≥ Pr( Y = 2) Pr( Y = 0), which states that, ( pQ (1)) 2 ≥ (1 − p ) pQ (2), and this is exactly th e assumption of (i). The assertion in (ii) follo ws from (i), since the sum of indep en d en t log -conca v e random v ariables is log-conca ve; see, e.g., [15]. Next we examine cond itions under which a co mp ound Po isson m easure is log-co nca ve . Our argumen t is based, in part, on the some of the ideas in Johns on and Gol dschmidt [13], and also in W ang and Y e h [26], where transformations that preserv e log-conca vit y are stud ied. Note that, unlike for the P oisson distribu tion, it is not the case that eve ry comp ound P oisson distribution CP o( λ, Q ) is log-conca v e. Indeed, for any distribution P , considering the diﬀer- ence, C Q P (1) 2 − C Q P (0) C Q P (2), shows that a n ecessary condition for C Q P to b e log-conca v e is that, ( P (1) 2 − P (0) P (2)) /P (0) P (1 ) ≥ Q (2) /Q (1) 2 . (29) T aking P to b e the P o( λ ) distrib u tion, a n ecessary condition for C Po( λ, Q ) to b e log-conca v e is that, λ ≥ 2 Q (2) Q (1) 2 , (30) while for P = b p , a necessary condition for the compoun d Bernoulli sum C Q b p to b e log-co nca ve is, X i p i 1 − p i + X i p 2 i (1 − p i ) 2 ! X i p i 1 − p i ! − 1 ≥ 2 Q (2) Q (1) 2 , whic h, by Jensen’s inequalit y , w ill hold as long as, P i p i ≥ 2 Q (2) /Q (1) 2 . 17 Theorem 4.2 L et Q b e a distribution supp orte d on the set { 1 , 2 } . (i) The c omp ound Poisson distribution CPo( λ, Q ) is lo g-c onc ave for al l λ ≥ 2 Q (2) Q (1) 2 . (ii) The distribution C Q P is lo g - c onc ave for any ultr a lo g- c onc ave distribution P with supp ort on { 0 , 1 , . . . , N } (wher e N may b e inﬁnite), which satisﬁes, ( x + 1) P ( x + 1) /P ( x ) ≥ 2 Q (2) /Q (1) 2 for al l x = 0 , 1 , . . . , N . Note that, the s econd condition in (ii) is equiv alen t to requiring that N P ( N ) /P ( N − 1) ≥ 2 Q (2) /Q (1 ) 2 if N is ﬁnite, or that lim x →∞ ( x + 1) P ( x + 1) / P ( x ) ≥ 2 Q (2) /Q (1) 2 if N is inﬁnite. Pro of W riting R ( y ) = y ! P ( y ), w e kn o w that C Q P ( x ) = P x y = 0 R ( y ) ( Q ∗ y ( x ) /y !) . Hence, the log-co nca vit y of C Q P ( x ) is equiv a lent to sho wing that, X r Q ∗ r (2 x ) r ! X y + z = r R ( y ) R ( z )  r y   Q ∗ y ( x ) Q ∗ z ( x ) Q ∗ r (2 x ) − Q ∗ y ( x + 1) Q ∗ z ( x − 1) Q ∗ r (2 x )  ≥ 0 , (31) for all x ≥ 2, since th e case of x = 1 w as dealt with previously by equation (29). In p articular, for (i), taking P = P o( λ ), it s uﬃces to show that for all r and x , th e function, g r,x ( k ) := X y + z = r  r y  Q ∗ y ( k ) Q ∗ z (2 x − k ) Q ∗ r (2 x ) is unimo d al as a function of k (since g r,x ( k ) is symmetric ab out x ). In the general case (ii), writing Q (2) = p = 1 − Q (1), we ha v e, Q ∗ y ( x ) =  y x − y  p x − y (1 − p ) 2 y − x , so that,  r y  Q ∗ y ( k ) Q ∗ z (2 x − k ) Q ∗ r (2 x ) =  2 x − r k − y  2 r − 2 x 2 y − k  , (32) for an y p . No w, follo wing [13, Lemma 2.4] and [26, Lemma 2.1], we use summation b y parts to sho w that the inner sum in (31) is p ositiv e for eac h r (exce pt for r = x when x is o dd), by case-splitting according to the parit y of r . (a) F or r = 2 t , we rewrite the inner sum of equation (31) as, t X s =0 ( R ( t + s ) R ( t − s ) − R ( t + s + 1) R ( t − s − 1) ) × t + s X y = t − s  2 x − r x − y  2 r − 2 x 2 y − x  −  2 x − r x + 1 − y  2 r − 2 x 2 y − x − 1  ! , where the ﬁrst term in the ab o v e p ro du ct is p ositive by the ultra log-conca vit y of P (and h ence log-co nca vit y of R ), and the s econd term is p ositiv e by Lemma 4.3 b elow. 18 (b) Similarly , for x 6 = r = 2 t + 1, we rewrite the inner s u m of equation (31) as, t X s =0 ( R ( t + s + 1) R ( t − s ) − R ( t + s + 2) R ( t − s − 1)) × t +1+ s X y = t − s  2 x − r x − y  2 r − 2 x 2 y − x  −  2 x − r x + 1 − y  2 r − 2 x 2 y − x − 1  ! , where th e ﬁrst term in the p ro du ct is p ositive by the ultra log-conca vit y of P (and hence log-co nca vit y of R ) and the second term is p ositiv e b y Lemma 4.3 b elo w. (c) Finally , in the case of x = r = 2 t + 1, su bstituting k = x and k = x + 1 in (32), com bining the resulting expression with (31), and n oting that  2 r − 2 x u  is 1 if and only if u = 0 (and is zero, otherwise), w e see that the inner sum b ecomes, − R ( t + 1) R ( t )  2 t +1 t  , and the summand s in (31) redu ce to, − p x R ( t ) R ( t + 1) ( t + 1)! t ! . Ho w ev er, the next term in the outer sum of equation (31), r = x + 1, giv es p x − 1 (1 − p ) 2 2(2 t )!  R ( t + 1) 2  2  2 t t  −  2 t t + 1  − R ( t ) R ( t + 2)  2 t t  ≥ p x − 1 (1 − p ) 2 2(2 t )! R ( t + 1) 2  2 t t  −  2 t t + 1  = p x − 1 (1 − p ) 2 2( t + 1)! t ! R ( t + 1) 2 . Hence, the sum of the ﬁr st tw o terms is p ositiv e (and h ence the whole sum is p ositiv e) if R ( t + 1)(1 − p ) 2 / (2 p ) ≥ R ( t ). If P is Poi sson( λ ), this simply reduces to equation (30), otherwise we use the fact that R ( x + 1) /R ( x ) is d ecreasing. Lemma 4.3 (a) If r = 2 t , for any 0 ≤ s ≤ t , the sum, t + s X y = t − s  2 x − r x − y  2 r − 2 x 2 y − x  −  2 x − r x + 1 − y  2 r − 2 x 2 y − x − 1  ≥ 0 . (b) If x 6 = r = 2 t + 1 , for any 0 ≤ s ≤ t , th e sum, t +1+ s X y = t − s  2 x − r x − y  2 r − 2 x 2 y − x  −  2 x − r x + 1 − y  2 r − 2 x 2 y − x − 1  ≥ 0 . Pro of Th e pr o of is in t w o stages; ﬁrst we sho w that the sum is p ositiv e for s = t , then w e sho w th at there exists some S such that, as s increases, the increment s are p ositiv e for s ≤ S and negativ e for s > S . The result then follo ws, as in [13] or [26]. 19 F or b oth (a) and (b), note that for s = t , equation (32) implies that the sum is the diﬀerence b et wee n the co eﬃcien ts of T x and T x +1 in f r,x ( T ) = (1 + T 2 ) 2 x − r (1 + T ) 2 r − 2 x . Since f r,x ( T ) has degree 2 x an d has co eﬃcient s wh ic h are symmetric about T x , it is enough to sho w that the coeﬃcien ts form a u nimo dal s equ ence. Now, (1 + T 2 ) 2 x − r (1 + T ) has coeﬃcient s wh ic h d o form a un imo dal sequence. S tatemen t S 1 of Keilson and Gerb er [16] states th at an y b inomial distribution is strongly unimo dal, whic h means that it p reserv es unimo dality on con v olution. This means that (1 + T 2 ) 2 x − r (1 + T ) 2 r − 2 x is un imo dal if r − x ≥ 1, and w e n eed only c heck the case r = x , when f r,x ( T ) = (1 + T 2 ) r . Note that if r = 2 t is ev en, the diﬀerence b etw een the co eﬃcien ts of T x and T x +1 is  2 t t  , whic h is p ositiv e. In part (a), the incremen ts are equ al to  2 x − 2 t x − t + s  4 t − 2 x 2 t − 2 s − x  m ultiplied by the expression, 2 − ( x − t − s )(2 t − 2 s − x ) ( x + 1 − t + s )(2 t + 2 s − x + 1) − ( x − t + s )(2 t + 2 s − x ) ( x + 1 − t − s )(2 t − 2 s − x + 1) , whic h is p ositiv e for s small and negati ve for s large, since placing the term in brac k ets o v er a common denominator, the n umerator is of the form ( a − bs 2 ). Similarly , in part (b), the incremen ts equal  2 x − 2 t − 1 x − t + s  4 t +2 − 2 x 2 t − 2 s − x  times the expr ession, 2 − ( x − t − s − 1)(2 t − 2 s − x ) ( x + 1 − t + s )(2 t + 2 s − x + 3) − ( x − t + s )(2 t + 2 + 2 s − x ) ( x − t − s )(2 t + 1 − 2 s − x ) , whic h is agai n p ositiv e for s small and negativ e for s large. Theorem 4.4 L et Q b e a ge ometric distribution on N . Then C Q P is lo g-c onc ave for any distribution P which is lo g -c onc ave and sa tisﬁes the c ondition (29) . Pro of If Q is geometric with mean 1 /α , then, Q ∗ y ( x ) = α y (1 − α ) x − y  x − 1 y − 1  , whic h implies that, C Q P ( x ) = x X y = 0 P ( y ) α y (1 − α ) x − y  x − 1 y − 1  . Condition (29) ensur es th at C Q P (1) 2 − C Q P (0) C Q P (2) ≥ 0, s o, taking z = y − 1, w e need only pro ve that the sequence, C ( x ) := C Q P ( x + 1) / (1 − α ) x = x X z =0 P ( z + 1)  α 1 − α  z +1  x z  is log-conca v e. Ho we ve r, this follo w s immediately f rom [15, Theorem 7.3], w hic h p ro v es th at if { a i } is a log-conca v e sequence, then so is { b i } , deﬁned by b i = P i j =0  i j  a j . Finally , based on the discu s sion in the b eginning of this section, the ab ov e results, and some calculatio ns of the quant ities, C Q Π λ ( x ) 2 − C Q Π λ ( x − 1) C Q Π λ ( x + 1) for sm all x , we mak e the follo wing conjecture: 20 Conjecture 4.5 The c omp ound Poisson me asur e CPo( λ, Q ) is lo g-c onc ave, as long as Q is lo g- c onc ave and λQ (1) 2 ≥ 2 Q (2) . The condition λQ (1) 2 ≥ 2 Q (2) is, of course, n ecessary; recall the argument leading to equa- tion (30) ab o v e. In clo sing, w e list some kn o wn results that are relate d to this conjecture and ma y b e useful in pro ving (or d ispro ving) it: 1. Th eorem 2.3 of Steutel and v an Harn [24] shows th at, if { iQ ( i ) } is a decreasing sequence, then CPo ( λ, Q ) is a unimo dal distribution (recall that log-conca vit y implies unimo dalit y). In terestingly , the same condition pro vides a dic hotom y of results in comp ound Poisson appro ximation b ounds as dev elop ed in [2]: If { iQ ( i ) } is decreasing the b ounds are of the same form and order as in the simple Po isson case, while if it is not the b ound s are m uch larger. 2. Th eorem 3.2 of Cai and Willmot [5] s ho ws that if { Q ( i ) } is decreasing then th e distrib u- tion fun ction of the comp ound P oisson d istribution CPo( λ, Q ) is log-conca v e. 3. A conjecture similar to Conjecture 4.5 is that, for log-conca v e Q , if CPo( λ, Q ) is log- conca v e, then so is CP o( µ, Q ), for all µ ≥ λ . Theorem 4.9 of Keilson and Su m ita [17] pro ve s the related result that, if Q is log-conca v e, then, for any n , the ratio, C Q Π λ ( n ) C Q Π λ ( n + 1) is decreasing in λ . 21 Ac kno wledgemen t W e wish to thank Z. Chi for sharing his (unpublish ed) comp ound binomial coun ter-example men tioned in equation (8) in the introdu ction. App endix Pro of of Lemma 2.3 Recall that, as stat ed in prop erties ( ii ) and ( ii i ) in the b eginning of Section 2, Q has ﬁnite m oments of all orders , and that the n th falling factorial moment of an y ultra log-conca v e random v ariable Y with d istribution R on Z + is b ou n ded ab o v e b y ( E ( Y )) n . No w for an arbitrary ultra log-conca v e d istribution R , deﬁ ne r an d om v ariables Y ∼ R and Z ∼ C Q R . If r 1 , r 2 , r 3 denote the ﬁ rst three moment s of Y ∼ R , then , E ( Z 3 ) = q 3 r 1 + 3 q 1 q 2 E [( Y ) 2 ] + q 3 1 E [( Y ) 3 ] ≤ q 3 r 1 + 3 q 1 q 2 r 2 1 + q 3 1 r 3 1 . (33) Since the map U α preserve s ultra log-conca vit y [12], if P is ultra log-conca v e then so is R = U α P , so that (33) giv es the required b ound for the third momen t of W α , up on noting that the mean of the distribution U α P is equal to λ . Similarly , size-biasing preserv es ultra log-conca vit y; t hat is, if R is ultra log-conca v e, then so is R # , sin ce R # ( x + 1)( x + 1) /R # ( x ) = ( R ( x + 2)( x + 2)( x + 1)) / ( R ( x + 1)( x + 1)) = R ( x + 2)( x + 2) /R ( x + 1 ) is also decreasing. Hence, R ′ = ( U α P ) # is ultra log-conca v e, and (33 ) applies in this case as wel l. I n particular, noting that the mean of Y ′ ∼ R ′ = ( U α P ) # = R # can b e b ounded in terms of the mean of Y ∼ R as, E ( Y ′ ) = X x x ( x + 1) U α P ( x + 1) λ = E [( Y ) 2 ] E ( Y ) ≤ λ 2 λ = λ, the b ound (33) yields the requ ired b ound for the thir d moment of V α . References [1] S. Artstein, K. M. Ball, F. Barthe, and A. Nao r. Solution of S h annon’s problem on the monotonicit y of en trop y . J. Amer. Math. So c. , 17(4): 975–982 (electronic), 2004. [2] A. Barb our, L. Chen, and W.-L. Loh. Comp ound P oisson app r o ximation for nonnegativ e random v aria bles via Stein’s metho d . Ann. P r o b ab. , 20(4):184 3–1866, 19 92. [3] A. Barb ou r , O. T. John son, I. Kon to yiannis, and M. Ma diman. Manuscript In Pr ep ar ation , 2008 [4] A. R. Barron. Entrop y and the Central Limit Theorem. Ann. Pr ob ab. , 14(1):336 –342, 1986. 22 [5] J. Cai and G. E. Willmot. Monoto nicit y and ag ing prop erties of rand om sums. Statist. Pr ob ab. L ett. , 73(4):38 1–392, 2005. [6] Z. Chi. P ersonal comm un ication, 2006. [7] T. M. Co v er and Z. Zh ang. On the maximum en tropy of the sum of t wo dep endent random v aria bles. IEEE T r ans. Information The ory , 40(4):1 244–1246 , 1994. [8] B. V. Gnedenko and V. Y. Korolev. R ando m Summation: Limit The or ems and Applic a- tions . CRC Press, Bo ca Raton, Florida, 1996. [9] P . Harremo¨ es. Binomial and Po isson distributions as maxim um entrop y d istr ibutions. IEEE T r ans. Infor mation The ory , 47(5):203 9–2041, 20 01. [10] P . Harr emo ¨ es, O. T. Johns on , and I. Kont o yiannis. Thin ning and th e La w of S mall Num b ers. In P r o c e e dings of ISIT 2007, 24th - 29th June 2007, Nic e , pages 1491–1 495, 2007. [11] O. T. Johnson . Informatio n the ory and the Centr al Limit The or em . Imp erial College Press, London, 2004. [12] O. T. Johnson. Log-conca vit y and the maxim um entrop y pr op erty of the P oisson distri- bution. Sto ch. Pr o c. Appl. , 117(6):791 –802, 2007 . [13] O. T. Johnson and C . A. Goldsc hmidt. Preserv atio n of log-conca vit y on summation. ESAIM Pr ob ability and Statistics , 10:206–215 , 200 6. [14] I. Johnstone and B. MacGibb on. Une mesure d’information caract ´ erisan t la loi de Poisson. In S´ eminair e de Pr ob abilit´ es, XXI , pages 563–573. Sp ringer, Berlin, 1987. [15] S . Karlin. T otal p ositivity. Vol. I . S tanford Universit y Press, Stanford , Calif, 1968. [16] J. Keilson an d H. Gerb er. Some results for discrete un imo dalit y . Journal of the Americ an Statistic al Asso ciation , 66(334):38 6–389, 197 1. [17] J. K eilson and U. Su mita. Uniform sto c hastic ordering and related inequalities. Canad. J. Statist. , 10(3):18 1–198, 1982. [18] I. K onto yiannis, P . Harremo¨ es, and O. T. Johnson. Entrop y and the la w of small num b ers. IEEE T r ans. Infor m. The ory , 51(2):466– 472, 200 5. [19] Y. Linn ik. An information-theoretic p r o of of the Cen tral Limit Theorem with the Linde- b erg Cond ition. The o ry P r ob ab. Appl. , 4:288–299 , 1959 . [20] M. Madiman and A. Barron. Generalized en trop y p o we r in equalities and monotonicit y prop erties of inform ation. IEE E T r ans. Inform . The ory , 53(7):23 17–2329, 2007. [21] P . Mateev. The en trop y of the m ultinomial distribution. T e or. V er ojatn ost. i Primenen. , 23(1): 196–198, 1978. 23 [22] R. P eman tle. T o wards a theory of negativ e d ep enden ce. J. Math. Phys. , 41(3):137 1–1390, 2000. [23] L. A. Shepp and I. Olkin. Entrop y of the sum of ind ep end ent Bernoulli ran d om v ari- ables and of the multinomial distribution. In Contributions to pr ob ability , p ages 201 –206. Academic Press, New Y ork, 1981. [24] F. W. Steutel and K. v an Harn. Discrete analogues of self-decomp osabilit y and stabilit y . Ann. Pr ob ab. , 7(5):893–8 99, 1979 . [25] A. T ulino and S. V erd ´ u. Monotonic d ecrease of the n on-Gaussianness of the sum of indep en d en t random v ariables: a simple pro of. IEEE T r ans. Inform. The or y , 52(9):429 5– 4297, 2006. [26] Y. W ang and Y.-N. Y eh. Log- conca vit y and LC -p ositivit y . J. Combin. The ory Ser. A , 114(2 ):195–210 , 2007. 24

On the entropy and log-concavity of compound Poisson measures

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment