Dual $phi$-divergences estimation in normal models

DUAL φ -DIVER GENCES ESTIMA TION IN NORMAL MODELS MOHAMED CHERFI Abstract. A class of robust estimators whic h are obtained from dual represen tation of φ -divergences, are studied empirically for the normal location mo del. Mem b ers of this class of estimators are compared, and it is found that they are eﬃcient at the true mo del and oﬀer an attractiv e alternativ e to the maximum likeli- ho o d, in term of robustness . Key w ords and phrases : Minim um div ergence estimators; Eﬃciency; Robustness; M -estimators; Inﬂuence function. 1. Introduction Div ergences of probabilit y distributions are widely used in a n umber of theoretical and applied statistical inference, they are also of k ey imp ortance in data pro cessing problems, see Basseville ( 2010 ). The φ - div ergence modeling has pro v ed to be a ﬂexible and pro vided a p o w erful statistical mo deling framework in a v ariet y of applied and theoretical con texts, see Liese and V a jda ( 2006 , 1987 ), P ardo ( 20 06 ), Broniatowski and Keziou ( 2009 ) and the recent monograph b y Basu et al. ( 2011 ). Recen tly in tro duced, the minim um div ergences estimation method based on a dual representation of the divergence b et w een probabilit y mea- sures, is an app ealing estimation metho d, it av oids the use of non- parametric densit y estimation and the complications related to the bandwidth selection. The estimators are deﬁned in an uniﬁed wa y for b oth con tin uous and discrete mo dels. They do not require any prior smo othing and include the classical maxim um likelihoo d estimators as a b enc hmark. These estimators called “dual φ -divergence estimators” (D φ DE’s), was shown by Keziou ( 2003 ) and Broniato wski and Keziou ( 2009 ), under suitable conditions, to b e consistent, asymptotically nor- mal and asymptotically full eﬃcient at the true mo del. 1 2 MOHAMED CHERFI Application of dual represen tation of φ -div ergences hav e b een con- sidered by many authors, w e cite among others, Keziou and Leoni- Aubin ( 2008 ) for semi-parametric t wo-sample densit y ratio mo dels, ro- bust tests based on saddlep oint appro ximations in T oma and Leoni- Aubin ( 2010 ), b o otstrapp ed φ -divergences estimates are considered in Bouzeb da and Cherﬁ ( 2011 ), extension of dual φ -div ergences estimators to right censored data are introduced in Cherﬁ ( 2011 ), for estimation and tests in copula models w e refer to Bouzeb da and Keziou ( 2010 ) and the references therein. The φ -divergences estimators are motiv ated b y the fact that a suitable c hoice of the div ergence ma y lead to an estimate more robust than the MLE one, see Jim ´ enez and Shao ( 2001 ). T oma and Broniato wski ( 2010 ) studied the robustness of the D φ DE’s through the inﬂuence function approac h. How ev er the fact that the D φ DE’s hav e unbounded inﬂuence functions in the normal lo cation model, causes some concern that these estimators are non-robust. In this article w e show that the  -inﬂuence functions are more suitable to capture the robustness prop erties of the D φ DE’s. W e also giv e some insights ab out the choice of the escort parameter. Sim ulation results indicate that the D φ DE’s attain the dual goal of robustness and eﬃciency , they are comp etitiv e with Hub er M -estimator without the loss of eﬃciency at the true mo del. The rest of the pap er is organized as follows. In Section 2 w e pro- vide bac kground material concerning the dual φ -div ergences estimators. Section 3 is dev oted to the c hoice of the escort parameter. Section 4 deals with robustness. In section 5 , w e illustrate the p erformance of the metho d in real data example. A sim ulation study describ ed in Section 6 in v estigates the asymptotic prop erties of the estimators. Section 7 con tains some concluding remarks. 2. Back gr ound and estima tors definition The class of dual divergences estimators has b een recen tly in tro duced b y Keziou ( 2003 ), Broniato wski and Keziou ( 2009 ). In the follo wing, w e shortly recall their context and deﬁnition. Recall that the φ -div ergence b etw een a bounded signed measure Q and a probability P on D , when Q is absolutely contin uous with resp ect to P , is deﬁned by D φ ( Q, P ) := Z D φ  d Q d P ( x )  d P ( x ) , where φ is a conv ex function from ] − ∞ , ∞ [ to [0 , ∞ ] with φ (1) = 0. W ell-known examples of div ergences are the Kullbac k-Leibler associ- ated to the function φ ( x ) = x log x − x + 1, modiﬁed Kullback-Leibler for DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 3 φ ( x ) = − log x + x − 1, the χ 2 -div ergence with φ ( x ) = 1 2 ( x − 1) 2 and the Hellinger distance giv en b y φ ( x ) = 2( √ x − 1) 2 . All these divergences b elong to the class of the so called “p ow er divergences” in tro duced in Cressie and Read ( 1984 ) (see also Liese and V a jda ( 1987 ) chapter 2). They are deﬁned through the class of con v ex functions x ∈ ]0 , + ∞ [ 7→ φ γ ( x ) := x γ − γ x + γ − 1 γ ( γ − 1) (1) if γ ∈ R \ { 0 , 1 } , φ 0 ( x ) := − log x + x − 1 and φ 1 ( x ) := x log x − x + 1. (F or all γ ∈ R , w e deﬁne φ γ (0) := lim x ↓ 0 φ γ ( x )). So, the K L -divergence is asso ciated to φ 1 , the K L m to φ 0 , the χ 2 to φ 2 and the Hellinger distance to φ 1 / 2 . W e refer to Liese and V a jda ( 1987 ) for an ov erview on the origin of the concept of div ergences in statistics. The reader in terested in other class of φ -divergences can refer to the recent pap er of K ˚ us et al. ( 2008 ), whic h prop ose a simple metho d of construction of new families of φ -divergences. Let X 1 , . . . , X n b e an i.i.d. sample with p.m. P θ 0 . Consider the prob- lem of estimating the p opulation parameters of interest θ 0 , when the underlying identiﬁable mo del is giv en by P = { P θ : θ ∈ Θ } with Θ a subset of R d . W e denote λ a dominating measure for P and the resulting densit y of any P θ is denoted p θ . Let φ b e a function of class C 2 , strictly conv ex and satisﬁes Z     φ 0  p θ ( x ) p α ( x )      d P θ ( x ) < ∞ . (2) By Lemma 3.2 in Broniatowski and Keziou ( 2006 ), if the function φ satisﬁes: There exists 0 < η < 1 such that for all c in [1 − η , 1 + η ], we can ﬁnd num b ers c 1 , c 2 , c 3 suc h that φ ( cx ) ≤ c 1 φ ( x ) + c 2 | x | + c 3 , for all real x, (3) then the assumption ( 2 ) is satisﬁed whenev er D φ ( P θ , P α ) is ﬁnite. F rom no w on, w e supp ose that there exists a neighborho o d U of θ 0 for which D φ ( P θ , P α ) < ∞ whatever θ and α in U . Note that all the real conv ex functions φ γ p ertaining to the class of p ow er divergences deﬁned in ( 1 ) satisfy the condition ( 3 ). Under the ab ov e conditions, the φ -divergence: D φ ( P θ , P θ 0 ) = Z φ  p θ p θ 0  d P θ 0 , can b e represented as the following form: D φ ( P θ , P θ 0 ) = sup α ∈U Z h ( θ , α ) d P θ 0 , (4) 4 MOHAMED CHERFI where h ( θ , α ) : x 7→ h ( θ , α, x ) and h ( θ , α, x ) := Z φ 0  p θ p α  d P θ −  p θ ( x ) p α ( x ) φ 0  p θ ( x ) p α ( x )  − φ  p θ ( x ) p α ( x )  . (5) Since the suprem um in ( 4 ) is unique and is attained in α = θ 0 , in- dep enden tly up on the v alue of θ , deﬁne the class of estimators of θ 0 b y b α φ ( θ ) := arg sup α ∈U Z h ( θ , α )d P n , θ ∈ Θ , (6) where h ( θ , α ) is the function deﬁned in ( 5 ). This class is called “dual φ -div ergences estimators” (D φ DE’s). The corresp onding estimating equation for the unknown parameter is then giv en by Z ∂ ∂ α h ( θ , α ) d P n = 0 . (7) Remark that the maxim um lik eliho o d estimate b elongs to the class of estimates ( 6 ). Indeed, it is obtained when φ ( x ) = − log x + x − 1, that is as the dual mo diﬁed K L -div ergence estimate. Observe that Z h ( θ , α )d P n = − Z log  p θ p α  d P n . Hence k eeping in mind deﬁnitions ( 6 ), w e get b α K L m ( θ ) = arg sup α ∈ Θ − Z log  p θ p α  d P n = arg sup α ∈ Θ Z log( p α )d P n = M LE , indep enden tly up on θ F ormula ( 4 ) deﬁnes a family of M -estimators indexed by some instru- men tal v alue of the parameter θ and b y the function φ deﬁning the div ergence. In the sequel w e call θ the “escort parameter”, the choice of θ app ears as a ma jor feature in the estimation pro cedure, see Section 3 b elow. Recall that an M -estimator of ψ -type is the solution of the vector equation: Z ψ ( x ; α ) d P n = 0 , (8) where the elements of ψ ( x ; α ) represent the partial deriv ativ es of h ( θ , α, x ) with resp ect to the comp onen ts of α . F or more details about M - estimators we may refer to Hub er ( 1981 ), Hampel et al. ( 1986 ), Maronna et al. ( 2006 ) and the references therein. DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 5 W e apply the dual represen tation of φ -divergences ( 4 ), which w e sp e- cialize to the present setting. Consider now, the normal mo del with densit y: p θ,σ ( x ) = 1 σ √ 2 π exp ( − 1 2  x − θ σ  2 ) , (9) and the p ow er divergences family ( 1 ). Observ e that, for γ ∈ R \ { 0 , 1 } , 1 γ − 1 Z  p θ,σ ( x ) p α, e σ ( x )  γ − 1 p θ,σ ( x ) d x = 1 γ − 1 e σ γ σ − ( γ − 1) p γ e σ 2 − ( γ − 1) σ 2 exp  γ ( γ − 1)( θ − α ) 2 2( γ e σ 2 − ( γ − 1) σ 2 )  . Hence, D γ ( P θ,σ , P n ) = sup α, e σ ( 1 γ − 1 e σ γ σ − ( γ − 1) p γ e σ 2 − ( γ − 1) σ 2 exp  γ ( γ − 1)( θ − α ) 2 2( γ e σ 2 − ( γ − 1) σ 2 )  − 1 γ n n X i =1  e σ σ  γ exp ( − γ 2  X i − θ σ  2 −  X i − α e σ  2 !) − 1 γ ( γ − 1) ) . F or γ = 0, D K L m ( P θ,σ , P n ) = sup α, e σ ( 1 2 n n X i =1  X i − θ σ  2 −  X i − α e σ  2 ! − log  e σ σ  ) . F or γ = 1, D K L ( P θ,σ , P n ) = sup α, e σ ( 1 2 1 −  e σ σ  2 −  θ − α e σ  2 ! − log  e σ σ  − 1 n n X i =1  e σ σ  exp ( − 1 2  X i − θ σ  2 −  X i − α e σ  2 !) + 1 ) . F or the normal family P θ ≡ N ( θ , 1), with the lo cation parameter θ and scale σ = 1. It follows that, for γ ∈ R \ { 0 , 1 } , D γ ( P θ , P n ) := sup α Z h ( θ , α ) d P n = sup α  1 γ − 1 exp  γ ( γ − 1)( θ − α ) 2 2  − 1 γ n n X i =1 exp n − γ 2 ( θ − α )( θ + α − 2 X i ) o − 1 γ ( γ − 1) ) . 6 MOHAMED CHERFI F or γ = 0, D K L m ( P θ , P n ) := sup α Z h ( θ , α ) d P n = sup α ( 1 2 n n X i =1 ( θ − α ) ( θ + α − 2 X i ) ) . F or γ = 1, D K L ( P θ , P n ) := sup α Z h ( θ , α ) d P n = sup α ( − 1 2 ( θ − α ) 2 − 1 n n X i =1 exp  − 1 2 ( θ − α ) ( θ + α − 2 X i )  + 1 ) . W e remark that the abov e op timizations are a feasible computationally closed-form expressions and can b e p erformed b y an y standard non linear optimization co de. 3. On the choice of the escor t p arameter The v ery p eculiar c hoice of the escort parameter deﬁned through θ = θ 0 has same limit prop erties as the MLE. The D φ DE b α φ ( θ 0 ), in this case, has v ariance which indeed coincides with the MLE, see for instance ( Keziou , 2003 , Theorem 2.2, (1) (b)). This result is of some relev ance, since it lea v es op en the c hoice of the div ergence, while k eeping go o d as- ymptotic prop erties. F or data generated from the distribution N (0 , 1), Figure 1 shows that the global maximum of the empirical criterion P n h  b θ n , α  is zero, indep endently of the v alue of the escort parameter b θ n (the sample mean X in Figure 1 (a) and the median in Figure 1 (b)) for all the considered divergences. Unlik e the case of data without contamination, the choice of the escort parameter is crucial in the estimation metho d in the presence of out- liers. W e plot in Figure 2 the empirical criterion P n h  b θ n , α  , where the data are generated from the distribution (1 −  ) N ( θ 0 , 1) + δ 10 , where  = 0 . 1 and θ 0 = 0. Figure 2 (a) illustrates the empirical criterion under con tamination, when we tak e the empirical mean , b θ n = X , as the v alue of the escort parameter θ , it shows ho w the global maximum of the empirical criterion P n h  b θ n , α  shifts from zero to the contamination p oin t. In Figure 2 (b), the c hoice of the median as escort parameter v alue leads to the p osition of the global maximum remains close to DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 7 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 (a) α P n h ( θ n , α ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ ^ n = -0.05654117 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 (b) α P n h ( θ n , α ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ ^ n = -0.1898001 Figure 1. Criterion for the normal lo cation mo del. -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 (a) α P n h ( θ n , α ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ ^ n = 0.9251538 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 -2 -1 0 1 2 -4 -3 -2 -1 0 1 (b) α P n h ( θ n , α ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ ^ n = 0.163952 Figure 2. Criterion for the normal lo cation mo del un- der con tamination. 8 MOHAMED CHERFI α = 0 for Hellinger ( γ = 0 . 5), χ 2 ( γ = 2) and K L -div ergence ( γ = 1), while the criterion asso ciated to the K L m -div ergence ( γ = 0, the maxim um is the MLE) stills aﬀected b y the presence of outliers. In practice, the consequence is that the escort parameter should b e c hosen as a robust estimator of θ 0 , sa y b θ n . Observ e that for the pow er div ergences family ( 1 ), the estimating equa- tion ( 7 ) reduces to − Z  p θ ( x ) p α ( x )  γ − 1 ˙ p α ( x ) p α ( x ) p θ ( x ) d x + 1 n n X i =1  p θ ( X i ) p α ( X i )  γ ˙ p α ( X i ) p α ( X i ) = 0 , (10) and the estimate b α φ ( θ ) is the solution in α of ( 10 ). An impro v ement of the present estimate results in the plugging of a preliminary robust estimate of θ 0 , sa y b θ n , as an adaptive escort parameter θ c hoice. 0 2 4 6 8 10 0 10 20 30 40 50 0 2 4 6 8 10 0 10 20 30 40 50 x p θ n ( x ) p θ 0 ( x ) Mean Median Figure 3. Behaviour of the ratio p b θ n ( x ) p θ 0 ( x ) under con- tamination, for a randomly generated Normal sample N ( θ 0 , 1) of size 100 with 10% of con tamination by 10. Let x b e some outlier, the role of the outlier x in ( 10 ) app ears in the term  p b θ n ( x ) p α ( x )  γ ˙ p α ( x ) p α ( x ) . (11) The estimate b α φ ( θ ) is robust if this term is stable. That is, if it is small when α is near θ 0 . If the escort parameter b θ n is not a robust estimator, DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 9 the ratio p b θ n ( x ) p θ 0 ( x ) can b e very large, see Figure 3 . This is due to the fact that the outlier x will b e more lik ely under P b θ n , that is b θ n will lead to an o ver ev aluation of p b θ n ( x ) with resp ect to the exp ected v alue under θ 0 , sa y p θ 0 ( x ). T o guard against such situations, comp ensate through the c hoice of γ , this requires further inv estigations. 4. R obustness One metho d of assessing the robustness of an estimator is by consid- ering its inﬂuence function (IF), with the usual in terpretation b eing that a robust estimator will hav e a b ounded inﬂuence function. How- ev er this requiremen t mak es the corresp onding estimator deﬁcient at the mo del in relation to the maxim um lik eliho o d estimator which hav e un b ounded inﬂuence function for most common mo dels. As w e will see in the follo wing, our class of dual φ -div ergences esti- mators hav e strong robustness features in spite of ha ving the same inﬂuence function as the maxim um likelihoo d estimator. On the other hand, b eing equal to the inﬂuence function of the MLE, the inﬂuence function of the D φ DE’s is p oten tially unbounded. Th us the robustness of the D φ DE’s cannot be described through the traditional b ounded inﬂuence approac h. Hamp el ( 1974 ) claims that the use of the  -IF (b efore the limit) is preferable to the use of the inﬂuence function to assess estimator robustness. The limiting form is often used b ecause it is usually easier to ev aluate, and it do es not dep end on  . Beran ( 1977 ) claims that for ev aluating the robustness of a functional with resp ect to a gross-error mo del, one should consider the  -inﬂuence function in- stead of the inﬂuence function, unless the former con v erges to the latter uniformly . F or more details, see also Section 4.7 of Basu et al. ( 2011 ). In this section we present the  -inﬂuence function technique and show that there is no in trinsic conﬂict b etw een the robustness of our estima- tors and optimal mo del eﬃciency . The statistical functional asso ciated with the estimate b α φ ( θ ) is given b y T θ ( P θ 0 ) = arg sup α ∈ Θ Z h ( θ , α ) d P θ 0 , (12) whic h is Fisher consisten t, namely T θ ( P α ) = α for all α ∈ Θ, keeping in mind ( 6 ). The inﬂuence function of the functional T θ is deﬁned as IF( x ; T , P θ 0 ) = lim  → 0 T θ ((1 −  ) P θ 0 + δ x ) − T θ ( P θ 0 )  , 10 MOHAMED CHERFI in which δ x is the Dirac measure at p oint x . The  -inﬂuence function is the quotient  − IF( x ) = T θ ((1 −  ) P θ 0 + δ x ) − θ 0  . Using existing theory on M -estimators, see Hamp el et al. ( 1986 ), p. 230, see also Prop osition 1 in T oma and Broniato wski ( 2010 ), the in- ﬂuence function of the functional T θ corresp onding to an estimator b α φ ( θ ) is given by IF( x ; T θ , P θ 0 ) := − S − 1  Z φ 00  p θ p θ 0  p θ p 2 θ 0 ˙ p θ 0 d P θ − φ 00  p θ ( x ) p θ 0 ( x )  p 2 θ ( x ) p 3 θ 0 ( x ) ˙ p θ 0 ( x )  . (13) When θ = θ 0 , it reduces to the inﬂuence function of the MLE given by IF( x ; T θ 0 , P θ 0 ) = I − 1 θ 0 ˙ p θ 0 ( x ) p θ 0 ( x ) , where I θ 0 is the information matrix deﬁned by I θ 0 = Z ˙ p θ 0 ˙ p > θ 0 p θ 0 d λ. Using ( 13 ), the inﬂuence function for the normal lo cation mo del is, for γ ∈ R \ { 0 , 1 } , IF( x ; T θ , P θ 0 ) := ( x − θ 0 ) exp  − γ 2 ( θ − θ 0 )( θ + θ 0 − 2 x )  + γ ( θ 0 − θ ) exp n γ ( γ − 1)( θ − θ 0 ) 2 2 o (1 + γ 2 ( θ − θ 0 ) 2 ) exp n γ ( γ − 1)( θ − θ 0 ) 2 2 o . F or γ = 0 ( K L m ), IF( x ; T θ , P θ 0 ) := x − θ 0 . F or γ = 1 ( K L ), IF( x ; T θ , P θ 0 ) := ( x − θ 0 ) exp  − 1 2 ( θ − θ 0 )( θ + θ 0 − 2 x )  + ( θ 0 − θ ) 1 + ( θ − θ 0 ) 2 . Remark that when θ = θ 0 , the inﬂuence functions of T θ coincide for all γ and are equal to the inﬂuence function of the MLE, IF( x ; MLE , P θ 0 ) := x − θ 0 . Figure 4 , presen ts the inﬂuence functions of the dual φ -div ergences estimators in the normal lo cation mo del, with θ = 0 . 1 and θ = 0 . 5. W e can see that the inﬂuence functions are unbounded. Th us, the D φ DE’s are examples of estimators for which the limiting form of their inﬂuence DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 11 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 (a) x IF(x) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ = 0.1 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 -3 -2 -1 0 1 2 3 4 -4 -2 0 2 4 (b) x IF(x) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 θ = 0.5 Figure 4. Inﬂuence functions of the D φ DE’s for the normal lo cation mo del. functions do es not reliably pro vide information ab out the form of the  -IF’s. Empirical  -IF’s were calculated for  ’s= 0 . 1 , 0 . 2. This was done by generating a sample size of 100 for Normal distribution N ( θ 0 , 1) with θ 0 = 1. After the data set was sorted, the largest n v alues were then iterativ ely reassigned along a grid of v alues ranging from 0 . 01 to 25. A t eac h grid v alue the D φ DE’s were computed. This w as rep eated 1000 times. The mean of the D φ DE’s at each grid were plotted against the grid. Below are the plots of the empirical av erage v alue of the D φ DE’s, see Figures 5 and 6 . Figure 5 (a) illustrates the empirical  -IF, when we tak e the mean , b θ n = X , as the v alue of the escort parameter θ , it sho ws that the D φ DE’s b α φ ( θ ) hav e the same empirical  -IF as the MLE. In Figure 5 (b), the choice of the median as escort parameter v alue impro v e con- siderably the robustness prop erties of our estimators. The D φ DE’s, with robust escort parameter p erform v ery well, outliers tend hav e less inﬂuence on the D φ DE’s. Clearly , we can see from Figure 6 that the MLE is greatly aﬀected by the v alue of the outlier. 12 MOHAMED CHERFI 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 (a) x α n ( θ ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 (b) x α n ( θ ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 Figure 5. Empirical  -inﬂuence functions,  = 0 . 1. 5. A Real da t a example This example inv olv es Newcom bs light sp eed data ( Stigler ( 1977 ), T a- ble 5). The data w ere also analysed by Brown and Hwang ( 1993 ), who w ere trying to ﬁt the b est approximating normal distribution to the corresp onding histogram. The data set sho ws a nice unimo dal struc- ture, and the normal mo del would ha v e provided an excellent ﬁt to the data except for the tw o large outliers. F or the dataset, T able 1 gives the v alues of the D φ DE’s. These estima- tors exhibit strong outlier resistance prop erties. The histogram, and the normal ﬁts using the maxim um likelihoo d estimate and the D φ DE’s are presen ted in Figure 7 , all the normal densities ﬁt the main b o dy of the histogram quite well, except the MLE. Note that the v alues of the escort parameters considered in this example are the median for θ and the mad for σ . 6. Simula tion In this section, we present results of a simulation study whic h was conducted to explore the prop erties of the D φ DE’s. DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 13 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 0 5 10 15 20 25 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 x α n ( θ ) γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 Figure 6. Empirical  -inﬂuence functions,  = 0 . 2. γ 0 0.5 1 2 b α 26.21 27.67 27.00 27.64 b e σ 10.66 5.16 4.47 4.84 T able 1. Estimated parameters for the Newcom b data under the normal mo del. A sample is generated from N ( θ 0 , 1) with θ 0 = 0. The D φ DE’s are calculated for samples of sizes 25 , 50 , 75 , 100 , 200 and the hole pro- cedure is rep eated 1000 times. In the examination of the robustness of the D φ DE’s, the data are generated from the distribution (1 −  ) N ( θ 0 , 1) + δ 10 , where  = 0 . 05 , 0 . 1 , 0 . 2. The v alue of escort parameter θ is tak en to b e the median. 14 MOHAMED CHERFI A histogram of the Newcomb data Class intervals Density -40 -20 0 20 40 60 0.00 0.02 0.04 0.06 0.08 0.10 γ = 0 ( M L E ) γ = 0.5 γ = 1 γ = 2 Figure 7. A histogram of the Newcom b data with nor- mal ﬁts. The v alues of γ are chosen to b e 0 , 0 . 5 , 1 , 2 which corresp ond to the w ell known standard divergences: K L m − div ergence, the Hellinger distance, K L and the χ 2 − div ergence resp ectively . These estimators are also compared with some other metho ds, including maxim um lik eliho o d estimator (MLE) and Hub er estimator (see Hub er ( 1964 )) using the ψ - function ψ ( t ) =  t if | t | < τ τ sgn( t ) if | t | ≥ τ for some constan t τ > 0. A v alue of 1 . 4 was used for τ , for lo cation estimation which is in the range of v alues shown to p erform well in the Princeton Robustness Study (see Andrews et al. ( 1972 )). W e carried out this analysis within the R Language R Dev elopmen t Core T eam ( 2009 ). Under the true mo del, from T able 2 , as exp ected, the MLE pro duces most eﬃcient estimators in this case. The D φ DE’s seem to b e a go o d comp etitors to the MLE in terms of MSE. Recall that, theoretically , the DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 15 T able 2. MSE of the estimates under the true mo del. n 25 50 75 100 150 200 γ 0 0.0386 0.0187 0.0129 0.0096 0.0064 0.0055 0.5 0.0386 0.0187 0.0129 0.0096 0.0064 0.0055 1 0.0386 0.0187 0.0129 0.0096 0.0064 0.0055 2 0.0440 0.0190 0.0129 0.0096 0.0064 0.0055 M -Estimator 0.0400 0.0199 0.0136 0.0100 0.0067 0.0057 T able 3. MSE of the estimates under 10% of contamination n 25 50 75 100 150 200 γ 0 1.3999 1.1696 1.1041 1.0941 1.0505 1.0360 0.5 0.2004 0.1610 0.1509 0.1475 0.1447 0.1402 1 0.1280 0.0874 0.0780 0.0726 0.0689 0.0649 2 0.1393 0.0843 0.0731 0.0680 0.0638 0.0598 M -Estimator 0.2460 0.0902 0.0724 0.0646 0.0593 0.0540 T able 4. MSE of the estimates under 20% of contamination n 25 50 75 100 150 200 γ 0 4.6936 4.3133 4.1415 4.1344 4.1282 4.0724 0.5 0.4935 0.4351 0.3947 0.3802 0.3812 0.3701 1 0.3253 0.2673 0.2325 0.2178 0.2163 0.2067 2 0.3092 0.2433 0.2072 0.1929 0.1906 0.1812 M -Estimator 2.6171 1.5906 1.1710 0.9353 0.7837 0.6220 D φ DE’s are asymptotically eﬃcien t. It is interesting to note that the asymptotics app ear to take eﬀect for a sample sizes of 75 or more but the results for sample sizes b elow 75 are also go o d, th us the D φ DE’s p erform as well as the MLE at the true mo del. W e now turn to the comparison of these v arious estimators under con- tamination. W e can see from T able 3 that for small sample sizes the D φ DE’s yield clearly the most robust estimates . As n increases, the M -Estimator obtain slightly higher p erformance. F or high amount of con tamination, we can see from T ables 4 and 5 that the D φ DE with γ = 2 has the smallest MSE o v er all other D φ DE’s and outp erform the 16 MOHAMED CHERFI T able 5. MSE of the estimates under 25% of contamination n 25 50 75 100 150 200 γ 0 7.1318 6.6138 6.5256 6.3594 6.3376 6.3436 0.5 1.1670 0.6124 0.5920 0.5563 0.5596 0.5533 1 0.9946 0.3963 0.3754 0.3442 0.3454 0.3383 2 0.9921 0.3561 0.3347 0.3047 0.3051 0.2980 M -Estimator 5.7497 4.3835 3.9891 3.7579 3.5211 3.3328 MLE substan tially and the p erformance of the M -Estimator decreases dramatically . 7. Conclusions The aim of this pap er was to inv estigate the new estimation pro cedure based on the dual representation of φ -div ergences for the univ ariate normal lo cation mo del. The estimators are easily computed and exhibit appropriate asymptotic b ehavior. First, w e ev aluated the impact of the c hoice of the escort parameter on the estimates and pro vided a practical c hoice of the escort parameter. Second, w e hav e sho wn that there is no in trinsic conﬂict b et w een the robustness of our estimators and optimal mo del eﬃciency , b y considering empirical  -inﬂuence functions. The sim ulation results presented here provide solid evidence that the D φ DE’s in the normal lo cation setting, obtain full eﬃciency at the true mo del while keeping go o d p erformances as robust estimators, they demonstrated unexp ected empirical robustness for high amounts of contamination. Thus they pro vide go o d alternativ e to maximum lik eliho o d. In this pap er w e ha v e limited ourself to the D φ DE’s asso ciated to the standard divergences, whic h are widely used in statistical inference. It will b e in teresting to in v estigate theoretically the problem of the c hoice of the divergence which leads to an “ optimal ” estimate in terms of eﬃciency and robustness, w e leav e this study op en for future researc h. References Andrews, D. F., Bick el, P . J., Hamp el, F. R., Hub er, P . J., Rogers, W. H., and T uk ey , J. W. (1972). R obust estimates of lo c ation: Survey and advanc es . Princeton Univ ersit y Press, Princeton, N.J. Basseville, M. (2010). Div ergence measures for statistical data pro cess- ing. DUAL φ -DIVERGENCES ESTIMA TION IN NORMAL MODELS 17 Basu, A., Shioy a, H., and Park, C. (2011). Statistic al Infer enc e: The Minimum Distanc e Appr o ach , v olume 120 of Mono gr aphs on Sta- tistics & Applie d Pr ob ability . Chapman & Hall/CR C, Bo ca Raton, FL. Beran, R. (1977). Minimum Hellinger distance estimates for parametric mo dels. A nn. Statist. , 5 (3), 445–463. Bouzeb da, S. and Cherﬁ, M. (2011). General bo otstrap for dual φ - div ergences estimates. Arxiv pr eprint arXiv:1106.2246 . Bouzeb da, S. and Keziou, A. (2010). Estimation and tests of indep en- dence in copula mo dels via divergences. Kyb ernetika , 46 (1), 178–201. Broniato wski, M. and Keziou, A. (2006). Minimization of φ -div ergences on sets of signed measures. Studia Sci. Math. Hungar. , 43 (4), 403– 442. Broniato wski, M. and Keziou, A. (2009). Parametric estimation and tests through div ergences and the dualit y technique. J. Multivariate A nal. , 100 (1), 16–36. Bro wn, L. D. and Hwang, J. T. G. (1993). Ho w to approximate a histogram b y a normal density . A mer. Statist. , 47 (4), 251–255. Cherﬁ, M. (2011). Dual div ergences estimation for censored surviv al data. A rxiv pr eprint arXiv:1106.2627 . Cressie, N. and Read, T. R. C. (1984). Multinomial go o dness-of-ﬁt tests. J. R oy. Statist. So c. Ser. B , 46 (3), 440–464. Hamp el, F. R. (1974). The inﬂuence curv e and its role in robust esti- mation. J. A mer. Statist. Asso c. , 69 , 383–393. Hamp el, F. R., Ronchetti, E. M., Rousseeuw, P . J., and Stahel, W. A. (1986). R obust statistics . Wiley Series in Probabilit y and Mathemat- ical Statistics: Probabilit y and Mathematical Statistics. John Wiley & Sons Inc., New Y ork. The approac h based on inﬂuence functions. Hub er, P . J. (1964). Robust estimation of a lo cation parameter. A nn. Math. Statist. , 35 , 73–101. Hub er, P . J. (1981). R obust statistics . John Wiley & Sons Inc., New Y ork. Wiley Series in Probabilit y and Mathematical Statistics. Jim ´ enez, R. and Shao, Y. (2001). On robustness and eﬃciency of minim um divergence estimators. T est , 10 (2), 241–248. Keziou, A. (2003). Dual representation of φ -divergences and applica- tions. C. R. Math. A c ad. Sci. Paris , 336 (10), 857–862. Keziou, A. and Leoni-Aubin, S. (2008). On empirical lik eliho o d for semiparametric tw o-sample densit y ratio mo dels. J. Statist. Plann. Infer enc e , 138 (4), 915–928. K ˚ us, V., Morales, D., and V a jda, I. (2008). Extensions of the paramet- ric families of divergences used in statistical inference. Kyb ernetika (Pr ague) , 44 (1), 95–112. 18 MOHAMED CHERFI Liese, F. and V a jda, I. (1987). Convex statistic al distanc es , v olume 95 of T eubner-T exte zur Mathematik [T eubner T exts in Mathematics] . BSB B. G. T eubner V erlagsgesellsc haft, Leipzig. With German, F rench and Russian summaries. Liese, F. and V a jda, I. (2006). On divergences and informations in sta- tistics and information theory . IEEE T r ans. Inform. The ory , 52 (10), 4394–4412. Maronna, R. A., Martin, R. D., and Y ohai, V. J. (2006). R obust statis- tics . Wiley Series in Probabilit y and Statistics. John Wiley & Sons Ltd., Chic hester. Theory and metho ds. P ardo, L. (2006). Statistic al infer enc e b ase d on diver genc e me asur es , v olume 185 of Statistics: T extb o oks and Mono gr aphs . Chapman & Hall/CR C, Bo ca Raton, FL. R Developmen t Core T eam (2009). R: A L anguage and Envir onment for Statistic al Computing . R F oundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Stigler, S. M. (1977). Do robust estimators work with real data? A nn. Statist. , 5 (6), 1055–1098. With discussion and a reply b y the author. T oma, A. and Broniatowski, M. (2010). Dual divergence estimators and tests: robustness results. Journal of Multivariate Analysis , 102 (1), 20–36. T oma, A. and Leoni-Aubin, S. (2010). Robust tests based on dual div ergence estimators and saddlep oint approximations. Journal of Multivariate A nalysis , 101 (5), 1143–1155. Labora toire de St a tistique Th ´ eorique et Appliqu ´ ee (LST A), Equipe d’A ccueil 3124, Universit ´ e Pierre et Marie Curie – P aris 6, Tour 15- 25, 2 ` eme ´ et age, 4 place Jussieu, 75252 P aris cedex 05 ; e-mail adresse : mohamed.cherfi@gmail.com

Dual $phi$-divergences estimation in normal models

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment