A New Approach of Point Estimation from Truncated or Grouped and Censored Data

We propose a new approach for estimating the parameters of a probability distribution. It consists on combining two new methods of estimation. The first is based on the definition of a new distance measuring the difference between variations of two d…

Authors: Ahmed Guellil (USTHB), Tewfik Kernane (USTHB)

A N ew A ppr o a ch of P oint E stima tion fr om T r unca ted or Gr ouped a nd Censored D a t a A hmed G uellil 1 and T ewfik K e rnane 2 1 Dep artment of Pr ob ability and Statistics, F aculty of Ma thematics University of Scienc es and T e chnolo gy USTHB, BP 32 El-A l i a , A lgeria 2 Dep artment of Mathematics, F ac ulty of Scienc e King Khalid University A bha, Kingdom of Sau d i A r abia e-mail : guellila med@yaho o.fr, tk er nane@gma il.co m Abstract W e prop ose a new approac h for estimating the parameters of a probabilit y distribution. It consists on com bining t w o n ew metho ds of estimation. The fi r st is based on the definition of a n ew distance measur ing the difference b etw een v ariations of t wo distributions o n a finite n umb er of p oints from their supp ort and on using this measure for estimation p urp oses by the metho d of minim um d istance. F or the second method , giv en an empirical discrete d istribution, we bu ild u p an auxiliary d iscr ete theoretical distr ibution having the same supp ort of th e first and dep end in g on the same parameters of the paren t distribution of th e data from whic h the empir ical distribu tion emanated. W e estimate then the parameters from the empirical distribution b y the usual statistical m etho ds. In practice, we prop ose to compute the t w o estimations, the second based on maximum lik elihoo d principle of kno wn theoretic al pr op erties, and the first b eing as a con trol of the effectiv en ess of the o btained estimation, and for wh ic h w e pro ve the con v ergence in probabilit y , so w e ha ve also a criterion on th e qualit y of the information con tained in the obs erv ations. W e app ly the approac h to trun cated or group ed and censored data situations to giv e the fla vour on the effectiv eness of the approac h. W e give also some in teresting persp ectiv es of the app roac h including mo del selec tion from truncated data, estimation of the in itial trial v alue in t h e celebrate EM algo rithm in the case of tr u ncation and merged n ormal p opulations, a te st o f go o dness of fi t based on the n ew distance, qu alit y of estimations and data. Key w ords and ph rases: EM algorithm, Minim um distance, Mo del selec tion from truncated data, Poin t estimation, T runcated data, Group ed an d censored d ata. 1 1 In tro duction P oin t estimation is the mos t p opular forms of s tatistical inference (see Lehmann and Casella [10 ]). W e in tro duce in this pa p er a ne w statistical point estimation approac h whic h found b e useful in sp ecial practical situations such as tr uncated and gro up ed and censored d ata . The data are said to b e truncated whe n measuring devices fail to re- p ort observ ations b elo w and/or ab ov e certain readings. F or example, truncated data frequen tly arise in the statistical analysis of astronomical o bserv ations ( see Efron a nd P etrosian [6 ]) and in medical data (see Klein and Zhang [9]), and if the truncation is ignored this can cause considerable bias in the estimation. There exists in the literature man y approa ches of estimation from ”incomplete dat a ” suc h as maxim um lik eliho o d based approac h of the EM algo r ithm (Hart ley [7], D empster et al [5]), or nonpara met- ric metho ds suc h as Kaplan-Meier (Kaplan a nd Meier [8]) or Lynden-Bell estimators (Lynden-Bell [11 ]) . The purp o se of the presen t pap er is to inv estigate anot her approach whic h consists on com bining tw o new metho ds of estimation and to apply it in the fixed t yp e I censored or gro up ed a nd censored data situations. In the first metho d, w e remark that in estimation problems w e deal in general with three functions: a theoretical probability law f ( · , θ ) of a r andom v ariable X , dep ending on a parameter θ (real or vector v a lued), an empirical distribution b f c onstructed f r o m a sample of o bserv ations dra wn from the rando m v ariable X, and an estimation e f (f rom an estimation e θ of θ ) obtained through the empirical law b f . The empirical distribution b f is considered as a represen tativ e distribution of f , but in practice it is reduced to only few of its c haracteristics suc h as the mean a nd v ariance. The v ariational asp ect o f b f is often neglected while its imp ortance. W e can easily find, for instance, tw o distributions ha ving the same supp o r t, mean and v ariance while their v ariations differ significan tly , or con v ersely having the same v a r iations but their supp orts and c haracteristic parameters are differen t. But t wo probabilit y distributions with same supp ort and same v aria tions in eac h subset of the supp ort are necessarily the same. W e introduce then a new distance whic h measures the difference b etw een v ariations of t w o distributions on a finite num b er of p oin ts and to use it for estimation purp oses b y the metho d of minim um distance. Since the new measure is not equiv alen t to classical ones it will g iv e new insigh ts that could not b e inv estigated b y classical distances. In the second metho d, we remark that the empirical distribution arising from a sam- ple of observ at io ns can b e view ed in fact as a conditional distribution a s it is built from the kno wledge of the da t a. It will b e then an estimation o f the theoretical conditional distribution with resp ect to the observ at io ns b efore being an estimation for the paren t distribution. This theoretical conditional distribution is represen ted by the auxiliary distribution in tro duced in this pap er. T o determine this distribution in discrete case, w e hav e simply to take the conditional distribution with resp ect to the observ ed v alues and w e pro ceed a na logously for the con tin uous case. It should b e noted that in discrete case it is kno wn as the truncated distribution which is the conditional distribution g iven a truncation (see for example Sha w [1 3]) but it is presen ted here in a general fr a me- w ork. W e ha v e t o deal with tw o discrete proba bility distributions having the same finite 2 supp ort, a theoretical distribution and its empirical represen tation with r esp ect to the observ ations. T he pa rameters of the former are those of the paren t distribution and the aim is to estimate them from the first instead of the parent one as commonly used. W e use classical to ols suc h as the metho d of moments or maxim um lik eliho o d principle. The setting that seems to us most suitable for illustrating o ur approac h is the one of truncated or gro up ed and censored data. In usual practical problems, truncation can b e on left or right or in either situations, and t he ”cut off ” can b e deterministic or ra ndom. In our approach , the truncation may b e on any pa rt of the range of the distribution so that t he setting is more general. Also, classical approache s for truncated data are in general custom-made dep ending o n sp ecific problems and distributions, o r sub jectiv e based metho ds. Instead, our approac h is quite general and migh t b e used in a ny situa- tion where the underlying complete data come from a know n fa mily of distributions. W e confine ourselv es as a first presen tation t o fixed ty p e I and group ed and censored data. In the subsequen t section, w e prop ose a v ar ia tional distance b etw een probability dis- tributions. In Section 3, w e define a truncation of data and asso ciated empirical and theoretical distributions and w e use tw o differen t metho ds for estimation from trunca- tion, a first metho d using minimum o f the new distance in tro duced in this pap er and a second metho d based on traditional to ols of estimations suc h as the metho d of maxim um lik eliho o d. In Section 4, w e presen t the new appro a c h a nd w e illustrate the pro cedure b y three examples: a binomia l pro babilit y la w, a normal distribution and a Gamma densit y function. W e presen t a lso a basic feature of the new approach which prov e the accuracy of the metho d and some illustrative examples. In Section 5, we giv e some ele- men ts of comparison with t he classical approac h o f estimation. In Section 6, w e list some p ersp ectiv es of the new approac h: mo del selection f r om truncated data using the new distance, estimation of t he first tria l v alue in the celebrate EM algorithm for incomplete data in the case of truncation and merged normal distributions, a go o dness of fit test based on the new distance, decision making ab out the quality of estimations and data. Finally , concluding remarks are made some p oin ting to ot her p ossible extensions and applications. 2 A New Di s tance Bet w een Probabilit y Distrib u- tions As is usual, giv en a sample of n indep enden t a nd iden tically distributed observ atio ns, ( x 1 , ..., x n ) , drawn from an unkno wn discrete random v aria ble X falling in a discrete family o f probabilit y la ws P = { f ( · , θ ) , θ ∈ R r } dep ending on a parameter θ (real or v ector v a lued), i.e., f ( x, θ ) = P ( X = x ) , one can summarize the sample into k couples ( y 1 , b f 1 ) , ..., ( y k , b f k ) , k ≤ n, where the y i are the different v alues taken by the sample and b f is the empirical la w b f j = n j /n, where n j represen ts the absolute fr equency of the v alue y j , j = 1 , ..., k . Usually , it is hop ed that b f j ≈ f ( y j , θ ) , in a certain probabilistic sense. But if the empirical distribution arises from truncated data, w e do not hop e in general having 3 b f ( x ) ≈ f ( x, θ ) , f o r the v alues x in the supp ort of b f , since the complete sample size n is usually not rep o r t ed. Ho we ve r, we exp ect r easonably to hav e appro ximately b f ( x ) b f ( y ) ≈ f ( x, θ ) f ( y , θ ) , (1) for any p oints in its supp ort, o nly if the sample has serious ir r egula r ities. In tro duce the following distanc e of pr o p ortional variations b et we en f ( · , θ ) and b f d v ( b f , f ( · , θ )) = X i,j ∈{ 1 ,...,k }      b f i b f j − f ( y i , θ ) f ( y j , θ )      . (2) It turns out that t his new distance, as w e will sho w, measures the v aria tions b et we en probabilit y distributions. In con tin uous case a lso, an y sample x 1 , ..., x n is summarized in to k couples ( y 1 , b f 1 ) , ... , ( y k , b f k ) , k ≤ n . This can b e done uniquely , by gro uping for example the sample in classes where the y i are the mid-classes (or class means) and b f i = b f ( y i ) where b f is an empirical densit y estimator, or the data is presen ted in a gro up ed and censored f orm. The prop or t io nal v ariational distance d v in this case, b et we en the densit y f ( x, θ ) of X and its empirical law b f , is thus defined as (2). One of its main p ow erful feature is that when using traditio nal distances w e ha ve to use the sample size n t hrough the expression of b f i = n i / ( nh n ) , where h n is the size o f class interv als; but sometimes, as f or truncated data situations where measuring devices fa il to rep ort eve n the n um b er of sample p oints in certain ranges, then the real size n is not kno wn, but a truncated sample size n t is instead used. Using the ra tios b f i / b f j will clear up the effect of the truncated sample size whic h can lead to considerable bias in the estimation. Note that d v p ossesses the prop erties of symmetry and tria ng le inequalit y . But in the identit y prop ert y d v ( f , g )( x, y ) = 0 ⇐ ⇒ f ≡ g , the equalit y b et w een f and g m ust b e understo o d in t he sens e that f and g ha v e the same v ariations on the po in ts x and y . It should b e stressed that this new measure is not equiv alent to classical ones and should then give new insigh ts and informa t io n ab out o ther characteristic s and features of probability distributions. F rom now on f shall represen t a theoretical probabilit y la w in both discrete or contin- uous cases and b f shall represen t the corresp onding empirical law in b oth cases. Denote b y Ω = { x ∈ R , f ( x, θ ) > 0 } the set o f atom s of f or supp ort . Let F b e the σ − algebra generated b y sets A = B ∩ ω where the ω a r e the Borel sets of R a nd B ⊂ Ω . F or all A ∈ F , w e hav e P ( A ) = R A f ( x, θ ) µ ( d x ) , where µ is the Leb esgue measure on R . In discrete case, w e hav e P ( A ) = P x ∈ A f ( x, θ ) . F or all i ≥ 1 , w e set Ω i = Ω , F i = F and P i = P . Let Ω n = Ω 1 × ... × Ω n , F ( n ) = F 1 ⊗ ... ⊗ F n and P ( n ) = P 1 ⊗ ... ⊗ P n . The probability space  Ω n , F ( n ) , P ( n )  represen ts the space of samples of size n from the random v ariable X . W e o mit the subscript n in  Ω n , F ( n ) , P ( n )  for notationa l con v enience and shall denote the sample space as (Ω , F , P ) . 4 2.1 A Notion of V ariation b et w een p robabilit y distributions W e will discuss now the measure theoretic asp ect of the new distance in tro duced a b o ve . Let P and Q t w o pro babilit y measures defined on the same measurable space (Ω , F ), f and g their resp ectiv e probability densities, not necessarily with respect to the same measure and E an ev ent of this space. W e say t ha t f and g hav e t he same v a r iation on E , if the resp ectiv e restrictions of f and g o n E , define the same probability measure on E endo w ed with the sigma algebra traces of F on E . Definition 1 L et f and g two pr ob ability distributions p ositive and define d on a p art E not r e duc e d to only one e lement. I f in any p oint ( x, y ) of E × E , we have: f ( x ) f ( y ) = g ( x ) g ( y ) (3) then we say that f and g have sam e variations on E . Example 2 L et f b e a densi ty of a pr ob ability m e asur e P and E an even t such that P ( E ) > 0 . The r estriction of f on E and the c onditional distribution of f with r esp e ct to E define the sa me pr o b ability me asur e on E a n d c on s e quently they have the same variations on E . Definition 3 L et f and g two pr ob ability distributions and E a n even t on wh ich they ar e strictly p os itive. If E is discr ete a nd not r e duc e d to only one element, and one of the distributions f and g b eing discr ete and the other may not b e discr ete, we c al l dis tanc e in variations b etwe en f and g on E the quantity: d v ( f , g ) E = X ( x,y ) ∈ E     f ( x ) f ( y ) − g ( x ) g ( y )     . If E is an in terva l of R and, f and g ar e pr ob ab ility den sities on R , with r esp e ct to L eb esgue me as ur e µ on R , we c al l distanc e in variations b etwe en f and g o n E , the quantity: d v ( f , g ) E = Z Z E × E     f ( x ) f ( y ) − g ( x ) g ( y )     µ ( dx ) µ ( dy ) . Let b e giv en a classic al distance d b et w een t w o functions f and g whic h asso ciates for p oin ts x and y f rom the intersec tion of their domain of definitions, the quan tit y d ( f , g ) ( x, y ) = | f ( x ) − g ( x ) | + | f ( y ) − g ( y ) | . Prop osition 4 We have the fol lowin g pr op erties for the distanc e d v : 1. d ( f , g )( x, y ) = 0 = ⇒ d v ( f , g )( x, y ) = 0 , the c onv erse is not a lways true. 2. L et b f b e a kernel density estimation. T hen lim n →∞ d v ( b f , f ) = 0 in pr ob ability. 3. L et f and g b e two functions define d on R and E ⊂ R satisfying: ∀ ( x, y ) ∈ E × E , d v ( f , g )( x, y ) = 0 . 5 If Z R f dµ = Z R g dµ = 1 , wher e µ is the L eb esgue me asur e on R , then µ  E  = 0 = ⇒ f = g µ − almost sur ely on R . Pro of. 1. F ollo ws directly from the definitions of d and d v . 2. F ollo ws from the fact lim n →∞ d ( b f , f ) = 0 in probabilit y (see P arzen [12]), then lim n →∞ d v ( b f , f ) = 0 in the same pro ba bilistic notion of conv ergence. 3. Fix y 0 ∈ E , w e hav e f ( x ) /f ( y 0 ) = g ( x ) /g ( y 0 ) for all x ∈ E . This implies that Z E f ( x ) dx = 1 ⇐ ⇒ Z E f ( y 0 ) g ( x ) g ( y 0 ) dx = f ( y 0 ) g ( y 0 ) Z E g ( x ) d x = 1 . W e deduce that f ( y 0 ) = g ( y 0 ) , and the result follows. 3 T runcated Data The truncated data sp ecification, or g enerally inc omplete data, implies the existence of t w o sample spaces X o and X t , suc h that the complete sample space is g iv en by Ω = X o ∪ X t . The observ ed data x o = ( x 1 , ..., x n t ) , where n t is the truncated sample size, are a realization from X o and the unobserv ed data z =  x ∗ 1 , ..., x ∗ n − n t  , where n is the complete unkno wn sample size, are from X t . The complete data x = x o ∪ z is kno wn only through the observ ed data x o (see Dempster, Lair d a nd Rubin [5] for f urther explanations ab out incomplete da ta sp ecification) . Consider a sample o f observ a t ions x 1 , ..., x n dra wn from a theoretical probability la w f ( · , θ ) , dep ending on a parameter θ ∈ R r . As usual, the data are summarized, in discrete or con tin uous cases (as sho wn in Section 2), in to k couples ( y 1 , b f 1 ) , ..., ( y k , b f k ) , k ≤ n, and let △ = { u 1 , ..., u m } a part from t he set { y 1 , ..., y k } , m ≤ k , whic h w e will call trunc ation. The observ ed da ta is summarized b y a truncation △ o = { u 1 , ..., u m } and an empirical estimation b f o and assume that the unobserv ed data is also summarized b y a set △ t = { u ∗ 1 , ..., u ∗ p } and b f t . The structure of the new distance d v allo ws the follo wing decomp osition prop ert y: d v ( b f , f ( · , θ )) = d v ( b f o , f ( · , θ )) + d v ( b f t , f ( · , θ ))+ (4) X u i ∈△ o u ∗ j ∈△ t      b f o ( u i ) b f t  u ∗ j  − f ( u i , θ ) f ( u ∗ j , θ )      + X u i ∈△ o u ∗ j ∈△ t      b f t  u ∗ j  b f o ( u i ) − f ( u ∗ j , θ ) f ( u i , θ )      . The follo wing prop osition is typical for the new distance and is useful for using the minim um of distance d v . Prop osition 5 L et b e gi v en a trunc ate d data △ o with c orr esp on d ing empiric al estim a - tion b f o . T h en lim n t →∞ d v ( b f o , f ) = 0 in pr ob abi lity. 6 Pro of. W e ha v e f rom Prop osition 1 that lim n →∞ d v ( b f , f ) = 0 in pro ba bility . Then, from the decomposition pro p ert y (4) w e obtain lim n →∞ d v ( b f o , f ) = lim n t →∞ d v ( b f o , f ) = 0 in probabilit y . 3.1 An A uxiliary Distribution Define the empirical distribution e f corresp onding to a g iv en truncation △ b y: e f ( x ) =  e f i if x = u i , i = 1 , ..., m, 0 otherwise, where t he e f i satisfy the f o llo wing set of prop ortiona l allo cation equations e f i / e f j = b f i / b f j , for i, j = 1 , ..., m a nd e f 1 + ... + e f m = 1 . Define the following auxiliary distribution from f ( · , θ ) , whic h is akin t o t he prop or- tional allo cation pro cedure for missing v a lues (see Hartley [7]). h ( x, θ ) =    f ( x, θ ) f ( u 1 , θ ) + f ( u 2 , θ ) + ... + f ( u m , θ ) if x = u i , i = 1 , ..., m, 0 otherwise (5) Remark 6 If the trunc a tion is r an d om, that is, ther e exis ts a r andom variable T s uch that we observe, for examp l e , the r andom varia b l e X only if X > T or X < T , then the pr ob ability law use d in (5) i s r epla c e d b y the c onditional law of X with r esp e ct to { X > T } or { X < T } r es p e ctively. The auxiliary distribution h was found be useful for e stimation problems in truncated data. Indeed, it is we ll kno wn in classical estimation from truncated data (see Hartley [7]) that missing v alues could b e recov ered by ”propo rtional allo cation” pro cedures, then the auxiliary distribution h, whic h is already ba sed on prop ortional allo cation, will b e an in tuitiv e and natura l to ol fo r estimation purp oses f r om truncated data. The function h is a theoretical proba bility distribution dep ending on the same para meters o f those of f . It has also the same supp ort as that o f e f . Definition 7 We c al l e f and h ( · , θ ) the empiric al and the or etic al distributions of a given trunc ation △ = { u 1 , ..., u m } fr om a sample of obse rv a tions ( x 1 , ..., x n ) . 4 The Ap proac h of Est imation W e will use mainly tw o metho ds of estimation. The first metho d is a minimum distance estimation using the metric d v b et w een the empirical and theoretical distributions b f and f ( · , θ ) . The second is similar to traditional ones suc h as the metho d o f substitution or maxim um lik eliho o d principle, by considering e f as an empirical estimation of h ( · , θ ) . The first is based on v ariational difference b etw een distributions and the second in the sense of an euclidean difference a nd hence they treat different asp ects of the sample of 7 observ ations. If for a giv en da ta they give differen t estimations, w e cannot susp ect the approac hes but w e can say tha t the data do not r estore in a coheren t w a y all asp ects of the pro ba bilit y distribution from whic h it emanated. If on the other hand they g iv e significan tly the same estimations we can assert that the estimation is credible since through differen t asp ects it has giv en the same distribution. That is the distribution whic h fits the b est the empirical distribution. Practically , w e prop ose to calculate the estimations b y the t w o metho ds and take the se cond one since based o n maximum lik eliho o d principle of go o d kno wn theoretical prop erties. W e use then the first as a to ol of decision on whether the estimation is credible or not. The estimation will then b e considered as credible in cases where the tw o metho ds give appro ximately the same estimation. 4.1 Con v ergence in Probabilit y of the Minim um Distance Es- timator Let X 1 , X 2 , ..., X n a sample with X i ∼ f ( x, θ ) , θ = ( θ 1 , ..., θ s ) t ∈ Θ ⊆ R s , with f ( x, θ ) = K ( x ) × exp ( s X k =1 θ k T k ( x ) + A ( θ ) ) , (6) x ∈ X ⊆ R , where X is a Borel set o f R suc h tha t X = { x : f ( x, θ ) > 0 } for all θ ∈ Θ . The family (6 ) is ve ry rich, one finds there, for example, the family of the normal la ws, and the family of the la ws o f P oisson. W e assume that the supp ort X do es not dep end on θ . Denote b y e θ n the estimator b y the minim um of metric d v b et w een the empirical and theoretical distributions b f n (based on a sample of size n ) and f ( · , θ ) , t ha t is e θ n = arg min θ d v ( f ( · , θ ) , b f n ) . This estimator falls into t he class of M-estimators. Using w ell kno wn theorems on the con v ergence of M-estimators (see fo r example Amemiy a [1]) we will pro v e that e θ n con- v erges in probability to the true parameter. Prop osition 8 L et X 1 , X 2 , ..., X n b e a sample fr om the fam ily of di s tributions ( 6). If the s e t of natur al p ar ameters Θ is c ovex and the true p ar ameter θ is an interior p oint of Θ , then the es tim a tor e θ n by the minimum of the distanc e of variations d v c onv er ges in pr ob ability to the true p ar ame ter θ , i.e., e θ n P − → θ . Pro of. Since w e searc h f o r a minim um of the criterion f unction d v , it suffices to show, under the assumptions of the f a mily (6) and the con v exit y of the set Θ , that d v ( θ , x ) seen as a function of θ is a con vex function (see Amemiy a [1]). Hence, this reduces t he problem to the con v exity of δ ij ( θ ) =      f ( y i , θ ) f ( y j , θ ) − b f ( y i ) b f ( y j )      . 8 F or λ, µ ∈ R with λ + µ = 1, and θ (1) , θ (2) ∈ Θ , w e hav e δ ij ( λθ (1) + µθ (2) ) =      C ij exp ( s X k =1 h λθ (1) k + µθ (2) k i ( T k ( y i ) − T k ( y j )) ) − A ij      (7) where C ij = K ( y i ) /K ( y j ) and assume that C ij > 0 and A ij = b f ( y i ) / b f ( y j ) . w e hav e from the con v exit y of the exp onential function that exp ( s X k =1 h λθ (1) k + µθ (2) k i ( T k ( y i ) − T k ( y j )) ) ≤ λ exp ( s X k =1 θ (1) k ( T k ( y i ) − T k ( y j )) ) + µ exp ( s X k =1 θ (2) k ( T k ( y i ) − T k ( y j )) ) , then C ij exp ( s X k =1 h λθ (1) k + µθ (2) k i ( T k ( y i ) − T k ( y j )) ) − A ij ≤ λC ij exp ( s X k =1 θ (1) k ( T k ( y i ) − T k ( y j )) ) + µC ij exp ( s X k =1 θ (2) k ( T k ( y i ) − T k ( y j )) ) − ( λ + µ ) A ij ≤ λ " C ij exp ( s X k =1 θ (1) k ( T k ( y i ) − T k ( y j )) ) − A ij # + µ " C ij exp ( s X k =1 θ (2) k ( T k ( y i ) − T k ( y j )) ) − A ij # . In tro ducing the absolute v alue we get δ ij ( λθ (1) + µθ (2) ) =      C ij exp ( s X k =1 h λθ (1) k + µθ (2) k i ( T k ( y i ) − T k ( y j )) ) − ( λ + µ ) A ij      ≤ λ      C ij exp ( s X k =1 θ (1) k ( T k ( y i ) − T k ( y j )) ) − A ij      + µ      C ij exp ( s X k =1 θ (2) k ( T k ( y i ) − T k ( y j )) ) − A ij      = λ δ ij ( θ (1) ) + µδ ij ( θ (2) ) . Hence δ ij ( θ ) is a con v ex f unction of θ , whic h implies the conv exity of d v ( θ , x ) seen as a function of θ and then the con v ergence in probabilit y of the minim um of distance d v estimator. 9 4.2 A Maxim um Lik eliho o d Principle with the Auxiliary Dis- tribution W e firstly b egin in a general situatio n, tha t of the one-para meter exp onen tial family , to sho w how to use the pro cedure explained b elo w in the case of the new metho d. Consider the one- pa r ameter exp onential family with densit y f ( x, θ ) = K ( x ) × exp[ θ T ( x ) − A ( θ )] , (8) where θ is the parameter, T a statistic, K ( x ) a function of x and A is a function of the parameter θ . Let us use the maximum lik eliho o d principle. Consider a sample of observ ations x 1 , ..., x n from whic h w e deriv e the supp ort △ = { y 1 , ..., y k } . W e then construct the auxiliary distribution from the supp o r t △ , expressed in the followin g form h ( x, θ ) = K ( x ) × exp [ θ T ( x ) − A ( θ )] P k i =1 K ( y i ) × exp[ θ T ( y i ) − A ( θ )] . (9) W e hav e t o maximize the lik eliho o d function give n in o ur case by L h ( y , θ ) = k Y i =1 h ( y i , θ ) . (10) Without loss of generality , we assume that the class in terv als are the same. Then, w e ha v e log L h ( y , θ ) = k X i =1 log h ( y i , θ ) = k X i =1 n i n log        K ( y i ) × exp [ θ T ( y i ) − A ( θ )] k X i =1 K ( y i ) × exp[ θ T ( y i ) − A ( θ )]        , (11) taking the deriv ativ e and solving the score equation on θ w e obtain an estimator of t he parameter θ satisfying the relation k X i =1 n i n k X i =1 T ( y i ) × f ( y i , θ ) k X i =1 f ( y i , θ ) = k X i =1 n i n T ( y i ) . (12) The later result ma y b e obtained directly b y the method o f momen ts, but w e ha v e presen ted the ma ximum likelihoo d metho d since it is widely used in statistical inference. In order to test the p erformance of the prop osed approac h, w e use syn thetic data sets whic h w ere generated b y simulation from three examples of proba bilit y law: binomial la w, normal densit y and a Gamma distribution. The examples w ere selected from v a rious sim ulation studies f rom differen t f amily of probability distributions and the tw o metho ds ha v e shown their effectiv eness and nev er deviate significantly f r om the true para meter. The reason fo r using syn thetic data sets is that the true parameters f or the syn thetic datasets are known a nd the accuracy of results obtained b y using the t w o new metho ds can b e compared. 10 4.3 Examples Binomial distribution. W e generated a syn thetic data set of size 50 0 f rom a binomial la w B ( n, p ) with n = 10 and p = 0 . 3 , and denote b y f ( y ; p ) = C y n p y (1 − p ) n − y its probabilit y mass function. The data are summarized in t he following table. T able 1. y i 0 1 2 3 4 5 6 7 n i 15 71 108 134 9 7 47 23 5 Our a im is to estimate the parameter p, with the kno wledge of n = 10, from differen t truncation of dat a . F or illustrating the t w o metho ds, consider the truncation △ = { 2 , 3 , 4 , 5 } with t run- cated sample size n t = 3 86 . W e ha v e then a truncation prop ortio n of Q = 100( n − n t ) /n = 22 , 8 % in data. F or the first metho d, we hav e to searc h the v alue of the para meter p whic h minimizes the distance d v , t ha t is: min p d v ( b f , f ) = min p X i,j ∈△ i 6 = j     f ( y i ; p ) f ( y j ; p ) − n i n j     , Using computer a lgebra pack age, we obtain the result e p 1 = 0 . 299 . F or the se cond metho d, the empirical distribution e f giv en the truncation ∆ = { 2 , 3 , 4 , 5 } is g iv en by e f (2) = 108 / 386 , e f (3) = 134 / 386 , e f (4) = 97 / 386 , e f (5) = 47 / 386 and e f ( x ) = 0 if x / ∈ ∆ . The auxiliary distribution h ( · , p ) is giv en by : h ( x, p ) =  f ( x, p ) f (2 , p ) + f (3 , p ) + f (4 , p ) + f (5 , p ) if x = u i , u i ∈ { 2 , 3 , 4 , 5 } 0 otherwis e. (13) By the metho d of substitution, the estimation of p is obtained by solving the equation: X u i ∈{ 2 , 3 , 4 , 5 } u i × h ( u i , p ) = X u i ∈{ 2 , 3 , 4 , 5 } u i × e f ( u i ) (14) Using a computer algebra pac k age we obtain the result e p 2 = 0 . 3. In the follo wing table w e presen t the estimations e p 1 from the fir st metho d using minim um distance a ppro ac h us ing the dis tance d v , and e p 2 from the auxiliary distribution, of the pa rameter p, for kno wn n, according to t he truncation △ = { u 1 , ..., u m } considered. 11 T able 2. The estimations e p 1 and e p 2 b y the new a pproac h of the par a meter p of the binomial probability law B ( n, p ) with p = 0 . 3 and know n n = 10. T runcated Prop ortion of n ◦ △ sample size n t truncation Q ( %) e p 1 e p 2 1 { 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 } 500 0 0 . 305 0 . 298 2 { 0 , 1 , 2 , 3 , 4 , 5 } 472 5 . 6 0 . 295 0 . 293 3 { 1 , 2 , 3 , 4 , 5 } 457 8 . 6 0 . 288 0 . 292 4 { 0 , 1 , 2 , 3 , 4 } 425 15 0 . 295 0 . 293 5 { 1 , 2 , 3 , 4 } 410 18 0 . 287 0 . 292 6 { 0 , 2 , 3 , 4 , 5 } 401 19 . 8 0 . 295 0 . 298 7 { 2 , 3 , 4 , 5 } 386 22 . 8 0 . 299 0 . 3 8 { 0 , 1 , 3 , 4 , 5 } 364 27 . 2 0 . 295 0 . 289 9 { 0 , 2 , 3 , 4 } 354 29 . 2 0 . 295 0 . 301 10 { 1 , 3 , 4 , 5 } 349 30 . 2 0 . 287 0 . 287 11 { 2 , 3 , 4 } 339 32 . 2 0 . 305 0 . 305 12 { 0 , 3 , 4 , 5 } 293 41 . 4 0 . 295 0 . 293 13 { 2 , 4 , 5 , 6 , 7 } 280 44 0 . 308 0 . 307 14 { 0 , 1 , 2 , 5 , 6 , 7 } 269 46 . 2 0 . 298 0 . 299 15 { 0 , 1 , 4 , 5 , 6 , 7 } 258 48 . 4 0 . 3013 0 . 29 5 16 { 0 , 4 , 5 , 6 , 7 } 187 62 . 6 0 . 3071 0 . 30 2 17 { 0 , 5 , 6 , 7 } 90 82 0 . 3014 0 . 30 1 18 { 0 , 5 } 62 87 . 6 0 . 2937 0 . 294 As previously said, the tw o estimations b y the new approach, e p 1 and e p 2 , a r e a ccurate in all cases and close to each other. F urthermore, the t runcation prop ortion has no effect on the qualit y o f estimations. The tw o estimations are also not sensitiv e to small cell probabilities as for truncations including the v alue y 8 = 7 . It should b e not ed that the classical estimation b y maxim um lik eliho o d without truncation is b p = 0 . 297 , and considering our approac h w e obta ined the estimations e p 1 = 0 . 305 3 for the first metho d and e p 2 = 0 . 2978 for the second. Normal distribution. Cons ider a sample of size 400 dra wn fro m a normal p opulation with mean m = 0 and standard deviation σ = 1 . Consider the data falling in 11 fixed class interv als as show n in the follo wing table, with mid-classes u i and absolute frequen- cies n i T able 3. y i − 2 . 581 − 2 . 06 − 1 . 533 − 1 . 009 − 0 . 485 0 . 039 0 . 56 3 1 . 086 1 . 61 0 2 . 134 2 . 658 n i 5 8 23 48 71 89 72 43 25 10 6 The n umber of bins can b e selected f r o m an optimal pro cedure deve lop ed by Birg ´ e and Rozenholc [2 ]. Let the follo wing table where we estimate sim ultaneously m and σ b y the minim um distance pro cedure with dv . W e denote the estimations b y ˜ m 1 and ˜ σ 1 . In eac h line of the table the estimates are made starting from the table of frequencies based on the observ ations indicated in the first column. The truncated sample size is denoted by n t . W e hav e then a truncation prop or t io n of Q = 100 ( n − n t ) /n in data. 12 T able 4. S n t Q % e m 1 ˜ σ 1 { y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 , y 10 , y 11 } 400 0 0 . 083 1 . 130 { y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 } 384 4 0 . 003 1 . 092 { y 2 , y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 } 379 5 . 25 0 . 054 0 . 977 { y 3 , y 4 , y 5 , y 6 , y 7 , y 8 , y 9 } 371 7 . 25 0 . 052 0 . 993 { y 4 , y 5 , y 6 , y 7 , y 8 , y 9 } 348 13 0 . 043 1 . 017 { y 5 , y 6 , y 7 , y 8 , y 9 } 300 25 0 . 05 2 1 . 012 { y 3 , y 4 , y 5 , y 6 } 231 42 . 25 0 . 303 1 . 104 { y 6 , y 7 , y 8 , y 9 } 229 42 . 75 − 0 . 225 1 . 140 { y 6 , y 7 , y 8 } 204 49 − 0 . 06 5 1 . 052 { y 3 , y 5 , y 7 } 166 58 . 5 0 . 052 0 . 993 { y 2 , y 3 , y 4 , y 5 } 150 62 . 5 − 0 . 1 3 7 0 . 904 { y 3 , y 4 , y 5 } 142 64 . 5 − 0 . 1 5 1 0 . 893 Remark 9 In pr actic e , the bin s ar e in fact chosen after obtaining the trunc ate d sam - ple so the r esults should b e mor e efficient, but this do es not affe ct the pr e c e ding r esults obtaine d after gr o upin g the who l e sample and trunc a te fr om the bins sin c e the aim is to give some fe el ab out the ac cur a c y of the estimations. Also we c an avo id gr oupin g the o b - servations by c onsidering empiric al fr e quencies obtain e d fr o m kernel densi ty estimations. 4.3.1 Gamma probabi lity densit y Consider a sample of size 800 dr awn from a G amma distribution G ( a, b ) with densit y giv en b y f ( x | a, b ) = 1 b a Γ( a ) x a − 1 exp  − x b  , x ≥ 0 , (15) and parameters a = 7 and b = 3 . Consider the data fa lling in 1 6 fixed class interv als as sho wn in the following table, with mid-classes u i and absolute frequencies n i : T able 5. u i 5 . 89 8 . 72 1 1 . 56 14 . 39 17 . 23 20 . 06 22 . 89 2 5 . 73 28 . 56 3 1 . 39 n i 11 40 60 108 118 104 100 74 63 53 34 . 23 37 . 06 39 . 8 9 42 . 73 45 . 5 6 48 . 39 27 21 11 5 3 2 In the f ollo wing table w e show the estimations e b 1 from the minim um of distance d v and e b 2 b y the second metho d f o r the parameter b, with kno wn a = 10 , according to the truncation △ considered. 13 T able 6. The estimations e b 1 and e b 2 b y the new a pproac h of the par a meter b of the Gamma probability distribution G ( a, b ) with b = 3 and kno wn a = 7. n ◦ △ n t Q (%) e b 1 e b 2 1 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 800 0 3 . 018 3 . 054 u 10 , u 11 , u 12 , u 13 , u 14 , u 15 , u 16 } 2 { u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , 787 1 . 625 2 . 980 3 . 065 u 10 , u 11 , u 12 , u 13 , u 14 , u 15 } 3 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10 , u 11 , u 12 } 779 2 . 625 3 . 012 3 . 068 4 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10 } 731 8 . 625 2 . 895 3 . 059 5 { u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10 } 720 10 3 . 063 3 . 075 6 { u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 , u 10 } 680 15 3 . 157 3 . 119 7 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 } 678 15 . 25 2 . 864 3 . 002 8 { u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 } 667 16 . 625 2 . 97 8 3 . 018 9 { u 3 , u 4 , u 5 , u 6 , u 7 , u 8 , u 9 } 627 21 . 625 3 . 08 6 3 . 062 10 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 } 615 23 . 125 2 . 85 9 2 . 960 11 { u 2 , u 3 , u 4 , u 5 , u 6 , u 7 , u 8 } 604 24 . 5 2 . 908 2 . 97 7 12 { u 4 , u 5 , u 6 , u 7 , u 8 , u 9 } 567 29 . 1 25 3 . 046 3 . 016 13 { u 2 , u 3 , u 4 , u 5 , u 6 , u 7 } 530 33 . 75 2 . 908 2 . 978 14 { u 2 , u 3 , u 4 , u 5 , u 10 , u 11 , u 12 , u 13 , u 14 } 443 44 . 625 3 . 01 8 3 . 080 15 { u 1 , u 2 , u 3 , u 4 , u 5 , u 6 } 441 44 . 8 75 2 . 775 2 . 894 16 { u 1 , u 2 , u 3 , u 4 , u 8 , u 9 , u 10 , u 11 , u 15 } 439 45 . 125 2 . 96 9 3 . 048 17 { u 1 , u 2 , u 3 , u 4 , u 5 , u 11 , u 12 , u 13 , u 14 , u 15 , u 16 } 406 50 . 75 3 . 018 3 . 031 18 { u 1 , u 2 , u 3 , u 4 , u 5 } 337 57 . 875 2 . 788 2 . 931 19 { u 8 , u 9 , u 10 , u 11 , u 12 , u 13 , u 14 , u 15 , u 16 } 256 67 . 625 2 . 990 3 . 21 2 20 { u 10 , u 11 , u 12 , u 13 , u 14 , u 15 , u 16 } 122 84 . 75 2 . 894 2 . 822 The estimations from the t wo metho ds a re also accurate in this case of gamma dis- tribution for the pa rameter b. Here also the truncation prop o r t ion do es not affect the qualit y of estimations. When w e consider the complete da t a , the classical estimation is b b = 3 . 04 and the tw o new estimations a r e e b 1 = 3 . 018 and e b 2 = 3 . 054 . As it was noticed in the examples a b o v e, the tw o metho ds lead to approx imately the same estimation results. Nev ertheless, if the tw o estimations are significan tly differen t, it seems related to the quality of the selected data. An imp o rtan t feature of this new approac h is that the qualit y of estimations is uninfluenced by the truncation prop ortion. The following section will giv e further insigh ts of the new approach. 4.4 A B asic F eature of the New Appr oac h The preceding results hav e shown the effectiv eness of the new approac h and w orke d w ell in simulation exp erimen ts. F urthermore, the prop osition b elo w will give an insight of a ma jor feature of the new approac h by considering the one parameter exp onen tial family . W e will pro v e that for all truncation considered formed b y more than t wo p oin ts, f r o m a sample of observ ations; if t he ratios o f the relativ e frequencies o f the u i are equal to t he theoretical ones, then w e may obtain the true v alue of the parameter. W e ma y conjecture 14 that when considering an arbitra ry la w of probability dep ending o n r parameters, suc h that we ha v e a truncation comp osed b y r + 1 p oints ha ving exact empirical ratios of the relativ e frequencies then w e obtain the true v alues of the r parameters. Prop osition 10 Consider a pr ob abi l i ty distribution f fr om the one-p ar ameter exp onen- tial fa mily with density f ( x, θ ) = K ( x ) × exp[ θ T ( x ) − A ( θ )] , (16) wher e θ ∈ R is the p ar ameter, T a statistic, K ( x ) a function of x and A is a function of the p ar ameter θ . Assume that we wis h to estimate the p ar ameter θ . If we c onsider a trunc ation having two p oints x and y with empiric al fr e quencies f 1 and f 2 satisfying f 1 /f 2 = f ( x, θ ) /f ( y , θ ) , then, using the appr o ach c onsider e d her e, we obtain the true value of θ . Pro of. 1. If we consider t he minimu m of distance d v the result is immediate. 2. Consider now the second metho d to estimate m . Consider tw o v alues x and y fr o m the exponential family with densit y giv en by (16 ) , with e θ b eing the estimation b y the new approach, and assume that their empirical fr equencies f 1 and f 2 are suc h that f 1 f 2 = f ( x, e θ ) f ( y , e θ ) . W e obtain u = xf 1 + y f 2 = xK ( x ) exp  e θ T ( x )  + y K ( y ) exp  e θ T ( y )  K ( x ) exp  e θ T ( x )  + K ( y ) exp  e θ T ( y )  . Then, w e solve on θ the follo wing equation:   x − xK ( x ) exp  e θ T ( x )  + y K ( y ) exp  e θ T ( y )  K ( x ) exp  e θ T ( x )  + K ( y ) exp  e θ T ( y )    K ( x ) exp ( θ T ( x )) +   y − xK ( x ) exp  e θ T ( x )  + y K ( y ) exp  e θ T ( y )  K ( x ) exp  e θ T ( x )  + K ( y ) exp  e θ T ( y )    K ( y ) exp ( θ T ( y )) = 0 , after straightforw ard algebra w e obtain ( x − y ) exp  e θ T ( y ) + θ T ( x )  + ( y − x ) exp  e θ T ( x ) + θ T ( y )  = 0 , yielding the true v a lue e θ = θ . The pro of is complete. 15 Remark 11 Note that the fr e quencies f 1 and f 2 ne e d not b e e xact, that is f 1 may b e differ ent fr om f ( x, θ ) and also f 2 , but we r e quir e only that their r atio is e qual to the the or etic a l one f ( x, θ ) /f ( y , θ ) . Examples Binomial distribution. Conside r again the binomial distribution B ( n, p ) with n = 10 and p = 0 . 3 and a ssume n is kno wn and w e wish to estimate p. Assume w e hav e the follo wing trunc at io n with only tw o p o in ts △ = { 0 , 1 } . T he exact ratio of their probabilit y distribution is giv en b y f (0 , p ) /f (1 , p ) = 7 / 30 , whic h is a rational v alue that will simplify the example. Cho ose the absolute f r equencie s of the tw o v alues considered as b eing n 1 = 7 and n 2 = 30 for the v alues u 1 = 0 and u 2 = 1 r espective ly , in order for ha ving f 1 /f 2 = f ( x, p ) /f ( y , p ) = 7 / 3 0 . Using the first a ppro ac h, that of the minim um of distance d v , we ha v e to solv e min p d v ( b f , f ) = min p      C 0 10 (1 − p ) 10 C 1 10 p (1 − p ) 9 − 7 30     +     C 1 10 p (1 − p ) 9 C 0 10 (1 − p ) 10 − 30 7      , and w e g et the true v alue e p 1 = 0 . 3. Using the second metho d w e ha v e t o solv e t he following equation on p 0 × C 0 10 (1 − p ) 10 + 1 × C 1 10 p (1 − p ) 9 C 0 10 (1 − p ) 10 + C 1 10 p (1 − p ) 9 = 30 37 , and w e o btain also the exact result e p 2 = 0 . 3 . Gamma distribution. Consider the Gamma probability distribution G ( a, b ) with a = 10 and b = 5 . Assume that a is kno wn and w e wish to estimate b. Consider the truncation △ = { u 3 , u 8 } with u 3 = 30 . 13 and u 8 = 60 . 02. W e hav e the following v alue of the ratio f ( u 3 , b ) /f ( u 8 , b ) ≈ 0 . 799 (the result is an appro ximate result since for probabilit y densit y functions it is difficult to get an exact r a tional v alue but w e will sho w that the estimations ar e very close to the true v alue). Consider the a bsolute frequencies n 3 = 79 . 93 (or 80) and n 8 = 100 for the v alues u 3 = 30 . 13 and u 8 = 60 . 02 resp ectiv ely . W e hav e then n 3 /n 8 ≈ f ( u 3 , b ) /f ( u 8 , b ) . Using the minimum of distance d v , w e hav e to solv e min b d v ( b f , f ) = min b    (79 . 93 / 10 0 ) − ((30 . 13 / 60 . 02) 9 × exp( − (1 /b ) × (30 . 1 3 − 6 0 . 02)))   +(100 / 79 . 9 3) − ((60 . 0 2 / 30 . 13) 9 × exp( − (1 /b ) × (60 . 0 2 − 3 0 . 13)))  , and w e g et the result e b 1 ≈ 5 . F rom the second metho d, w e compute u = 46 . 7438 a nd solve o n b the following equation (30 . 13 − 46 . 743 8 ) × 30 . 13 9 × exp( − 30 . 13 /b ) +(60 . 02 − 46 . 743 8) × 60 . 02 9 × exp( − 60 . 02 /b ) = 0 . The result is e b 2 ≈ 5 . 16 No w assume that the parameters a and b are unknow n and sho w how to join tly estimate them using the new approach. Since now there are t w o unkno wn par a meters, w e need to hav e three p o in ts from the supp ort, so consider u 1 = 34 . 7702, u 2 = 57 . 5008 and u 3 = 7 4 . 5487 with their corresp onding absolute frequencies n 1 = 102 , n 2 = 100 and n 3 = 3 4 . W e ha ve to find a and b whic h minimize the distance d v that is min a,b d v ( b f , f ) . The result is e a ≈ 10 . 0454 and e b ≈ 4 . 9739 . 5 Elemen ts of Comparison wit h the Classical Ap- proac h Our aim here is no t to giv e a detailed comparison study which needs to b e in v estigated thoroughly , but only some ele men ts of appreciation. A ma jor feature whic h c haracterizes this new approac h f r o m the others is that when w e ha v e exact ratios o f f r equencie s w e obtain the true para meter and when their difference fro m the theoretical rat ios decrease the qualit y of estimation increase ev en if w e are using only a part from t he sample of observ ations. This is not the case for classical approac hes. In classical approache s, qualit y considerations are only view ed through mean prop erties of estimators o r their asymptotic b eha viour. By com bining the tw o pro p osed metho ds w e hav e in fact a p oint criterion. Anot her characteristic s is that the prop o rtion of truncation has an y effect on the quality of estimations. The first metho d uses a w ell kno wn metho d of minim um distance but with a new one whic h has an imp or t a n t adv an tage o f b eing symmetric, the prop ert y of which man y traditional distances do not hav e. Ho w ev er, the estimations are obtained in this case implicitly so it is difficult to find explicit expressions and study their prop erties to compare them with classical ones. Using the new distance w e hope ha ving fast conv erg ent estimators since w e exp ect that the influence of the errors in the frequencies will b e slight in the new a pproac h as w e are using ratios of frequencies. Consider now the second metho d of the new approach. W e use classic al pro cedures of estimation suc h as the maxim um lik eliho o d principle using the auxiliary distribution. W e ma y obtain the estimators and study their prop erties as commonly used and then preserv es the a dv an tages of classical metho ds. In classical approach, giv en a sample, the estimation of certain para meters suc h as the mean and v ariance do not change according t o the family of parent distributions. The la t t er infor ma t io n is not used a nd this disadv antages t he approac h. Ho we ve r, in the new approach the estimations of the mean a nd v ariance c hange according to the distribution from whic h t he data emanated. The following t w o examples show the effectiv eness of using t he auxiliary distribution. Example. Consider t he following frequency table: T able 7. x i 2 3 T otal n i n 1 n 2 n b f ( x i ) = f i f 1 = ( n 1 /n ) f 2 = ( n 2 /n ) 1 17 An y sample of observ atio ns that satisfies the pr eceding frequency ta ble ma y b elong from one of the follow ing distributions: g 1 ( x ) =  x 6 0 if x ∈ { 1 , 2 , 3 } , other w ise, or g 2 ( x ) =  x − 1 6 0 if x ∈ { 2 , 3 , 4 } , other w ise. The decision fo r determining which of the t w o distributions is more appro pria te fo r t a ble 7, dep ends intuitiv ely on the v alues n 1 and n 2 (or f 1 and f 2 ). How eve r, if w e use the classical maxim um lik eliho o d, w e obta in that the samples of observ ations we re generated from distribution h 1 whatev er the v alues of n 1 and n 2 , t ha t is:  1 6  n 1 ×  2 6  n 2 <  2 6  n 1 ×  3 6  n 2 . W e will sho w b y using the new approach that the decision is more relev a nt. D etermine first the auxiliary distributions, h 1 and h 2 , based on the truncation △ = { 2 , 3 } , for g 1 and g 2 resp ectiv ely . W e obta in h 1 ( x ) =    2 / 5 3 / 5 0 if x = 2 , if x = 3 , other w ise, and h 2 ( x ) =    1 / 3 2 / 3 0 if x = 2 , if x = 3 , other w ise. By using the maximu m lik eliho o d for h 1 and h 2 , w e hav e to decide according t o the quan tities (2 / 5) n 1 × (3 / 5) n 2 and (1 / 3 ) n 1 × (2 / 3) n 2 . So lving the following inequalit y  2 5  n 1 ×  3 5  n 2 ≤  1 3  n 1 ×  2 3  n 2 , whic h is equiv alen t to (6 / 5) α (9 / 10) 1 − α ≤ 1 , where α = n 1 /n 2 , w e obtain 0 < α ≤ − log (9 / 10) / log(4 / 3 ) = x 0 ≈ 0 . 36 624 . If 0 < α < x 0 , the data w ere generated fro m g 2 and if x 0 < α < 1, the data w ere g enerated f rom g 1 . W e cannot mak e any decision ab out the case α = x 0 . Example. Consider a binomial distribution with parameters n = 4 and p is unkno wn, from whic h w e consider some samples of observ a tions o f size 15 giv en in table 8 b y their absolute fr equencies and chos en in o r der for havin g x = 8 / 15 . T able 8. V alues samples 0 1 2 3 4 b p e p 1 7 8 0 0 0 0 . 133 0 . 22 2 2 9 5 0 1 0 0 . 133 0 . 18 4 3 9 4 2 0 0 0 . 133 0 . 13 9 4 10 3 1 1 0 0 . 1 33 0 . 134 5 10 4 0 0 1 0 . 1 33 0 . 216 6 12 0 2 0 1 0 . 1 33 0 . 196 7 13 0 0 0 2 0 . 1 33 0 . 385 18 It is clear that the information g iv en b y the samples are not the same, nev ertheless the classical estimation metho d gives us t he same estimation b p = 8 / (15 × 4) ≈ 0 . 133 . If w e use the second metho d of the new appro a c h, w e ha v e to solve the fo llowing equation for each sample: 0 × h (0 , p ) + 1 × h (1 , p ) + 2 × h (2 , p ) + 3 × h (3 , p ) + 4 × h (4 , p ) = x, where h ( x, p ) is the corresp onding auxiliary distribution. The estimations giv en by the new metho d differ from sample to another as shown in the latest column of table 8, whic h is natura l since eac h sample prov ides a diff erent information ab out the paren t distribution. W e can a lso use the minim um of distance d v and we get also the same conclusion. 6 P ersp ectiv es for the New Approac h 6.1 Mo d el Selection F rom T runcated Data The fact that the distance d v is a metric allo w to prop ose v a rious applications of this new measure. W e can use it for mo del selection amongst differen t probabilit y families. W e c ho ose tw o or more p ossible candidate parametric families of distributions, and fo r eac h alternativ e fa mily , estimate the parameters to select a sp ecific candidate. Determine the distance b et w een the sp ecific candidate a nd the empirical distribution using the new metric d v . Finally , select t he family whic h yields t he minim um distance. In view of the new approa c h this can also b e done in case of truncated data as opp osed to classical approac hes (see for example Co x [3], [4]), T aylor and Jak eman [16]) for mo del selection whic h can b e used, fro m the b est of our knowle dge, only for complete data. T o in v estigate this p ersp ectiv e thor o ughly , samples of v arious sizes from kno wn dis- tributions should b e sim ulated, and the metho d f or mo del selection applied, w e can score the selection as correct or not after rep eating t he pro cess a large num b er of times, the probabilit y of correct selection could b e estimated according to a give n sample size. W e can also use the new distance in cases where classical g o o dness of fit tests cannot reject t wo candidate fa milies. W e can c ho o se the one whic h yields the minim um of distance d v . In the follow ing examples, we shall select, in the first, b et w een bino mia l distributions from truncated data. In the second example, w e select b etw een a W eibull and a G amma distributions from rig h t truncated data. Selection from Binomial distributions. W e sim ulated 10000 samples o f size 10 0 from a Binomial distribution B (8 , 0 . 1) and eac h time w e retained only the observ ations b elonging from { 0 , 1 , 2 , 3 } with their f r equencie s. Then we tried to iden tify the la w sim- ulated start ing from the corresp onding table o f frequencies. W e used the distance d v to select b etw een t he original distribution of eac h sim ulated sample and t he distribution B (10 , 0 . 15) and w e score the selection as correct if the distance b et we en the empirical distribution and the original one is less than with the alternative o ne B (15 , 0 . 15 ) . The correct distribution was selected 98 , 8%. Conv ersely , we sim ulated 100 00 samples of size 19 100 from a Binomial distribution B (10 , 0 . 15 ) and we select with B (8 , 0 . 1 ), the correct distribution was selected 99 , 43% . Selection b etw een W eibull and Gamma distributions. W e sim ulated 10000 sam- ples of size 1000 fr o m the w eibull distribution W (1 . 2 , 1 . 5) and w e t r uncated them on righ t b y considering only observ ations ab o ve the cut-off 1 . 25. Eac h truncated sample w as summarized into 1 1 classes. W e selected b et we en W (1 . 2 , 1 . 5) and the Ga mma distribu- tion G (2 , 0 . 5) . The distance d v has selected the correct distribution, t hat is W (1 . 2 , 1 . 5) , 98 . 16% . W e can also find, b efore selecting b et we en distribution, the b est fit from the f a mily of gamma distributions G ( a, b ) of the truncated dat a from a giv en probabilit y densit y sa y W (1 . 2 , 1 . 5) . W e ha ve then to solv e an optimization problem o f finding the minim um of a function of tw o v ariables, min a,b d v ( b f , f ) where b f is the empirical distribution and f ≡ G ( a, b ), using w ell know n metho ds suc h as L av en b erg-Marquardt using a computer algebra pac k age. Also it should b e b etter t o c ho ose the n umber of bins for eac h truncated sample by an o ptimal pro cedure, f o r example that of Birg´ e a nd Ro zenholc [2]. 6.2 Estimation of the initial trial v alue in EM Algorithm The initial starting v alue is of g reat imp ort a nce in conv erg ence b ehav iour of algorithms suc h as EM Algorithm. Usually , as f or the latter, the initial trial v alue is guessed. Surprisingly , w e will sho w t ha t o ur pro cedure give s an estimation of the starting v alue instead of hav ing to guess. The approac h will b e illustrated b y the follow ing classical example which w as the basis of the EM algorithm. Example of Hartley (1958) revisited. Har tley [7] used an a lgorithmic pro cedure to estimate the pa r ameter of a P oisson distribution fr o m da t a on the p ollution of a sort o f seeds b y the presence of no xious we ed seeds quoted from Snedecor [15] and truncated them by missing the frequencies of the v alues 0 and 1 as sho wn in the follow ing table 9 (T able 1 in Hartley [7 ]) T able 9. V alues mis sing 0 1 observ ed 2 3 4 5 6 7 9 frequencies n i 26 16 18 9 3 5 1 Hartley [7] has guessed the frequencies of the missing v alues 0 and 1 b y taking n 0 = 4 and n 1 = 14 , and after 4 steps of his algorithmic pro cedure, whic h has b een the basis of the w ell kno wn EM algorithm for incomplete da t a (Dempster, Laird and Rubin [5]), has reached the estimation b λ = 3 . 02 6 (see table 1 p.177 Ha r tley [7]). Using the second metho d, w e get the estimation e λ 2 = 3 . 1149 . And by prop ortiona l allo cation pro cedure w e can see that the f requencies w e get are n 0 = 4 . 29 and n 1 = 13 . 3 8 whic h are close to the guessed v alues. Using the distance d v w e o btain the estimation e λ 1 = 3 . 8447 , and b y remo ving the last v alue whic h has a small frequency n 7 = 1 , we obtain a b etter result e λ 1 = 3 . 4441 , whic h are also appreciable as starting v alues since in practice t he true 20 parameter is unknow n. Initial tr ial v alue for mixture Normal P opulations. W e shall presen t an applica- tion of the previous metho d used for truncated data in the situation where we hav e a mixture p opulation of t w o normal distributions. In classical metho ds, we use the merged distribution f = αf 1 + (1 − α ) f 2 and we estimate the para meters α, m 1 and m 2 using for example the EM algorithm whic h is based o n maximizing the complete likelihoo d of the merged distribution b y an algo rithmic pro cedure from a guessed initial trial v alue. Ho w ev er, the problem of o ccurrence of sev eral lo cal maxima is w ell-kno wn for the setting of EM a lg orithm. Also, Seidel, Mosler and Alk er [14] p ointed out that the like liho o d- ratio test in mixture mo dels dep ends on the choice of the initial tria l v a lue for the EM algorithm. If the initial trial v alue is close to the true v alue it is clear that the alg orithm will conv erge in f ew steps to the tr ue lo cal maxim um. W e will sho w t ha t using the new approac h w e get an accurate estimated initia l tr ia l v alue. Assume w e ha ve a merged sample fr o m tw o samples of observ ations of sizes n 1 and n 2 from tw o normal distributions f 1 = N ( m 1 , σ 1 ) a nd f 2 = N ( m 2 , σ 2 ), with m 1 6 = m 2 . By assuming that σ 1 and σ 2 are kno wn, our aim is to estimate the means m 1 and m 2 , and also the merging prop ort io n α of eac h p o pulation. W e will use a metho d based on truncations. The main idea b eing to split the range of the merged sample in to three suitably c hosen parts. A cen tra l part where the o b- serv ations a r e hig hly merged, a left and righ t t r uncated parts where the observ at io ns b ecome mainly from one of the distributions considered. If fo r example m 1 < m 2 , then to estimate m 1 w e hav e to use the chosen right truncated part (left truncation △ ). The pro cedure is summarized a s follow s: 1. W e compute the sample mean m g of the merged observ ations. 2. F o r determining the lo cat io n of the tw o means m 1 and m 2 , w e compute the empirical standard deviation S l of the observ atio ns less than m g , and S r for those that are greater. Assume that S l < S r , in this case if σ 1 < σ 2 then w e deduce that m 1 is situated on the left of m g . Otherwise, it will b e assumed to b e on its right. W e follow the same idea for the case S l > S r . If σ 1 = σ 2 w e pass directly to the third step. 3. Assume that m 1 is on the left. It is w ell kno wn that for a normal distribution N ( m, σ ) w e hav e P (] m − σ, m + σ [) ≃ 0 . 68 . W e hop e that on the left o f sup l = m g − σ 2 the n umber of observ ations generated from N ( m 2 , σ 2 ) is negligible, and on the righ t o f min r = m g + σ 1 the n umber o f observ ations g enerated from N ( m 1 , σ 1 ) is a lso negligible. Hence, to estimate m 1 , we consider only the part o f o bserv ations situated o n the left of m g − σ 2 , and to estimate m 2 w e consider the part situated on the right of m g + σ 1 . The follo wing example will pro vide some feel for t he accuracy of the pro cedure. Example. W e conside r the case where σ 1 = σ 2 . consider tw o samples of observ a tions generated from N ( m 1 , σ 1 ) and N ( m 2 , σ 2 ) , where m 1 = 1 . 3 and m 2 = 2 . 4 , with kno wn σ 1 = σ 2 = 1 and sizes n 1 = 3 0 0 a nd n 2 = 200 . W e combine them to obtain a merged sample of size n = 5 0 0 . W e ha ve c hosen the distributions in such a w ay that the his tog r am (Fig.1) o f the merged sample do es not sho w directly the existence o f a mixture of t w o distributions. When the histogram of the merged p opulation is bimo dal the situation is mor e easier, since when ta king a suitably left (or right) part w e get mor e a ccurate 21 estimation from the situation that this part will hav e a negligible num b er of observ ations from the second distribution. −2 −1 0 1 2 3 4 5 0 10 20 30 40 50 60 70 80 Fig 1. Merged histogram of tw o normal distributions N (1 . 3 , 1 ) and N (2 . 4 , 1) . It should b e stres sed that the histogram is o ne mo da l and does no t sho w a t first glance an y mixture situatio n. F o llo wing the steps of the pro cedure we b egin by calculating t he mean of the resulting merged sample and we obta in m g = 1 . 8046 . Since the standard deviations are ass umed to b e equal then w e compute directly sup l = m g − σ 2 = 0 . 8046 . By grouping the observ ations on the left of sup l (whic h constitute the c hosen righ t truncated part) in 7 classes w e obtain the following table: T able 10. u i − 1 . 5589 − 1 . 1294 − 0 . 6998 − 0 . 2703 − 0 . 1 5 93 0 . 5888 n i 1 3 6 17 24 41 Using the distance d v w e o btain for all the truncatio n e m ( dv ) 1 = 1 . 244 and by deleting u 1 w e get t he v alue e m ( dv ) 1 = 1 . 2516 . The sample mean of the observ at ions o n the left of sup l is g iven by u l = 0 . 1483 . Using the second metho d w e hav e to solv e on m the following formula u 1 × exp h − ( u 1 − m ) 2 2 σ 2 i + u 2 × exp h − ( u 2 − m ) 2 2 σ 2 i + ... + u k × exp h − ( u k − m ) 2 2 σ 2 i exp h − ( u 1 − m ) 2 2 σ 2 i + exp h − ( u 2 − m ) 2 2 σ 2 i + ... + exp h − ( u k − m ) 2 2 σ 2 i = u l . (1 7 ) w e o btain t he estimation e m 1 = 1 . 2646 . By deleting the first v alue u 1 whic h has a w eak frequency n 1 = 1 , tha t is using the tr uncatio n △ = { u 2 , u 3 , u 4 , u 5 , u 6 } , (w e compute again u l = 0 . 1734) w e obta in a b etter estimation e m 1 = 1 . 3011 , whic h is very close to the true v alue m 1 = 1 . 3. 22 T o estimate m 2 , w e consider the part situated on the right of min r = m g + σ 1 = 2 . 8046 . Grouping the observ a t io ns on the rig h t of inf d (whic h constitute the ch osen rig ht part) in 7 classes w e obtain the follow ing table: T able 11. u i 2 . 979 3 . 316 3 . 65 3 3 . 990 4 . 32 6 4 . 663 n i 38 25 15 9 7 3 Using the distance d v for all the truncation we get e m ( dv ) 2 = 2 . 39 7 . The sample mean of the observ at io ns on the right part is giv en by u d = 3 . 523 . Using formula (17) with u d , we obtain the result e m 2 = 2 . 245 . Deleting the extreme v alues u 1 and u 6 w e obtain e m 2 = 2 . 412 . The mixture proportio n α can easily b e estimated using the f orm ula α × e m 1 + (1 − α ) × e m 2 = m g . Considering the estimations obtained, whic h are close to the true v alues of m 1 and m 2 , it is clear that the EM algo rithm will conv erge f astly to the unique solutions. 6.3 T est of Go o dness of Fit Based on the New Distance W e can obtain empirical quan tile estimations of d v using Montec arlo or Bo otstrapping tec hnics, and use them in a test of go o dness of fit for a specified pro babilit y distribution. W e sim ulate N samples of the same size from the sp ecified probabilit y distribution a nd calculate the distances d (1) v , ..., d ( N ) v . W e can then estimate the asymptotic distribution of d v b y F d v ( d ) = # d ( i ) v < d N . (18) Consequen tly , for a sample of the same size we compute d ( obs ) v and w e reject the h yp ot hesis that it b elongs from the sp ecified distribution if F d v ( d ( obs ) v ) > (1 − α ) for a give n lev el o f significance α . The v alues d (1) v , ..., d ( N ) v ma y b e obta ined fro m the empirical distribution function F n of the sample. 6.4 Qualit y of Data The fact tha t the new measure d v is not equiv alen t to classical ones means that it treats other asp ects not in v estigated by the latter. This may op en new persp ectiv es suc h as making decision ab out the accuracy of an estimation in cases where the classical and new estimations are close to eac h others. In cases where the classical estimation and the new one using d v are significantly differen t then w e can sa y that the sample of observ ations considered do es not restore coheren tly all necessary information ab out the paren t distribution fro m whic h it emanated. 23 7 Conclud ing Remarks In the foregoing study , w e hav e presen ted a new statistical p oint estimation metho d whic h found be useful in truncated and gr o up ed and censored data situations. A new distance b et we en probabilit y distributions was intro duced. It measures t he difference b et w een the v ariations of tw o give n probability distributions. W e in tro duced an auxiliary distribution based on a truncation, fro m a c hosen family of probability distributions. This new distribution will hav e the same parameters to estimate as the parent one. W e use then statistical metho ds to estimate the para meters of the random v ariable under study using the empirical and new auxiliary distribution in the regio n t ha t captures the data, from whic h w e determine the corresp onding paren t distribution. The later is the estimation b y the new metho d. Using the new distance intro duced we also estimate by the minim um distance approa c h and use the resulting estimation as a con trol on the accuracy of estimation obtained b y the fo r mer metho d. W e ha v e obtained a result whic h states that if we ha v e to estimate the para meter of a probabilit y distribution from the one pa rameter exp onen tial family , then it suffices to ha v e tw o p oints with exact ratio o f frequencies, that is equal to the theoretical one expressed b y the ratio of the v alue of the probabilit y distribution on these tw o p oints, to obta in the true v a lue of the parameter. W e hav e conjectured that if w e ha v e in general r parameters, then it suffices to ha v e r + 1 p oin ts with exact r a tios of t heir frequencies to obtain the r true parameters exactly . The later result need to b e pro ve d rigorously in a general setting f o r other distributions than the class considered. A large comparative study b etw een the classical and new metho ds should also b e inv estigated. W e presen ted some p ersp ectiv es of the new approac h suc h as mo del sele ction from truncated data using the new distance, estimation of the first trial v alue in the celebrate EM a lg orithm in the case of truncation and for mixture of t w o normal p opulations, a test o f go o dness of fit ba sed on the new distance, decision making a b out the qualit y o f estimations and data. References [1] Amemiy a, T. (1985). A dvanc e d Ec on ometrics. Cambridge: Harv ard Univ ersit y Press. [2] Birg´ e, L. and R o zenholc,Y. (2 006) Ho w many bins should b e put in a regular his- togram. ESAIM: Probability and Stat istics,V ol. 10 , p. 24-4 5 . [3] Cox , D. R. (196 1). T ests of separate families of hy p o t heses. In Pr o c e e dings o f the F ourth Berkeley Symp osium, V ol. 1, 105-123. Berke ley: Univ ersit y of California Press. [4] Cox , D. R. (1962). F urther results on tests of separate families of h yp otheses. J. R. Statist. So c. B n ◦ 24, pp 406–424 . [5] Dempster, A. P ., L a ird, N. M. a nd Rubin, D. B. (1977) Maxim um likelihoo d from incomplete da ta via the EM algorithm, J. R. Statist. So c. B n ◦ 1, pp 1-38. 24 [6] Efron, B. and P etrosian, V. (1999). Nonparametric metho ds for doubly truncated data, J. Am. Stat. Asso c. V ol. 94 . No. 447. pp. 824-8 3 4. [7] Hartley , H. O. ( 1 958). Maximum lik eliho o d estimation from incomplete da t a , Bio- metrics , June , pp 174-19 4. [8] Kaplan, E. L. and Meier, P . (1958 ) . No npa r ametric estimation from incomplete observ ations. J. Am. Stat. Asso c. V ol. 53. pp. 4 57-481. [9] Klein, J. P . and Zhang, M. J. (1996). Sta t istical c hallenges in comparing che mother- ap y and b one marrow transplan tatio n as a treatment for leuk emia, Lifetime Data: Mo dels in R eliability an d Survival A nalysis , N.P. Jewel, 175-185. [10] Lehmann, E. L. & Casella, G. (1998) . The ory of p oint estimation . Spring er, New Y ork. [11] Lynden-Bell, D . (1971 ) . A metho d o f allowing for kno wn o bserv ational selection in small samples applied to 3CR quasars. Mon. Not. R. Astr. S o c. V ol.155. pp. 95-118 . [12] P arzen, E. (196 2 ). On estimation of a probabilit y densit y function and mo de. Ann. Math. Stat., 106 5 -1076. [13] Sha w, D. (1988). On-Site samples regress ion problems of nonnegativ e in tegers, trun- cation, and endogenous stratification. Journal of Ec onometrics, 37, pp. 21 1-223. [14] Seidel, W., Mosler, K., and Alk er, M. (20 00). A cautiona ry note on likelihoo d ratio tests in mixture mo dels. Ann. Ist. Stat. Math., 52, 481- 487. [15] Snedecor, G. W. (1956). Statistic al Metho d s . 5th ed., The Io w a State College Press, Ames, Iow a. [16] T aylor, J. A. and Jak eman, A. J. (1985). Iden tification of a distributional mo del. Commun. S tatist.- Simula. Computa., 14(2), 497-508 . 25

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment