Chi-square goodness of fit tests for weighted histograms. Review and improvements

Chi-square go o dness of ﬁt tests fo r w eigh ted his tograms. Review and impro v emen ts. N.D. Gagunash vili 1, ∗ University of Akur eyri, Bor gir, v/Nor dursl´ od, IS-600 Akur eyri, Ic eland Abstract W eigh t ed histograms are used for the estimation of probability densit y func- tions. Computer sim ulation is the main domain o f application of this type of histogram. A re view of c hi- square go o dness of ﬁt tests for w eigh ted his- tograms is presen ted in this pap er. Impro v emen ts are prop osed to these tests that ha v e size more close to its nominal v alue. Numerical examples are presen ted in this pap er for ev aluation of tests and to demonstrate v arious applications of tests. Keywor ds: probabilit y densit y function, histogr am, go o dness o f ﬁt test, m ultino mia l distribution, Pois son histog r a m P ACS: 02.30 .Z z, 07 .05.Kf, 0 7.05.Fb 1. In tro duction A histogram with m bins fo r a giv en probability densit y function ( PD F) p ( x ) is used to estimate the probabilities p i = Z S i p ( x ) dx, i = 1 , . . . , m (1) that a random eve n t b elongs to bin i . In tegration in (1) is done ov er the bin S i . ∗ T el.: +35 4-460 8505 ; fax: +354- 46089 98 Email addr ess: nik olai@u nak.i s (N.D. Gagunashvili) 1 Present address: Max-Pla nck-Institut f¨ ur Ker nphysik, PO Box 1039 8 0, 69029 Heidelber g, Germany Pr eprint submitte d to Elsevier F ebruary 20, 2015 A histogram can b e obtained as a r esult of a random experiment with PDF p ( x ). Let us denote the n um b er of random ev ents b elonging to the i th bin of the histogram as n i . The total n um b er o f ev ents n in the histogram is equal to n = m X i =1 n i . (2) The quantit y ˆ p i = n i /n (3) is an estimator of probability p i with exp ectation v alue E [ ˆ p i ] = p i . (4) The distribution of the n um b er of ev en ts fo r bins of the histogram is the m ulti- nomial distribution [1] and the proba bility of the random v ector ( n 1 , . . . , n m ) is P ( n 1 , . . . , n m ) = n ! n 1 ! n 2 ! . . . n m ! p n 1 1 . . . p n m m , m X i =1 p i = 1 . (5) A w eigh ted histogram or a histogram of weigh ted ev en ts is used again for estimating the pro babilities p i (1), see R ef. [2]. It is obta ined a s a result of a random exp erimen t with probability densit y function g ( x ) that generally do es not coincide with PDF p ( x ). The sum of we igh ts of eve n ts fo r bin i is deﬁned as: W i = n i X k =1 w i ( k ) , (6) where n i is the num b er o f ev en ts at bin i and w i ( k ) is the w eight of the k th ev ent in the i th bin. The statistic ˆ p i = W i /n (7) is used to estimate p i , where n = P m i =1 n i is the total num b er of eve n ts for the histogra m with m bins. W eights of ev en ts are c hosen in suc h a w a y that the estimate (7) is unbias ed, E[ ˆ p i ] = p i . (8 ) The usual histogram is a we igh ted histogram with w eigh ts of ev en ts equal to 1. The t w o examples of weigh ted histogra ms are considered b elow : 2 1.1. Example 1 T o deﬁne a w eighted histogra m let us write the pro ba bilit y p i (1) for a giv en PDF p ( x ) in the form p i = Z S i p ( x ) dx = Z S i w ( x ) g ( x ) dx, (9) where w ( x ) = p ( x ) /g ( x ) (10) is the w eigh t function and g ( x ) is some other probabilit y densit y function. The function g ( x ) m ust b e > 0 for p oints x , where p ( x ) 6 = 0. The weigh t w ( x ) = 0 if p ( x ) = 0, see R ef. [3]. The w eighted histogram is obtained from a random exp eriment with a probabilit y densit y function g ( x ), and the w eights of the even ts a re calcu- lated according to (10). 1.2. Example 2 The pr o babilit y densit y function p r ec ( x ) of a reconstructed c har a cteristic x of an ev ent obta ined from a detector with ﬁnite resolution and limited acceptance can b e represen ted as p r ec ( x ) ∝ Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ , (11) where p tr ( x ′ ) is the true PDF, A ( x ′ ) is the a cceptance of the setup, i.e. the probabilit y of recording an ev en t with a c haracteristic x ′ , and R ( x | x ′ ) is the exp erimental resolution, i.e. the probability of obtaining x instead of x ′ after the reconstruction of the ev ent. The in tegratio n in ( 1 1) is carried o ut o v er the domain Ω ′ of the v ariable x ′ . T otal probability that an ev en t will not b e registered is equal to p = Z Ω ′ p tr ( x ′ )(1 − A ( x ′ )) d x ′ . (12) The sum of probabilities Z Ω Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx + Z Ω ′ p tr ( x ′ )(1 − A ( x ′ )) d x ′ = 1 (13) 3 b ecause Z Ω Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx = Z Ω ′ p tr ( x ′ ) A ( x ′ ) , dx ′ , (14) where Ω domain of the v ariable x . A histogram of the PDF p r ec ( x ) can b e obtained as a result of a random exp eriment (sim ulation) that has three steps [3]: 1. A random v alue x ′ is c hosen according to a PDF p tr ( x ′ ). 2. W e go bac k to step 1 ag ain with probability 1 − A ( x ′ ), and to step 3 with probabilit y A ( x ′ ). 3. A random v alue x is c hosen according to the PDF R ( x | x ′ ). The quan tit y ˆ p i = n i /n , where n i is the n umber of ev ents b elonging to t he i th bin for a histogram with total num b er of ev en ts n in random exp erimen t (at step 1), is an estimator of p i , p i = Z S i Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx, i = 1 , . . . , m, (15) with the exp ectation v alue o f the estimator E [ ˆ p i ] = p i . (16) The quantit y ˆ p = n/n , where n is t he num b er o f ev en ts that w ere lost, is an estimator of p (12) with the exp ectation v alue of the estimator E [ ˆ p ] = p. (17) Notice that m X i =1 p i + p = 1 a nd m X i =1 n i + n = n. (18) In experimental particle and n uclear phy sics, step 3 is the most time-consuming step o f the Mon te Carlo sim ulation. This step is related to the simulation of the pro cess of tra nsp ort of particles t hro ugh a medium and the rather complex registration appara tus. T o use the results of the sim ulation with some PDF g tr ( x ′ ) for calculating a we igh ted histogra m o f ev en ts with a true PDF p tr ( x ′ ), w e write the equation for p i in the form 4 p i = Z S i Z Ω ′ w ( x ′ ) g tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx, (19) where w ( x ′ ) = p tr ( x ′ ) /g tr ( x ′ ) (20) is the w eight function. The weigh ted histogram for the PDF p r ec ( x ) can b e obtained using ev ents with reconstructed c haracteristic x and w eights calculated according to (20). In this wa y , w e a v oid step 3 of the sim ulation pro cedure, which is im- p ortant in cases where one needs to calculate Monte Carlo reconstructed histograms for man y diﬀerent true PDFs. The probability that a n ev en t will not b e registered can b e represen ted as p = Z Ω ′ w ( x ′ ) g tr ( x ′ )(1 − A ( x ′ )) d x ′ , (21) and is estimated the same w ay using eve n ts with weigh ts calculated a ccording form ula (20). 2. Go o dness o f ﬁt tests The problem of go o dness of ﬁt is to test the hy p othesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (22) where p i 0 are sp eciﬁed probabilities, and P m i =1 p i 0 = 1. The test is used in a data analysis for comparing theoretical frequencies np i 0 with observ ed frequencies n i . This classical problem remains o f curren t practical in terest. The t est statistic for a histogram with unw eigh ted en tries X 2 = m X i =1 ( n i − np i 0 ) 2 np i 0 (23) w as suggested by P earson [4]. P earson sho w ed that the statistic (23) ha s appro ximately a χ 2 m − 1 distribution if the hypothesis H 0 is true. 5 2.1. The c ontemp or ary pr o of of Pe arson ’s r esult The exp ectation v alues o f the observ ed frequency n i , if h yp othesis H 0 is v alid, equal to: E[ n i ] = np i 0 , i = 1 , . . . , m (24) and it s co v ariance matrix Γ has elemen ts: γ ij = ( np i 0 (1 − p i 0 ) for i = j − np i 0 p j 0 for i 6 = j Notice that the co v ariance matrix Γ is singular [5]. Let us no w in tro duce the m ultiv aria te statistic ( n − n p 0 ) t Γ − 1 k ( n − n p 0 ) , (25) where n = ( n 1 , . . . , n k − 1 , n k +1 , . . . , n m ) t , p 0 = ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) t and Γ k = ( γ ij ) ( m − 1) × ( m − 1 ) is the co v ariance matrix for a histogram without bin k . The matrix Γ k has the form Γ k = n diag ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) − n p 0 p t 0 . (26) The sp ecial form of this matrix p ermits one to ﬁnd analytically Γ − 1 k [7]: Γ − 1 k = 1 n diag ( 1 p 10 , . . . , 1 p k − 1 , 0 , 1 p k +1 , 0 , . . . , 1 p m 0 ) + 1 np k , 0 Θ , (27) where Θ is ( m − 1) × ( m − 1 ) matrix with all elemen ts unity . Fina lly the result o f the calculation of expression (25) giv es us the X 2 test statistic (23) . Notice that the result will b e the same for an y choic e o f bin n umber k . Asymptotically the v ector n has a no rmal distribution N ( n p 0 , Γ 1 / 2 k ), see Ref. [5], and therefore the test statistic (23) has χ 2 m − 1 distribution if hy p oth- esis H 0 is true X 2 ∼ χ 2 m − 1 . (28) 2.2. Gener alization of the Pe a rson ’s chi-squar e test for weighte d histo gr ams The total sum of w eights o f ev en t s in i th bin W i , i = 1 , . . . , m , a s prop osed in Ref. [2], can b e considered as a sum of random v ariables W i = n i X k =1 w i ( k ) , (29) 6 where also the n umber of ev ents n i is a random v alue and the w eights w i ( k ) , k = 1 , ..., n i are indep enden t random v ariables with the same prob- abilit y distribution function. The distribution of the n um b er of eve n ts f o r bins o f the histogram is the multinomial distribution and the probability of the random v ector ( n 1 , . . . , n m ) is P ( n 1 , . . . , n m ) = n ! n 1 ! n 2 ! . . . n m ! g n 1 1 . . . g n m m , m X i =1 g i = 1 , (30) where g i is the probabilit y tha t a rando m ev en t b elongs to the bin i . Let us denote the exp ectation v alues of the w eights of ev en ts from the i th bin as E[ w i ] = µ i (31) and t he v ariances as V ar[ w i ] = σ 2 i . (32) The exp ectation v alue of the total sum of w eigh ts W i , i = 1 , . . . , m , see Ref. [6], is: E[ W i ] = E[ n i X k =1 w i ( k )] = E[ w i ]E[ n i ] = nµ i g i . (33) The diagonal elemen ts γ ii of the co v ariance matrix of the v ector ( W 1 , . . . , W m ), see R ef. [6], are equal t o γ ii = σ 2 i g i n + µ 2 i g i (1 − g i ) n = nα 2 i g i − nµ 2 i g 2 i , (34) where α 2 i = E[ w 2 i ] . (35) The no n-diagonal elemen ts γ ij , i 6 = j are equal t o: γ ij = n X k =0 n X l =0 E [ k X u =1 l X v =1 w i ( u ) w j ( v )] h ( k , l ) − E[ W i ]E[ W j ] = n X k =0 n X l =0 E[ w i w j ] h ( k , l ) k l − µ i ng i µ j ng j = µ i µ j ( − g i g j n + g i g j n 2 ) − µ i ng i µ j ng j = − nµ i µ j g i g j , (36) 7 where h ( k , l ) is the probabilit y tha t k ev ents b elong to bin i and l ev en ts t o bin j . F or we igh ted histograms a gain the problem of go o dness of ﬁt is to test the h yp othesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (37) where p i 0 are sp eciﬁed probabilities, and P m i =1 p i 0 = 1. If hy p othesis H 0 is true then E[ W i ] = nµ i g i = np i 0 , i = 1 , . . . , m (38) and g i = p i 0 /µ i , i = 1 , . . . , m. (39) W e can substitute g i to Eqs. (34) and (3 6) whic h giv es the co v ariance matrix Γ with elemen ts: γ ij = ( np i 0 ( r − 1 i − p i 0 ) for i = j − np i 0 p j 0 for i 6 = j where r i = µ i /α 2 i (40) is the ratio of the ﬁrst momen t of the distribution of w eights of ev en ts µ i to the the second momen t α 2 i for a particular bin i . Notice that for usual histograms the rat io of momen ts r i is equal to 1 and the co v ariance matr ix coincides with the co v ariance matrix o f the m ultino mial distribution. The m ultiv ariate statistic is r epresen ted as ( W − n p 0 ) t Γ − 1 k ( W − n p 0 ) , (41) where W = ( W 1 , . . . , W k − 1 , W k +1 , . . . , W m ) t , p 0 = ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) t and Γ k = ( γ ij ) ( m − 1) × ( m − 1 ) is the cov ariance matrix for a histogram without bin k . The matrix Γ k has the form Γ k = n diag ( p 10 r 1 , . . . , p k − 1 , 0 r k − 1 , p k +1 , 0 r k +1 , . . . , p m 0 r m ) − n p 0 p t 0 . (42) The sp ecial fo rm of this mat r ix p ermits one to ﬁnd analytically the in v erse matrix Γ − 1 k = 1 n diag ( r 1 p 10 , . . . , r k − 1 p k − 1 , 0 , r k +1 p k +1 , 0 , . . . , r m p m 0 ) + 1 n (1 − P i 6 = k r i p i 0 ) rr t , (43) 8 where r = ( r 1 , . . . , r k − 1 , r k +1 , . . . , r m ) t . After that, the m ultiv ariate statistic can b e written as X 2 k = X i 6 = k r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k r i ( W i − np i 0 )) 2 n (1 − P i 6 = k r i p i 0 ) , (44) and can also b e transformed to fo rm X 2 k = 1 n X i 6 = k r i W 2 i p i 0 + 1 n ( n − P i 6 = k r i W i ) 2 1 − P i 6 = k r i p i 0 − n (45) whic h is con v enien t fo r n umerical calculations. Asymptotically the v ector W has a normal distribution N ( n p 0 , Γ 1 / 2 k ) [8] and therefore the test statistic (44) has χ 2 m − 1 distribution if hy p othesis H 0 is true X 2 k ∼ χ 2 m − 1 . (46) F or usual histogra ms when r i = 1, i = 1 , . . . , m the statistic ( 4 4) is P earson’s c hi- square statistic (23). The exp ectation v alue of statistic (44), as sho wn in Ref. [2], is equal t o E[ X 2 k ] = m − 1 , (47 ) as for P earson’s test [1]. The rat io of momen ts r i = µ i /α 2 i , that is used for the test statistic calculation, is not kno wn in ma jority of cases. An estimation of r i can b e used: ˆ r i = W i /W 2 i , (48) where W 2 i = P n i k =1 w 2 i ( k ). Let us no w replace r i with the estimate ˆ r i and denote the estimator of matrix Γ k as ˆ Γ k . Then for p ositive deﬁnite matrices ˆ Γ k , k = 1 , . . . , m the test statistic is give n as ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k ˆ r i ( W i − np i 0 )) 2 n (1 − P i 6 = k ˆ r i p i 0 ) . (49) F orm ula (49) for usual histogra ms do es not dep end on the c hoice of the excluded bin, but for we igh ted histograms there can b e a dep endence. A test statistic that is in v arian t to the c ho ice of the excluded bin and at the same 9 time is a P earson’s c hi square stat istic (2 3) f or t he un w eighted histograms can b e represen ted as the median v alue for the set o f statistics ˆ X 2 k (49) with p ositiv e deﬁnite mat r ixes ˆ Γ k ˆ X 2 M ed = Med { ˆ X 2 1 , ˆ X 2 2 , . . . , ˆ X 2 m } . (50) Statistic ˆ X 2 M ed ﬁrst time w as prop osed in R ef. [2] and appro ximately has χ 2 m − 1 distribution if hy p othesis H 0 is true ˆ X 2 M ed ∼ χ 2 m − 1 . (51) The usage of ˆ X 2 M ed to t est the h yp othesis H 0 with a given signiﬁcance lev el is equiv alen t to making a decision b y v o ting. It w as noticed that size of test can b e slightly greater than no minal v alue of size of test eve n for larg e v alue of total n um b er of ev en ts n . 2.3. New gener alizations of Pe arson ’s chi-squar e test for weighte d histo gr ams Set of stat istics { ˆ X 2 1 , ˆ X 2 2 , . . . , ˆ X 2 m } , with p ositiv e deﬁnite matrixes ˆ Γ k only , is used for calculating the median statistic ˆ X 2 M ed (50). It can b e used for an y w eighted histogr a ms, including histogr ams with unw eigh ted en tries. One bin is excluded b ecause the full cov ar iance matrix of an un w eigh ted histogram is singular and hence can no t b e in verted. Let us consider estimation o f a full co v ariance matrix ˆ Γ fo r the w eigh ted histogram with more detail. The symmetric matrix is p ositive deﬁnite if the minimal eigen v alue of the matrix larg er then 0. W e denote minimal eigen v alue of the matrix n − 1 ˆ Γ by λ min then follow to Ref. [10] it can b e sho wn that min i { p i 0 ˆ r i } − m X i =1 p 2 i 0 ≤ λ min ≤ min i { p i 0 ˆ r i } . (52) and t he eigen v alue λ min is the ro ot of secular equation 1 − m X i =1 p 2 i 0 p i 0 / ˆ r i − λ = 0 . (53) In case of a histogram with un w eigh ted entries, all ˆ r i = 1 a nd λ = 0 is zero of equation (53). Matrix ˆ Γ for t his case is no t p ositiv e deﬁnite and is singular, but mat r ix ˆ Γ k is p ositiv e deﬁnite and therefore inv ertible. Num b er of ev en ts n i in bins of usual histogram satisfy to equation n 1 + n 2 , ..., + n m = n 10 that is wh y the co v ar ia nce matrix of m ultinomial distribution is no t p ositiv e deﬁnite and is singular. Matrix ˆ Γ for a histogra m with w eighted entries can b e a lso non-p ositiv e deﬁnite. There are tw o r easons wh y this can b e. First of a ll, the total sums of w eights W i in bins of a w eighted histogram a r e related with each o ther, b ecause satisfy the equation E[ P m i =1 W i ] = n and second, due ﬂuctuations o f matrix elemen ts. The test statistic obtained with full matrix ˆ Γ is unstable and can ha v e large v ariance esp ecially for the case of low n umber n of ev en ts in a histogram. The fa ct the matrix is not p ositiv e deﬁnite is equiv alen t to the f act that the minimal eigenv alue λ min of the matrix ˆ Γ is ≤ 0. A case when the min- imal eigen v alue is positive but rather small is also no t desirable, esp ecially for computer calculations. Due to the ab ov e mentioned reasons it is wise to use the test stat istic for a w eighted histograms ˆ X 2 = ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k ˆ r i ( W i − np i 0 )) 2 n (1 − P i 6 = k ˆ r i p i 0 ) (54) for k where p k 0 ˆ r k = min i { p i 0 ˆ r i } . (55) A secular equation for the new minimal eigen v alue can b e solv ed n umerically , b y bisection metho d, to c hec k whether a matrix ˆ Γ k is po sitiv e deﬁnite or not. Numerical exp eriments show that it is v ery rare that the matrix ˆ Γ k is not p ositiv e deﬁnite a nd it happ ens only for histogr a ms with a small n umber n of ev ents in a histogram. If hypothesis H 0 is v alid, statistic ˆ X 2 asymptotically has distribution ˆ X 2 ∼ χ 2 m − 1 . (56) It is plausible that p ow er o f the new test is not low er than p o w er of tests with statistic ˆ X 2 M ed and with other statistics { ˆ X 2 i , i 6 = k } . The distribution of the statistic ˆ X 2 is closer to χ 2 m − 1 then distribution of median statistic ˆ X 2 M ed . Also the statistic ˆ X 2 is easier to calculate than the stat istic ˆ X 2 M ed . 11 3. Go o dness of ﬁt t est s for w eigh ted histograms with deviations from main mo del Here, diﬀeren t deviations from the main mo del of w eighted histograms will b e considered as we ll as go o dness of ﬁt tests for those cases. 3.1. Go o dness of ﬁt test for we i g hte d histo gr am with unknown normalization In practice one is often faced with the case that all w eights of ev en ts are deﬁned up to an unknow n normalization constan t C see Ref. [2]. It happ ens b ecause in some cases of computer simulation is rather diﬃcult giv e analytical formula for the PDF, but the PDF up to multiplicativ e constan t is p ossible, that is enough f o r the generation of ev ents according to the PDF, for example, b y v ery p opular Neumann’s metho d [11]. F or the go o dness of ﬁt t est it means that if hypothesis H 0 is v alid E[ W i ] · C = np i 0 , i = 1 , . . . , m. (57) with unkno wn constan t C . Then the test statistic (45) can b e written as c ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 /C ) 2 np i 0 /C + ( P i 6 = k ˆ r i ( W i − np i 0 /C )) 2 n (1 − C − 1 P i 6 = k ˆ r i p i 0 ) . (58) An estimator for the constant C can b e found b y minimizing Eq. (5 8). ˆ C k = X i 6 = k ˆ r i p i 0 + s P i 6 = k ˆ r i p i 0 P i 6 = k ˆ r i W 2 i /p i 0 ( n − X i 6 = k ˆ r i W i ) , (59) where ˆ C k is an estimator of C . Substituting (59) to (58), we get the test statistic c ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 / ˆ C k ) 2 np i 0 / ˆ C k + ( P i 6 = k ˆ r i ( W i − np i 0 / ˆ C k )) 2 n (1 − ˆ C − 1 k P i 6 = k ˆ r i p i 0 ) . (60) The stat istic (60) has a χ 2 m − 2 distribution if h yp o t hesis H 0 is v alid. F orm ula (60) can b e a lso transformed to c ˆ X 2 k = s 2 k n + 2 s k , (61) 12 where s k = s X i 6 = k ˆ r i p i 0 X i 6 = k ˆ r i W 2 i /p i 0 − X i 6 = k ˆ r i W i (62) whic h is conv enien t fo r calculations, see [2]. Median statistics can b e used for the same reason as in section 2.2 c ˆ X 2 M ed = Med { c ˆ X 2 1 , c ˆ X 2 2 , . . . , c ˆ X 2 m } . (63) and has appro ximately χ 2 m − 2 distribution if hypothesis H 0 v alid, see Ref. [2] c ˆ X 2 M ed ∼ χ 2 m − 2 . (64) 3.2. New go o dness of ﬁt test for weighte d histo gr am with unknown normal- ization The new estimator of constan t C is ˆ C = X i 6 = k ˆ r i p i 0 + s P i 6 = k ˆ r i p i 0 P i 6 = k ˆ r i W 2 i /p i 0 ( n − X i 6 = k ˆ r i W i ) , (65) for k where p k 0 ˆ r k = min i { p i 0 ˆ r i } . (66) And the test statistic can b e written as c ˆ X 2 = X i 6 = k ˆ r i ( W i − np i 0 / ˆ C ) 2 np i 0 / ˆ C + ( P m 1 ˆ r i ( W i − np i 0 / ˆ C )) 2 n (1 − ˆ C − 1 P m 1 ˆ r i .p i 0 ) (67) Statistic c ˆ X 2 asymptotically has χ 2 m − 2 distribution if h yp othesis H 0 is v alid c ˆ X 2 ∼ χ 2 m − 2 . (68) 3.3. Go o dness of ﬁt test for weighte d Poisson histo gr ams P oisson histogram [12] can b e deﬁned as histogram with mu lti-Pois son distributions of a n um b er o f ev en ts for bins P ( n 1 , . . . , n m ) = m Y i =1 e − n 0 p i ( n 0 p i ) n i /n i ! , (69) 13 where n 0 is a free para meter. The discrete probability distribution function (probabilit y ma ss function) of a Poiss on histogram can b e r epresen ted a s a pro duct o f tw o probability functions: a P oisson probabilit y mass f unction for a num b er of ev ents n with parameter n 0 and a multinomial probability mass function of the num b er of even ts for bins of the histogram, with total n umber of ev ents equal to n , see Ref. [1] P ( n 1 , . . . , n m ) = e − n 0 ( n 0 ) n /n ! × n ! n 1 ! n 2 ! . . . n m ! p n 1 1 . . . p n m m . (70) A P oisson histogram can b e obtained as a result of t w o rando m experiments , namely , where the ﬁrst experimen t with P oisson probability mass function giv es us the total n um b er of ev en ts in histogram n , and then a histogram is obtained as a result of a random exp eriment with PDF p ( x ) and t he total n umber of ev ents is equal to n . As in the case of m ultinomial histograms, also for P oisson histograms there is the problem of go o dness of ﬁt test with the hypothesis: H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (71) where p i 0 are sp eciﬁed probabilit ies, and P m i =1 p i 0 = 1. If n 0 is kno wn, then the statistic, see Ref. [13]: X 2 pois = m X i =1 ( n i − n 0 p i 0 ) 2 n 0 p i 0 , (72) can b e used and has asymptotically a χ 2 m distribution if the hypothesis H 0 is v alid X 2 pois ∼ χ 2 m . (73) The h yp othesis H 0 b ecomes complex if pa rameter n 0 is unkno wn for the P oisson histogram. This is an o pp osite situation to the case of a multinomial histogram, where the h yp othesis is simple. In [13] there are prop osed statistics for go o dness of ﬁt t est for a w eighted P oisson histogram with know n pa r a meter n 0 X 2 cor r 0 = m X i =1 ( W i − n 0 p i 0 ) 2 W 2 i n 0 p i 0 /W i , (74) 14 and f or the case the n 0 is not kno wn: X 2 cor r = m X i =1 ( W i − ˆ n 0 p i 0 ) 2 W 2 i ˆ n 0 p i 0 /W i , (75) with estimation of n 0 obtained b y minimization of equation (74) ˆ n 0 =  P m i =1 W 3 i / ( W 2 i p 0 i ) P m i =1 W i p 0 i /W 2 i  1 / 2 . (76) The distribution of statistic X 2 cor r 0 in case h yp othesis H 0 is v alid X 2 cor r 0 ∼ χ 2 m (77) and f or the statistic X 2 cor r is X 2 cor r ∼ χ 2 m − 1 (78) according Ref. [13]. Generally , the p ow er of the tests for Poiss on histograms will b e sligh tly lo w er than for multinomial histograms with the n um b er of eve n ts n = n 0 whic h is explained by the fact that the tota l num b er of ev en ts for Pois son histograms ﬂuctuates. The c ho ice of the t yp e of the histogram dep ends on what t yp e of a ph ysical exp eriment is pro duced. If the n um b er of ev en ts n is constan t, then it is a m ultino mia l histog r am; if the n umber of ev ents n is a random v alue that has P oisson distribution, then it is a P oisson histogram. A we igh ted histogram very of ten is the result of mo deling a nd the n um b er of sim ulated ev en ts is kno wn exactly , and therefore the c hoice of a m ultino- mial histog ram is reasonable. It is also reasonable to use tests deve lop ed for the m ultinomial histograms in the case, if the n umber of ev en ts n is random v alue but with unkno wn distribution [14]. 4. Restriction for go o dness of ﬁt t ests applications F or the histograms with un w eigh ted en tries, the use of P earson’s chi- square test ( 2 3) is inappropria te if an y exp ected frequency np i 0 is b elo w 1 or if the exp ected frequency is less than 5 in more than 20% of bins [15]. 15 Restrictions for w eigh ted histograms, due to ﬂuctuation of the estimation of ratio of momen ts ˆ r i , can b e made stronger. Namely , the use of new c hi- square tests (54) and (67) is inappropriate if a ny exp ected fr equency E[ n i ] is less than 5. F ollow ing Ref. [16] a disturbance is rega rded as unimp o r tan t when the nominal size of the t est is 5% and the size of the t est lies b et ween 4% and 6% f or a go o dness of ﬁt tests. 5. Numerical ev aluation of the tests’ p ow er and sizes The main parameters whic h c haracterizes the eﬀectiv eness of a test are size and p ow er. The nominal signiﬁcance lev el w as ta ken to b e equal to 5% for calculating of size o f tests in presen ted n umerical examples. Hypot hesis H 0 is rejected if test statistic ˆ X 2 is larg er than some threshold. Threshold k 0 . 05 for a giv en nominal size of test 5% can b e deﬁned from the equation 0 . 05 = P ( χ 2 l > k 0 . 05 ) = Z + ∞ k 0 . 05 x l/ 2 − 1 e − x/ 2 2 l/ 2 Γ( l / 2) dx, (79) where l = m − 1. Let us deﬁne the test size α for a given nominal test size 5% as the probabilit y α = P ( ˆ X 2 > k 0 . 05 | H 0 ) . (80) This is t he probability that hypothesis H 0 will b e rejected if the distribution of w eigh ts W i for bins o f the histogram satisﬁes hypothesis H 0 . Deviation of the test size from the nominal test size is a n imp ortant test characteristic. A second imp or t a n t test characteristic is the p o w er. Let us deﬁne the test p ow er as P ( ˆ X 2 > k 0 . 05 | H a ) . (81) This is t he probability that hypothesis H 0 will b e rejected if the distribution of w eights W i for bins of the histogram do es no t satisfy h yp othesis H 0 . Notice that the p ow er calculated by for mula (81) can giv e misleading result in case of comparing of diﬀerent tests. T o ov ercome this pro blem here w e deﬁne the p ow er of test π a s π = P ( ˆ X 2 > K 0 . 05 | H a ) (82) 16 with the threshold K 0 . 05 calculated b y Mon te-Carlo metho d from equation 0 . 05 = P ( ˆ X 2 > K 0 . 05 | H 0 ) . (83) All deﬁnitions prop osed ab o v e for statistics ˆ X 2 can b e used for other t est statistics with appro pr ia te n um b er of degree of freedom l in t he form ula (79). The size and p o w er of tests dep end o n the n umber of ev ents a nd the binning tha t w as discussed for usual histog rams in Ref. [1]. The p o w er fo r w eighted histogr a ms also dep ends on the c ho ice o f PDF g ( x ) (subsection 1.1) or g tr (subsection 1.2) and can b e eve n hig her than for histogra ms with un weigh ted en tries as w ell as low er. Below we demonstrate tw o examples of an application of the previously discussed tests. The size a nd p o w er of the tests are calculated for a diﬀeren t tot a l n um b er of ev ents in the histog rams. In n umerical examples w ere demonstrated applications of: • P earson’s go o dness of ﬁt test [4], see subsection 2 .1 and ﬁrst paragraph of section 2. The test statistic is X 2 (23). • go o dness of ﬁt test for we igh ted histograms with normalized w eights [2], see subsection 2.2. The test statistic is ˆ X 2 M ed (50). • go o dness of ﬁt test for weigh ted histograms with unnormalized weigh ts [2], see subsection 3.1. The test statistic is c ˆ X 2 M ed (63). • new g o o dness of ﬁt test for w eigh t ed histograms with normalized we igh ts, see subsection 2.3. The test statistic is ˆ X 2 (54). • new go o dness of ﬁt test for w eigh ted histograms with unnormalized w eights, see subsection 3.2. The test statistic is c ˆ X 2 (67). • go o dness of ﬁt test for P oisson histograms with un w eighted en tries and kno wn parameter n 0 [13], see subsection 3 .3 . The test statistic is X 2 pois (72). • go o dness of ﬁt test for w eigh ted P oisson histograms with known pa- rameter n 0 [13], see subsection 3.3 . The test statistic is X 2 cor r 0 (74). • go o dness of ﬁt test for w eighted P oisson histograms with unkno wn parameter n 0 [13], see subsection 3.3 . The test statistic is X 2 cor r (75). The published program, see Ref. ([17]), w as used for the calculatio n of the test statistics with minor mo diﬁcation needed for the new tests. 17 0 0.05 0.1 0.15 0.2 0.25 4 6 8 10 12 14 16 g 1 (x) g 3 (x) g 2 (x) p 0 (x) x Figure 1: Pro ba bilit y densit y functions g 1 ( x ) = p ( x ), g 2 ( x ), g 3 ( x ) used for even ts genera- tion and PDF p 0 ( x ) for h yp othesis H 0 (dashed line) 5.1. Numeric al example 1 A sim ula t ion study w a s done for the example fro m Ref. [2]. W eighte d his- tograms describ ed in subsection 1.1 , are used here. The PDF for h yp othesis H 0 is: p 0 ( x ) ∝ 2 ( x − 10) 2 + 1 + 1 . 15 ( x − 14) 2 + 1 (84) against alternativ e H a : p ( x ) ∝ 2 ( x − 10) 2 + 1 + 1 ( x − 14) 2 + 1 (85) represen ted by the w eigh ted histogram. Both PDF’s are deﬁned on the in- terv al [4 , 16]. A calculation w as done for three cases of a PDF, used f o r the ev ent g eneration, see Fig. 1 g 1 ( x ) = p ( x ) (86) g 2 ( x ) = 1 / 12 (87) g 3 ( x ) ∝ 2 ( x − 9) 2 + 1 + 2 ( x − 15) 2 + 1 . (88) Distribution ( 8 6) giv es an unw eigh ted histogram. Distribution (87) is a uni- form distribution on the in terv al [4 , 16]. D istribution (88) has the same type 18 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 4 6 8 10 12 14 16 x p p 0 Figure 2 : Pro babilities p i , i = 1 , ..., 20 for the PDF p ( x ) (solid line) a nd p 0 i , i = 1 , ..., 20 for the PDF p 0 ( x ) (dashed line) of parameterizations as Eq. (84), but with diﬀeren t v alues of the parame- ters. Histograms with 20 bins and equidistant binning w ere used. A t Fig. 2 presen ted proba bilities p i , i = 1 , ..., 20 f o r the PD F p ( x ) and p 0 i , i = 1 , ..., 2 0 for the PDF p 0 ( x ). Size and p ow er of tests with statistics ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) we re calculated for w eigh ted histograms with w eights of ev en ts equal to p ( x ) /g 2 ( x ) and p ( x ) /g 3 ( x ). Statistics ˆ X 2 (54) and ˆ X 2 M ed (50) coincide with Pe arson’s statistic X 2 (23) and w ere used for his- tograms with un w eighted en t r ies. The results of calculations for 100000 runs are presen ted in T able 1. Conclusion and interpre tation o f results presen ted in T able 1. • The size of new tests ˆ X 2 (54)(ro ws 2, 6) and c ˆ X 2 (67)(row 3, 7) are generally closer to nominal v alue 5% then median tests ˆ X 2 M ed (50)(rows 4, 8) and c ˆ X 2 M ed (63)(ro ws 5, 9 ) when the application o f the test satisﬁes restrictions formulated in section 4. • The p ow er o f new tests ˆ X 2 (54) (r ows 2, 6) are greater than for ana lo- gous median tests ˆ X 2 M ed (50)(ro ws 4, 8) . The p o w er of tests c ˆ X 2 (67)(rows 3, 7) are greater than for ana logous median tests c ˆ X 2 M ed (63)(ro ws 5 , 9). • The p ow er o f all tests calculated for histograms with w eights of ev ents 19 T able 1: Numerica l example 1. Size ( α ) and p ow e r ( π ) of diﬀere n t test statistics X 2 (23), ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50), c ˆ X 2 M ed (63) obtained for diﬀerent weigh ted functions w ( x ). Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 200 400 600 800 1000 3000 500 0 7000 9000 w(x) 1 X 2 α 5.7 5.4 5.3 5.2 5.2 5.1 5.0 5.0 5.1 1 π 6.0 7.1 8.2 9.8 11.2 29.9 52.7 71.6 84.9 2 ˆ X 2 α 5.5 5.3 5.2 5.1 5.0 5. 1 5.1 5.1 4.9 p ( x ) g 2 ( x ) π 6.1 7.0 8.2 9.2 10.5 26.2 45.8 64.0 78.7 3 c ˆ X 2 α 5.0 5.1 5.0 5.0 4.9 5. 0 5.0 5.2 4.9 π 6.0 7.0 8.1 9.1 10.4 26.0 45.6 63.0 78.1 4 ˆ X 2 M ed α 5.4 5.4 5.3 5.2 5.1 5. 3 5.2 5.3 5.0 π 6.0 6.9 8.0 9.1 10.3 25.7 45.3 63.1 78.2 5 c ˆ X 2 M ed α 5.6 5.8 5.7 5.7 5.5 5. 7 5.7 5.8 5.5 π 5.9 6.9 8.0 9.1 10.2 25.4 44.9 62.5 77.5 6 ˆ X 2 α 7.3 6.6 6.1 5.8 5.6 5.2 5.2 4.9 5.0 p ( x ) g 3 ( x ) π 16.2 29.7 40.1 48.5 56.1 95.7 99.8 100.0 100.0 7 c ˆ X 2 α 4.7 4.9 5.0 5.0 5.1 5.1 5.0 4.9 5.0 π 6.9 8.3 9.9 11.6 13.4 36.5 61.6 80.5 91.2 8 ˆ X 2 M ed α 5.5 5.3 5.4 5.3 5.5 5.4 5.2 5.3 5.2 π 7.9 11.8 15.8 20.2 25.0 75.3 96.6 99.8 100.0 9 c ˆ X 2 M ed α 5.4 5.5 5.7 5.6 5.8 5.7 5.6 5.7 5.5 π 6.8 8.4 9.7 11.4 13.1 36.0 60.7 79.2 90.7 equal to p ( x ) /g 2 ( x ) (rows 2-5) are lo w er then for histogram with un- w eighted en t r ies (ro w 1) , but the p ow er of a ll tests calculated for histograms with w eigh ts of ev en ts equal to p ( x ) /g 3 ( x ) (ro ws 6-9 ) are greater. The explanation is that in latter case w e increase the statistics of eve n ts f or domains with high deviation of the distribution presen ted b y the histogram from the tested distribution. Prop erties of tests in applications to P oisson histog r ams with the same w eighted functions and distributions of ev en ts w ere in v estigated. In this case, the total n umber of ev en t s n is random and w as sim ula t ed a ccording P oisson distribution f or a give n parameter n 0 . Siz e and p ow er of tests X 2 pois (72), X 2 cor r 0 (74) with exactly kno wn parameter n 0 and X 2 cor r (75) dev elop ed ad ho c for the P oisson histogram in [1 3] also was calculated. Results of t he calculations are presen ted in T able 2 . Conclusion and interpre tation o f results presen ted in T able 2. • The size o f all tests are close to nominal v alue 5%. 20 T able 2: Numerical example 1. Size ( α ) and power ( π ) of diﬀerent tes t s tatistics X 2 pois (72), X 2 (23) , X 2 cor r 0 (74), X 2 cor r (75) , ˆ X 2 (54) , c ˆ X 2 (67) in application for Poisson histo grams. Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 0 200 400 600 800 1000 3000 500 0 7000 9000 w(x) 1 X 2 pois α 6.0 5.6 5.2 5. 2 5.2 5.1 5.1 5.1 5.1 1 π 5.9 7.0 8.3 9.6 11.1 29.2 50.9 70.0 83.8 2 X 2 α 5.6 5.5 5.1 5. 2 5.1 5.1 5.1 5.1 5.0 π 6.0 7.0 8.4 9.8 11.1 30.0 52.2 71.2 85.0 3 X 2 cor r 0 α 5.4 5.3 5.2 5. 1 5.1 5.0 5.0 5.1 5.0 p ( x ) g 2 ( x ) π 6.0 6.7 7.8 8.8 10.0 25.0 43.9 61.5 76.2 4 X 2 cor r α 3.9 4.4 4.6 4. 7 4.7 5.0 5.0 5.0 4.9 π 6.0 7.0 8.0 9.0 10.3 25.5 45.0 62.9 77.4 5 ˆ X 2 α 5.5 5.2 5.2 5. 1 5.0 5.1 5.0 5.0 5.0 π 6.1 7.1 8.1 9.2 10.6 26.3 46.0 64.1 78.5 6 c ˆ X 2 α 5.1 5.0 5.0 5. 0 5.0 5.0 5.0 5.0 5.0 π 6.0 7.0 8.1 9.2 10.5 26.0 45.5 63.4 77.6 7 X 2 cor r 0 α 5.1 5.0 5.2 5. 0 5.2 5.1 5.1 5.1 4.9 p ( x ) g 3 ( x ) π 6.3 7.5 8.8 10.7 12.3 35.3 60.5 79.6 91.3 8 X 2 cor r α 3.5 4.1 4.5 4. 7 4.8 5.0 5.0 5.0 4.9 π 7.0 8.4 9.7 11.6 13.4 36.0 60.4 79.2 90.8 9 ˆ X 2 α 7.2 6.5 6.0 5.6 5. 6 5.3 5.1 5.1 4.9 π 16.4 30.1 40.1 48.8 56.1 95.7 99.8 100.0 10 0.0 10 c ˆ X 2 α 4.6 4.9 4.9 5.0 5. 1 5.0 5.0 5.0 5.0 π 7.0 8.5 10.0 11.7 13.5 37.0 61.7 80.0 91.2 • The p ow er of new tests ˆ X 2 (54)(ro ws 5 , 9) and c ˆ X 2 (67)(ro ws 6, 10) used for P oisson histograms are greater than the p ow er of tests dev elop ed ad ho c for the P oisson histograms X 2 cor r 0 (74)(ro ws 3, 7) with the ex- actly kno wn parameter n 0 and X 2 cor r (75)(ro ws 4, 8) with the unknown parameter n 0 in Ref. [13]. • The p ow er of P earson’s test X 2 (23)(row 2) used for P oisson histograms is greater than test X 2 pois (72)(ro w 1) with the exactly kno wn parameter n 0 prop osed in Ref. [13]. 5.2. Numeric al example 2 A sim ulation study w as done for the example described in Ref. [18] and also in Ref. [19]. W eigh ted histograms described in subsection 1.2 a re used here. The PD F p 0 ( x ) fo r the h yp othesis H 0 is tak en according to fo r m ula (11 ) with: p 0 tr ( x ′ ) = 0 . 4( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (89) 21 A ( x ′ ) = 1 − ( x ′ − 0 . 5) 2 (90) R ( x | x ′ ) = 1 σ √ 2 π exp  − ( x − x ′ ) 2 2 σ 2  , with σ = 0 . 3 . (91) F or the alternativ e H a , p ( x ) is ta k en with the same a cceptance and reso- lution function according t o fo rm ula (11) with: p tr ( x ′ ) = 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (92) that is presen ted b y the weigh ted histogra m. A calculation w as done for tw o cases of PDFs used for ev ent generation, see F ig . 3. h 1 ( x ′ ) = 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (93) and h 2 ( x ′ ) = − 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] . (94) In the ﬁrst case, a weigh ted histogra m is the histogram with w eigh ts of ev ents equal to 1 ( histogram with un we igh ted en tries) and, in the second case, w eigh ts of ev en ts equal to h 1 ( x ′ ) /h 2 ( x ′ ). The results of this calculation for 100000 runs are presen ted in tables 3. W e use a histog ram with 20 bins on in terv al [ − 0 . 3 , 1 . 3]. Fig. 4 presen ted pro ba bilities p i , i = 1 , ..., 2 0 for the PDF p ( x ) and p 0 i , i = 1 , ..., 2 0 for the PDF p 0 ( x ). Here, we add tw o bins for ev ents with x 6 − 0 . 3 a nd x > 1 . 3 as we ll as one bin for eve n ts tha t we re not registered due to limited acceptance. T otal num b er of bins m is used in test equal t o 23. The results of calculations of the sizes and p o w er of tests for 100000 runs are presen ted in T able 3. Conclusion and interpre tation o f results presen ted in T able 3. • The size of new tests ˆ X 2 (54) and c ˆ X 2 (67) ( ro w 2 , 3 ) is more close to the nominal v alue 5% then the size of median tests ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) (row s 4, 5). • The p o w er of new tests ˆ X 2 (54) and c ˆ X 2 (67) (ro ws 2,3) is roughly the same compared with analog ous median tests ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) (ro ws 4, 5). • All tests demonstrate greater p ow er then Pearson’s test X 2 (23) (row 1) used for the histogram with unw eigh ted en tries. 22 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x´ h 2 (x´) p 0tr (x´) h 1 (x´) Figure 3: Pro bability density functions h 1 ( x ′ ) = p tr ( x ′ ), h 2 ( x ′ ) and p 0 tr ( x ′ ) (dashed line) The pro p ert y of tests in application for P oisson histograms is in v estigat ed with the same weigh ted functions and distributions of ev ents. In this case, the n um b er of ev ents n in a histogram w as sim ulated according P o isson distribution with give n para meter n 0 . The size and p o w er of tests deve lop ed for the P oisson histogram in [13] w as also calculated. Results of calculations are presen ted in T able 4. Conclusion and interpre tation o f results presen ted b y T able 4. • The size o f all tests are close to nominal v alue 5%. • Basically , the p o w er of new tests ˆ X 2 (54) and c ˆ X 2 (67)(ro ws 5, 6) in applying for P oisson histograms a re greater than the p o w er of tests dev elop ed ad ho c for the P oisson histograms X 2 cor r 0 (74) with the ex- actly kno wn pa r a meter n 0 and X 2 cor r (75) (ro ws 3, 4 ) with the unknown parameter n 0 in Ref. [13]. 23 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 x p 0 p Figure 4 : Pro babilities p i , i = 1 , ..., 20 for the PDF p ( x ) (solid line) a nd p 0 i , i = 1 , ..., 20 for the PDF p 0 ( x )(dashed line) • The pow er of P earson’s test X 2 (ro w 2) use d fo r Poiss on histogr ams is greater than p ow er of test X 2 pois (72)(row 1 ) with the exactly kno wn parameter n 0 . Generally the n umerical example 1 and example 2 demonstrate the sup eri- orit y of new go o dness of ﬁt tests under existing tests for w eighted histograms, see R ef. [2] and for w eighted P oisson histograms, see R ef. [13]. 24 T able 3 : Numerica l exa mple 2 . Sizes ( α ) and powers ( π ) of diﬀerent test statistics X 2 (23), ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50), c ˆ X 2 M ed (63) obtained for diﬀerent weigh ted functions w ( x ). Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 200 400 600 800 1000 3000 5000 7000 9000 w(x) 1 X 2 α 5.1 5.1 5.1 5.0 5.2 5.1 5.1 5.0 5.0 1 π 5.6 6.6 7.5 8.8 9.8 25.9 45.7 64.9 79.4 2 ˆ X 2 α 7.0 6.2 5.8 5.6 5.5 5.1 5.0 4.9 4.9 h 1 ( x ′ ) h 2 ( x ′ ) π 8.4 9.4 10.9 12 .8 14.6 40.9 67.1 85.3 94.5 3 c ˆ X 2 α 5.6 5.6 5.5 5.4 5.3 5.1 5.0 5.0 4.9 π 6.4 7.4 8.4 9.9 11.0 28.0 47.9 66.4 80.5 4 ˆ X 2 M ed α 10.9 7.4 6.6 6.1 6.1 5.7 5.6 5.6 5.6 π 9.1 10.1 11.5 13 .9 15.8 43.7 70.9 87.8 95.8 5 c ˆ X 2 M ed α 7.8 6.6 6.3 5.9 5.9 5.7 5.7 5.7 5.6 π 6.1 7.2 8.4 9.7 10.9 27.4 46.9 65.0 79.2 T able 4: Numerical example 2. Size ( α ) and power ( π ) of diﬀerent tes t s tatistics X 2 pois (72), X 2 , X 2 cor r 0 (74), X 2 cor r (75), ˆ X 2 (54), c ˆ X 2 (67) in applicatio n for Poisson histograms. Italic t yp e mark s a s iz e of test with inappropr iate num b er of even ts in the bins of histogr ams. № n 0 200 400 600 800 1000 3000 5000 7000 9 000 w(x) 1 X 2 pois α 5.5 5.3 5.3 5.0 5.2 5.1 5.1 5.0 5.2 1 π 5.5 6.4 7.4 8.7 9.7 25.6 45.0 64.0 77.9 2 X 2 α 5.1 5 .1 5.1 4.9 5.2 5.0 5.1 5. 0 5.2 π 5.5 6.5 7.5 8.9 9.8 26.3 45.9 65.0 78.8 3 X 2 cor r 0 α 5.8 5.8 5.5 5.3 5.3 5.0 5.1 5.0 5.0 h 1 ( x ′ ) h 2 ( x ′ ) π 5.8 6.6 7.7 9.0 10.2 27.2 47.0 66.5 80.7 4 X 2 cor r α 4.2 4.9 4.9 4.8 4.9 4.9 5.0 4.9 5.0 π 6.3 7.2 8.4 9.7 11.1 27.5 46.9 65.5 79.5 5 ˆ X 2 α 6.8 6.1 5.8 5.6 5.4 5.1 5. 0 5.0 4.9 π 8.4 9.4 11.0 12.7 14.8 41.1 67.2 85.1 94.4 6 c ˆ X 2 α 5.4 5.6 5.5 5.3 5.3 5.0 4. 9 5.0 4.9 π 6.5 7.4 8.5 9.8 11.0 28.0 48.2 66.4 80.4 6. Conclusion A review o f go o dness of ﬁt tests for weigh ted histograms was presen ted. The bin con ten t of a we igh ted histogram was considered as a ra ndom sum of random v ariables that p ermits t o generalize the classical Pearson’s go o d- ness of ﬁt test for histograms with w eigh ted en tries. Impro v emen ts of the c hi- square t ests with b etter statistical pro p erties were prop osed. Ev aluatio n of the size and p ow er of tests w as done numeric ally for diﬀeren t t yp es of w eighted histograms with diﬀeren t nu m b ers of ev ents and diﬀerent w eigh t functions. Generally the size o f new tests is closer to no minal v alue and p o w er is not low er than hav e existing tests. Except direct application of 25 tests in da ta a nalysis, see for example Ref. [20], t he prop osed tests are nec- essary bases for generalization of test in the case when some parameters m ust b e estimated from the data, see Ref. [21], as w ell as for the generalisation of test for comparing w eighted and un w eighted histograms o r t w o we igh ted ones ( ho mogeneit y test), see Refs. [21, 22, 23]. Parametric ﬁt of data ob- tained from detectors with ﬁnite resolution and limited a cceptance is one of imp ortant application of metho ds dev elop ed for weigh ted histograms that can b e used for exp erimen tal data in terpretation, see Refs. [19 ]. Ac kno wledgemen ts The autho r is grateful to Johan Blouw for useful discussions and careful reading of the manusc ript and thanks the Univ ersit y of Akureyri and the MPI f or Nuclear Ph ysics for supp ort in carrying out the researc h. References [1] M. G. Kendall, A. S. Stuart, The Adv anced Theory o f Sta t istics, V ol. 2 , c h. 30, sec. 30.4, Griﬃn Publishing Company , London, 1973. [2] N. D. Ga gunash vili, Nucl. Instr. Meth. A 5 96(2008)4 3 9. [3] I. M. Sob ol, Numerical Mon te Carlo metho ds, c h.5, Nauk a, Mosco w, 197 3 . [4] K. Pearson, Phil. Mag. 50(190 0 )157. [5] M. G. Kendall, A. S. Stuart, The Adv anced Theory o f Sta t istics, V ol. 1 , c h.15 , sec. 15.10, Griﬃn Publishing Compan y , London, 1973 . [6] B.V. G nedenk o, V.Y u. K o rolev, R a ndom Summation: Limit Theorems and Applicatio ns, CR C Press, Bo ca Ra ton, Florida, 1996. [7] H. V. Henderson, S. R. Searle, SIAM Rev. 23(19 81)53. [8] H. Robbins, Bull. Amer. Math. So c. 54(1948)11 5 1. [9] N. D. Ga gunash vili, Nucl. Instr. Meth. A614 (2010)287 . [10] G. H. G olub, SIAM Rev. 15(1973 )318. [11] J. Neumann, V arious tec hniques used in connection with rando m digits, NBS Appl. Math. series, 12(19 51)36-38 . 26 [12] S. Baker, R. D. Cousins, Nucl. Instr. Meth. 221(1984)43 7. [13] G. Zech, Nucl. Instr. Meth. A691(2012 )178. [14] W. T. Eadie, D . D ry ard, F. E. Ja mes, M. Ro os, B. Sadoulet, Sta tistical Metho ds in Experimental Ph ysics, North-Holland Publishing Company , c h. 4, sec.4.1.2, Amsterdam,London, 19 71. [15] D. S. Mo ore, G. P . McCab e, In tro duction t o the Practice of Statistics, W. H. F reeman Publishing Compan y , New Y ork, 20 07. [16] W. G. Co c hran, Ann. of Math. Stat . 23 ( 1 952)315. [17] N. D . G agunash vili, Comput. Ph ys. Comm un. 183(2012 )418. [18] G. Bo hm, G. Zec h, In tro duction to Statistics and Data Analysis for Ph ysicists, V erlag Deutsc hes Elektronen-Sync hrotron, 2010. [19] N. D . G agunash vili, Nucl. Instr. Meth. A6 3 5(2011)86 . [20] D0 Collab oration, V. M. Abazov et al., Phys . L ett. B 693(2 0 10)515. [21] H. Cramer, Mathematical metho ds of statistics, c h. 30, Princeton Uni- v ersity Press, Princeton, 1999. [22] N. D . G agunash vili, Nucl. Instr. Meth. A6 1 4(2010)28 7. [23] N. D . G agunash vili, Comput. Ph ys. Comm un. 183(2012 )193. 27

Chi-square goodness of fit tests for weighted histograms. Review and improvements

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment