Chi-square goodness of fit tests for weighted histograms. Review and improvements

Weighted histograms are used for the estimation of probability density functions. Computer simulation is the main domain of application of this type of histogram. A review of chi-square goodness of fit tests for weighted histograms is presented in th…

Authors: Nikolai Gagunashvili

Chi-square goodness of fit tests for weighted histograms. Review and   improvements
Chi-square go o dness of fit tests fo r w eigh ted his tograms. Review and impro v emen ts. N.D. Gagunash vili 1, ∗ University of Akur eyri, Bor gir, v/Nor dursl´ od, IS-600 Akur eyri, Ic eland Abstract W eigh t ed histograms are used for the estimation of probability densit y func- tions. Computer sim ulation is the main domain o f application of this type of histogram. A re view of c hi- square go o dness of fit tests for w eigh ted his- tograms is presen ted in this pap er. Impro v emen ts are prop osed to these tests that ha v e size more close to its nominal v alue. Numerical examples are presen ted in this pap er for ev aluation of tests and to demonstrate v arious applications of tests. Keywor ds: probabilit y densit y function, histogr am, go o dness o f fit test, m ultino mia l distribution, Pois son histog r a m P ACS: 02.30 .Z z, 07 .05.Kf, 0 7.05.Fb 1. In tro duction A histogram with m bins fo r a giv en probability densit y function ( PD F) p ( x ) is used to estimate the probabilities p i = Z S i p ( x ) dx, i = 1 , . . . , m (1) that a random eve n t b elongs to bin i . In tegration in (1) is done ov er the bin S i . ∗ T el.: +35 4-460 8505 ; fax: +354- 46089 98 Email addr ess: nik olai@u nak.i s (N.D. Gagunashvili) 1 Present address: Max-Pla nck-Institut f¨ ur Ker nphysik, PO Box 1039 8 0, 69029 Heidelber g, Germany Pr eprint submitte d to Elsevier F ebruary 20, 2015 A histogram can b e obtained as a r esult of a random experiment with PDF p ( x ). Let us denote the n um b er of random ev ents b elonging to the i th bin of the histogram as n i . The total n um b er o f ev ents n in the histogram is equal to n = m X i =1 n i . (2) The quantit y ˆ p i = n i /n (3) is an estimator of probability p i with exp ectation v alue E [ ˆ p i ] = p i . (4) The distribution of the n um b er of ev en ts fo r bins of the histogram is the m ulti- nomial distribution [1] and the proba bility of the random v ector ( n 1 , . . . , n m ) is P ( n 1 , . . . , n m ) = n ! n 1 ! n 2 ! . . . n m ! p n 1 1 . . . p n m m , m X i =1 p i = 1 . (5) A w eigh ted histogram or a histogram of weigh ted ev en ts is used again for estimating the pro babilities p i (1), see R ef. [2]. It is obta ined a s a result of a random exp erimen t with probability densit y function g ( x ) that generally do es not coincide with PDF p ( x ). The sum of we igh ts of eve n ts fo r bin i is defined as: W i = n i X k =1 w i ( k ) , (6) where n i is the num b er o f ev en ts at bin i and w i ( k ) is the w eight of the k th ev ent in the i th bin. The statistic ˆ p i = W i /n (7) is used to estimate p i , where n = P m i =1 n i is the total num b er of eve n ts for the histogra m with m bins. W eights of ev en ts are c hosen in suc h a w a y that the estimate (7) is unbias ed, E[ ˆ p i ] = p i . (8 ) The usual histogram is a we igh ted histogram with w eigh ts of ev en ts equal to 1. The t w o examples of weigh ted histogra ms are considered b elow : 2 1.1. Example 1 T o define a w eighted histogra m let us write the pro ba bilit y p i (1) for a giv en PDF p ( x ) in the form p i = Z S i p ( x ) dx = Z S i w ( x ) g ( x ) dx, (9) where w ( x ) = p ( x ) /g ( x ) (10) is the w eigh t function and g ( x ) is some other probabilit y densit y function. The function g ( x ) m ust b e > 0 for p oints x , where p ( x ) 6 = 0. The weigh t w ( x ) = 0 if p ( x ) = 0, see R ef. [3]. The w eighted histogram is obtained from a random exp eriment with a probabilit y densit y function g ( x ), and the w eights of the even ts a re calcu- lated according to (10). 1.2. Example 2 The pr o babilit y densit y function p r ec ( x ) of a reconstructed c har a cteristic x of an ev ent obta ined from a detector with finite resolution and limited acceptance can b e represen ted as p r ec ( x ) ∝ Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ , (11) where p tr ( x ′ ) is the true PDF, A ( x ′ ) is the a cceptance of the setup, i.e. the probabilit y of recording an ev en t with a c haracteristic x ′ , and R ( x | x ′ ) is the exp erimental resolution, i.e. the probability of obtaining x instead of x ′ after the reconstruction of the ev ent. The in tegratio n in ( 1 1) is carried o ut o v er the domain Ω ′ of the v ariable x ′ . T otal probability that an ev en t will not b e registered is equal to p = Z Ω ′ p tr ( x ′ )(1 − A ( x ′ )) d x ′ . (12) The sum of probabilities Z Ω Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx + Z Ω ′ p tr ( x ′ )(1 − A ( x ′ )) d x ′ = 1 (13) 3 b ecause Z Ω Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx = Z Ω ′ p tr ( x ′ ) A ( x ′ ) , dx ′ , (14) where Ω domain of the v ariable x . A histogram of the PDF p r ec ( x ) can b e obtained as a result of a random exp eriment (sim ulation) that has three steps [3]: 1. A random v alue x ′ is c hosen according to a PDF p tr ( x ′ ). 2. W e go bac k to step 1 ag ain with probability 1 − A ( x ′ ), and to step 3 with probabilit y A ( x ′ ). 3. A random v alue x is c hosen according to the PDF R ( x | x ′ ). The quan tit y ˆ p i = n i /n , where n i is the n umber of ev ents b elonging to t he i th bin for a histogram with total num b er of ev en ts n in random exp erimen t (at step 1), is an estimator of p i , p i = Z S i Z Ω ′ p tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx, i = 1 , . . . , m, (15) with the exp ectation v alue o f the estimator E [ ˆ p i ] = p i . (16) The quantit y ˆ p = n/n , where n is t he num b er o f ev en ts that w ere lost, is an estimator of p (12) with the exp ectation v alue of the estimator E [ ˆ p ] = p. (17) Notice that m X i =1 p i + p = 1 a nd m X i =1 n i + n = n. (18) In experimental particle and n uclear phy sics, step 3 is the most time-consuming step o f the Mon te Carlo sim ulation. This step is related to the simulation of the pro cess of tra nsp ort of particles t hro ugh a medium and the rather complex registration appara tus. T o use the results of the sim ulation with some PDF g tr ( x ′ ) for calculating a we igh ted histogra m o f ev en ts with a true PDF p tr ( x ′ ), w e write the equation for p i in the form 4 p i = Z S i Z Ω ′ w ( x ′ ) g tr ( x ′ ) A ( x ′ ) R ( x | x ′ ) d x ′ dx, (19) where w ( x ′ ) = p tr ( x ′ ) /g tr ( x ′ ) (20) is the w eight function. The weigh ted histogram for the PDF p r ec ( x ) can b e obtained using ev ents with reconstructed c haracteristic x and w eights calculated according to (20). In this wa y , w e a v oid step 3 of the sim ulation pro cedure, which is im- p ortant in cases where one needs to calculate Monte Carlo reconstructed histograms for man y different true PDFs. The probability that a n ev en t will not b e registered can b e represen ted as p = Z Ω ′ w ( x ′ ) g tr ( x ′ )(1 − A ( x ′ )) d x ′ , (21) and is estimated the same w ay using eve n ts with weigh ts calculated a ccording form ula (20). 2. Go o dness o f fit tests The problem of go o dness of fit is to test the hy p othesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (22) where p i 0 are sp ecified probabilities, and P m i =1 p i 0 = 1. The test is used in a data analysis for comparing theoretical frequencies np i 0 with observ ed frequencies n i . This classical problem remains o f curren t practical in terest. The t est statistic for a histogram with unw eigh ted en tries X 2 = m X i =1 ( n i − np i 0 ) 2 np i 0 (23) w as suggested by P earson [4]. P earson sho w ed that the statistic (23) ha s appro ximately a χ 2 m − 1 distribution if the hypothesis H 0 is true. 5 2.1. The c ontemp or ary pr o of of Pe arson ’s r esult The exp ectation v alues o f the observ ed frequency n i , if h yp othesis H 0 is v alid, equal to: E[ n i ] = np i 0 , i = 1 , . . . , m (24) and it s co v ariance matrix Γ has elemen ts: γ ij = ( np i 0 (1 − p i 0 ) for i = j − np i 0 p j 0 for i 6 = j Notice that the co v ariance matrix Γ is singular [5]. Let us no w in tro duce the m ultiv aria te statistic ( n − n p 0 ) t Γ − 1 k ( n − n p 0 ) , (25) where n = ( n 1 , . . . , n k − 1 , n k +1 , . . . , n m ) t , p 0 = ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) t and Γ k = ( γ ij ) ( m − 1) × ( m − 1 ) is the co v ariance matrix for a histogram without bin k . The matrix Γ k has the form Γ k = n diag ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) − n p 0 p t 0 . (26) The sp ecial form of this matrix p ermits one to find analytically Γ − 1 k [7]: Γ − 1 k = 1 n diag ( 1 p 10 , . . . , 1 p k − 1 , 0 , 1 p k +1 , 0 , . . . , 1 p m 0 ) + 1 np k , 0 Θ , (27) where Θ is ( m − 1) × ( m − 1 ) matrix with all elemen ts unity . Fina lly the result o f the calculation of expression (25) giv es us the X 2 test statistic (23) . Notice that the result will b e the same for an y choic e o f bin n umber k . Asymptotically the v ector n has a no rmal distribution N ( n p 0 , Γ 1 / 2 k ), see Ref. [5], and therefore the test statistic (23) has χ 2 m − 1 distribution if hy p oth- esis H 0 is true X 2 ∼ χ 2 m − 1 . (28) 2.2. Gener alization of the Pe a rson ’s chi-squar e test for weighte d histo gr ams The total sum of w eights o f ev en t s in i th bin W i , i = 1 , . . . , m , a s prop osed in Ref. [2], can b e considered as a sum of random v ariables W i = n i X k =1 w i ( k ) , (29) 6 where also the n umber of ev ents n i is a random v alue and the w eights w i ( k ) , k = 1 , ..., n i are indep enden t random v ariables with the same prob- abilit y distribution function. The distribution of the n um b er of eve n ts f o r bins o f the histogram is the multinomial distribution and the probability of the random v ector ( n 1 , . . . , n m ) is P ( n 1 , . . . , n m ) = n ! n 1 ! n 2 ! . . . n m ! g n 1 1 . . . g n m m , m X i =1 g i = 1 , (30) where g i is the probabilit y tha t a rando m ev en t b elongs to the bin i . Let us denote the exp ectation v alues of the w eights of ev en ts from the i th bin as E[ w i ] = µ i (31) and t he v ariances as V ar[ w i ] = σ 2 i . (32) The exp ectation v alue of the total sum of w eigh ts W i , i = 1 , . . . , m , see Ref. [6], is: E[ W i ] = E[ n i X k =1 w i ( k )] = E[ w i ]E[ n i ] = nµ i g i . (33) The diagonal elemen ts γ ii of the co v ariance matrix of the v ector ( W 1 , . . . , W m ), see R ef. [6], are equal t o γ ii = σ 2 i g i n + µ 2 i g i (1 − g i ) n = nα 2 i g i − nµ 2 i g 2 i , (34) where α 2 i = E[ w 2 i ] . (35) The no n-diagonal elemen ts γ ij , i 6 = j are equal t o: γ ij = n X k =0 n X l =0 E [ k X u =1 l X v =1 w i ( u ) w j ( v )] h ( k , l ) − E[ W i ]E[ W j ] = n X k =0 n X l =0 E[ w i w j ] h ( k , l ) k l − µ i ng i µ j ng j = µ i µ j ( − g i g j n + g i g j n 2 ) − µ i ng i µ j ng j = − nµ i µ j g i g j , (36) 7 where h ( k , l ) is the probabilit y tha t k ev ents b elong to bin i and l ev en ts t o bin j . F or we igh ted histograms a gain the problem of go o dness of fit is to test the h yp othesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (37) where p i 0 are sp ecified probabilities, and P m i =1 p i 0 = 1. If hy p othesis H 0 is true then E[ W i ] = nµ i g i = np i 0 , i = 1 , . . . , m (38) and g i = p i 0 /µ i , i = 1 , . . . , m. (39) W e can substitute g i to Eqs. (34) and (3 6) whic h giv es the co v ariance matrix Γ with elemen ts: γ ij = ( np i 0 ( r − 1 i − p i 0 ) for i = j − np i 0 p j 0 for i 6 = j where r i = µ i /α 2 i (40) is the ratio of the first momen t of the distribution of w eights of ev en ts µ i to the the second momen t α 2 i for a particular bin i . Notice that for usual histograms the rat io of momen ts r i is equal to 1 and the co v ariance matr ix coincides with the co v ariance matrix o f the m ultino mial distribution. The m ultiv ariate statistic is r epresen ted as ( W − n p 0 ) t Γ − 1 k ( W − n p 0 ) , (41) where W = ( W 1 , . . . , W k − 1 , W k +1 , . . . , W m ) t , p 0 = ( p 10 , . . . , p k − 1 , 0 , p k +1 , 0 , . . . , p m 0 ) t and Γ k = ( γ ij ) ( m − 1) × ( m − 1 ) is the cov ariance matrix for a histogram without bin k . The matrix Γ k has the form Γ k = n diag ( p 10 r 1 , . . . , p k − 1 , 0 r k − 1 , p k +1 , 0 r k +1 , . . . , p m 0 r m ) − n p 0 p t 0 . (42) The sp ecial fo rm of this mat r ix p ermits one to find analytically the in v erse matrix Γ − 1 k = 1 n diag ( r 1 p 10 , . . . , r k − 1 p k − 1 , 0 , r k +1 p k +1 , 0 , . . . , r m p m 0 ) + 1 n (1 − P i 6 = k r i p i 0 ) rr t , (43) 8 where r = ( r 1 , . . . , r k − 1 , r k +1 , . . . , r m ) t . After that, the m ultiv ariate statistic can b e written as X 2 k = X i 6 = k r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k r i ( W i − np i 0 )) 2 n (1 − P i 6 = k r i p i 0 ) , (44) and can also b e transformed to fo rm X 2 k = 1 n X i 6 = k r i W 2 i p i 0 + 1 n ( n − P i 6 = k r i W i ) 2 1 − P i 6 = k r i p i 0 − n (45) whic h is con v enien t fo r n umerical calculations. Asymptotically the v ector W has a normal distribution N ( n p 0 , Γ 1 / 2 k ) [8] and therefore the test statistic (44) has χ 2 m − 1 distribution if hy p othesis H 0 is true X 2 k ∼ χ 2 m − 1 . (46) F or usual histogra ms when r i = 1, i = 1 , . . . , m the statistic ( 4 4) is P earson’s c hi- square statistic (23). The exp ectation v alue of statistic (44), as sho wn in Ref. [2], is equal t o E[ X 2 k ] = m − 1 , (47 ) as for P earson’s test [1]. The rat io of momen ts r i = µ i /α 2 i , that is used for the test statistic calculation, is not kno wn in ma jority of cases. An estimation of r i can b e used: ˆ r i = W i /W 2 i , (48) where W 2 i = P n i k =1 w 2 i ( k ). Let us no w replace r i with the estimate ˆ r i and denote the estimator of matrix Γ k as ˆ Γ k . Then for p ositive definite matrices ˆ Γ k , k = 1 , . . . , m the test statistic is give n as ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k ˆ r i ( W i − np i 0 )) 2 n (1 − P i 6 = k ˆ r i p i 0 ) . (49) F orm ula (49) for usual histogra ms do es not dep end on the c hoice of the excluded bin, but for we igh ted histograms there can b e a dep endence. A test statistic that is in v arian t to the c ho ice of the excluded bin and at the same 9 time is a P earson’s c hi square stat istic (2 3) f or t he un w eighted histograms can b e represen ted as the median v alue for the set o f statistics ˆ X 2 k (49) with p ositiv e definite mat r ixes ˆ Γ k ˆ X 2 M ed = Med { ˆ X 2 1 , ˆ X 2 2 , . . . , ˆ X 2 m } . (50) Statistic ˆ X 2 M ed first time w as prop osed in R ef. [2] and appro ximately has χ 2 m − 1 distribution if hy p othesis H 0 is true ˆ X 2 M ed ∼ χ 2 m − 1 . (51) The usage of ˆ X 2 M ed to t est the h yp othesis H 0 with a given significance lev el is equiv alen t to making a decision b y v o ting. It w as noticed that size of test can b e slightly greater than no minal v alue of size of test eve n for larg e v alue of total n um b er of ev en ts n . 2.3. New gener alizations of Pe arson ’s chi-squar e test for weighte d histo gr ams Set of stat istics { ˆ X 2 1 , ˆ X 2 2 , . . . , ˆ X 2 m } , with p ositiv e definite matrixes ˆ Γ k only , is used for calculating the median statistic ˆ X 2 M ed (50). It can b e used for an y w eighted histogr a ms, including histogr ams with unw eigh ted en tries. One bin is excluded b ecause the full cov ar iance matrix of an un w eigh ted histogram is singular and hence can no t b e in verted. Let us consider estimation o f a full co v ariance matrix ˆ Γ fo r the w eigh ted histogram with more detail. The symmetric matrix is p ositive definite if the minimal eigen v alue of the matrix larg er then 0. W e denote minimal eigen v alue of the matrix n − 1 ˆ Γ by λ min then follow to Ref. [10] it can b e sho wn that min i { p i 0 ˆ r i } − m X i =1 p 2 i 0 ≤ λ min ≤ min i { p i 0 ˆ r i } . (52) and t he eigen v alue λ min is the ro ot of secular equation 1 − m X i =1 p 2 i 0 p i 0 / ˆ r i − λ = 0 . (53) In case of a histogram with un w eigh ted entries, all ˆ r i = 1 a nd λ = 0 is zero of equation (53). Matrix ˆ Γ for t his case is no t p ositiv e definite and is singular, but mat r ix ˆ Γ k is p ositiv e definite and therefore inv ertible. Num b er of ev en ts n i in bins of usual histogram satisfy to equation n 1 + n 2 , ..., + n m = n 10 that is wh y the co v ar ia nce matrix of m ultinomial distribution is no t p ositiv e definite and is singular. Matrix ˆ Γ for a histogra m with w eighted entries can b e a lso non-p ositiv e definite. There are tw o r easons wh y this can b e. First of a ll, the total sums of w eights W i in bins of a w eighted histogram a r e related with each o ther, b ecause satisfy the equation E[ P m i =1 W i ] = n and second, due fluctuations o f matrix elemen ts. The test statistic obtained with full matrix ˆ Γ is unstable and can ha v e large v ariance esp ecially for the case of low n umber n of ev en ts in a histogram. The fa ct the matrix is not p ositiv e definite is equiv alen t to the f act that the minimal eigenv alue λ min of the matrix ˆ Γ is ≤ 0. A case when the min- imal eigen v alue is positive but rather small is also no t desirable, esp ecially for computer calculations. Due to the ab ov e mentioned reasons it is wise to use the test stat istic for a w eighted histograms ˆ X 2 = ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 ) 2 np i 0 + ( P i 6 = k ˆ r i ( W i − np i 0 )) 2 n (1 − P i 6 = k ˆ r i p i 0 ) (54) for k where p k 0 ˆ r k = min i { p i 0 ˆ r i } . (55) A secular equation for the new minimal eigen v alue can b e solv ed n umerically , b y bisection metho d, to c hec k whether a matrix ˆ Γ k is po sitiv e definite or not. Numerical exp eriments show that it is v ery rare that the matrix ˆ Γ k is not p ositiv e definite a nd it happ ens only for histogr a ms with a small n umber n of ev ents in a histogram. If hypothesis H 0 is v alid, statistic ˆ X 2 asymptotically has distribution ˆ X 2 ∼ χ 2 m − 1 . (56) It is plausible that p ow er o f the new test is not low er than p o w er of tests with statistic ˆ X 2 M ed and with other statistics { ˆ X 2 i , i 6 = k } . The distribution of the statistic ˆ X 2 is closer to χ 2 m − 1 then distribution of median statistic ˆ X 2 M ed . Also the statistic ˆ X 2 is easier to calculate than the stat istic ˆ X 2 M ed . 11 3. Go o dness of fit t est s for w eigh ted histograms with deviations from main mo del Here, differen t deviations from the main mo del of w eighted histograms will b e considered as we ll as go o dness of fit tests for those cases. 3.1. Go o dness of fit test for we i g hte d histo gr am with unknown normalization In practice one is often faced with the case that all w eights of ev en ts are defined up to an unknow n normalization constan t C see Ref. [2]. It happ ens b ecause in some cases of computer simulation is rather difficult giv e analytical formula for the PDF, but the PDF up to multiplicativ e constan t is p ossible, that is enough f o r the generation of ev ents according to the PDF, for example, b y v ery p opular Neumann’s metho d [11]. F or the go o dness of fit t est it means that if hypothesis H 0 is v alid E[ W i ] · C = np i 0 , i = 1 , . . . , m. (57) with unkno wn constan t C . Then the test statistic (45) can b e written as c ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 /C ) 2 np i 0 /C + ( P i 6 = k ˆ r i ( W i − np i 0 /C )) 2 n (1 − C − 1 P i 6 = k ˆ r i p i 0 ) . (58) An estimator for the constant C can b e found b y minimizing Eq. (5 8). ˆ C k = X i 6 = k ˆ r i p i 0 + s P i 6 = k ˆ r i p i 0 P i 6 = k ˆ r i W 2 i /p i 0 ( n − X i 6 = k ˆ r i W i ) , (59) where ˆ C k is an estimator of C . Substituting (59) to (58), we get the test statistic c ˆ X 2 k = X i 6 = k ˆ r i ( W i − np i 0 / ˆ C k ) 2 np i 0 / ˆ C k + ( P i 6 = k ˆ r i ( W i − np i 0 / ˆ C k )) 2 n (1 − ˆ C − 1 k P i 6 = k ˆ r i p i 0 ) . (60) The stat istic (60) has a χ 2 m − 2 distribution if h yp o t hesis H 0 is v alid. F orm ula (60) can b e a lso transformed to c ˆ X 2 k = s 2 k n + 2 s k , (61) 12 where s k = s X i 6 = k ˆ r i p i 0 X i 6 = k ˆ r i W 2 i /p i 0 − X i 6 = k ˆ r i W i (62) whic h is conv enien t fo r calculations, see [2]. Median statistics can b e used for the same reason as in section 2.2 c ˆ X 2 M ed = Med { c ˆ X 2 1 , c ˆ X 2 2 , . . . , c ˆ X 2 m } . (63) and has appro ximately χ 2 m − 2 distribution if hypothesis H 0 v alid, see Ref. [2] c ˆ X 2 M ed ∼ χ 2 m − 2 . (64) 3.2. New go o dness of fit test for weighte d histo gr am with unknown normal- ization The new estimator of constan t C is ˆ C = X i 6 = k ˆ r i p i 0 + s P i 6 = k ˆ r i p i 0 P i 6 = k ˆ r i W 2 i /p i 0 ( n − X i 6 = k ˆ r i W i ) , (65) for k where p k 0 ˆ r k = min i { p i 0 ˆ r i } . (66) And the test statistic can b e written as c ˆ X 2 = X i 6 = k ˆ r i ( W i − np i 0 / ˆ C ) 2 np i 0 / ˆ C + ( P m 1 ˆ r i ( W i − np i 0 / ˆ C )) 2 n (1 − ˆ C − 1 P m 1 ˆ r i .p i 0 ) (67) Statistic c ˆ X 2 asymptotically has χ 2 m − 2 distribution if h yp othesis H 0 is v alid c ˆ X 2 ∼ χ 2 m − 2 . (68) 3.3. Go o dness of fit test for weighte d Poisson histo gr ams P oisson histogram [12] can b e defined as histogram with mu lti-Pois son distributions of a n um b er o f ev en ts for bins P ( n 1 , . . . , n m ) = m Y i =1 e − n 0 p i ( n 0 p i ) n i /n i ! , (69) 13 where n 0 is a free para meter. The discrete probability distribution function (probabilit y ma ss function) of a Poiss on histogram can b e r epresen ted a s a pro duct o f tw o probability functions: a P oisson probabilit y mass f unction for a num b er of ev ents n with parameter n 0 and a multinomial probability mass function of the num b er of even ts for bins of the histogram, with total n umber of ev ents equal to n , see Ref. [1] P ( n 1 , . . . , n m ) = e − n 0 ( n 0 ) n /n ! × n ! n 1 ! n 2 ! . . . n m ! p n 1 1 . . . p n m m . (70) A P oisson histogram can b e obtained as a result of t w o rando m experiments , namely , where the first experimen t with P oisson probability mass function giv es us the total n um b er of ev en ts in histogram n , and then a histogram is obtained as a result of a random exp eriment with PDF p ( x ) and t he total n umber of ev ents is equal to n . As in the case of m ultinomial histograms, also for P oisson histograms there is the problem of go o dness of fit test with the hypothesis: H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (71) where p i 0 are sp ecified probabilit ies, and P m i =1 p i 0 = 1. If n 0 is kno wn, then the statistic, see Ref. [13]: X 2 pois = m X i =1 ( n i − n 0 p i 0 ) 2 n 0 p i 0 , (72) can b e used and has asymptotically a χ 2 m distribution if the hypothesis H 0 is v alid X 2 pois ∼ χ 2 m . (73) The h yp othesis H 0 b ecomes complex if pa rameter n 0 is unkno wn for the P oisson histogram. This is an o pp osite situation to the case of a multinomial histogram, where the h yp othesis is simple. In [13] there are prop osed statistics for go o dness of fit t est for a w eighted P oisson histogram with know n pa r a meter n 0 X 2 cor r 0 = m X i =1 ( W i − n 0 p i 0 ) 2 W 2 i n 0 p i 0 /W i , (74) 14 and f or the case the n 0 is not kno wn: X 2 cor r = m X i =1 ( W i − ˆ n 0 p i 0 ) 2 W 2 i ˆ n 0 p i 0 /W i , (75) with estimation of n 0 obtained b y minimization of equation (74) ˆ n 0 =  P m i =1 W 3 i / ( W 2 i p 0 i ) P m i =1 W i p 0 i /W 2 i  1 / 2 . (76) The distribution of statistic X 2 cor r 0 in case h yp othesis H 0 is v alid X 2 cor r 0 ∼ χ 2 m (77) and f or the statistic X 2 cor r is X 2 cor r ∼ χ 2 m − 1 (78) according Ref. [13]. Generally , the p ow er of the tests for Poiss on histograms will b e sligh tly lo w er than for multinomial histograms with the n um b er of eve n ts n = n 0 whic h is explained by the fact that the tota l num b er of ev en ts for Pois son histograms fluctuates. The c ho ice of the t yp e of the histogram dep ends on what t yp e of a ph ysical exp eriment is pro duced. If the n um b er of ev en ts n is constan t, then it is a m ultino mia l histog r am; if the n umber of ev ents n is a random v alue that has P oisson distribution, then it is a P oisson histogram. A we igh ted histogram very of ten is the result of mo deling a nd the n um b er of sim ulated ev en ts is kno wn exactly , and therefore the c hoice of a m ultino- mial histog ram is reasonable. It is also reasonable to use tests deve lop ed for the m ultinomial histograms in the case, if the n umber of ev en ts n is random v alue but with unkno wn distribution [14]. 4. Restriction for go o dness of fit t ests applications F or the histograms with un w eigh ted en tries, the use of P earson’s chi- square test ( 2 3) is inappropria te if an y exp ected frequency np i 0 is b elo w 1 or if the exp ected frequency is less than 5 in more than 20% of bins [15]. 15 Restrictions for w eigh ted histograms, due to fluctuation of the estimation of ratio of momen ts ˆ r i , can b e made stronger. Namely , the use of new c hi- square tests (54) and (67) is inappropriate if a ny exp ected fr equency E[ n i ] is less than 5. F ollow ing Ref. [16] a disturbance is rega rded as unimp o r tan t when the nominal size of the t est is 5% and the size of the t est lies b et ween 4% and 6% f or a go o dness of fit tests. 5. Numerical ev aluation of the tests’ p ow er and sizes The main parameters whic h c haracterizes the effectiv eness of a test are size and p ow er. The nominal significance lev el w as ta ken to b e equal to 5% for calculating of size o f tests in presen ted n umerical examples. Hypot hesis H 0 is rejected if test statistic ˆ X 2 is larg er than some threshold. Threshold k 0 . 05 for a giv en nominal size of test 5% can b e defined from the equation 0 . 05 = P ( χ 2 l > k 0 . 05 ) = Z + ∞ k 0 . 05 x l/ 2 − 1 e − x/ 2 2 l/ 2 Γ( l / 2) dx, (79) where l = m − 1. Let us define the test size α for a given nominal test size 5% as the probabilit y α = P ( ˆ X 2 > k 0 . 05 | H 0 ) . (80) This is t he probability that hypothesis H 0 will b e rejected if the distribution of w eigh ts W i for bins o f the histogram satisfies hypothesis H 0 . Deviation of the test size from the nominal test size is a n imp ortant test characteristic. A second imp or t a n t test characteristic is the p o w er. Let us define the test p ow er as P ( ˆ X 2 > k 0 . 05 | H a ) . (81) This is t he probability that hypothesis H 0 will b e rejected if the distribution of w eights W i for bins of the histogram do es no t satisfy h yp othesis H 0 . Notice that the p ow er calculated by for mula (81) can giv e misleading result in case of comparing of different tests. T o ov ercome this pro blem here w e define the p ow er of test π a s π = P ( ˆ X 2 > K 0 . 05 | H a ) (82) 16 with the threshold K 0 . 05 calculated b y Mon te-Carlo metho d from equation 0 . 05 = P ( ˆ X 2 > K 0 . 05 | H 0 ) . (83) All definitions prop osed ab o v e for statistics ˆ X 2 can b e used for other t est statistics with appro pr ia te n um b er of degree of freedom l in t he form ula (79). The size and p o w er of tests dep end o n the n umber of ev ents a nd the binning tha t w as discussed for usual histog rams in Ref. [1]. The p o w er fo r w eighted histogr a ms also dep ends on the c ho ice o f PDF g ( x ) (subsection 1.1) or g tr (subsection 1.2) and can b e eve n hig her than for histogra ms with un weigh ted en tries as w ell as low er. Below we demonstrate tw o examples of an application of the previously discussed tests. The size a nd p o w er of the tests are calculated for a differen t tot a l n um b er of ev ents in the histog rams. In n umerical examples w ere demonstrated applications of: • P earson’s go o dness of fit test [4], see subsection 2 .1 and first paragraph of section 2. The test statistic is X 2 (23). • go o dness of fit test for we igh ted histograms with normalized w eights [2], see subsection 2.2. The test statistic is ˆ X 2 M ed (50). • go o dness of fit test for weigh ted histograms with unnormalized weigh ts [2], see subsection 3.1. The test statistic is c ˆ X 2 M ed (63). • new g o o dness of fit test for w eigh t ed histograms with normalized we igh ts, see subsection 2.3. The test statistic is ˆ X 2 (54). • new go o dness of fit test for w eigh ted histograms with unnormalized w eights, see subsection 3.2. The test statistic is c ˆ X 2 (67). • go o dness of fit test for P oisson histograms with un w eighted en tries and kno wn parameter n 0 [13], see subsection 3 .3 . The test statistic is X 2 pois (72). • go o dness of fit test for w eigh ted P oisson histograms with known pa- rameter n 0 [13], see subsection 3.3 . The test statistic is X 2 cor r 0 (74). • go o dness of fit test for w eighted P oisson histograms with unkno wn parameter n 0 [13], see subsection 3.3 . The test statistic is X 2 cor r (75). The published program, see Ref. ([17]), w as used for the calculatio n of the test statistics with minor mo dification needed for the new tests. 17 0 0.05 0.1 0.15 0.2 0.25 4 6 8 10 12 14 16 g 1 (x) g 3 (x) g 2 (x) p 0 (x) x Figure 1: Pro ba bilit y densit y functions g 1 ( x ) = p ( x ), g 2 ( x ), g 3 ( x ) used for even ts genera- tion and PDF p 0 ( x ) for h yp othesis H 0 (dashed line) 5.1. Numeric al example 1 A sim ula t ion study w a s done for the example fro m Ref. [2]. W eighte d his- tograms describ ed in subsection 1.1 , are used here. The PDF for h yp othesis H 0 is: p 0 ( x ) ∝ 2 ( x − 10) 2 + 1 + 1 . 15 ( x − 14) 2 + 1 (84) against alternativ e H a : p ( x ) ∝ 2 ( x − 10) 2 + 1 + 1 ( x − 14) 2 + 1 (85) represen ted by the w eigh ted histogram. Both PDF’s are defined on the in- terv al [4 , 16]. A calculation w as done for three cases of a PDF, used f o r the ev ent g eneration, see Fig. 1 g 1 ( x ) = p ( x ) (86) g 2 ( x ) = 1 / 12 (87) g 3 ( x ) ∝ 2 ( x − 9) 2 + 1 + 2 ( x − 15) 2 + 1 . (88) Distribution ( 8 6) giv es an unw eigh ted histogram. Distribution (87) is a uni- form distribution on the in terv al [4 , 16]. D istribution (88) has the same type 18 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 4 6 8 10 12 14 16 x p p 0 Figure 2 : Pro babilities p i , i = 1 , ..., 20 for the PDF p ( x ) (solid line) a nd p 0 i , i = 1 , ..., 20 for the PDF p 0 ( x ) (dashed line) of parameterizations as Eq. (84), but with differen t v alues of the parame- ters. Histograms with 20 bins and equidistant binning w ere used. A t Fig. 2 presen ted proba bilities p i , i = 1 , ..., 20 f o r the PD F p ( x ) and p 0 i , i = 1 , ..., 2 0 for the PDF p 0 ( x ). Size and p ow er of tests with statistics ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) we re calculated for w eigh ted histograms with w eights of ev en ts equal to p ( x ) /g 2 ( x ) and p ( x ) /g 3 ( x ). Statistics ˆ X 2 (54) and ˆ X 2 M ed (50) coincide with Pe arson’s statistic X 2 (23) and w ere used for his- tograms with un w eighted en t r ies. The results of calculations for 100000 runs are presen ted in T able 1. Conclusion and interpre tation o f results presen ted in T able 1. • The size of new tests ˆ X 2 (54)(ro ws 2, 6) and c ˆ X 2 (67)(row 3, 7) are generally closer to nominal v alue 5% then median tests ˆ X 2 M ed (50)(rows 4, 8) and c ˆ X 2 M ed (63)(ro ws 5, 9 ) when the application o f the test satisfies restrictions formulated in section 4. • The p ow er o f new tests ˆ X 2 (54) (r ows 2, 6) are greater than for ana lo- gous median tests ˆ X 2 M ed (50)(ro ws 4, 8) . The p o w er of tests c ˆ X 2 (67)(rows 3, 7) are greater than for ana logous median tests c ˆ X 2 M ed (63)(ro ws 5 , 9). • The p ow er o f all tests calculated for histograms with w eights of ev ents 19 T able 1: Numerica l example 1. Size ( α ) and p ow e r ( π ) of differe n t test statistics X 2 (23), ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50), c ˆ X 2 M ed (63) obtained for different weigh ted functions w ( x ). Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 200 400 600 800 1000 3000 500 0 7000 9000 w(x) 1 X 2 α 5.7 5.4 5.3 5.2 5.2 5.1 5.0 5.0 5.1 1 π 6.0 7.1 8.2 9.8 11.2 29.9 52.7 71.6 84.9 2 ˆ X 2 α 5.5 5.3 5.2 5.1 5.0 5. 1 5.1 5.1 4.9 p ( x ) g 2 ( x ) π 6.1 7.0 8.2 9.2 10.5 26.2 45.8 64.0 78.7 3 c ˆ X 2 α 5.0 5.1 5.0 5.0 4.9 5. 0 5.0 5.2 4.9 π 6.0 7.0 8.1 9.1 10.4 26.0 45.6 63.0 78.1 4 ˆ X 2 M ed α 5.4 5.4 5.3 5.2 5.1 5. 3 5.2 5.3 5.0 π 6.0 6.9 8.0 9.1 10.3 25.7 45.3 63.1 78.2 5 c ˆ X 2 M ed α 5.6 5.8 5.7 5.7 5.5 5. 7 5.7 5.8 5.5 π 5.9 6.9 8.0 9.1 10.2 25.4 44.9 62.5 77.5 6 ˆ X 2 α 7.3 6.6 6.1 5.8 5.6 5.2 5.2 4.9 5.0 p ( x ) g 3 ( x ) π 16.2 29.7 40.1 48.5 56.1 95.7 99.8 100.0 100.0 7 c ˆ X 2 α 4.7 4.9 5.0 5.0 5.1 5.1 5.0 4.9 5.0 π 6.9 8.3 9.9 11.6 13.4 36.5 61.6 80.5 91.2 8 ˆ X 2 M ed α 5.5 5.3 5.4 5.3 5.5 5.4 5.2 5.3 5.2 π 7.9 11.8 15.8 20.2 25.0 75.3 96.6 99.8 100.0 9 c ˆ X 2 M ed α 5.4 5.5 5.7 5.6 5.8 5.7 5.6 5.7 5.5 π 6.8 8.4 9.7 11.4 13.1 36.0 60.7 79.2 90.7 equal to p ( x ) /g 2 ( x ) (rows 2-5) are lo w er then for histogram with un- w eighted en t r ies (ro w 1) , but the p ow er of a ll tests calculated for histograms with w eigh ts of ev en ts equal to p ( x ) /g 3 ( x ) (ro ws 6-9 ) are greater. The explanation is that in latter case w e increase the statistics of eve n ts f or domains with high deviation of the distribution presen ted b y the histogram from the tested distribution. Prop erties of tests in applications to P oisson histog r ams with the same w eighted functions and distributions of ev en ts w ere in v estigated. In this case, the total n umber of ev en t s n is random and w as sim ula t ed a ccording P oisson distribution f or a give n parameter n 0 . Siz e and p ow er of tests X 2 pois (72), X 2 cor r 0 (74) with exactly kno wn parameter n 0 and X 2 cor r (75) dev elop ed ad ho c for the P oisson histogram in [1 3] also was calculated. Results of t he calculations are presen ted in T able 2 . Conclusion and interpre tation o f results presen ted in T able 2. • The size o f all tests are close to nominal v alue 5%. 20 T able 2: Numerical example 1. Size ( α ) and power ( π ) of different tes t s tatistics X 2 pois (72), X 2 (23) , X 2 cor r 0 (74), X 2 cor r (75) , ˆ X 2 (54) , c ˆ X 2 (67) in application for Poisson histo grams. Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 0 200 400 600 800 1000 3000 500 0 7000 9000 w(x) 1 X 2 pois α 6.0 5.6 5.2 5. 2 5.2 5.1 5.1 5.1 5.1 1 π 5.9 7.0 8.3 9.6 11.1 29.2 50.9 70.0 83.8 2 X 2 α 5.6 5.5 5.1 5. 2 5.1 5.1 5.1 5.1 5.0 π 6.0 7.0 8.4 9.8 11.1 30.0 52.2 71.2 85.0 3 X 2 cor r 0 α 5.4 5.3 5.2 5. 1 5.1 5.0 5.0 5.1 5.0 p ( x ) g 2 ( x ) π 6.0 6.7 7.8 8.8 10.0 25.0 43.9 61.5 76.2 4 X 2 cor r α 3.9 4.4 4.6 4. 7 4.7 5.0 5.0 5.0 4.9 π 6.0 7.0 8.0 9.0 10.3 25.5 45.0 62.9 77.4 5 ˆ X 2 α 5.5 5.2 5.2 5. 1 5.0 5.1 5.0 5.0 5.0 π 6.1 7.1 8.1 9.2 10.6 26.3 46.0 64.1 78.5 6 c ˆ X 2 α 5.1 5.0 5.0 5. 0 5.0 5.0 5.0 5.0 5.0 π 6.0 7.0 8.1 9.2 10.5 26.0 45.5 63.4 77.6 7 X 2 cor r 0 α 5.1 5.0 5.2 5. 0 5.2 5.1 5.1 5.1 4.9 p ( x ) g 3 ( x ) π 6.3 7.5 8.8 10.7 12.3 35.3 60.5 79.6 91.3 8 X 2 cor r α 3.5 4.1 4.5 4. 7 4.8 5.0 5.0 5.0 4.9 π 7.0 8.4 9.7 11.6 13.4 36.0 60.4 79.2 90.8 9 ˆ X 2 α 7.2 6.5 6.0 5.6 5. 6 5.3 5.1 5.1 4.9 π 16.4 30.1 40.1 48.8 56.1 95.7 99.8 100.0 10 0.0 10 c ˆ X 2 α 4.6 4.9 4.9 5.0 5. 1 5.0 5.0 5.0 5.0 π 7.0 8.5 10.0 11.7 13.5 37.0 61.7 80.0 91.2 • The p ow er of new tests ˆ X 2 (54)(ro ws 5 , 9) and c ˆ X 2 (67)(ro ws 6, 10) used for P oisson histograms are greater than the p ow er of tests dev elop ed ad ho c for the P oisson histograms X 2 cor r 0 (74)(ro ws 3, 7) with the ex- actly kno wn parameter n 0 and X 2 cor r (75)(ro ws 4, 8) with the unknown parameter n 0 in Ref. [13]. • The p ow er of P earson’s test X 2 (23)(row 2) used for P oisson histograms is greater than test X 2 pois (72)(ro w 1) with the exactly kno wn parameter n 0 prop osed in Ref. [13]. 5.2. Numeric al example 2 A sim ulation study w as done for the example described in Ref. [18] and also in Ref. [19]. W eigh ted histograms described in subsection 1.2 a re used here. The PD F p 0 ( x ) fo r the h yp othesis H 0 is tak en according to fo r m ula (11 ) with: p 0 tr ( x ′ ) = 0 . 4( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (89) 21 A ( x ′ ) = 1 − ( x ′ − 0 . 5) 2 (90) R ( x | x ′ ) = 1 σ √ 2 π exp  − ( x − x ′ ) 2 2 σ 2  , with σ = 0 . 3 . (91) F or the alternativ e H a , p ( x ) is ta k en with the same a cceptance and reso- lution function according t o fo rm ula (11) with: p tr ( x ′ ) = 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (92) that is presen ted b y the weigh ted histogra m. A calculation w as done for tw o cases of PDFs used for ev ent generation, see F ig . 3. h 1 ( x ′ ) = 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] (93) and h 2 ( x ′ ) = − 0 . 6666( x ′ − 0 . 5) + 1; x ′ ∈ [0 , 1] . (94) In the first case, a weigh ted histogra m is the histogram with w eigh ts of ev ents equal to 1 ( histogram with un we igh ted en tries) and, in the second case, w eigh ts of ev en ts equal to h 1 ( x ′ ) /h 2 ( x ′ ). The results of this calculation for 100000 runs are presen ted in tables 3. W e use a histog ram with 20 bins on in terv al [ − 0 . 3 , 1 . 3]. Fig. 4 presen ted pro ba bilities p i , i = 1 , ..., 2 0 for the PDF p ( x ) and p 0 i , i = 1 , ..., 2 0 for the PDF p 0 ( x ). Here, we add tw o bins for ev ents with x 6 − 0 . 3 a nd x > 1 . 3 as we ll as one bin for eve n ts tha t we re not registered due to limited acceptance. T otal num b er of bins m is used in test equal t o 23. The results of calculations of the sizes and p o w er of tests for 100000 runs are presen ted in T able 3. Conclusion and interpre tation o f results presen ted in T able 3. • The size of new tests ˆ X 2 (54) and c ˆ X 2 (67) ( ro w 2 , 3 ) is more close to the nominal v alue 5% then the size of median tests ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) (row s 4, 5). • The p o w er of new tests ˆ X 2 (54) and c ˆ X 2 (67) (ro ws 2,3) is roughly the same compared with analog ous median tests ˆ X 2 M ed (50) and c ˆ X 2 M ed (63) (ro ws 4, 5). • All tests demonstrate greater p ow er then Pearson’s test X 2 (23) (row 1) used for the histogram with unw eigh ted en tries. 22 0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x´ h 2 (x´) p 0tr (x´) h 1 (x´) Figure 3: Pro bability density functions h 1 ( x ′ ) = p tr ( x ′ ), h 2 ( x ′ ) and p 0 tr ( x ′ ) (dashed line) The pro p ert y of tests in application for P oisson histograms is in v estigat ed with the same weigh ted functions and distributions of ev ents. In this case, the n um b er of ev ents n in a histogram w as sim ulated according P o isson distribution with give n para meter n 0 . The size and p o w er of tests deve lop ed for the P oisson histogram in [13] w as also calculated. Results of calculations are presen ted in T able 4. Conclusion and interpre tation o f results presen ted b y T able 4. • The size o f all tests are close to nominal v alue 5%. • Basically , the p o w er of new tests ˆ X 2 (54) and c ˆ X 2 (67)(ro ws 5, 6) in applying for P oisson histograms a re greater than the p o w er of tests dev elop ed ad ho c for the P oisson histograms X 2 cor r 0 (74) with the ex- actly kno wn pa r a meter n 0 and X 2 cor r (75) (ro ws 3, 4 ) with the unknown parameter n 0 in Ref. [13]. 23 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 x p 0 p Figure 4 : Pro babilities p i , i = 1 , ..., 20 for the PDF p ( x ) (solid line) a nd p 0 i , i = 1 , ..., 20 for the PDF p 0 ( x )(dashed line) • The pow er of P earson’s test X 2 (ro w 2) use d fo r Poiss on histogr ams is greater than p ow er of test X 2 pois (72)(row 1 ) with the exactly kno wn parameter n 0 . Generally the n umerical example 1 and example 2 demonstrate the sup eri- orit y of new go o dness of fit tests under existing tests for w eighted histograms, see R ef. [2] and for w eighted P oisson histograms, see R ef. [13]. 24 T able 3 : Numerica l exa mple 2 . Sizes ( α ) and powers ( π ) of different test statistics X 2 (23), ˆ X 2 (54), c ˆ X 2 (67), ˆ X 2 M ed (50), c ˆ X 2 M ed (63) obtained for different weigh ted functions w ( x ). Italic type ma rks a size o f test with inappro priate num ber of events in the bins of his- tograms. № n 200 400 600 800 1000 3000 5000 7000 9000 w(x) 1 X 2 α 5.1 5.1 5.1 5.0 5.2 5.1 5.1 5.0 5.0 1 π 5.6 6.6 7.5 8.8 9.8 25.9 45.7 64.9 79.4 2 ˆ X 2 α 7.0 6.2 5.8 5.6 5.5 5.1 5.0 4.9 4.9 h 1 ( x ′ ) h 2 ( x ′ ) π 8.4 9.4 10.9 12 .8 14.6 40.9 67.1 85.3 94.5 3 c ˆ X 2 α 5.6 5.6 5.5 5.4 5.3 5.1 5.0 5.0 4.9 π 6.4 7.4 8.4 9.9 11.0 28.0 47.9 66.4 80.5 4 ˆ X 2 M ed α 10.9 7.4 6.6 6.1 6.1 5.7 5.6 5.6 5.6 π 9.1 10.1 11.5 13 .9 15.8 43.7 70.9 87.8 95.8 5 c ˆ X 2 M ed α 7.8 6.6 6.3 5.9 5.9 5.7 5.7 5.7 5.6 π 6.1 7.2 8.4 9.7 10.9 27.4 46.9 65.0 79.2 T able 4: Numerical example 2. Size ( α ) and power ( π ) of different tes t s tatistics X 2 pois (72), X 2 , X 2 cor r 0 (74), X 2 cor r (75), ˆ X 2 (54), c ˆ X 2 (67) in applicatio n for Poisson histograms. Italic t yp e mark s a s iz e of test with inappropr iate num b er of even ts in the bins of histogr ams. № n 0 200 400 600 800 1000 3000 5000 7000 9 000 w(x) 1 X 2 pois α 5.5 5.3 5.3 5.0 5.2 5.1 5.1 5.0 5.2 1 π 5.5 6.4 7.4 8.7 9.7 25.6 45.0 64.0 77.9 2 X 2 α 5.1 5 .1 5.1 4.9 5.2 5.0 5.1 5. 0 5.2 π 5.5 6.5 7.5 8.9 9.8 26.3 45.9 65.0 78.8 3 X 2 cor r 0 α 5.8 5.8 5.5 5.3 5.3 5.0 5.1 5.0 5.0 h 1 ( x ′ ) h 2 ( x ′ ) π 5.8 6.6 7.7 9.0 10.2 27.2 47.0 66.5 80.7 4 X 2 cor r α 4.2 4.9 4.9 4.8 4.9 4.9 5.0 4.9 5.0 π 6.3 7.2 8.4 9.7 11.1 27.5 46.9 65.5 79.5 5 ˆ X 2 α 6.8 6.1 5.8 5.6 5.4 5.1 5. 0 5.0 4.9 π 8.4 9.4 11.0 12.7 14.8 41.1 67.2 85.1 94.4 6 c ˆ X 2 α 5.4 5.6 5.5 5.3 5.3 5.0 4. 9 5.0 4.9 π 6.5 7.4 8.5 9.8 11.0 28.0 48.2 66.4 80.4 6. Conclusion A review o f go o dness of fit tests for weigh ted histograms was presen ted. The bin con ten t of a we igh ted histogram was considered as a ra ndom sum of random v ariables that p ermits t o generalize the classical Pearson’s go o d- ness of fit test for histograms with w eigh ted en tries. Impro v emen ts of the c hi- square t ests with b etter statistical pro p erties were prop osed. Ev aluatio n of the size and p ow er of tests w as done numeric ally for differen t t yp es of w eighted histograms with differen t nu m b ers of ev ents and different w eigh t functions. Generally the size o f new tests is closer to no minal v alue and p o w er is not low er than hav e existing tests. Except direct application of 25 tests in da ta a nalysis, see for example Ref. [20], t he prop osed tests are nec- essary bases for generalization of test in the case when some parameters m ust b e estimated from the data, see Ref. [21], as w ell as for the generalisation of test for comparing w eighted and un w eighted histograms o r t w o we igh ted ones ( ho mogeneit y test), see Refs. [21, 22, 23]. Parametric fit of data ob- tained from detectors with finite resolution and limited a cceptance is one of imp ortant application of metho ds dev elop ed for weigh ted histograms that can b e used for exp erimen tal data in terpretation, see Refs. [19 ]. Ac kno wledgemen ts The autho r is grateful to Johan Blouw for useful discussions and careful reading of the manusc ript and thanks the Univ ersit y of Akureyri and the MPI f or Nuclear Ph ysics for supp ort in carrying out the researc h. References [1] M. G. Kendall, A. S. Stuart, The Adv anced Theory o f Sta t istics, V ol. 2 , c h. 30, sec. 30.4, Griffin Publishing Company , London, 1973. [2] N. D. Ga gunash vili, Nucl. Instr. Meth. A 5 96(2008)4 3 9. [3] I. M. Sob ol, Numerical Mon te Carlo metho ds, c h.5, Nauk a, Mosco w, 197 3 . [4] K. Pearson, Phil. Mag. 50(190 0 )157. [5] M. G. Kendall, A. S. Stuart, The Adv anced Theory o f Sta t istics, V ol. 1 , c h.15 , sec. 15.10, Griffin Publishing Compan y , London, 1973 . [6] B.V. G nedenk o, V.Y u. K o rolev, R a ndom Summation: Limit Theorems and Applicatio ns, CR C Press, Bo ca Ra ton, Florida, 1996. [7] H. V. Henderson, S. R. Searle, SIAM Rev. 23(19 81)53. [8] H. Robbins, Bull. Amer. Math. So c. 54(1948)11 5 1. [9] N. D. Ga gunash vili, Nucl. Instr. Meth. A614 (2010)287 . [10] G. H. G olub, SIAM Rev. 15(1973 )318. [11] J. Neumann, V arious tec hniques used in connection with rando m digits, NBS Appl. Math. series, 12(19 51)36-38 . 26 [12] S. Baker, R. D. Cousins, Nucl. Instr. Meth. 221(1984)43 7. [13] G. Zech, Nucl. Instr. Meth. A691(2012 )178. [14] W. T. Eadie, D . D ry ard, F. E. Ja mes, M. Ro os, B. Sadoulet, Sta tistical Metho ds in Experimental Ph ysics, North-Holland Publishing Company , c h. 4, sec.4.1.2, Amsterdam,London, 19 71. [15] D. S. Mo ore, G. P . McCab e, In tro duction t o the Practice of Statistics, W. H. F reeman Publishing Compan y , New Y ork, 20 07. [16] W. G. Co c hran, Ann. of Math. Stat . 23 ( 1 952)315. [17] N. D . G agunash vili, Comput. Ph ys. Comm un. 183(2012 )418. [18] G. Bo hm, G. Zec h, In tro duction to Statistics and Data Analysis for Ph ysicists, V erlag Deutsc hes Elektronen-Sync hrotron, 2010. [19] N. D . G agunash vili, Nucl. Instr. Meth. A6 3 5(2011)86 . [20] D0 Collab oration, V. M. Abazov et al., Phys . L ett. B 693(2 0 10)515. [21] H. Cramer, Mathematical metho ds of statistics, c h. 30, Princeton Uni- v ersity Press, Princeton, 1999. [22] N. D . G agunash vili, Nucl. Instr. Meth. A6 1 4(2010)28 7. [23] N. D . G agunash vili, Comput. Ph ys. Comm un. 183(2012 )193. 27

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment