CHIWEI: A code of goodness of fit tests for weighted and unweighted histograms

A self-contained Fortran-77 program for goodness of fit tests for histograms with weighted entries as well as with unweighted entries is presented. The code calculates test statistics for case of histogram with normalized weights of events and in cas…

Authors: Nikolai Gagunashvili

CHIWEI: A co de of go o dness of fit tests for w eigh ted and un w eigh ted histogr a ms N.D. Gagunash vili a, ∗ a University of Akur eyri , Bor gir, v/Nor dursl´ od, IS-600 Akur eyri, Ic e land Abstract A F ortran-77 program for go o dness of fit tests fo r histograms with w eighted en tries as w ell as with unw eigh ted en tries is presen ted. The co de calculates test statistics for case o f histogram with normalized w eigh ts of ev en ts and in case of unnormalized w eigh ts of ev en ts. Keywor ds: c hi-square test generalization, comparison exp e rimen tal and sim ula t ed data, data interpretation, Monte Carlo metho d PR OGRAM SUMM AR Y Pr o gr am Title: CHIWEI Journal R efer enc e: Catalo gue identifier: Lic ensing pr ovisions: none Pr o gr amming language: F ortran-77 Computer: Any Unix/Lin ux workstati on or PC w i th a F ortran-77 compiler Classific ation: 4.13, 11.9, 16.4, 19.4 External r outines/libr aries use d: FPLSOR (M103) fr o m C E RN Prog ram Lib rary Natur e of p r oblem: The program calculate s go o dness of fit test statistics for w eigh ted histograms Solution metho d: Calculation of test s tatistics is done a ccording form ulas p resen ted in Ref. [1] ∗ Corresp onding autho r . E-mail addr ess: nikolai@simnet.is Pr eprint submitt e d to Co mputer Physics Commun i c ations Octob er 29, 2018 References [1] N.G. Gagunash vili, Nu cl. Instrum. Meth. A596 (200 8) 439. 1. I n tro duction A histogram with m bins for a giv en pr o babilit y density function p ( x ) is used to estimate the probabilities p i = Z S i p ( x ) dx, i = 1 , . . . , m (1) that a random ev ent b elongs to bin i . Inte gratio n in (1) is done o v er the bin S i . A histog r a m can b e o btained as a result of a random exp erimen t with probabilit y densit y function p ( x ). Let us denote the num b er of random ev ents b elonging to the i th bin of the histogram as n i . The t o tal n um b er of ev en ts in the histog r am is equal to n = P m i =1 n i . The quantit y ˆ p i = n i /n is a n estimator of p i with exp ectation v a lue E ˆ p i = p i . The problem of go o dness of fit is to test the hypothesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (2) where p i 0 are sp ecified probabilities, and P m i =1 p i 0 = 1. The test is used in a data analysis for comparison theoretical frequencie s np i 0 with the observ ed frequencies n i . The test statistic X 2 = m X i =1 ( n i − np i 0 ) 2 np i 0 (3) w as suggested b y Pearson [2 ]. P earson sho w ed that the statistic (3) has appro ximately a χ 2 m − 1 distribution if the hypothesis H 0 is true. T o define a w eigh ted histogram let us write the probabilit y p i (1) for a giv en probability density function p ( x ) in the form p i = Z S i p ( x ) dx = Z S i w ( x ) g ( x ) dx, (4) where w ( x ) = p ( x ) / g ( x ) (5) 2 is the w eight function and g ( x ) is some other probabilit y densit y function. The function g ( x ) mus t b e > 0 fo r p oints x , where p ( x ) 6 = 0. The w eigh t w ( x ) = 0 if p ( x ) = 0, see Ref. [3]. Because of the condition P i p i = 1 further w e will call the ab ov e defined w eigh ts nor malized w eigh ts as o pp osite to the unnormalized w eigh ts ˇ w ( x ) whic h ar e ˇ w ( x ) = const · w ( x ). The histogram with normalized w eights w as obtained from a random exp eriment with a probabilit y densit y function g ( x ), and the w eigh ts of t he ev ents w ere calculated according to (5). Let us denote the total sum o f the w eigh ts of the eve nts in the i th bin of the histog ram as W i = n i X k =1 w i ( k ) (6) and the total sum of squares of we ights as W 2 i = n i X k =1 w i ( k ) 2 , (7) where n i is the num b er of ev en ts at bin i and w i ( k ) is t he w eigh t of the k th ev ent in the i th bin. The t otal n umber o f ev en ts in the histogram is equal to n = P m i =1 n i , where m is t he num b er of bins. The quantit y ˆ p i = W i /n for the histogram with normalized weigh ts is the estimator of p i with t he exp ectation v alue E ˆ p i = p i . Note tha t in the case where g ( x ) = p ( x ), the w eigh ts of the ev ents a re equal to 1 and the histogr a m with normalized we ights is the usual histogram with un w eighted en tries. F or w eigh ted histograms again the problem of go o dness of fit is to test the h yp othesis H 0 : p 1 = p 10 , . . . , p m − 1 = p m − 1 , 0 vs. H a : p i 6 = p i 0 for some i, (8) where p i 0 are specified probabilities, and P m i =1 p i 0 = 1. The test statistic that is a generalization of Pe arson’s statistic ( 3 ) w as prop osed in [1] for cases of histograms with normalized w eigh ts of entries as w ell as with unnormalised w eights of en t ries. A code for the calculation of test statistics is presen ted in this a rticle. As shown in [1] if hypothesis H 0 (8) is t r ue then the statistic for a histogram with normalized w eigh ted en tries has appro ximately the χ 2 m − 1 distribution and for a histogram with unnormalized w eigh ted entries has χ 2 m − 2 distribution. 3 Use of the pro p osed test is inappropriate if any exp ected count in bin of histogram is b elo w 1 or if the expected coun t is less than 5 in more than 20% of the bins. This empirical restriction kno wn for the usual c hi-square test [4] is quite reasonable for w eighted histograms. Information for readers. Recen tly , a no ther pap er dedicated to w eighted histograms has b een publishe d in ”Computer Phys ics Communic ation“, see Ref. [6]. T he same author has pr esen ted a program fo r calculating test statistics to compare w eigh ted histogram with un w eigh ted histogram and t w o histograms with w eigh ted entries . The test can b e used fo r the compar- ison of exp erimen tal dat a distributions with sim ulated data distributions as w ell as for the tw o sim ulated data distributions. 2. Computer program CHIWEI is subroutine whic h can b e called from F ortran program for the calculation of test statistics. Usage CALL CHIWEI(P,W 1,W2,N,NCHA,MODE,STAT,NDF,IFAIL) Input D ata P – one dimensional real a r ra y of probabilities p i W1 – one dimensional ar r ay , sum of weigh ts W i in eac h bin W2 – one dimensional ar r ay , sum of squares of weigh ts W 2 i in eac h bin N – n um b er of ev en ts n NCHA – n umber of bins m MODE – mus t b e equ al to 1 for a histog r a m with normalized w eigh ts, and equal 2 for histogram with unnor malized w eigh ts 4 Output data ST A T – test statistic follo wing a chi-sq uare distribution with NDF degrees of freedom if h yp ot hesis H 0 is true NDF – n um b er of degree of freedom (will b e m -MODE) IF AIL – will b e > 0 if calculation is not succes sful. 3. T est run W e tak e a distribution p ( x ) ∝ 2 ( x − 10) 2 + 1 + 1 ( x − 14) 2 + 1 (9) defined on the interv a l [4 , 16] and represen t ing t w o so-called Breit-Wigner p eaks. Tw o cases of the pro babilit y densit y function g ( x ) are considered g 1 ( x ) = p ( x ) (10) g 2 ( x ) ∝ 2 ( x − 9) 2 + 1 + 2 ( x − 15) 2 + 1 (11) Distribution (10) giv es an un w eigh ted histogram and the metho d coin- cides with P earson’s chi square test. Distribution (11 ) has t he same form of parametrization as (9), but with differen t v a lues of the para meters. Three cases of histograms w ere considered: un we ighted histogra m, histogram with w eigh ts p ( x ) /g 2 ( x ) and histogram with unnormalized weigh ts 2 p ( x ) /g 2 ( x ). Histograms with 5 bins w ere created by sim ulatio n 1000 en tries for eac h case. The results of the calculations are presen ted b elo w. Program PR OB(G100) [5] has b een used for calculating p-v alues. T est 1 INPUT P 0.0296 0.1106 0.4460 0.2067 0.2072 W1 26.0000 115.0000 454.0000 183.0000 222.0000 W2 26.0000 115.0000 454.0000 183.0000 222.0000 5 N 1000 NCHA 5 MODE 1 OUTPUT STAT 4.5291 (p-v alue=0.3391 ) NDF 4 IFAIL 0 T est 2 INPUT P 0.0296 0.1106 0.4460 0.2067 0.2072 W1 36.0112 106.1355 458.3037 197.8123 205.7211 W2 28.2698 56.9601 938.7897 363.4649 172.2003 N 1000 NCHA 5 MODE 1 OUTPUT STAT 2.3380 (p-v alue=0.6738 ) NDF 4 IFAIL 0 T est 3 INPUT P 0.0296 0.1106 0.4460 0.2067 0.2072 W1 72.0225 212.2710 916.6075 395.6246 411.4423 W2 113.0790 227.8403 3755.1587 1453.8595 688.8014 N 1000 NCHA 5 MODE 2 OUTPUT 6 STAT 2.2398 (p-v alue=0.5241 ) NDF 3 IFAIL 0 References [1] N.G. G agunash vili, Nucl. Instrum. Meth. A596 (2008) 439. [2] K. Pearson, Phil. Mag. 5th Ser. 50 (1900) 15 7. [3] I. Sobol, A Primer F or The Mon te Carlo Metho d, CRC Press, Bo ca Raton, Florida, 1 994. [4] D.S. Mo o r e, G.P . McCab e, In tro duction to the Practice of Stat istics, W.H. F reeman Publishing Compan y , New Y ork, 2005 . [5] CERN Prog ram Library , http://cernlib.w eb.cern.c h/cernlib/ . [6] N.D. G agunash vili, Comp. Ph ys. Comm. CPC-D- 11-001 9 6R1 7

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment