On two-sided p-values for non-symmetric distributions
Two-sided statistical tests and p-values are well defined only when the test statistic in question has a symmetric distribution. A new two-sided p-value called conditional p-value $P_C$ is introduced here. It is closely related to the doubled p-value…
Authors: Elena Kulinskaya (Imperial College London)
On t w o-sided p-v alues for non-symmetric distributions Elena Kulinsk a ya ∗ Octob er 5, 2008 Abstract Tw o-sided statistical tests and p -v al ues are wel l defined only wh en the test statistic in question has a symmetric distribution. A new t wo-sided p-v alue called c onditional p-value P C is in trod uced here. It is closely related to the doubled p-v alue and has an in tuitiv e app eal. Its use is adv o cated for b oth con tin uous and discrete distribu tio ns. An imp ortan t adv antag e of this p-v alue is that equiv alen t 1-sided tests are transformed into P C -equiv alen t 2-sided tests. It is compared to the widely used doubled and minim um lik eliho o d p-v alues. Examples in- clude the v ariance test, the binomial and the Fisher’s exact test. k eyw ords: tw o-sided tests, Fisher’s exact test, v ariance test, bino- mial test, F test, minimum lik eliho od ∗ Statistical Advisory Serv ice, 8 P r inces Gardens, Imper ial Co lleg e, London SW7 1NA, UK. T el +442 0 759 4 3 950. e-mail: e.kulinsk aya@ic.ac.uk 1 1 In tro duction Tw o- sided statistical tests are widely used and misused in n umerous applica- tions of statistics. In fa ct, some applied journals do not accept pap ers quoting 1-sided p-v alues an ymore. Examples include The New England Journal o f Medicine, Journal of the National Cance r Institute and Journal of C linical Oncology among others. Unfortunately , t w o-sided statistical tests and p-v alues are w ell defined only w hen the tes t statistic in question has a symmetric distribution. The difficulties with tw o-sided p-v alues a r ise in a general case of a non-symmetric distribution, though they are more often commen ted on for discrete distri- butions. The most famous example is an ongoing discussion ab out how 2-sided p- v alues should b e constructed for the Fisher’s exact test . This discuss ion w as started in 1935 b y Fisher (1935) and Irwin (19 3 5 ). Numerous dev elopmen ts of the next 50 ye ars are summarised in Y ates (1984) and discus sion thereof. The more recen t con tributions include sev eral prop osals based o n an a mo di- fied UMPU test Llo yd (1988), D unne et al. (19 9 6 ), Meulepas (1998). See also Agresti and W ac k erly (1977), D upont (19 86 ), Da vis (1986), Agresti (199 2). A (far from exhausting) list of 9 differen t proposals is giv en in Meulepas (1998). The problem is still not resolv ed. Fisher adv o cated doubling the 1 - side d p-v alue in his letter to Finney in 1946 (Y ates, 1984, p.444) . This doubled p- v a lue is denoted b y P F . Fisher’s motiv ation w as an equal prior w eigh t of departure in either direction. Other argumen ts for doubli ng includ e in v ariance under transformation of the dis- tribution to a normal scale, and ease of appro ximation b y the chi -square distribution (Y ates, 1984). One of the eviden t drawbac ks o f the doubling rule is that it may result in a p-v alue g reater than 1. The doubled p- v alue is used in the ma jo r ity of statistical softw are in the case of con tin uously dis- tributed statistics a nd o ften in the discrete case. The primary con tribution of this article is the in tro duction of a new metho d of defining tw o-sided p-v alues to b e called ‘ c onditional two-side d p-values ’ denoted b y P C . The conditional p-v alue is close ly related to the doubled p-v a lue and has an in tuitiv e app eal. It is demonstrated that this new tw o-sided p-v alue has prop erties which make it a definite impro v emen t on curren tly used tw o-sided p-v alues for both disc rete and con tin uous non- 2 symmetric distributions. Another po pular tw o-sided p-v alue for non-sym metric discrete distrib u- tions implemen t ed in computer pac k ages, R (R D ev elopmen t Core T eam, 2004) in particular, is adding the probabilities of the p oin ts less proba- ble than the observ ed (at both tails). This p-v alue is denoted b y P pr ob . This metho d was in tro duced in F reeman and Halton ( 1 951), and is based on Neyman and Pearson (1 931) idea of ordering multinomi al probabilities; this is called ‘the principle of minim um like liho o d’ by Gibb ons and Pratt (197 5 ), see also George and Mudholk ar (1990). Hill and Pik e (196 5 ) w ere the first to use this p-v alue for Fisher’s exact test. Man y statisticians ob jected to this principle. Gibb ons and Pratt (1975) commen t ed that ‘The minimu m lik eliho o d metho d can also lead to absurdities, especially when the distribu- tion is U-shap ed, J-shap ed, or simply not unimo dal.’ Radlo w and Alf (19 7 5 ) p oin ted o ut ‘This pro cedure is justified only if ev en ts of low er probability a re necess arily more discrepan t from the n ull hy p othesis . Unfortunately , this is frequen tly not true.’ The follo wing example clearly demonstrates another unfortunate f eature of this p-v alue. When a v alue of densit y is asso ciated with a high 1- sid ed p-v alue at one tail, the v alue a t the oppo site tail cannot b e rejected ev en though it ma y hav e a v ery low 1-sided p-v alue. Example: Tw o-sided v aria nce test based on t he Chi-square dis- tribution Supp ose w e hav e 6 observ ations from a p erfectly normal p opula- tion and wish to test the null hy p othesis that the v ariance σ 2 = σ 2 0 against a t w o-sided alternativ e. The test statistic X = ( n − 1) S 2 /σ 2 0 ∼ ( σ 2 /σ 2 0 ) χ 2 (5), where S 2 is the sample v ariance. F or X = 1 (or S 2 = 0 . 2) the 1-sided p- v alue on the left tail is 0.0374, the dens it y is 0.0807, the sym metric v alue on the right tail is x ′ = 6 . 711, the 1-sided p-v alue is 0.2431, see dotted lines on the left plot o f F igure 1; similarly for X = 0 . 5 ( S 2 = 0 . 1) the dens it y is 0.0366, the p-v alue on the left tail is 0.0079, the symmetric v alue is 9.255 , p-v alue=0.0993 (dashed lines on the same plot). It is v ery difficult to reject the n ull f o r small observ ed v alues. Giv en critical v alues on the left and righ t t a il c L,α and c R,α , suc h that χ 2 5 ( c L,α ) + 1 − χ 2 5 ( c R,α ) = α , the p o we r o f a t w o-sided v aria nce test at lev el α is calculated as χ 2 5 ( ρc L,α ) + 1 − χ 2 5 ( ρc R,α ), where ρ = σ 2 0 /σ 2 . The p o w er of four 0 . 05-lev el tests is plotted at the right plot of Figure 1. The test based 3 Figure 1 : Tw o-sided v ariance test with the statistic X ∼ χ 2 (5) . On the left plot, the densit y of χ 2 (5) distribution, with dotted/dashed lines illustrating the calculation of the P pr ob for X = 1 a nd X = 0 . 5. On the righ t plot, the p o we r of the 5%-lev el v ariance tests based on the p-v alues P pr ob ( x ) (solid line), P F ( x ) ( dashed line), P E C ( x ) (dotted line), and the UMPU test (long-dashed line). The horizon ta l line at 0 . 05 corresp onds to the significance lev el. on the P pr ob is eviden tly biased, with v ery low pow er for ρ < 1, i.e. when σ < σ 0 . The minim um v alue of the p o w er is 0 . 01. The uniformly most p o w erfull unbiase d (UMPU) test for this example has the critical region defined b y c L,. 05 = 0 . 989 and c R,. 05 = 14 . 37 corresponding to critical lev els α L = . 037 and α R = . 0 1 3 at the left and right ta il, res p ec- tiv ely . Finally , the generalized like liho o d ratio (GLR) tes t is based on the statistic Λ = [( X/n ) exp(1 − ( X/n ))] n/ 2 , and it is biased (Stuart and Ord, 1991, Example 23.5, p.882). This is not exceptional; Bar-Lev et a l. ( 2 002) sho wed that for a con tin uous exp onen tial family F on the r eal line, the GLR and UMPU tests coincide if and only if, up to an affine transformation, F is either a normal, inv erse Ga us sian or gamma family . The new conditional 2-sided p-v alue P C is formally defined in the next section. The p o w er of the tests based on the doubled and conditional p-v alue for the chi -square example is also plotted in Figure 1. They are muc h less biased, with minim um p ow er of 0 . 045 and 0 . 048 respectiv ely . 4 The formal definition of the conditional 2 - sid ed p-v alue P C and the com- parison of its prop erties to tho se o f the doubl ed p-v alue and the P pr ob for a case o f contin uous distributions is giv en in Section 2, a nd for discrete distri- butions (binomial a nd hypergeometric) in Section 3. Discussion is in Section 4. The use of the conditional 2-sided p-v alue P C is advocated for b oth con- tin uous and disc rete distributions . An important adv an tage of this p-v alue is that equiv alen t 1-sided tests are transformed in to P C -equiv alen t 2- si ded tests. 2 Tw o-s ided p- v alue s for con tin uous asym- metric di s tributions Consider a general case of a statistic X with a strictly increasing con tinuous n ull distribution F ( x ) with contin uous densit y f ( x ). F or a n observ ed v alue x of X , one-sided p-v alue on the left tail is defined as P ( X ′ ≤ x | X = x ) = F ( x ), where X ′ ∼ F ( x ) indep enden t from X . Similarly , on the righ t tail the p-v alue is P ( X ′ ≥ x | X = x ) = 1 − F ( x ). Denote by A a generic lo cation parameter c hosen to separate the t wo tails of t he distribution F . P articular examples include the me an E = E( X ), the mo de M = arg sup x f ( x ), or the median m = F − 1 (1 / 2). Wh ic h parameter should be used to separate the t w o tails dep en ds on the con text; the mean seems to b e the most a ppropriate when a test statistic is based on an estimate of a natural para meter in an exp onen tial family , as is the case with binomial or Fisher’s exact test. General theory b elo w is applicable regardless of the parameter c hosen, though the details of examples ma y differ. In terestingly , it do es not matter m uc h for the most imp ortan t non-symm etric discre te dis tributions: the mean when attainable coincides with the mo de (or one of the tw o mo des) for Poiss on, binomial and h ypergeometric distributions. The latter tw o distributions a re discussed in Section 3 . Definition 1 Weigh te d two-taile d p-value c enter e d at A with weights w = ( w L , w R ) satisfying w L + w R = 1 is de fi ne d as P A w ( x ) = min ( F ( x ) w L | xA , 1) . (1) 5 Doubled p-v alue denoted by P A F ( x ) has we igh ts 1 / 2. Without loss of gen- eralit y assum e that A > m . Then the doubled p- v a lue P A F ( x ) is equ al to 2 F ( x ) for x < m , 1 fo r m ≤ x ≤ A , and 2(1 − F ( x )) for x > A . Th us the doubled p-v alue is not con tinuous at A unless m = A , its deriv ativ e is also discon tinu ous a t m . Similarly , a w eighte d p-v alue P A w ( x ) is con tinuous at A iff w L /w R = F ( A ) / (1 − F ( A )) a nd an additional requireme n t of P A w ( A ) = 1 results in w L = F ( A ) a nd w R = (1 − F ( A )) arriving at the next definition. Definition 2 C o nditional 2-side d p-value c enter e d at A is define d as P A C ( x ) = P A { F ( A ) , 1 − F ( A ) } ( x ) = P ( X ′ ≤ x | X = x ≤ A ) + P ( X ′ ≥ x | X = x ≥ A ) . (2) This is a smo oth function of x (but at A ), with a maxim um of 1 at A . It strictly increas es for x < A and decreas es for x > A . The conditional p- v alue is conceptually close to the doubled p-v alue, the only difference b ein g that the t w o tails are w eigh t ed in v ersely prop ortionate to their probabilities. This results in inflated p- v alues o n the thin tail, and deflated p-v alues on the thic k tail when compared to the doubled p-v alue. When the tails are defined in resp ect to the median, the t w o p-v alues coincide: P m F ( x ) = P m C ( x ). Th us conditional p-v alue is equal to the usual doubled p- v alue for a symmetric dis- tribution. It is easy to see that under the null h ypothesis P A C ( x ) is uniformly distributed on [0 , 1] giv en a particular tail, i.e. P 0 ( P A C ( X ) ≤ p | X ≤ A ) = p , similar to a 1-sided p-v alue. There is a n eviden t connection b et w een a c hoice of a t w o-sided p-v alue and a critical region (CR) for a t w o-sided test at lev el α . A CR is defined through critical v alues correspo nding to probabilities α 1 = w L α and α 2 = w R α , with the w eigh ts of the tw o tails w L + w R = 1. It can equ iv alen tly b e defined through a w eighte d p-v alue as { x : P A w ( x ) < α } . The doubled p-v alue c or- respo nds to w L = w R = 1 / 2. The conditional p-v alue is equiv alen t to the c hoice w L = F ( A ), w R = 1 − F ( A ). F or a t w o- sided test, critic al v alues c L,α and c R,α satisfy F ( c L,α ) = w L α and 1 − F ( c R,α ) = w R α . Th us w L = w L ( α ) = F ( c L,α ) /α . Define A = A ( α ) = F − 1 ( F ( c L,α ) /α ). Then the CR is { x : P A C ( x ) < α } . Therefore an y 2-sided test, a UMPU test inclusiv e, is a test based on conditional p-v alue cen tered 6 at some A = A ( α ). Con v ersely , if t he A v alue is c hosen to b e indep en den t of α , the resulting test is, in general, biased. Since an indep endence from lev el α is a natural requiremen t for a p- v alue, some bias cannot b e escaped. Lemma 1 F or a one - p ar ameter exp onential family F ( x, θ ) , a two-side d level- α test b ase d on the c onditional p-value P C ( A ) is less biase d in the n e ighb or- ho o d o f the nul l value θ 0 than the standar d e qual tails test b ase d on the double d p-value whenever F ( A ) ∈ (1 / 2 , w ∗ L,α ] , wher e w ∗ L,α is the weight at the lef t tail of the UMPU test. Pro of Denote test critical function b y φ ( x ). This is an indicator function of the CR, so E 0 [ φ ( X )] = α , and the p o w er is β ( θ ) = E θ [ φ ( X )]. Without loss of generalit y X is the sufficien t statistic. The deriv ativ e of the p o we r is (Lehmann, 1 959, p. 1 27) β ′ ( θ ) = E θ [ X φ ( X )] − E θ ( X )E θ [ φ ( X )] (3) F or an UMPU test β ′ ( θ 0 ) = 0 . F or a test with w eigh t w L at the left tail, β ′ ( θ 0 ) = Z F − 1 ( αw L ) −∞ xdF + Z ∞ F − 1 (1 − α (1 − w L )) xdF − αE . F or α < 1, this is strictly decreasing function of w L equal zero a t w ∗ L,α . When 1 / 2 < w ∗ L.α , an y w L ∈ (1 / 2 , w ∗ L,α ] provid es p ositiv e v alues of β ′ ( θ 0 ), and when 1 / 2 > w ∗ L.α , the v alues of β ′ ( θ 0 ) are negativ e; in an y case the gradien t is the steepest and the bias is the larg est at 1 / 2, as required. Lemma 1 prov ides a sufficien t condition for t he P E C -based test to b e less biased than the equal tails test, but this condition is not neces sary . This condition holds for the χ 2 distribution, and the v ariance test ba sed on P E C ( x ) is uniformly (in n ) less biased then the test based on the doubled p-v alue, left plot of F igure 2. The doubled p-v alue based t est is asymptotically UMPU, Shao (1999), a nd so is the P C -based test. In the tw o-sample case, the equal- tails F -test of the equalit y of v ariances is UMPU for equal sample sizes , a nd the P C -based test is less biased when the ratio o f sample sizes is larger than 1.7 (starting from n = 6), whereas lemma 1 holds fo r ev en more un balanced sample sizes with t he ra tio of 2.5 or ab o v e, r ig ht plot of Figure 2. Finally , consider the minim um lik eliho o d p-v alue. 7 Figure 2: Bias of the P F ( x )-based v ar ia nce test at 5% lev el (dashed line), and P E C ( x )-based test (dotted line ) in the 1-sample case ( χ 2 -test, left plot) and in the 2- samp le case with n 1 = 6 (F-test, righ t plot). 0 20 40 60 80 100 −0.014 −0.010 −0.006 −0.002 degrees of freedom bias of the variance test 0 20 40 60 80 100 −0.004 −0.002 0.000 difference in sample sizes bias of the 2−sample variance test Definition 3 Minim um likeliho o d p-value is P pr ob ( x ) = P ( f ( X ) ≤ f ( x )) . P pr ob ( x ) reac hes 1 at the mode, and P pr ob ( A ) < 1 whenev er A 6 = M . It is not a unimo dal function of x when the densit y f ( x ) is not unimodal. In a case of a unimo dal dis tribution, for a pair o f conjugate p oin ts ( x, x ′ ) : x < M < x ′ , f ( x ′ ) = f ( x ), it is calculated as P pr ob ( x ) = F ( x ) + 1 − F ( x ′ ). It has a U nif (0 , 1) distribution under the n ull. When used to define a test, the acceptance region defined as { x : P pr ob ( x ) > α } con ta ins the p o in ts with the highest densit y , and is therefore of minim um length. In v erting this test results in t he shortest confidence in terv als, see Sterne (1954) for the binomial and Baptista and Pik e (1977) for the hyperge- ometric distribution. It is also related to Bay es shortest p osterior confidence in t erv als, see Wilson a nd T onascia (1971) for the in terv als for the standard deviation σ a nd the ra tio of v ariances in normal p o pulations , based on in- v erse chi and F distribution, resp ectiv ely . The next three examples clarify the prop erties of the conditional p-v alue P C in comparison to P pr ob . Example: T riangular distribution 8 Supp ose that t he n ull densit y is giv en by f ( x ) = 2( x + a ) / [ a ( a + b )] fo r − a ≤ x ≤ 0 and f ( x ) = 2( b − x ) / [ b ( a + b )] for 0 ≤ x ≤ b . Th e mo de M = 0, and F (0) = a/ ( a + b ). Then for x < 0, P pr ob = F ( x ) /F (0) a nd for x > 0, P pr ob = (1 − F ( x )) / (1 − F (0)), George and Mudholk ar (1990). Th us, P M C ( x ) = P pr ob ( x ). This is the only unimo dal dis tribution for whic h this equalit y holds as it requires the linearit y of the densit y f ( x ). Example: Uniform distribution Consider a U nif (0 , 1) distribution. This is a symmetric distribution with E = m = 1 / 2, and P C = P F = 2 x for x ≤ 1 / 2 , and P C = P F = 2(1 − x ) for x ≥ 1 / 2, whereas P pr ob ≡ 1 for all v a lues of x ∈ [0 , 1]. This example shows the cardinal difference b et w een the t w o p-v alues. P C ac- kno wledges un usual v alues o f x at the ends of the interv al, and the P pr ob do es not. This is a somewhat extreme example, b ecause the uniform distribu- tion has a whole inte rv al of mo des. The next example deals with a unimo dal distribution but show s exactly the same prop erties of the respectiv e p-v alues. Example: Left-truncated norma l distribution. Denote the standard normal distribution function a nd densit y b y Φ( x ) and φ ( x ), resp ectiv ely . Consider a left-truncated at − L < 0 normal distri- bution G L ( x ) = (Φ( x ) − Φ( − L )) / (1 − Φ( − L )) defined for x ≥ − L . The mo de is at zero. Then P pr ob ( x ) = 2 G L ( −| x | ) + 1 − G L ( L ) for − L ≤ x ≤ L , and P pr ob ( x ) = 1 − G L ( x ) for x > L . P pr ob reac hes 1 at 0, and P pr ob ( ± L ) = 1 − G L ( L ) is con t inuous at L , but its deriv ativ e is not con tin uous at L . The mean is E = E ( L ) = φ ( − L ) / (1 − Φ( − L )), and t he conditional p-v alue P C ( x ) reach es 1 at E ( L ). An example for L = 0 . 5 is plotted in the righ t plot in Figure 3. F or this e xample E ( L ) = 0 . 50 9 a nd the we igh t of the left tail is w L = 0 . 558 . The main difference b et w een the t wo p-v alues is that P pr ob ≥ 1 − G ( L ) at the left tail, so ev en the low v alues of x in the vicin- it y of − L ha v e rather high p-v alues. On the other hand, P C is v ery close to zero for these v alues, recognizing that it is rather un usu al to get close to − L . It seems that a small t w o-sided p-v alue at the left t a il mak es more sense. The ab ov e t w o examples show the prop erties of the P C whic h are p erhaps clear from its definition: it compares a v alue x to other v alues at the same tail. On the other hand, P pr ob dep en ds on the v alues at b oth tails. The same circumstance s arise in the v ariance test example whic h w as in tr o duced in the In tro duction. 9 Figure 3: Plot of P pr ob ( x ) (solid line), P E F ( x ) (dashed line), and P E C ( x ) (dotted line) for the χ 2 (5) distribution (left plot) and for a standard normal distri- bution truncated at − 0 . 5 (righ t plot). The plotted doubled p-v alue P E F ( x ) is not t r uncated at 1 . −0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 x p−value Example: v ariance test based on the Chi-square distribution (con tin ued) Recall, that for X = 0 . 5 ( s 2 n = 0 . 1) the 1- si ded p-v alue is .0079, and the v alue of X with equal dens it y is X ′ p = 9 . 256 with the 1- sided p-v alue o f .0993. The mean E = 5 , and the conditional p-v alues are P E C (0 . 5) = 0 . 0135, and P E C (9 . 256) = 0 . 239, the w eigh t o f the left tail is w L = F ( E ) = 0 . 584. The v alue with the same conditional p-v alue on the opp osite tail is X ′ C = F − 1 (1 − (1 − w L ) P C (0 . 5)) = 16 . 48 with the 1-sided p-v alue of 0.0 0 56. Clearly , X ′ C is more comparable to X than the v alue X ′ p . The three p-v alues are plotted at the left plot in Figure 3. The pow er of the three tes ts and of the UMPU test (all at 5% lev el) is sho wn in the righ t plot in Figure 1. The UMPU test is the conditional test with A = 6 . 403, corresp onding to the w eight w ∗ L = 0 . 731. All three tests are biased, with the bias B defined as the mini m um difference b et we en the p o we r and lev el b eing B F = − 0 . 0046 fo r the doubled and B C = − 0 . 0020 for the conditional test. This ag r ees with Lemma 1. The doubled test is slightly less p ow erful on the righ t, and sligh tly more on the left. The test based o n P pr ob has v ery large bias and suc h low p o we r for the alternativ es σ < σ 0 , that it do es not deserv e to b e called a tw o-sided test. 10 The main difficult y asso ciated with the tw o-sided tests is that tw o equiv- alen t 1-sided tests ma y result in distinct 2-sided tests. F or the v aria nce test example the tests based on | s 2 n − σ 2 0 | and on | log ( s 2 n /σ 2 0 ) | are not equiv alen t. Let D ( x, A ) b e a measure of distance fro m A . It imp oses an equiv alence of p oin ts at tw o sides of A : eac h v alue x < A has an equidistan t v a lue x ′ D : D ( x, A ) = D ( x ′ , A ). Tw o-sided tests based on | X − A | and D ( X , A ) are not equiv alen t, generally sp eaking, b ecause for x < A , the equidis tan t v alue x ′ D 6 = 2 A − x . This results in differen t rejection regions f or the t w o tests. The main adv an tages of the conditional p-v alue P A C ( x ) are giv en in the next Lemma. Lemma 2 (i ) F or a strictly incr e as i ng function T ( x ) , the c ondition a l p-value P C ( T ( x ) | T ( A )) = P A C ( x ) . (ii) Supp os e D ( x, A ) strictly de cr e a s e s for x < A a n d strictly incr e ase s fo r x > A , and D ( A, A ) = 0 . Define the c ond itional p -value fo r the dis- tanc e D ( x, A ) as P C ( D ( x, A )) = P ( D ( x ′ , A ) ≥ D ( x, A ) | X = x ≤ A ) + P ( D ( x ′ , A ) ≥ D ( x, A ) | X = x ≥ A ) . T hen P C ( D ( x, A )) = P C ( | x − A | ) . The first statemen t of the lemma easily follows from the definition of P A C ( x ), and for the second statemen t tak e T ( x ) = D ( x, A )sign( x − A ). This is a strictly increasing function of x , a nd the pro of follo ws fro m pa rt (i). The first part of the lemma ensures that equiv alen t 1-sided tests are trans- formed in to P C ( x )-equiv alen t 2-sided tests. The second part states that the 2-sided tests based on an y measure of distance from A a re P C ( x )-equiv alen t. This is true b ecause the conditional p- v a lue ignores an y equiv alence b et w een the p oin ts at differen t tails. 3 Discrete distribut ions In this section the 2-sided conditional p-v alue P C is defin ed for a discrete distribution. It is also com pared to P F and P pr ob for t w o importa nt case s: binomial and h ypergeometric distributions. The definition of the conditional p-v alue P C (2) is also applicable in a discrete case, but it may require a mo dification when the v a lue A is attain- able. Since the v alue A b elongs to b oth tails, the previously defined weigh ts of the tails w L = P ( x ≤ A ) and w R = P ( x ≥ A ) add up to 1 + P ( A ) > 1. 11 The mo dified w eigh ts of the tails are w A ( m ) L = P ( x ≤ A ) / (1 + P ( A )) and w A ( m ) R = P ( x ≥ A ) / (1 + P ( A )). This mo dification is akin to con tin uit y correction. The formal definition of P C ( x ) is Definition 4 C o nditional two-side d p-value for a disc r e te distribution is P A C ( x ) = P ( X ≤ x ) w L | ( xA ) , (4) wher e the weights ar e w L = P ( x ≤ A ) and w R = P ( x ≥ A ) . Mo difi e d c onditional p-val ue P A ( m ) C ( x ) is defin e d with weights w m L = P ( x ≤ A ) / (1 + P ( A )) and w m R = P ( x ≥ A ) / (1 + P ( A )) in e quation (4). Tw o definitions coincide whe n the v alue A is not attainable. In a disc rete symmetric case when A = E = m is an attainable v alue the v alues of P m C ( x ) = P F ( x ) are double d 1-sided v alues, and the v alues of P C ( x ) are (1 + P ( A )) times smaller, and the P C ( x )-based test is therefore more lib eral. The conditional p-v alue has a mo de of 1 at A when this v alue is attainable, and t w o mo des of 1 at the attainable v alues abov e and b elo w A when A is not an a ttainable v alue. It has discrete uniform distribution when restricted to v alues at a particular tail, though not o v erall. In what follo ws we consider the case of A = E , and use the notation P C ( x ) = P E C ( x ). 3.1 Binomial distribution F or B inom ( n, p ) distribution the mo de is M = ⌊ ( n + 1) p ⌋ = ⌊ E + p ⌋ . When ( n + 1) p is an inte ger, M = ( n + 1) p and M − 1 are both mo des, and the mean E = np ∈ ( M − 1 , M ) is unattainable. When E is a n inte ger, M = E . In all cases the distance | M − E | < 1. The median is one of ⌊ np ⌋ or ⌊ np ⌋ ± 1. Consider first the symmetric case p = 0 . 5. F or o dd n , t he v alue ( n + 1) p is an in t eger, b oth tails of the distribution ha v e w eigh t 0.5 and P C ( x ) = P pr ob ( x ). F or e v en n , the mean E = np is an in teger, w L > 0 . 5, but w m L = 0 . 5. Un- mo dified v ersion P C ( x ) is symmetric at E with v alues P C ( x ) < P pr ob ( x ) for x 6 = E . The mo dified v ersion P ( m ) C ( x ) = P pr ob ( x ). Statistical pac k ages differ in rega r ds to the 2- sid ed p-v alues for the bino- mial t est: R (R Dev elopmen t Core T eam, 2004) uses P pr ob ( x ) a nd StatXact (www.cy tel.com) uses the doubled p-v alue. 12 The three p-v alues, P C , P ( m ) C , and P pr ob are plotted in F igure 4 f or p = 0 . 2 and tw o v alues of n , n = 10 and n = 11. In the first case E = 2 is an a t t a in- able v alue. It can b e seen that P ( m ) C > P C on the left plot. The w eigh t of the left ta il is w L = 0 . 678 vs w m L = 0 . 521. Consequen tly , P m C ( x ) = 1 . 3 P C ( x ) for all x but E . Mo dified conditional p- v a lue P ( m ) C is considerably closer to P pr ob at the left tail, a nd P pr ob < P C < P ( m ) C at the righ t (thin) tail. In fact, in this exam ple for n = 10 , p = 0 . 2, P pr ob pro vides exact 1 -side d p-v alues for x ≥ 4, P C ( x ) = 1 . 60 P pr ob ( x ) and P ( m ) C ( x ) = 2 . 09 P pr ob ( x ) for x ≥ 4. So P pr ob (5) = 0 . 0 3 3, P F (5) = 0 . 0 6 6, P C (5) = 0 . 052, and P m C (5) = 0 . 0 6 8. The t w o-sided binomial test as programmed in R uses P pr ob and w ould reject the n ull hy p othesis of p = 0 . 2 at 5% lev el giv en an observ ed v alue of 5, whereas a test based on the doubled or conditional p-v alue would not reject. The same thing ma y happen for m uc h larger v alues of n . F or example, for n = 1 0 1, p = 0 . 1 and the observ ed v alue of x = 17 the v alues are P pr ob = 0 . 030 , P F = 0 . 06 and P C = P m C = 0 . 052. F or n = 11 (right plot) E = 2 . 2 is not attainable. P C = P m C has t w o mo des at 2 and 3. Here w L = 0 . 617 and P C ( x ) = 1 . 62 P F ( x ) for x ≤ 2, whereas P C = 2 . 61 (1 − F ( x − 1)) for x ≥ 3. T ypically , P C ( x ) < P pr ob ( x ) at the thic k tail, a nd P C ( x ) > P pr ob ( x ) at the thin tail. Ev en fo r large n the difference b et w een P C and P pr ob is rather large. F or example, for n = 101 and p = 0 . 1 the v alues are P C (17) = 0 . 0 5 2 and P pr ob (17) = 0 . 030 in comparison to the 1- si ded p-v alue of 0 . 023. F or the binomial dis tribution the weigh t of the tails con v erges to 0.5 rather slo wly , and P C ( x ) → P F ( x ), see T a ble 1 . Whenev er the mean is at ta inable , the w eigh t of the thin righ t tail is also more than 0.5, and P C ( x ) < P F ( x ). If E is not attainable, the w eigh t of the thin tail is less than 0 .5, and then P C > P F . This is alwa ys true for P m C ( x ). The distribution is more symmetric when the mean is atta inable. Otherwise ev en for n = 100 1, the w eigh t of the left tail is w L = 0 . 52 2 for p = 0 . 1. 3.2 Hyp ergeometric distribution Consider a crosstabulation of t w o binary v ariables A and B . W e shall refer to n um b ers of observ ations in the cell ( i, j ) and resp ec tiv e probabilities as n ij and p ij , i, j = 1 , 2. The v alue n 11 is the statistic of Fisher’s exact test used 13 Figure 4: Plot of P pr ob (solid line, circles), P m C (long dash, filled circles), P C (doted line, squares) and P F (dashed line, triangles ) for B inom (10 , 0 . 2) distribution (left), a nd B inom (11 , 0 . 2)(rig h t). On the right plot P m C = P C . 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Number of successes p−value 0 2 4 6 8 10 0.0 0.2 0.4 0.6 0.8 1.0 Number of successes p−value to test for asso ciation of A a nd B giv en fixed margins n 1+ , n +1 , n . The v alue n 11 defines all the other en tries in a table with given margins. A parameter of primary imp ortance is the o dds ratio ρ = p 11 p 22 /p 12 p 21 estimated b y ˆ ρ = n 11 n 22 /n 12 n 21 . The case of no asso ciation p ij = p i + p + j is equiv alen t to ρ = 1. Denote the exp ecte d v alues m ij = E( n ij ) = n i + n + j /n , with E = m 11 = E( n 11 ). The n um b er n 11 > E iff ˆ ρ > 1 . Fisher (1935) deriv ed the distribution of n 11 as f ( n 11 ; n 1+ , n +1 ; ρ ) = n 1+ n 11 n − n 1+ n +1 − n 11 ρ n 11 P u n 1+ u n − n 1+ n +1 − u ρ u . The n ull distribution (standard h yp ergeometric) is for ρ = 1. F or testing H 0 : ρ = 1 vs H 1 : ρ > 1 the p-v alue is p + = P u ≥ n 11 f ( u ; n 1+ , n +1 ; ρ ). F or H 1 : ρ < 1 the p-v a lue is p − = P u ≤ n 11 f ( u ; n 1+ , n +1 ; ρ ). F or a t w o -side d test, P pr ob seems to b e the p-v alue of choice , implemen ted both in R and in StatExact. Sometimes other o ne-sided test statistics are used to test for asso ciation; they ma y b e based on the differences of prop ortions in row s or columns ( e.g. n 11 /n +1 − n 12 /n +2 ) or o n the log ˆ ρ . Nev ertheless, all ot her possible 1-sided tests are equiv alent to Fisher’s exact test since their statistics are strictly 14 p = 0 . 1 p = 0 . 2 n w L w L /w R w m L w L w L /w R w m L 10 0.736 1.130 0.5 31 0.678 1.086 0.521 11 0.697 2.304 0.6 97 0.617 1.614 0.617 20 0.677 1.113 0.5 27 0.630 1.070 0.517 21 0.648 1.844 0.6 48 0.586 1.416 0.586 50 0.616 1.083 0.5 20 0.584 1.049 0.512 51 0.598 1.485 0.5 98 0.556 1.250 0.556 100 0.583 1.063 0.515 0.559 1.036 0.509 101 0.570 1.325 0.570 0.540 1.172 0.540 200 0.559 1.046 0.511 0.542 1.026 0.507 201 0.550 1.221 0.550 0.528 1.119 0.528 500 0.538 1.030 0.507 0.527 1.017 0.504 501 0.532 1.135 0.532 0.518 1.074 0.518 1000 0.527 1.022 0.50 5 0.519 1.012 0.503 1001 0.522 1.094 0.52 2 0.513 1.052 0.513 T able 1: W eigh t of the left tail w L = P ( x ≤ A ) a nd the ratio of the w eights of t w o tails w L /w R for B inom ( n, p ) distribution. w m L stands for the mo dified w eight w m L = P ( X ≤ A ) / (1 + P ( A )). increasing functions of n 11 , as show n b y Da vis (1986). The Fisher’s exact test is a ls o the UMPU test if the randomization is allo wed ( T oche r, 1950). F or H y per ( x ; n 1+ , n +1 , n ) distribution, the full range of v alues x fo r fixed margins ( n 1+ , n +1 , n ) is { x = m − , · · · , m + } , where m − = max(0 , n 1+ + n +1 − n ) and m + = min( n 1+ , n +1 ). The mo de is M = ⌊ ( n 1+ + 1 )( n +1 + 1 ) / ( n + 2) ⌋ = ⌊ n n +2 ( p 1+ (1 − p +1 )+ p +1 (1 − p 1+ )+ 1 /n ) + E ⌋ . Therefore ⌊ E ⌋ ≤ M ≤ ⌊ E + 1 / 2 ⌋ . When M is an in teger, M − 1 and M are b oth modes and the mean E ∈ ( M − 1 , M ) is unattainable. When E is an in teger, M = E . In all cases the distance | M − E | < 1. Exact 2-sided tests for asso ciation are used when b oth p ositiv e and nega- tiv e asso ciations are of in terest. Ho w ev er, there is ongoing contro v ersy ab out ho w 2-sided p-v alues should b e constructed f or the h yp ergeometric distribu- tion ( Y ates, 198 4 ; Agr esti and W ac k erly, 1977; Meulepas, 1998; D unn e et al., 1996). Da vis (1986) compares the p- v a lues asso ciated with the follo wing 6 statis- tics: T 1 = − P ( n 11 ) , T 2 = | n 11 /n +1 − n 12 /n +2 | = N ( n +1 n +2 ) − 1 | n 11 − m 11 | , T 3 = | n 11 /n 1+ − n 21 /n 2+ | = N ( n 1+ n 2+ ) − 1 | n 11 − m 11 | , T 4 = | log( ˆ ρ ) | , T 5 = P ij ( n ij − m ij ) 2 /m ij = n 3 ( n 11 − m 11 ) 2 ( n 1+ n 2+ n +1 n +2 ) − 1 , 15 T 6 = 2 P ij n ij log( n ij /m ij ) . Statistic T 1 orders the the tables according to their probabilit y , and corresponds to a test based on P pr ob , T 2 and T 3 are stan- dard large-sample tests for homogeneit y of prop ortions, T 4 (Hill and Pik e , 1965) rejects for small and large v alues of observ ed log-o dds ratio, T 5 is the Pe arson’s c hi-square test statistic, and T 6 is the lik eliho o d ratio statistic (Agresti and W ac k erly , 1977). It can b e seen that T 2 , T 3 and T 5 are strictly increasing functions of | n 11 − m 11 | , and therefore the p-v alues for them do not differ. F urther, all of the statistics T j , j = 1 , · · · , 6 are strictly decreasing functions of n 11 for n 11 ≤ m 11 , and strictly increasing functions of n 11 for n 11 ≥ m 11 . Da vis (1986) further sho ws that the 2- si ded tests T 1 , T 4 , T 5 and T 6 are not equiv alent due to differing ordering of the tables at the opp osite tail. Consider the table with margins ( n 1+ n 2+ n +1 n +2 ) = (9 , 21 , 5 , 25) used as an example in Davis (1986). The p ossible n 11 v alues are 0 through 5, E( n 11 ) = 1 . 5, so the left tail has tw o ta bles only , for n 11 = 0 and 1, with the total probabilit y of w L = . 52 1. T ables with n 11 = 2 , · · · , 5 are on the righ t tail, the total probabilit y is w R = . 47 9. Tw o tails are rather c lose in probabilit y . Da vis (19 8 6 ) lo oks at the orderin gs of tables according to the increasing v alues o f test statistics, as follo ws: T 1 : 1 2 0 3 4 5 T 4 : 2 1 3 4 0 5 T 5 : { 1 2 } { 0 3 } 4 5 T 6 : 2 1 3 0 4 5 Due to monotonicit y o f all statistics T j , j = 1 , · · · , 6 at b oth sides of the mean m 11 , the conditional p-v alues for all 6 statistics do not differ (Lemma 2). Therefore all 6 2- sided tests are equiv alen t. This is the main adv antage of the conditional p-v alue for h ypergeometric distribution. Fisher’s exact test is usually superseded b y the chi-sq uare test for large cell n um b ers. Equiv alence of these tw o tests is of practical imp ortance, for example when testing for link age disequil ibrium in genetics. The probabilities of the 6 tables along with their one-sided p-v alues, P pr ob and P C v alues are giv en in columns 2-5 of T able 2 .Conditional p-v alues are v ery close to doubled 1-sided p-v alues. The second set of tables in T able 2 correspo nds to margins ( n 1+ n 2+ n +1 n +2 ) = (9 , 31 , 5 , 35) . Here the left tail probabilit y is 0.689, and the thin righ t tail has probabilit y 0.311. The probabilities a nd the p-v alues are giv en in columns 6-9. Here the inflation of the conditional p-v alues o n the right tail is more prominen t. 16 n 11 P ( n 11 ) p 1 − sided P pr ob P C P ( n 11 ) p 1 − sided P pr ob P C 0 .143 .143 .286 .274 .258 .258 .570 .374 1 .378 .521 1 1 .430 .689 1 1 2 .336 .479 .622 1 .246 .311 .311 1 3 .124 .143 .143 .299 .059 .065 .065 .209 4 .019 .019 .019 .040 .006 .006 .006 .028 5 .001 .001 .001 .002 .0002 .0002 .0002 .0006 T able 2: 6 p ossible tables, their probabilities and v arious p -v alues f or Fisher’s exact test f or a table with m argins ( n 1+ n 2+ n +1 n +2 ) = (9 , 21 , 5 , 25) are give n in columns 2-5. The same inform ation for a table with margins ( n 1+ n 2+ n +1 n +2 ) = (9 , 31 , 5 , 35) is giv en in columns 6-9. 4 Discuss i o n Tw o- sided testing in non-symmetric distributions is not straigh tforw ard. The UMPU tests a r e not implemen ted in the mainstream soft w are pac k ages ev en for con tin uous problems, and require randomization in the discrete case. The non-asymptotic GLR tests are also not implemen t ed, and are, in general, bi- ased, Bar- L ev et al. (2002). At the same time the tw o-sided tests a re the staple in all applications. An imp ortance o f a conceptu ally and computa- tionally simple approac h to tw o-sided testing is self-eviden t. The conditional 2-sided p-v alue P C in t ro duced in Section 1 is closely re- lated to doubled p- v a lue and has an intuitiv e app eal. Its use is adv o cated for b oth contin uous and discrete distributions. An impo rtan t adv an tage of this p-v alue is that equiv alen t 1-sided tests are transformed in to P C -equiv alen t 2-sided tests. This helps to resolv e the ong o ing con tro v ersy about whic h 2- sided tests should b e used for the asso ciation in 2 b y 2 tables. The prop erties of this p-v alue compare fav orably to the doubled p-v alue and to the minim um lik eliho o d p-v alue P pr ob , the main tw o implemen ted options in statistical tests fo r non-symmetric distributions. F or the v ariance test, the bias of the P C -based tes t is smaller than the bias of the s tandard equal tails test based on the doubled p-v alue, and m uc h smaller than the bias of the P pr ob -based test. F or the considerably un balanced sample sizes , the P C -based test is also less biased than the equal tails F-test o f the equalit y of v ariances. W e did not compare the p o w er and the bias o f the r esulting tests fo r the binomial and the h yp ergeometric cases. This is difficult to do for tests at differen t lev els without recourse to randomisation. F or a sy mptotically normal 17 tests, b oth p-v alues should result in asymptotically UMPU tests , though the minim um lik eliho o d p-v alue ma y require more stringen t conditions to ensure the con v ergence of the densit y to normal densit y . The pro of of these statemen ts is a matter for further researc h. Another op en question is whic h v ersion P A C ( x ) or P A ( m ) C ( x ) should b e used for an attainable v alue of A . Motiv a t io n for P A ( m ) C ( x ) is less clear, it also results in a more conserv ativ e test on top of the inescapable conserv ativ eness due to discrete distribution. Gibb ons and Pratt (1 975) consider a large n um b er of 2-sided p-v alues and find them lac king. They recommend rep o rting one -tailed p- v alue with the direction of the observ ed de parture from the n ull h yp othesis. In this spirit, the conditional p-v alue conditions on this direction. References Agresti, A. (1992) . A surv ey of exact inference for con tingency ta bles. Sta- tistic al Scienc e , 7:131–1 5 3. Agresti, A. and W ac k erly , D. (1 977). Some e xact c onditional te sts of inde- p ende nce for r × c cross-classific ation tables. Psychometrika , 4 2 :111–125. Baptista, J. and Pik e, M. C. (1977). Exact t w o-sided confidence limits for the o dds ratio in a 2 x 2 table. J. R oy. Statist.So c. Ser. C , 26:214–220. Bar-Lev, S. K., Bshout y , D., and Letac, G. (2 002). Normal, gamma and in vers e-gaussian are the only NEFs where the bilateral UM PU and GLR tests coincide. The Annals of Statistics , 3 0:1524–1534. Da vis, L. (1986). Exact tests for 2 × 2 contin gency tables. The Americ an Statistician , 40:139–14 1. Dunne, A., P a witan, Y., and D oo dy , L. (199 6 ). Tw o-sided p -v alues from dis- crete asymmetric distributions based on uniformly most p o w erful unb iased tests. The Statistician , 45:397–405. Dup on t, W. (1986). Sensitivit y of Fisher’s exact test t o minor p ertubations in 2 × 2 con tingency tables. Statistics in Me dicine , 5:629–635. Fisher, R. A. (1935). The logic of inductiv e inference. Journal of the R oyal Statistic al So ciety, Series A , 98:39–54. 18 F reeman, G. H. and Halton, J. H. (195 1). Note on an exact treatme n t of con tingency , go o dness of fit and other problems of significance . Biometrika , 38:141–149. George, E. and Mudholk ar, G. (19 9 0). P-v alues for tw o-sided tests. Biomet- ric al Journal , 32:747–751. Gibb ons, J. D. and Pratt, J. W. (1975). P-v alues: in terpretation and metho d- ology . The Americ an Statistician , 20:20– 25. Hill, I. D. and Pik e, M. C. (1965). Algorithm 4: Tw ob yt wo. Comp ut er Bul letin , 9:5 6–63. Irwin, J. O. (1935). T ests o f significance for differences b et w een p ercen tag es based on small n um b ers. Metr on , 1 2 :83–94. Lehmann, E. (1959). T esting Statistic al Hyp otheses . A Wiley publication in mathematical statistics. Wiley . Llo yd, C. (1988). D oubling the one-sided p-v alue in testing independence in 2 × 2 tables against a tw o-sided alternativ e. Statist ics in Me dicine , 7:1297–1306 . MacGillivra y , H. (1981). The mean, median, mo de inequalit y for a class of densities . Aust r alian Journal of Statistics , 23:247–250. Meulepas, E. (1 9 98). A t w o-ta ile d p-v alue for Fisher’s exact test. Bio m etric al Journal , 40:3–10. Neyman, J. and P earson, E. S. (1931). F urther notes o n the χ 2 distribution. Biometrika , 22:298–30 5 . R Dev elopmen t Core T eam (200 4). R: A language and envir onment for statistic al c omp ut ing . R F oundation fo r Statistical Computing, Vienn a, Austria. ISBN 3-9000 5 1-00-3. Radlo w, R. and Alf, E. J. (1975). An alternate m ultinomial assessmen t of the accuracy of the χ 2 test o f go o dne ss of fit. Jour nal of the Americ an Statistic al Asso ciation , 70:811–81 3 . Shao, J. (1999). Mathematic al statistics . Springer Series in Statistics. Springer. 19 Sterne, T. (1954 ) . Some remarks on confidence of fiducial limits. Biometrika , 41:275–278. Stuart, A. a nd Ord, J. (1991). Kendal l’s A dvan c e d The o ry of Statistics . T o c her, K. (1950). Extension of the Neyman-Pearson theory of tests to discon tinu ous v a ria tes . Biome trika , 37:13 0 –144. Wilson, D. and T onascia, J. (1971). T ables for shortest confidence inte rv als on the standard deviation and v ariance ratio from normal distributions . Journal of the A meric an Statistic al Asso ci a tion , 66:909–912 . Y ates, F . (1984). T ests of significance fo r 2 × 2 contingenc y ta ble s (with discuss ion). Journal of the R oyal Statistic al So c i e ty , Series A , 147:426– 463. 20
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment