The Inverse Simpson Paradox (How To Win Without Overtly Cheating)

The In v erse Sim pson P arado x (Ho w to win without o v ertly c heating) Ora E. P ercus and Jerome K. Percus Couran t Institute of Mathematical Sciences New Y ork Univ ersit y 251 Mercer Street New Y ork, NY 10012 Email:p ercus@cims.n yu.edu No v ember 9, 2018 Abstract Given t wo sets of data which lead to a s imilar sta tis tica l conclusio n, the Simpson Paradox [10 ] describ es the tactic of combining these t wo sets and achieving the opp o site conclusion. Depending upon the given data, this may or ma y not succeed. Inv erse Simpson is a method of decompo sing a given set o f compariso n data in to t wo disjoin t sets and achieving the o ppo site conclusio n for each o ne. This is a lwa ys po ssible; howev er, the statistical signiﬁcance of the conclusions do es depe nd up on the details of the given data . 1 In tro du ction An yone con templating a statistical analysis is warned, at an early stage of the game, “but don’t combine the statistics of monke y wrenches an d w atermelons”, or the equiv alen t. F ailure to heed this instruction – at a more sophisticated lev el, to b e sure – giv es rise f requent ly to S impson’s P arado x (here, in its 2-trial sequence v ersion): if choice A is “statistically b etter” than c hoice B in eac h of tw o sets of tr ials u nder diﬀering circumstances, then it ma y happ en that merging the tw o sets of data pr o duces the opp osite conclusion. Consider the follo wing sp ecially constructed example for the sak e of illustration: In Fig. 1, w e pictorially represent trial sequence #1 b y a solid line, trial sequence #2 b y a d ashed line; trial #1 tests d rug A , N 1 times, drug B , 1 T able 1: S impson P arado x Protot yp e T rial #1 T rial #2 T otal S A ≡ A successes 60 60 120 F A ≡ A failures 20 140 160 S B ≡ B su ccesses 140 20 160 F B ≡ B failures 60 60 120 60/80 > 140/200 and 60/2 00 > 20/80 but 120/280 < 160 /280 N 2 times, while trial #2 reverses the num b er of tests. The successes S , and failures F are sho w n for eac h drug in eac h trial sequence. If a < b , so 1 − a > 1 − b , then clearly the S/ N ratio of drug A is larger th an that of B in b oth trial sequences, so drug A certainly seems b etter. But in the com bin ed trials S A / N A = ((1 − a ) N 1 + bN 2 ) / ( N 1 + N 2 ) is low er than S B / N B = ((1 − b ) N 2 + aN 1 ) / ( N 1 + N 2 ) if (1 − a ) N 1 + bN 2 < (1 − b ) N 2 + aN 1 , or N 1 < 1 − 2 b 1 − 2 a N 2 , (1.1) a quite f easible circumstance, so that dru g A has n o w b ecome inferior to B ! 2 2 A b N S 1 1 ( 1 ) A S a N  1 1 A F aN 2 2 ( 1 ) A b N F  A S A F 2 A F 2 bN Drug A 2 bN 2 ( 1 ) b N  1 aN 1 ( 1 ) a N  B S B F Drug B Figure 1: S impson P aradox Protot yp e This p henomenon is w ell-kno wn and we ll-do cum en ted [5] [6] [7] [8] [9] [10] – b ut hop e springs eternal. On ly recentl y [1], a dru g man uf acturer, whose current p oten tial blo ckbuster dr ug (Xinla y) failed to b etter a p lacebo in tw o clinical trials with uncorrelated proto cols, prop osed to a regulato ry agency to p o ol the t wo sequences. If accepted, their dr ug would th en outp erform the placeb o, allo wing them to mo ve forw ard . The regulato r y agency panel w as not una ware of the forced p arado x, and denied the reinterpretatio n of the d ata. 2 2 In v erse Simpson The Simpson Parado x is d ata-driv en. As in (1.1 ), it ma y , or ma y not, hold in a giv en situation. Ho wev er, what w e ma y term in v erse Simpson parado x is a diﬀeren t story: can w e tak e a long pair of d ata streams – say successes and failures with dr u g A , and similarly with d rug B – and decomp ose them in to t w o pairs of subsequences, eac h of wh ic h rev ers es th e conclusion of th e original pair? This can b e carried out in diﬀeren t w ays and for diﬀerent purp oses, a) Most d irectly and legitima tely , it ma y b e realized that data from tw o sources w ere com bined for simplicit y , and so there is a unique decom- p osition called for, w h ic h ma y ind eed r everse the conclusion. This app ears to b e the case in the oft-quoted Berk eley s ex discrimination con tro versy [5 ]. b) Least d irectly and least legitimately – but p erhaps an eﬀectiv e strategy in litigat ion – one can ask f or that decomp osition that maximally rev erses the conclusion, and then use in gen uit y to charact erize the subsets thus obtained. c) Putting a diﬀeren t spin on b), one can ask for that d ecomp osition that maximally comes jointl y to either conclusion, and use this as an in vesti gativ e to ol to recognize a hidden c h aracterizat ion of signiﬁcan t subsets of related ent ities. A t ﬁrst blush, in ve r s e S impson, in con texts b) and c), is trivially acco m- plished. Fig. 2 illustrates the principle. A S A F Drug A B S B F Drug B Figure 2: I nv erse Simpson Prototype The d otted lines refer to the assertedly p o oled data, clearly in dicating that A loses to B . Th e hypothetical trial 1 data is represen ted by solid lines, and since A has only successes, it is surely s u p erior. And the dashed lines refer to trial 2, in which B has only failures, and so surely loses. 3 But Fig. 2 is a susp iciously extreme version of a strategy that can b e made to lo ok more r easonable. T o put it in cont ext, let us consider th e w ell-kno wn Berk eley sex discrimin ation case [5], which we will paraph rase for n um er ical simplicit y . T he orig in al data is that in one division, S A = 41 out of N A = 100 m ale applican ts were adm itted, a success rate of P A = . 41. On the o th er hand, S B = 29 of N B = 100 female app lican ts w ere admitted, a success rate of only P B = . 29. Clearly , it w ould seem that the admission pro cess d iscriminated ag ainst females. This was not the case. In fact, T able 2: S impliﬁed Berke ley Admission Dat a Dept. 1 Dept. 2 Male Ap plican ts 30 70 Males Admitted 6 35 F emale Applican ts 70 30 F emales Admitted 14 15 T otal Male Admissions/App lican ts 41 /100=.41 T otal F emale Admiss ions/Applican ts 29/ 100=.29 T able 2, S impliﬁed Berk eley Ad m ission Data, w as arrive d at b y com bining that of t wo departments, sa y 1 and 2. Referring to T able 2, we see that the success rates of males in the t wo departments were P A 1 = . 2, P A 2 = . 5, with the corresp ondin g female success rates of P B 1 = . 2, P B 2 = . 5. Th ere w as no demonstrable discrimination in either departmen t, but “mixing w atermelons and mon key wrenches” created very muc h of a statistical artifact. Let us pro ceed to a general situation. W e are given N A and P A = S A / N A , N B , and P B = S B / N B for whic h, without loss of generalit y , P A > P B . W e then imagine compartmen talizing the A -p o ol as N A 1 = αN A , N A 2 = (1 − α ) N A , and the B -p o ol as N B 1 = β N B , N B 2 = (1 − β ) N B ; the success rates are to b e giv en via S A 1 = P A 1 N A 1 , S A 2 = P A 2 N A 2 , S B 1 = P B 1 N B 1 , S B 2 = P B 2 N B 2 . The question then is whether α and β can b e chose n so that P A 1 = λ = P B 1 P A 2 = µ = P B 2 , (2.1) indicating no ad v anta ge to A or B in either case. This is trivial. Since S A 1 = αλN A , S A 2 = (1 − α ) µN A , S B 1 = β λN B , S B 2 = (1 − β ) µN B , w e m ust ha ve P A = αλ + (1 − α ) µ P B = β λ + (1 − β ) µ (2.2) 4 B P A P 0 1 P O Figure 3: Placement of Averaging Parameters λ and µ Th u s, P A and P B are b oth a verag es of λ and µ , which therefore m ust lie outside the in terv al ( P B , P A ) as in Fig. 3. Explicitly , of course, we hav e α = µ − P A µ − λ β = µ − P B µ − λ 1 − α = P A − λ µ − λ 1 − β = P B − λ µ − λ (2.3) In situations not as clear cut as the Berk eley case, w e w ould wan t to inv en t a hyp othetical decomp osition in whic h e.g. λ is rough ly in the midd le of the (0 , P B ) interv al, µ r oughly in th e middle of ( P A , 1), in order to alla y suspicion. In the Berk eley case, we see that λ = . 2, µ = . 5 do satisfy this criterion. With (2.3), w e ﬁnd that a su itable decomp osition r emov es the apparent bias ag ainst females: no assertion can then b e made. But Fig. 2 illustrates a proactiv e strategy , in whic h a suitable decomposition reverses the original assertion and app ears to establish the su p eriorit y of A . What is wr ong with the construction of Fig. 2, aside fr om its s u spicious extreme nature? Nothing, but th e conclusion is questionable b ecause we ha ve not attended to the statistical signiﬁ cance of the new assertions, a p oin t that w as emphasized b y the FD A panel cite d ab o v e. Doing so forms the sub stance of our ensuin g discussion. 3 Statistical Signiﬁcance A protot ypical situ ation calling for statistical assessment is this. A s equ ence of N indep end en t Be r noulli trials – successes or failures – is carried out on the same ob j ect, resulting in S successes. Give n ǫ , w ith what probabilit y , or conﬁdence , can we claim that p , the intrinsic su ccess p robabilit y parameter, satisﬁes | p − S/ N | ≤ ǫ/ N 1 / 2 ? (3.1) 5 The stand ard approac h is to start with th e elemen tary result that, regarding S as a random v ariable and deﬁning q ≡ 1 − p , P r ( | S − N p | ≤ N 1 / 2 ǫ | p ) = [ N p + N 1 / 2 ǫ ] X j =[ N p − N 1 / 2 ǫ ]  N j  p j q N − j , (3.2) where [ ] d enotes in teger part. T he device then is to iden tify (3.2), whic h is a p robabilit y on S -space, with a probabilit y on p -space: P r ( | p − S/ N | ≤ ǫ/ N 1 / 2 | S ) = P r ( | S − N p | ≤ N 1 / 2 ǫ | p ) (3.3) signifying our conﬁdence that (3.1) holds. The sort of information that will in terest us will, ho wev er , in the con- text of this pr otot yp e, b e more like : with what co n ﬁdence, based up on the observ ed v alue of S , can w e claim that p ≥ 1 / 2? (3.4) No w, the abov e recip e is not r eadily applicable, since we are no longer qu es- tioning a relationship b et ween p and S that mak es p ossible the sub rosa journey fr om S -space to p -space. Bu t this is ind eed the pro vince of the Ba y es approac h [4 ] whic h – ignoring th e con tro versy th at conti nues to swir l around it – is what w e will use. First of all, let up recall what (3.1) w ould b ecome in a Ba ye sian con text: w e imagine join t ( p, S )-space and quote the ob vious P r ( p = p ′ | S = S ′ ) = P r ( S = S ′ | p = p ′ ) f ( p ′ ) / Z where Z = Z ′ 0 P r ( S = S ′ | p = p ′′ ) f ( p ′′ ) dp ′′ , (3.5) f here referring to probabilit y densit y . If f ( p ′ ) is the prior den s it y on p - space, then P r ( | p − S/ N | ≤ ǫ/ N 1 / 2 | S = S ′ ) = Z S ′ / N + ǫ/ N 1 / 2 S ′ / N − ǫ/ N 1 / 2 f ( p ′ ) p ′ S ′ q ′ N − S ′ dp ′ / Z Z = Z ′ 0 f ( p ′ ) p ′ S ′ q ′ N − S ′ dp ′ . (3.6) 6 But s upp ose w e c ho ose a uniform prior, f ( p ) = 1; then (3. 6) b ecomes P r ( | p − S/ N | ≤ ǫ/ N 1 / 2 ) = Z min( S ′ / N + ǫ/ N 1 / 2 ,N ) max( O, S ′ / N − ǫ/ N 1 / 2 ) p ′ S q ′ N − S dp ′ / Z Z = Z ′ 0 p ′ S q ′ N − S dp ′ = (( N + 1)  N S  ) − 1 . (3.7) Eqs. (3.2, 3.3) and (3.7) are certainly not iden tical, but if we go to the large sample regime, i.e. the normal app ro ximation to the binomial, then (3.2, 3.3) av er that P r ( | p − S/ N | ≤ ǫ/ N 1 / 2 ) = Z ǫ/ √ pq − ǫ/ √ pq e − 1 2 s ′ 2 ds ′ / √ 2 π , (3.8) whic h, it is easy to sho w is iden tical with the large N , ﬁxed S/ N , steep est descen t expansion [3] of (3.7) around p ′ = S/ N . On the basis of the ab o v e equiv alence, w e n o w go immediately to the question indicated b y (3. 4). Using Ba ye s with a u n iform p rior, precisely as in (3.7), w e ha v e P r  p ≥ 1 2  = Z 1 1 / 2 p ′ S q ′ N − S dp ′ / Z 1 0 p ′ S q ′ N − S dp ′ = 1 − B 1 / 2 ( S + 1 , N + 1 − S ) /B ( S + 1 , N + 1 − S ) , (3.9) where B is th e Beta fun ction, B 1 / 2 the corresp ond in g in complete Beta fu nc- tion [2]. Eq. (3.9) can also b e written in the neat form P r  p ≥ 1 2  = 1 − N − S X j =0  N +1 j  p N +1 − j q j | p = 1 2 = 1 − N − S X j =0  N +1 j  / 2 N +1 (3.10) The imp ortan t p oin t ho wev er is that th is construction leads quite directly to ev aluation of quant ities suc h as P r ( p A ≥ p B ), th at are appropr iate to the Simpson p arad ox. 7 4 Lev el of S igniﬁcanc e of the In v erse P arado x The eﬀect w e are studying is not v ery sub tle, and so it is su ﬃcien t to tak e a large sample limit, w hic h strategy we adopt. Ho w ever, there are sev eral sample parameters, leading to the meanin gfu l use of additional limiting op- erations. Consider ﬁrst the protot yp e, Eq . (3.10); here, α N ( S ) = N − S X j =0  N +1 j  / 2 N +1 (4.1) expresses the lev el of signiﬁcance of the assertion that p ≥ 1 2 , and it is not unt il s u c h an assessmen t is made that one can declare meaningful compar- isons. Let us ev aluate (4.1) in the large sample limit in a familiar fashion that extends at once to the question of P r ( p A ≥ p B ) relev ant to the Simpson parado x. Although (4.1) is ﬁ nite and explicit, its implementa tion for large N and S – while trivial n umerically – is a bit complex. F or this p urp ose, the expression (3.9) is more useful; it sa ys that α N ( S ) = Z 1 / 2 0 p S (1 − p ) N − S dp/ Z 1 0 p S (1 − p ) N − S dp. (4.2) By th e large sample limit, we will mea n that in whic h s = N − 1 / 2  S − 1 2 N  (4.3) is ﬁxed (to within N − 1 / 2 ) as N → ∞ , and w e then ask for α ( s ) = lim N → ∞ α N ( S ) . (4.4) This is obtained quite d irectly by a steep est descen t ev aluation [3] of (4.2). The r elev an t int egrand is n ow I ( p ) ≡ p S (1 − p ) N − S = exp  N 2 + N 1 / 2 s  ln p +  N 2 − N 1 / 2 s  ln (1 − p )  , (4.5) with a maxim um at p 0 = 1 2 + N − 1 / 2 s, (4.6) 8 and a corresp onding expans ion starting as I ( p ) = I ( p 0 ) exp −  N 2 ( p − p 0 ) 2 /  1 4 − s 2 N  . (4.7) Hence α ( s ) = lim N → ∞ Z 1 / 2 0 e − 2 N ( p − p 0 ) 2 dp/ Z 1 0 e − 2 N ( p − p 0 ) 2 dp = lim N → ∞ Z − 2 s − N 1 / 2 − 2 s e − x 2 / 2 dx/ Z N 1 / 2 − 2 s − N 1 / 2 − 2 s e − x 2 / 2 dx = Z − 2 s −∞ e − x 2 / 2 dx/ Z ∞ −∞ e − x 2 / 2 dx, (4.8) immediately reco gnizable in a normal distribution con text. W e can then pro ceed to the desired ev aluation of P r ( p A ≥ p B | S A , S B , N A , N B ) = Z Z 1 ≥ p A ≥ p B ≥ 0 [ f ( p A , p B ) P r ( S A , S B , | p A , p B , N A N B )] dp A dp B / Z Z 1 ≥ p A ≥ 0 1 ≥ p B ≥ 0 [ f ( p A , p B ) P r ( S A , S B | p A , p B , N A , N B )] dp A dp B . (4.9) This is carried out in App endix A, where we choose Ba y es with uniform prior on p A , p B space and pro cess (4.9) as we did (4.2). The result is that for large N A , N B , P r ( p A ≥ p B ) = φ  S A N A − S B N B  S A ( N A − S A ) N A 3 + S B ( N B − S B ) N B 3  1 / 2 ! where φ ( x ) = 1 √ 2 π Z x −∞ e − 1 2 y 2 dy (4.10) Unsurp r isingly , w e can obtain (4.10) as w ell by a version of the p roba- bilit y space equiv alence assertion emplo ye d in (3.3 ). It is only necessary to 9 consider the random v ariable ξ = S A N A − S B N B (4.11) where S A and S B are binomially distr ibuted with success prob ab ilities p A and p B . Since w e ﬁn d at once that E ( e γ “ S A N A − S B N B ” ) = ( p A e γ / N A + q A ) N A ( p B e − γ / N B + q B ) N B , (4.12) it follo w s d ir ectly that E ( ξ | p A , p B ) = p A − p B V ar( ξ | p A , p B ) = p A N A q A + p B N B q B (4.13) and then from the cen tral limit theorem th at in the limit N A , N B → ∞ , P r  S A N A − S B N B ≥ p A − p B + ∆ | p A , p B  = φ ( − ∆ / ( p A q A / N A + p B q B / N B ) 1 / 2 ) (4.14) The same sleigh t of hand as in (3.3) then conv erts this to P r  p A − p B ≤ S A N A − S B N B − ∆ | S A , S B  = φ − ∆ /  S A ( N A − S A ) N 3 A + S B ( N B − S B ) N 3 B  1 / 2 ! , (4.15) and so, setting ∆ = S A N A − S B N B , to (4.10 ), as w as to b e s h o wn. 5 Realizati ons of the In v erse Pa rado x No w let us mak e use of the r esult (4.10). If our initial data is characte r ized b y S A , S B , N A + N S = N , and P A = S A / N A , P B = S B / N B , th en the conﬁdence lev el with w hic h w e ca n assert that p A ≥ p B is giv en by φ ( N 1 / 2 C AB ) C AB = ( P A − P B ) /σ AB > 0 σ 2 AB = P A (1 − P A ) N A / N + P B (1 − P B ) N B / N . (5.1) 10 Our ob jective is to supply a d ecomp osition into t wo hyp othetical trials ( S A 1 , N A 1 , S B 1 , N B 1 ) and ( S A 2 , N A 2 , S B 2 , N B 2 ) su c h that if C ′ i = ( P B i − P Ai ) /σ i , i = 1 , 2 where P Ai = S Ai / N Ai , P B i = S B i / N B i σ 2 i = P Ai (1 − P Ai ) N Ai / N + P B i (1 − P B i ) N B i / N , then C ′ i > 0 for i = 1 , 2 . (5.2) In fact, to b e deﬁ n ite, w e su pp ose that the tw o pairs of trials reverse the initial assertio n at a common level of conﬁ dence ( P B 1 − P A 1 ) /σ 1 = C ′ = ( P B 2 − P A 2 ) /σ 2 (5.3) with C ′ > 0. T o s tart, w e need to ﬁn d the restrictions on C ′ under which the r equired ( P A 1 , P A 2 , P B 1 , P B 2 ) satisfying (5.2) can b e found. The solution is direct but algebraicall y cum b er s ome, and is presented in detail in App endices B and C. The conclusion of the f orm er is that if α ≥ β , then C ′ ≤ min  ¯ β ¯ P A − ¯ α ¯ P B ¯ ασ B , ¯ β ¯ P A − ¯ α ¯ P B ¯ β σ A , αP B − β P A ασ B , αP B − β P A β σ A  . (5.4) Since w e r equire C ′ ≥ 0, this implies that α/β ≥ P A /P B ≥ 1 , ¯ β / ¯ α ≥ ¯ P B / ¯ P A ≥ 1 . (5.5) In (5.4) and (5.5), w e uniformly adopt th e n otatio n : if 0 ≤ x ≤ 1 , then ¯ x ≡ 1 − x. (5.6) Eq. (5.4) is a bit in vol ved and, eve n worse, con tains the unknown pa- rameters p Ai , p B i implicitly . But it can b e simpliﬁed b y reducing its right hand side and thereby strengthening the requirement on C ′ a b it. This is 11 carried out in Ap p endix C, with the co n clusion that, if α ≥ β , th en P A + P B ≥ 1 : C ′ ≤ 2( γ ¯ γ ) 1 / 2 (( ¯ P B / ¯ P A ) 2 − 1) ( P A − P B )( P B /P A ) / "  P A P B ¯ P B ¯ P A  2 − 1 # P A + P B ≤ 1 : C ′ ≤ 2( γ ¯ γ ) 1 / 2 (( P A /P B ) 2 − 1) ( P A − P B )( ¯ P A / ¯ P B ) / "  P A P B ¯ P B ¯ P A  2 − 1 # where γ = N A / N (5.7) are su ﬃcien t to carry ou t the apparen t reversal of ranking of A and B . Let us take a simp le example that has b een previously q u oted [4] [8]. W e will paraphrase it and use rounded oﬀ data. Hospitals A and B sp ecialize in treating a certain deadly d isease. N A = 1000 patien ts are treated at A and N B = 1000 at B . O f these, S A = 900 reco v er, while S B = 800 reco v er, so that P A = . 9, P B = . 8 and Hospital A is app arently the place to go. In fact, one compu tes C AB = . 05, so th at this conclusion is supp orted at the . 05 × (200 0) 1 / 2 = 2 . 2 4 standard deviation lev el. Detailed inv estigation sho ws that matters are not so simple. Some patien ts ente r in otherwise go o d shap e, others in p o or shap e. Of the former, N A 1 = 900 ent er hospital A , and 870 reco ver; of the latter, N A 2 = 100 ente r and 30 reco ver, so P A 1 = . 967, P A 2 = . 3. T able 3: S impliﬁed Hospita l Reco v ery Data Go o d S hap e Poor Shap e Admissions to Hospital A 900 100 Reco v ered in Hospita l A 870 30 Admissions to Hospital B 600 400 Reco v ered in Hospita l B 590 210 T otal Reco vered/Admissions in A: 900/1000=.9 T otal Reco vered/Admissions in B: 800/1000=.8 On the other hand, N B 1 = 600 en ter Hospital B in go o d sh ap e and S B 1 = 590 reco ver, wh ereas N B 2 = 400, S B 2 = 210. Th us, P B 1 = . 983 , P B 2 = . 55. W e s ee that b y not mixing the t wo cla ss es of patien ts, Hospital B is sup erior for eac h class – at leve ls C ′ 1 = . 038 (1.7 standard d eviations) 12 and C ′ 2 = . 176 (7.9 standard deviations). S im p son, or inv erse Simp s on, dep end in g up on one’s p oint of view, is certainly exempliﬁed . Of course, the criteria as to w h ic h p atien ts en tered in goo d shap e, whic h in p o or sh ap e, are a bit fu zzy . Giv en the ag gregate data , th e decomp osition in to the t wo classes could, as we hav e seen, b een plann ed with the in ten tion of most convincingly asserting the opp osite of the conclusion from the ag- gregate data. If this had b een d on e according to the p rescription of (5.7), then with the same inpu t data, we would ha ve foun d α = . 935, β = . 738 (not far fr om the α = . 9, β = . 6 corresp ondin g to the additional data pr e- sen ted) and concluded with th e sup eriorit y of Hospital B at a conﬁdence lev el corresp ondin g to C ′ ≤ . 107 or 4.79 standard deviations for eac h class of p atien ts. 6 Concluding Remarks The S impson p arado x, one of the simp lest examples of the common misuse of statistics (think m eta-analysis?) has receiv ed in creasing atten tion, since the consequences of its use – or misuse – can b e quite seve r e (as w ell as proﬁtable). I n the classical Simpson P aradox, the only question is whether or not to combine data from diﬀerent sources (and trying to justify the de- cision to com b ine). What w e ha ve seen here is that the in verse Simp son parado x, ev en in its m ost “sophisticated” v ersion in w h ic h mean d iﬀerences are weigh ted by appropriate standard deviations, is nearly u niv ersally ap- plicable. This can b e an eﬀectiv e analytic al tool, bu t can equally well b e an eﬀectiv e technique for distorting statistical data. 13 A Ev aluation of (4.9) Cho osing Bay es w ith a uniform prior on p A , p B space, (4. 9) b ecomes Z Z 1 ≥ p A ≥ p B ≥ 0 P r ( S A , S B | p A , p B , N A , N B ) dp A dp B / Z Z 1 ≥ p A ,p B ≥ 0 P r ( S A , S B | p A , p B , N A , N B ) dp A dp B = Z Z 1 ≥ p A ≥ p B ≥ 0 p S A A p S B B q F A A q F B B dp A dp B / Z Z 1 ≥ p A ,p B ≥ 0 p S A A p S B B q F A A q F B B dp A dp B = Z 1 0  Z p A 0 p S B B q F B B dp B  p S A A q F A A dp A / Z 1 0 Z p S B B q F B B p S A A q F A A dp B dp A = Z 1 0 B p A ( S A + 1 , F B + 1) p S A A q F A A dp A / B ( S B + 1 , F B + 1) B ( S A + 1 , F A + 1) . (A.1) Applying th e kno wn expansion of th e incomplete Beta function [2], this reduces after a little algebra to P r ( p A ≥ p B | S A , S B , N A , N B ) = F A + F B X j =0  S A + S B +1+ j S A   F A + F B − j F A  /  N A + N B +2 N A +1  , (A.2) or in tro ducing S = S A + S B , N = N A + N B for notational con ve n ience, P r ( p A ≥ p B | S A , S B , N A , N B ) = F X j =0  S +1+ j S A   F − j F A  /  N +2 N A +1  (A.3) 14 But we will go to th e large sample limit deﬁned by ﬁxed s = N − 1 / 2 A  S A − 1 2 N A  , γ = N A / N , s ′ = N − 1 / 2 B  S B − 1 2 N B  1 − γ ≡ N B / N (A.4) as N → ∞ . W e could pro ceed precisely as in (4 .5 – 4.8), but if w e imagine a large samp le limit from the outset, the deriv ation is b rief and standard. Consider drug A . A u niform pr ior for p A is giv en b y the b eta distribu tion f ( p A ) = b (1 , 1; p A ) where b ( m, n ; p A ) = p m − 1 A q n − 1 A /B ( m, n ) B ( m 1 n ) = m − 1! n − 1! /m + n − 1! (A.5) whic h, after S A successes in N A trials creates the p osterior distribution b (1 + S A , 1 + N A − S A ; p A ) . (A.6 ) Drug B w orks the same w a y . It follo ws that E ( p A − p B ) = S A + 1 N A + 1 − S B + 1 N B + 1 V ar( p A − p B ) = ( S A + 1)( N A + 1 − S A ) ( N A + 1) 2 ( N A + 2) + ( S B + 1)( N B + 1 − S B ) ( N B + 1) 2 ( N B + 2) , (A.7) and so b y the cen tr al limit theorem for large N A , N B , P r ( p A ≥ p B ) = φ  S A N A − S B N B /  S A ( N A − S A ) N 3 A + S B ( N B − S B ) N 3 B  1 / 2 ! where φ ( x ) = 1 √ 2 π Z x −∞ e − 1 2 y 2 dy . (A.8) B Restrictions on C ′ Eq. (5.3) itself imp oses tw o conditions. Aside fr om the crucial 0 ≤ P A 1 , P A 2 , P B 1 , P B 2 ≤ 1, there are ju s t t wo more due to th e comp osition cond itions 15 that S A 1 + S A 2 = S A , N A 1 + N A 2 = N A , S B 1 + S B 2 = S B , N B 1 + N B 2 = N B . W e rein tro du ce the notat ion of Secti on 2: N A 1 = αN A , N B 1 = β N B (B.1) and h ereafter un iformly adopt the notation that if 0 ≤ x ≤ 1 , then ¯ x ≡ 1 − x. (B.2) Th u s S A 1 + S A 2 = S A implies P A 1 N A 1 + P A 2 N A 2 = P A N A , or αP A 1 + ¯ αP A 2 = P A (B.3) and similarly β P B 1 + ¯ β P B 2 = P B . (B.4) W e also app end (5.3 ) in the form P B 1 − P A 1 = C ′ σ 1 P B 2 − P A 2 = C ′ σ 2 , (B.5) and solve (B.3), (B.4 ), (B.5) to yield P A 1 = K 1 + ¯ α α − β C ′ σ B , P A 2 = K 2 − α α − β C ′ σ B , P B 1 = K 1 + ¯ β α − β C ′ σ α , P B 2 = K 2 − β α − β C ′ σ α (B.6) where K 1 = ( ¯ β P A − ¯ αP B ) / ( α − β ) , σ α = ασ 1 + ¯ ασ 2 , K 2 = ( αP B − β P A ) / ( α − β ) , σ β = β σ 1 + ¯ β σ 2 . (B.7) Eqs. (B.6), (B.7) are realizable if the requirements 0 ≤ P A 1 , P A 2 , P B 1 , P B 2 ≤ 1 are satisﬁed. Since we are asserting, without loss of g ener ality , that p A ≥ p B , we of co u r se hav e the condition P A ≥ P B , ¯ P B ≥ ¯ P A . (B.8) There are then t wo cases to consider. If α ≥ β , it is easily seen that K 1 ≥ 0, K 2 ≤ 1, so that P A 1 , P B 1 ≥ 0, P A 2 , P B 2 ≤ 1 are already satisﬁed. The remaining four conditions P A 1 , P B 1 ≤ 1, P A 2 , P B 2 ≥ 0 can then b e gathered together as if α ≥ β then C ′ ≤ min  ( α − β )(1 − K 1 ) ¯ ασ β , ( α − β )(1 − K 1 ) ¯ β σ α , ( α − β ) K 2 ασ β , ( α − β ) K 2 β σ α  , (B.9) 16 or, in serting (B.7), C ′ = min  ¯ β ¯ P A − ¯ α ¯ P B ¯ ασ β , ¯ β ¯ P A − ¯ α ¯ P B ¯ β σ α , αP B − β P A ασ β , αP B − β P A β σ α  . (B.10) Similarly , if α ≤ β then C ′ ≤ min  ¯ αP B − ¯ β P A ¯ ασ β , ¯ αP B − ¯ β P A ¯ β σ α , β ¯ P A − α ¯ P B ασ β , β ¯ P A − α ¯ P B β σ α  (B.11) Since w e r equire C ′ ≥ 0, immediate consequences are that if α ≥ β , then α β ≥ P A P B ≥ 1 , ¯ β ¯ α ≥ ¯ P B ¯ P A ≥ 1 if α ≤ β , ¯ α ¯ β ≥ P A P B ≥ 1 , β α ≥ ¯ P B ¯ P A ≥ 1 (B.12) m ust hold. C Simpliﬁcation of (5.4) The ma jor step is the observ ation, fr om (5.2) that σ 2 i ≤ N 4  1 N Ai + 1 N B i  , (C.1) so th at σ 2 1 ≤ N 4  1 αN A + 1 β N B  σ 2 2 ≤ N 4  1 ¯ αN A + 1 ¯ β N B  (C.2) Hence, if α ≥ β , σ 2 1 ≤ 1 4 β  N N A + N N B  σ 2 2 ≤ 1 4 ¯ α  N N A + N N B  , (C.3) 17 yielding σ 2 σ B } ≤ max( σ 1 , σ 2 ) ≤ 1 2  N N A + N N B  1 / 2 max  1 β 1 / 2 , 1 ¯ α 1 / 2  . (C.4) Setting N A / N = γ , condition (5.4) ca n therefore b e strengthened to α ≥ β : C ′ ≤ 2( γ ¯ γ ) 1 / 2 min( β , ¯ α ) 1 / 2 min  1 ¯ β ( ¯ β ¯ P A ) − ¯ α ¯ P B ) , 1 α ( αP B − β P A )  . (C.5) And in the same w a y , we obtain th e strengthened α ≤ B : C ′ ≤ 2( γ ¯ γ ) 1 / 2 min( α, ¯ β ) 1 / 2 min  1 ¯ α ( ¯ αP B − ¯ β P A ) , 1 β ( β ¯ P A − α ¯ P B )  . (C.6) Eqs. (C.5) and (C.6) are v alid for all α, β , and we m a y indeed ﬁn d the largest feasible range for C ′ b y maximizing their righ t hand sides o v er α and β . Again, to reduce complexit y , let u s tak e the sp ecial case in which: α ≥ β : ¯ α/ ¯ β = ( ¯ P A / ¯ P B ) 2 , β /α = ( P B /P A ) 2 (C.7) so th at α = [( P A /P B ) 2 ( ¯ P B / ¯ P A ) 2 − ( P A /P B ) 2 ] / [( P A /P B ) 2 ( ¯ P B / ¯ P A ) 2 − 1] β = [( ¯ P B / ¯ P A ) 2 − 1] / [( P A /P B ) 2 ( ¯ P B / ¯ P A ) 2 − 1] (C.8) con v ertin g (C.5) and (C.6) to α ≥ β : C ′ ≤ 2( γ ¯ γ ) 1 / 2 / [( P A /P B ) 2 ( ¯ P B / ¯ P A ) 2 − 1] min[( ¯ P B / ¯ P A ) 2 − 1 , ( P A /P B ) 2 − 1] · m in( ¯ P A − ¯ P A 2 / ¯ P B , P B − P 2 B /P A ) . (C.9) But ( ¯ P A − ¯ P A 2 / ¯ P B ) − ( P B − P 2 B /P A ) = (1 − P A − P B )( P A + P B ) 2 /P A ¯ P B and (( ¯ P B / ¯ P A ) 2 − 1) − (( P A /P B ) 2 − 1) = ( P A + P B − 1) P A − P B P A ¯ P B  P A P B + ¯ P B ¯ P A  , 18 so it follo ws that in th e α ≥ β case, P A + P B ≥ 1 : C ′ ≤ 2( γ ¯ γ ) 1 / 2  ¯ P B ¯ P A  2 − 1 ! ( P A − P B ) P B P A / "  P A P B ¯ P B ¯ P A  2 − 1 # P A + P B ≤ 1 : C ′ ≤ 2( γ ¯ γ ) 1 / 2  P A P B  2 − 1 ! ( P A − P B ) ¯ P A ¯ P B / "  P A P B ¯ P B ¯ P A  2 − 1 # (C.10) are suﬃcien t to carry out the apparen t rev ersal of ranking of A and B . Th e decomp osition corresp onding to the c hoice α ≤ β can of course b e similarly sp ecialized. References [1] Abboud, L. (2005). Abb ott S eeks to Clear Stalled D rug. Wal l Str e et Journal , Sept. 12. [2] Abramo witz, M. and Stegun, I. A. (1965). Handb o ok of Mathematic al F unctions , Dov er Publications, New Y ork. [3] Beckenbach, E. F., editor , (1956). Mo dern Math ematics for the Engine er , McGra w-Hill, New Y ork, Chapter 18. [4] Berger, J. O. (1985 ). Statistical Theorey and Ba yesian Analysis. Statistical D eci- sion Theory and Bay esian An alysis. Springer-V erlag , New Y ork. [5] Bickel, P. J., Hammel, E. A., and O’Connell, J. W. (1975). S ex Bias in Grad- uate Admissions: Data from Berkeley . Scienc e 187 , 398 – 404. [6] Capocci, A. and Calaion, F. (2006). Mixing prop erties of growi n g netw orks and Simpson’s Pa radox. Phys. R ev. E74 026122. [7] Moore, D. S . and McCabe, G . P. (1998). Intr o duction to the Pr actic e of Statistics , 100 – 201. W. F reeman and Co., N ew Y ork. [8] Moore, T. Simpson and Simpson-like P arado x Examples. see www.math.gri n nell.edu/ ∼ mooret/rep orts/SimpsonExamples.p df [9] Saari, D. (2001). De cisions and El e ctions , Cam bridge Universit y Press, Cam bridge. [10] Simpson, E. H. (1951). The interpretatio n of in teraction in Contingency T ables. J. R oy. Stat. So c. B 13 , 238 – 241. 19

The Inverse Simpson Paradox (How To Win Without Overtly Cheating)

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment