A Counter Example to Theorems of Cox and Fine

Journal of Articial In telligence Researc h 10 (1999) 67{85 Submitted 6/98; published 2/99 A Coun terexample to Theorems of Co x and Fine Joseph Y. Halp ern halpern@cs.cornell.edu Cornel l University, Computer Scienc e Dep artment Ithac a, NY 14853 http://www.cs.c ornel l.e du/home/ha lp ern Abstract Co x's w ell-kno wn theorem justifying the use of probabilit y is sho wn not to hold in nite domains. The coun terexample also suggests that Co x's assumptions are insucien t to pro v e the result ev en in innite domains. The same coun terexample is used to dispro v e a result of Fine on comparativ e conditional probabilit y . 1. In tro duction One of the b est-kno wn and seemingly most comp elling justications of the use of probabilit y is giv en b y Co x (1946). Supp ose w e ha v e a function Bel that asso ciates a real n um b er with eac h pair ( U; V ) of subsets of a domain W suc h that U 6 = ; . W e write Bel ( V j U ) rather than Bel ( U; V ), since w e think of Bel ( V j U ) as the credibilit y or lik eliho o d of V giv en U . 1 Co x further assumes that Bel ( V j U ) is a function of Bel ( V j U ) (where V denotes the complemen t of V in W ), that is, there is a function S suc h that A1. Bel ( V j U ) = S (Bel ( V j U )) if U 6 = ; , and that Bel ( V \ V 0 j U ) is a function of Bel ( V 0 j V \ U ) and Bel ( V j U ), that is, there is a function F suc h that A2. Bel ( V \ V 0 j U ) = F (Bel ( V 0 j V \ U ) ; Bel ( V j U )) if V \ U 6 = ; . Notice that if Bel is a probabilit y function, then w e can tak e S ( x ) = 1  x and F ( x; y ) = xy . Co x mak es m uc h w eak er assumptions: he assumes that F is t wice dieren tiable, with a con tin uous second deriv ativ e, and that S is t wice dieren tiable. Under these assumptions, he sho ws that Bel is isomorphic to a probabilit y distribution in the sense that there is a con tin uous one-to-one on to function g : I R ! I R suc h that g  Bel is a probabilit y distribution on W , and g (Bel ( V j U ))  g (Bel ( U )) = g (Bel ( V \ U )) if U 6 = ; , (1) where Bel ( U ) is an abbreviation for Bel ( U j W ). Not surprisingly , Co x's result has attracted a great deal of in terest, particularly in the maxim um en trop y comm unit y and, more recen tly , in the AI comm unit y . F or example 1. Co x writes V j U rather than Bel ( V j U ), and tak es U and V to b e prop ositions in some language rather than ev en ts, i.e., subsets of a giv en set. This dierence is minor|there are w ell-kno wn mappings from prop ositions to ev en ts, and vice v ersa. I use ev en ts here since they are more standard in the probabilit y literature. c  1999 AI Access F oundation and Morgan Kaufmann Publishers. All righ ts reserv ed. Halpern  Cheeseman (1988) has called it the \strongest argumen t for use of standard (Ba y esian) probabilit y theory". Similar sen timen ts are expressed b y Ja ynes (1978, p. 24); indeed, Co x's Theorem is one of the cornerstones of Ja ynes' recen t b o ok (1996).  Horvitz, Hec k erman, and Langlotz (1986) used it as a basis for comparison of proba- bilit y and other nonprobabilistic approac hes to reasoning ab out uncertain t y .  Hec k erman (1988) used it as a basis for pro viding an axiomatization for b elief up date. The main con tribution of this pap er is to sho w (b y means of an explicit coun terexample) that Co x's result do es not hold in nite domains, ev en under strong assumptions on S and F (stronger than those made b y Co x and those made in all pap ers pro ving v arian ts of Co x's results). Since nite domains are arguably those of most in terest in AI applications, this suggests that argumen ts for using probabilit y based on Co x's result|and other justications similar in spirit|m ust b e tak en with a grain of salt, and their pro ofs carefully review ed. Moreo v er, the coun terexample suggests that Co x's assumptions are insucien t to pro v e the result ev en in innite domains. It is kno wn that some assumptions regarding F and S m ust b e made to pro v e Co x's result. Dub ois and Prade (1990) giv e an example of a function Bel , dened on a nite domain, that is not isomorphic to a probabilit y distribution. F or this c hoice of Bel , w e can tak e F ( x; y ) = min ( x; y ) and S ( x ) = 1  x . Since min is not t wice dieren tiable, Co x's assumptions blo c k the Dub ois-Prade example. Other authors ha v e made dieren t assumptions. Acz  el (1966, Section 7 (Theorem 1)) do es not mak e an y assumptions ab out F , but he do es mak e t w o other assumptions, eac h of whic h blo c k the Dub ois-Prade example. The rst is that the Bel ( V j U ) tak es on ev ery v alue in some range [ e; E ], with e < E . In the Dub ois-Prade example, the domain is nite, so this certainly cannot hold. The second is that if V and V 0 are disjoin t, then there is a con tin uous function G : I R 2 ! I R , strictly increasing in eac h argumen t, suc h that A3. Bel ( V [ V 0 j U ) = G (Bel ( V j U ) ; Bel ( V 0 j U )). With these assumptions, he giv es a pro of m uc h in the spirit of that of Co x to sho w that Bel is essen tially a probabilit y distribution. Dub ois and Prade p oin t out that, in their example, there is no function G satisfying A3 (ev en if w e drop the requiremen t that G b e con tin uous and strictly increasing in eac h argumen t). 2 Reic hen bac h (1949) earlier pro v ed a result similar to Acz  el's, under somewhat stronger assumptions. In particular, he assumed A3, with G b eing +. Other v arian ts of Co x's result ha v e also b een considered in the literature. F or example, Hec k erman (1988) and Horvitz, Hec k erman, and Langlotz (1986) assume that F is con tin- uous and strictly increasing in eac h argumen t and S is con tin uous and strictly decreasing. Since min is not strictly con tin uous in eac h argumen t, it fails this restriction to o. 3 Aleliunas (1988) giv es y et another collection of assumptions and claims that they suce to guaran tee that Bel is essen tially a probabilit y distribution. 2. In fact, Acz  el allo ws there to b e a dieren t function G U for eac h set U on the righ t-hand side of the conditional. Ho w ev er, the Dub ois-Prade example do es not ev en satisfy this w eak er condition. 3. Actually , the restriction that F b e strictly increasing in eac h argumen t is a little to o strong. If e = Bel ( ; ), then it can b e sho wn that F ( e; x ) = F ( x; e ) = e for all x , so that F is not strictly increasing if one of its argumen ts is e . 68 A Counterexample to Theorems of Co x and Fine The rst to observ e p oten tial problems with Co x's result is P aris (1994). As he puts it, \Co x's pro of is not, p erhaps, as rigorous as some p edan ts migh t prefer and when an attempt is made to ll in all the details some of the attractiv eness of the original is lost." P aris pro vides a rigorous pro of of the result, assuming that the range of Bel is con tained in [0 ; 1] and using assumptions similar to those of Horvitz, Hec k erman, and Langlotz. In particular, he assumes that F is con tin uous and strictly increasing in (0 ; 1] 2 and that S is decreasing. Ho w ev er, he mak es use of one additional assumption that, as he himself sa ys, is not v ery app ealing: A4. F or all 0  ;  ;   1 and  > 0, there are sets U 1  U 2  U 3  U 4 suc h that U 3 6 = ; , and eac h of j Bel ( U 4 j U 3 )   j , j Bel ( U 3 j U 2 )   j , and j Bel ( U 2 j U 1 )   j is less than  . Notice that this assumption forces the range of Bel to b e dense in [0 ; 1]. This means that, in particular, the domain W on whic h Bel is dened cannot b e nite. Is this assumption really necessary? P aris suggests that Acz  el needs something lik e it. (This issue is discussed in further detail b elo w.) The coun terexample of this pap er giv es further evidence. It sho ws that Co x's result fails in nite domains, ev en if w e assume that the range of Bel is in [0 ; 1], S ( x ) = 1  x (so that, in particular, S is t wice dieren tiable and monotonically decreasing), G ( x; y ) = x + y , and F is innitely dieren tiable and strictly increasing on (0 ; 1] 2 . W e can further assume that F is comm utativ e, F (0 ; x ) = F ( x; 0) = 0, and that F ( x; 1) = F (1 ; x ) = x . The example emphasizes the p oin t that the applicabilit y of Co x's result is far narro w er than w as previously b eliev ed. It remains an op en question as to whether there is an appropriate strengthening of the assumptions that do es giv e us Co x's result in nite settings. There is further discussion of this issue in Section 5. In fact, the example sho ws ev en more. In the course of his pro of, Co x claims to sho w that F m ust b e an asso ciativ e function, that is, that F ( x; F ( y ; z )) = F ( F ( x; y ) ; z ). F or the Bel of the coun terexample, there can b e no asso ciativ e function F satisfying A2. It is this observ ation that is the k ey to sho wing that there is no probabilit y distribution isomorphic to Bel . What is going on here? Actually , Co x's pro of just sho ws that F ( x; F ( y ; z )) = F ( F ( x; y ) ; z ) only for those triples ( x; y ; z ) suc h that, for some sets U 1 , U 2 , U 3 , and U 4 , w e ha v e x = Bel ( U 4 j U 3 \ U 2 \ U 1 ), y = Bel ( U 3 j U 2 \ U 1 ), and z = Bel ( U 2 j U 1 ). If the set of suc h triples ( x; y ; z ) is dense in [0 ; 1] 3 , then w e conclude b y con tin uit y that F is asso ciativ e. The con ten t of A4 is precisely that the set of suc h triples is dense in [0 ; 1] 3 . Of course, if W is nite, w e cannot ha v e densit y . As m y coun terexample sho ws, w e do not in general ha v e asso ciativit y in nite domains. Moreo v er, this lac k of asso ciativit y can result in the failure of Co x's theorem. A similar problem seems to exist in Acz  el's pro of (as already observ ed b y P aris (1994)). While Acz  el's pro of do es not in v olv e sho wing that F is asso ciativ e, it do es in v olv e sho wing that G is asso ciativ e. Again, it is not hard to sho w that G is asso ciativ e for appropriate triples, just as is the case for F . But it seems that Acz  el also needs an assumption that guaran tees that the appropriate set of triples is dense, and it is not clear that his assumptions 69 Halpern do in fact guaran tee this. 4 As sho wn in Section 2, the problem also arises in Reic hen bac h's pro of. The coun terexample to Co x's theorem, with sligh t mo dications, can also b e used to sho w that another w ell-kno wn result in the literature is not completely correct. In his sem- inal b o ok on probabilit y and qualitativ e probabilit y (1973), Fine considers a non-n umeric notion of c omp ar ative (c onditional) pr ob ability , whic h allo ws us to sa y \ U giv en V is at least as probable as U 0 giv en V 0 ", denoted U j V  U 0 j V 0 . Conditions on  are giv en that are claimed to force the existence of (among other things) a function Bel suc h that U j V  U 0 j V 0 i Bel ( U j V )  Bel ( U 0 j V 0 ) and an asso ciativ e function F satisfying A2. (This is Theorem 8 of Chapter I I in (Fine, 1973).) Ho w ev er, the Bel dened in m y coun terexample to Co x's theorem can b e used to giv e a coun terexample to this result as w ell. In terestingly , this is not the rst time a similar error has b een noted in the use of functional equations. F almagne (1981) giv es another example (in a case in v olving a utilit y mo del of c hoice b eha vior) and men tions that he kno ws \of at least t w o similar examples in the psyc hological literature". The remainder of this pap er is organized as follo ws. In the next section there is a more detailed discussion of the problem in Co x's pro of. The coun terexample to Co x's theorem is giv en in Section 3. The follo wing section sho ws that it is also a coun terexample to Fine's theorem. Section 5 concludes with some discussion, particularly of assumptions under whic h Co x's theorem migh t hold. 2. The Problem With Co x's Pro of T o understand the problems with Co x's pro of, I actually consider Reic hen bac h's pro of, whic h is similar in spirit Co x's pro of (it is actually ev en closer to Acz  el's pro of ), but uses some additional assumptions, whic h mak es it easier to explain in detail. Acz  el, Co x, and Reic hen bac h all mak e critical use of functional equations in their pro of, and they mak e the same (seemingly unjustied) leap at corresp onding p oin ts in their pro ofs. In the notation of this pap er, Reic hen bac h (1949, pp. 65{67) assumes (1) that the range of Bel ( j ) is a subset of [0 ; 1], (2) Bel ( V j U ) = 1 if U  V , (3) that if V and V 0 are disjoin t, then Bel ( V [ V 0 j U ) = Bel ( V j U ) + Bel ( V 0 j U ) (th us, he assumes that A3 holds, with G b eing +), and (4) that A2 holds with a function F that is dieren tiable. (He remarks that the result holds ev en without assumption (4), although the pro of is more complicated; Acz  el in fact do es not mak e an assumption lik e (4).) Reic hen bac h's pro of pro ceeds as follo ws: Replacing V 0 in A2 b y V 1 [ V 2 , where V 1 and V 2 are disjoin t, w e get that Bel ( V \ ( V 1 [ V 2 ) j U ) = F (Bel ( V 1 [ V 2 j V \ U ) ; Bel ( V j U )) : (2) Using the fact that G is +, w e immediately get Bel ( V \ ( V 1 [ V 2 ) j U ) = Bel ( V \ V 1 j U ) + Bel ( V \ V 2 j U ) (3) 4. I should stress that m y coun terexample is not a coun terexample to Acz  el's theorem, since he explicitly assumes that the range of Bel is innite. Ho w ev er, it do es p oin t out p oten tial problems with his pro of, and certainly sho ws that his argumen t do es not apply to nite domains. Acz  el is in fact a w are of the problems with his pro of [priv ate comm unication, 1996]. He later pro v ed results in a similar spirit with the aid of a requiremen t of nonatomicity (Acz  el & Daro czy , 1975, pp. 5{6), whic h is in fact a stronger requiremen t than A4, and th us also requires the domain to b e innite. 70 A Counterexample to Theorems of Co x and Fine and F (Bel ( V 1 [ V 2 j V \ U ) ; Bel ( V j U )) = F (Bel ( V 1 j V \ U ) + Bel ( V 2 j V \ U ) ; Bel ( V j U )) (4) Moreo v er, b y A2, w e also ha v e, for i = 1 ; 2, Bel ( V \ V i j U ) = F (Bel ( V \ V i j V \ U ) ; Bel ( V j U )) : (5) Putting together (2), (3), (4), and (5), w e get that F (Bel ( V \ V 1 j V \ U ) ; Bel ( V j U )) + F (Bel ( V \ V 2 j V \ U ) ; Bel ( V j U )) = F (Bel ( V \ V 1 j V \ U ) + Bel ( V \ V 2 j V \ U ) ; Bel ( V j U )) : (6) T aking x = Bel ( V \ V 1 j V \ U ), y = Bel ( V \ V 2 j V \ U ), and z = Bel ( V j U ) in (6), w e get the functional equation F ( x; z ) + F ( y ; z ) = F ( x + y ; z ) : (7) Supp ose that w e assume (as Reic hen bac h implicitly do es) that this functional equation holds for all ( x; y ; z ) 2 P = f ( x; y ; z ) 2 [0 ; 1] 3 : x + y  1 g . The rest of the pro of no w follo ws easily . First, taking x = 0 in (7), it follo ws that F (0 ; z ) + F ( y ; z ) = F ( y ; z ) ; from whic h w e get that F (0 ; z ) = 0 : Next, x z and let g z ( x ) = F ( x; z ). Since F is, b y assumption, dieren tiable, from (7) w e ha v e that g 0 z ( x ) = lim y ! 0 ( F ( x + y ; z )  F ( x; z ) =y ) = lim y ! 0 F ( y ; z ) =y : It th us follo ws that g 0 z ( x ) is a constan t, indep enden t of x . Since the constan t ma y dep end on z , there is some function h suc h that g 0 z ( x ) = h ( z ). Using the fact that F (0 ; z ) = 0, elemen tary calculus tells us that g z ( x ) = F ( x; z ) = h ( z ) x: Using the assumption that for all U; V , w e ha v e Bel ( V j U ) = 1 if U  V , w e get that Bel ( V j U ) = Bel ( V \ V j U ) = F (Bel ( V j V \ U ) ; Bel ( V j U )) = F (1 ; Bel ( V j U )) : Th us, w e ha v e that F (1 ; z ) = h ( z ) = z : W e conclude that F ( x; z ) = xz . Note, ho w ev er, that this conclusion dep ends in a crucial w a y on the assumption that the functional equation (7) holds for all ( x; y ; z ) 2 P . 5 In fact, all that w e can conclude from (6) is that it holds for all ( x; y ; z ) suc h that there exist U , V , V 1 , and V 2 , with V 1 and V 2 disjoin t, suc h that x = Bel ( V \ V 1 j V \ U ), y = Bel ( V \ V 2 j V \ U ), and z = Bel ( V j U ). 5. Actually , using the con tin uit y of F , it suces that the functional equation holds for a set of triples whic h is dense in P . 71 Halpern Let us sa y that a triple that satises this condition is R-c onstr aine d (since it m ust satisfy certain constrain ts imp osed b y the F and G functions; the R here is for Reic hen bac h, to distinguish this notion from a similar one dened in the next section.) As I men tioned earlier, Acz  el also assumes that Bel ( V j U ) tak es on all v alues in [ e; E ], where e = Bel ( ;j U ) and E = Bel ( U j U ). (In Reic hen bac h's form ulation, e = 0 and E = 1.) There are t w o w a ys to in terpret this assumption. The w eak in terpretation is that for eac h x 2 [0 ; 1], there exist U; V suc h that Bel ( V j U ) = x . The strong in terpretation is that for eac h U and x , there exists V suc h that Bel ( V j U ) = x . It is not clear whic h in terpretation is in tended b y Acz  el. Neither one ob viously suces to pro v e that ev ery triple in P is R-constrained, although it do es seem plausible that it migh t follo w from the second assumption. In an y case, neither Acz  el nor Reic hen bac h see a need to c hec k that Equation (7) holds throughout P . (Nor do es Co x for his analogous functional equation, nor do the authors of more recen t and p olished presen tations of Co x's result, suc h as Ja ynes (1996) and T ribus (1969).) Ho w ev er, it turns out to b e quite necessary to do this. Moreo v er, it is clear that if W is nite, there are only nitely tuples in P that are R-constrained, and it is not the case that all of P is. As w e shall see in the next section, this observ ation has serious consequences as far as all these pro ofs are concerned. 3. The Coun terexample to Co x's Theorem The goal of this section is to pro v e Theorem 3.1 : Ther e is a function Bel 0 , a nite domain W , and functions S , F , and G satisfying A1, A2, and A3 r esp e ctively such that  Bel 0 ( V j U ) 2 [0 ; 1] for U 6 = ; ,  S ( x ) = 1  x (so that S is strictly de cr e asing and innitely dier entiable),  G ( x; y ) = x + y (so that G is strictly incr e asing in e ach ar gument and is innitely dier entiable),  F is innitely dier entiable, nonde cr e asing in e ach ar gument in [0 ; 1] 2 , and strictly in- cr e asing in e ach ar gument in (0 ; 1] 2 . Mor e over, F is c ommutative, F ( x; 0) = F (0 ; x ) = 0 , and F ( x; 1) = F (1 ; x ) = x . However, ther e is no one-to-one onto function g : [0 ; 1] ! [0 ; 1] satisfying (1). Note that the h yp otheses on Bel 0 , S , G , and F are at least as strong as those made in all the other v arian ts of Co x's result, while the assumptions on g are w eak er than those made in the v arian ts. F or example, there is no requiremen t that g b e con tin uous or increasing nor that g  Bel 0 is a probabilit y distribution (although P aris and Acz  el b oth pro v e that, under their assumptions, g can b e tak en to satisfy all these requiremen ts). This serv es to mak e the coun terexample quite strong. 72 A Counterexample to Theorems of Co x and Fine The pro of of Theorem 3.1 is constructiv e. Consider a domain W with 12 p oin ts: w 1 ; :::; w 12 . W e asso ciate with eac h p oin t w 2 W a w eigh t f ( w ), as follo ws. f ( w 1 ) = 3 f ( w 4 ) = 5  10 4 f ( w 2 ) = 2 f ( w 5 ) = 6  10 4 f ( w 3 ) = 6 f ( w 6 ) = 8  10 4 f ( w 7 ) = 3  10 8 f ( w 10 ) = 3  10 18 f ( w 8 ) = 8  10 8 f ( w 11 ) = 2  10 18 f ( w 9 ) = 8  10 8 f ( w 12 ) = 14  10 18 F or a subset U of W , w e dene f ( U ) = P w 2 U f ( w ). Th us, w e can dene a probabilit y distribution Pr on W b y taking Pr( U ) = f ( U ) =f ( W ). Let f 0 b e iden tical to f , except that f 0 ( w 10 ) = (3   )  10 18 and f 0 ( w 11 ) = (2 +  )  10 18 , where  is dened b elo w. Again, w e extend f 0 to subsets of W b y dening f 0 ( U ) = P w 2 U f 0 ( w ). Let W 0 = f w 10 ; w 11 ; w 12 g . If U 6 = ; , dene Bel 0 ( V j U ) = ( f 0 ( V \ U ) =f ( U ) if W 0  U f ( V \ U ) =f ( U ) otherwise. Bel 0 is clearly v ery close to Pr . If U 6 = ; , then it is easy to see that j Bel 0 ( V j U )  Pr( V j U ) j = j f 0 ( V \ U )  f ( V \ U ) j =f ( U )   . W e c ho ose  > 0 so that if Pr ( V j U ) > Pr( V 0 j U 0 ), then Bel 0 ( V j U ) > Bel 0 ( V 0 j U 0 ). (8) Since the range of Pr is nite, all sucien tly small  satisfy (8). The exact c hoice of w eigh ts ab o v e is not particularly imp ortan t. One thing that is imp ortan t though is the follo wing collection of equalities: Pr( w 1 jf w 1 ; w 2 g ) = Pr ( w 10 jf w 10 ; w 11 g ) = 3 = 5 Pr( f w 1 ; w 2 gjf w 1 ; w 2 ; w 3 g ) = Pr( w 4 jf w 4 ; w 5 g ) = 5 = 11 Pr( f w 4 ; w 5 gjf w 4 ; w 5 ; w 6 g ) = Pr( f w 7 ; w 8 gjf w 7 ; w 8 ; w 9 g ) = 11 = 19 Pr( w 4 jf w 4 ; w 5 ; w 6 g ) = Pr ( f w 10 ; w 11 gjf w 10 ; w 11 ; w 12 g ) = 5 = 19 Pr( w 1 jf w 1 ; w 2 ; w 3 g ) = Pr ( w 7 jf w 7 ; w 8 g ) = 3 = 11 : (9) It is easy to c hec k that exactly the same equalities hold if w e replace Pr b y Bel 0 . W e sho w that Bel 0 satises the requiremen ts of Theorem 3.1 b y a sequence of lemmas. The rst lemma is the k ey to sho wing that Bel 0 cannot b e isomorphic to a probabilit y func- tion. It uses the fact (pro v ed in Lemma 3.3) that if Bel 0 w ere isomorphic to a probabilit y function, then there w ould ha v e to b e a function F satisfying A2 that is asso ciativ e. Al- though, as is sho wn in Lemma 3.7, the function F satisfying A2 can b e tak en to b e innitely dieren tiable and increasing in eac h argumen t, the equalities in (9) suce to guaran tee that it cannot b e tak en to b e asso ciativ e, that is, w e do not in general ha v e F ( x; F ( y ; z )) = F ( F ( x; y ) ; z ) : Indeed, there is no asso ciativ e function F satisfying A2, ev en if w e drop the requiremen ts that F b e dieren tiable or increasing. 73 Halpern Lemma 3.2 : F or Bel 0 as dene d ab ove, ther e is no asso ciative function F satisfying A2. Pro of: Supp ose there w ere suc h a function F . F rom (9), w e m ust ha v e that F (5 = 11 ; 11 = 19) = F (Bel 0 ( w 4 jf w 4 ; w 5 g ) ; Bel 0 ( f w 4 ; w 5 gjf w 4 ; w 5 ; w 6 g )) = Bel 0 ( w 4 jf w 4 ; w 5 ; w 6 g ) = 5 = 19 and that F (3 = 5 ; 5 = 11) = F (Bel 0 ( w 1 jf w 1 ; w 2 g ) ; Bel 0 ( f w 1 ; w 2 gjf w 1 ; w 2 ; w 3 g )) = Bel 0 ( w 1 jf w 1 ; w 2 ; w 3 g ) = 3 = 11 : It follo ws that F (3 = 5 ; F (5 = 11 ; 11 = 19) ) = F (3 = 5 ; 5 = 19) and that F ( F (3 = 5 ; 5 = 11) ; 11 = 19 ) = F (3 = 11 ; 11 = 19) : Th us, if F w ere asso ciativ e, w e w ould ha v e F (3 = 5 ; 5 = 19) = F (3 = 11 ; 11 = 19) : On the other hand, from (9) again, w e see that F (3 = 5 ; 5 = 19) = F (Bel 0 ( w 10 jf w 10 ; w 11 g ) ; Bel 0 ( f w 10 ; w 11 gjf w 10 ; w 11 ; w 12 g )) = Bel 0 ( w 10 jf w 10 ; w 11 ; w 12 g ) = (3   ) = 19 ; while F (3 = 11 ; 11 = 19) = F (Bel 0 ( w 7 jf w 7 ; w 8 g ) ; Bel 0 ( f w 7 ; w 8 gjf w 7 ; w 8 ; w 9 g )) = Bel 0 ( w 7 jf w 7 ; w 8 ; w 9 g ) = 3 = 19 : It follo ws that F cannot b e asso ciativ e. u t T o understand ho w Lemma 3.2 relates to our discussion in Section 2 of the problems with Reic hen bac h's pro of, w e sa y ( x; y ; z ) is a c onstr aine d triple if there exist sets U 1  U 2  U 3  U 4 with U 3 6 = ; suc h that x = Bel 0 ( U 4 j U 3 ), y = Bel 0 ( U 3 j U 2 ), and z = Bel 0 ( U 2 j U 1 ). It is easy to see that A2 forces F to b e asso ciativ e on constrained triples, since if w = Bel 0 ( U 3 j U 1 ) and w 0 = Bel 0 ( U 4 j U 2 ), b y A2, w e ha v e F ( x; F ( y ; z )) = F ( x; w ) = Bel 0 ( U 4 j U 1 ) and F ( F ( x; y ) ; z ) = F ( w 0 ; z ) = Bel 0 ( U 4 ; U 1 ). A4 sa ys that the set of constrained triples is dense in [0 ; 1] 3 . W e similarly dene ( x; y ) to b e a c onstr aine d p air if there exist sets U 1  U 2  U 3 with U 2 6 = ; suc h that x = Bel 0 ( U 3 j U 2 ) and y = Bel 0 ( U 2 j U 1 ). W e sa y that ( U 1 ; U 2 ; U 3 ) c orr esp onds to the constrained pair ( x; y ). (Note that there ma y b e more than one triple of sets corresp onding to a constrained pair.) If ( U 1 ; U 2 ; U 3 ) corresp onds to the constrained pair ( x; y ) and F satises A2, then w e m ust ha v e F ( x; y ) = Bel 0 ( U 3 j U 1 ). Note that b oth (3 = 5 ; 5 = 11) and (5 = 11 ; 11 = 19) are constrained pairs, although the triple (3 = 5 ; 5 = 11 ; 11 = 19 ) is not constrained. It is this fact that w e use in Lemma 3.2. The next lemma sho ws that Bel 0 cannot b e isomorphic to a probabilit y function. 74 A Counterexample to Theorems of Co x and Fine Lemma 3.3 : F or Bel 0 as dene d ab ove, ther e is no one-to-one onto function g : [0 ; 1] ! [0 ; 1] satisfying (1). Pro of: Supp ose there w ere suc h a function g . First note that g (Bel 0 ( U )) 6 = 0 if U 6 = ; . F or if g (Bel 0 ( U )) = 0, then it follo ws from (1) that for all V  U , w e ha v e g (Bel 0 ( V )) = g (Bel 0 ( V j U ))  g (Bel 0 ( U )) = g (Bel 0 ( V j U ))  0 = 0 : Th us, g (Bel 0 ( V )) = g (Bel 0 ( U )) for all subsets V of U . Since the denition of Bel 0 guaran tees that Bel 0 ( V ) 6 = Bel 0 ( U ) if V is a strict subset of U , this con tradicts the assumption that g is one-to-one. Th us, g (Bel 0 ( U )) 6 = 0 if U 6 = ; . It no w follo ws from (1) that if U 6 = ; , then g (Bel 0 ( V j U )) = g (Bel 0 ( V \ U )) =g (Bel 0 ( U )) : (10) No w dene F ( x; y ) = g  1 ( g ( x )  g ( y )). W e sho w that F dened in this w a y satises A2 and is asso ciativ e. This will giv e us a con tradiction to Lemma 3.2. T o see that F satises A2, notice that, b y applying the observ ation ab o v e rep eatedly , if V \ U 6 = ; , w e get F (Bel 0 ( V 0 j V \ U ) ; Bel 0 ( V j U )) = g  1 (( g (Bel 0 ( V 0 j V \ U ))  g (Bel 0 ( V j U )) = g  1 (( g (Bel 0 ( V 0 \ V \ U )) =g (Bel 0 ( V \ U )))  ( g (Bel 0 ( V \ U )) =g (Bel 0 ( U )))) = g  1 ( g (Bel 0 ( V 0 \ V \ U )) =g (Bel 0 ( U ))) = g  1 ( g (Bel 0 ( V 0 \ V j U ))) = Bel 0 ( V 0 \ V j U ) : Th us, F satises A2. T o see that F is asso ciativ e, note that F ( F ( x; y ) ; z ) = g  1 ( g ( g  1 ( g ( x )  g ( y )))  g ( z )) = g  1 ( g ( x )  g ( y )  g ( z )) = g  1 ( g ( x )  g ( g  1 ( g ( y )  g ( z )))) = F ( x; F ( y ; z )) : This giv es us the desired con tradiction to Lemma 3.2. It follo ws that Bel 0 cannot b e isomorphic to a probabilit y function. u t Despite the fact that Bel 0 is not isomorphic to a probabilit y function, functions S , F , and G can b e dened that satisfy A1, A2, and A3, resp ectiv ely , and all the other requiremen ts stated in Theorem 3.1. The argumen t for S and G is easy; all the w ork go es in to pro ving that an appropriate F exists. Lemma 3.4 : Ther e exists an innitely dier entiable, strictly de cr e asing function S : [0 ; 1] ! [0 ; 1] such that Bel 0 ( V j U ) = S ( Bel 0 ( V j U )) for al l sets U; V  W with U 6 = ; . In fact, we c an take S ( x ) = 1  x . Pro of: This is immediate from the observ ation that Bel 0 ( V j U ) = 1  Bel 0 ( V j U ) for U; V  W . u t 75 Halpern Lemma 3.5 : Ther e exists an innitely dier entiable function G : [0 ; 1] 2 ! [0 ; 1] , incr e asing in e ach ar gument, such that if U; V ; V 0  W , V \ V 0 = ; , and U 6 = ; , then Bel 0 ( V [ V 0 j U ) = G ( Bel 0 ( V j U ) ; Bel 0 ( V 0 ; U )) . In fact, we c an take G ( x; y ) = x + y . Pro of: This is immediate from the denition of Bel 0 . u t Th us, all that remains is to sho w that an appropriate F exists. The k ey step is pro vided b y the follo wing lemma, whic h essen tially sho ws that there is a w ell dened F that is increasing. Lemma 3.6 : If U 2 \ U 1 6 = ; and V 2 \ V 1 6 = ; , then (a) if Bel 0 ( V 3 j V 2 \ V 1 )  Bel 0 ( U 3 j U 2 \ U 1 ) and Bel 0 ( V 2 j V 1 )  Bel 0 ( U 2 j U 1 ) , then Bel 0 ( V 3 \ V 2 j V 1 )  Bel 0 ( U 3 \ U 2 j U 1 ) , (b) if Bel 0 ( V 3 j V 2 \ V 1 ) < Bel 0 ( U 3 j U 2 \ U 1 ) , Bel 0 ( V 2 j V 1 )  Bel 0 ( U 2 j U 1 ) , Bel 0 ( U 3 j U 2 \ U 1 ) > 0 , and Bel 0 ( U 2 j U 1 ) > 0 , then Bel 0 ( V 3 \ V 2 j V 1 ) < Bel 0 ( U 3 \ U 2 j U 1 ) , (c) if Bel 0 ( V 3 j V 2 \ V 1 )  Bel 0 ( U 3 j U 2 \ U 1 ) , Bel 0 ( V 2 j V 1 ) < Bel 0 ( U 2 j U 1 ) , Bel 0 ( U 3 j U 2 \ U 1 ) > 0 , and Bel 0 ( U 2 j U 1 ) > 0 , then Bel 0 ( V 3 \ V 2 j V 1 ) < Bel 0 ( U 3 \ U 2 j U 1 ) , Pro of: First observ e that if Bel 0 ( V 3 j V 2 \ V 1 )  Bel 0 ( U 3 j U 2 \ U 1 ) and Bel 0 ( V 2 j V 1 )  Bel 0 ( U 2 j U 1 ), then from (8), it follo ws that Pr( V 3 j V 2 \ V 1 )  Pr ( U 3 j U 2 \ U 1 ) and Pr ( V 2 j V 1 )  Pr ( U 2 j U 1 ). If w e ha v e either Pr ( V 3 j V 2 \ V 1 ) < Pr( U 3 j U 2 \ U 1 ) or Pr ( V 2 j V 1 ) < Pr( U 2 j U 1 ), then w e ha v e either Pr( V 3 \ V 2 j V 1 ) < Pr ( U 3 \ U 2 j U 1 ) or Pr( U 3 j U 2 \ U 1 ) = 0 or Pr ( U 2 j U 1 ) = 0. It follo ws that either Bel 0 ( V 3 \ V 2 j V 1 ) < Bel 0 ( U 3 \ U 2 j U 1 ) (this uses (8) again) or that Bel 0 ( V 3 \ V 2 j V 1 ) = Bel 0 ( U 3 \ U 2 j U 1 ) = 0. In either case, the lemma holds. Th us, it remains to deal with the case that Pr( V 3 j V 2 \ V 1 ) = Pr ( U 3 j U 2 \ U 1 ) and Pr ( V 2 j V 1 ) = Pr ( U 2 j U 1 ), and hence Pr ( V 3 \ V 2 j V 1 ) = Pr( U 3 \ U 2 j U 1 ). The details of this analysis are left to the app endix. u t Lemma 3.7 : Ther e exists a function F : [0 ; 1] 2 ! [0 ; 1] satisfying al l the assumptions of The or em 3.1 (with r esp e ct to Bel 0 ). Pro of: Dene a partial function F 0 on [0 ; 1] 2 whose domain D consists of all constrained pairs. F or a constrained pair, w e dene F 0 in the unique w a y required to satisfy A2. A priori , F 0 ma y not b e w ell dened; it is p ossible that there exist triples ( U 1 ; U 2 ; U 3 ) and ( V 1 ; V 2 ; V 3 ) that b oth corresp ond to ( x; y ) (i.e., x = Bel 0 ( U 3 j U 2 ) = Bel 0 ( V 3 j V 2 ) and y = Bel 0 ( U 2 j U 1 ) = Bel 0 ( V 2 j V 1 )) suc h that Bel 0 ( U 3 j U 1 ) 6 = Bel 0 ( V 3 j V 1 ). If this w ere the case, then F 0 ( x; y ) w ould not b e w ell dened. Ho w ev er, Lemma 3.6 sa ys that this cannot happ en. Moreo v er, Lemma 3.6 assures us that F 0 is increasing on D , and strictly increasing as long as one of its argumen ts is not 0. Indeed, if there is a triple ( U 1 ; U 2 ; U 3 ) corresp onding to ( x; y ) suc h that f w 10 ; w 11 ; w 12 g 6 U 1 , then w e m ust ha v e F 0 ( x; y ) = xy . The domain D of F 0 is nite. Let D 0 b e the comm utativ e closure of D , so that D 0 consists of D and all pairs ( y ; x ) suc h that ( x; y ) is in D . Extend F 0 to a comm utativ e function F 00 on D 0 b y dening F 00 ( y ; x ) = F 0 ( x; y ) if ( x; y ) 2 D . F 00 is w ell dened b ecause, as can easily b e v eried, if ( x; y ) and ( y ; x ) are b oth in D , one of x or y m ust b e 1, and 76 A Counterexample to Theorems of Co x and Fine F 0 ( x; 1) = F 0 (1 ; x ) = x . Clearly F 00 is comm utativ e. It is also increasing. F or supp ose ( x; y ) ; ( x 0 ; y 0 ) 2 D 0 , x  x 0 , and y  y 0 . If b oth ( x; y ) and ( x 0 ; y 0 ) are in D , w e m ust ha v e F 00 ( x; y )  F 00 ( x 0 ; y 0 ), since F 0 is increasing. Similarly , if b oth ( y ; x ) and ( y 0 ; x 0 ) are in D , w e m ust ha v e F 00 ( x; y ) = F 0 ( y ; x )  F 0 ( y 0 ; x 0 ) = F 00 ( x 0 ; y 0 ). Finally , if ( x; y ) and ( y 0 ; x 0 ) are in D , a straigh tforw ard c hec k o v er all p ossible elemen ts in D sho ws that this can happ en only if the triples ( U 1 ; U 2 ; U 3 ) and ( V 1 ; V 2 ; V 3 ) corresp onding to ( x; y ) and ( y 0 ; x 0 ) are suc h that f w 10 ; w 11 ; w 12 g is not a subset of either U 1 or V 1 . It follo ws that F 0 ( x; y ) = xy and F 0 ( y 0 ; x 0 ) = x 0 y 0 , so again w e get that F 00 is increasing. A similar argumen t sho ws that F 00 is strictly increasing as long as one of its argumen ts is not 0. It is straigh tforw ard to extend F 00 to a comm utativ e, innitely dieren tiable, and in- creasing function F dened on all of [0 ; 1] 2 , whic h is strictly increasing on (0 ; 1] 2 , and satises F ( x; 1) = F (1 ; x ) = x and F ( x; 0) = F (0 ; x ) = 0. W e pro ceed as follo ws. W e rst extend F 00 so that it is dened for all pairs ( x; y ) 2 [0 ; 1] 2 suc h that x  y so that it has the required prop erties. If x < y , w e then dene so that F ( x; y ) = F ( y ; x ). Since F 00 is comm utativ e, this denition agrees with F 00 ( x; y ) for x < y . Clearly F is comm utativ e and innitely dieren tiable. T o see that F is increasing, supp ose that x  x 0 and y  y 0 . Just as in the case of F 00 , it is immediate that F is increasing if b oth x  y and x 0  y 0 or b oth x < y and x 0 < y 0 . Otherwise, supp ose x  y and y 0  x 0 . Then w e ha v e y  x  x 0  y 0 . Since F is increasing on f ( x; y ) : x  y g , w e ha v e F ( x; y )  F ( x 0 ; y )  F ( x 0 ; x 0 )  F ( y 0 ; x 0 ) = F ( x 0 ; y 0 ). A similar argumen t sho ws that F is strictly increasing unless one its argumen ts is 0. Finally , F clearly satises A2, since (b y construction) F 0 do es, and A2 puts constrain ts only on the domain of F 0 . u t Theorem 3.1 no w follo ws from Lemmas 3.3, 3.4, 3.5, and 3.7. 4. The Coun terexample to Fine's Theorem Fine is in terested in what he calls c omp ar ative c onditional pr ob ability . Th us, rather than asso ciating a real n um b er with eac h \conditional ob ject" V j U , he puts an ordering  on suc h ob jects. As usual, V j U  V 0 j U 0 is tak en to b e an abbreviation for V j U  V 0 j U 0 and not( V 0 j U 0  V j U ). Fine is in terested in when suc h an ordering is induced b y a real-v alued b elief function with reasonable prop erties. He sa ys that a real-v alued function P on suc h ob jects agr e es with  if P ( V j U )  P ( V 0 j U 0 ) i V j U  V 0 j U 0 . Fine then considers a n um b er of axioms that  migh t satisfy . F or our purp oses, the most relev an t are the ones Fine denotes QCC1, QCC2, QCC5, and QCC7. QCC1 just sa ys that  is a linear order: QCC1. V j U  V 0 j U 0 or V 0 j U 0  V j U . QCC2 sa ys that  is transitiv e: QCC2. If V 1 j U 1  V 2 j U 2 and V 2 j U 2  V 3 j U 3 , then V 1 j U 1  V 3 j U 3 . QCC5 is a tec hnical condition in v olving notions of order top ology . The relev an t deni- tions are omitted here (see (Fine, 1973) for details), since QCC5, as Fine observ es, holds v acuously in nite domains (the only ones of in terest here). 77 Halpern QCC5. The set f V j U g has a coun table basis in the order top ology induced b y  . Finally , QCC7 essen tially sa ys that  is increasing, in the sense of Lemma 3.6. QCC7. (a) If V 3 j V 2 \ V 1  U 3 j U 2 \ U 1 and V 2 j V 1  U 2 j U 1 then V 3 \ V 2 j V 1  U 3 \ U 2 j U 1 . (b) If V 3 j V 2 \ V 1  U 2 j U 1 and V 2 j V 1  U 3 j U 2 \ U 1 then V 3 \ V 2 j V 1  U 3 \ U 2 j U 1 . (c) If V 3 j V 2 \ V 1  U 3 j U 2 \ U 1 , V 2 j V 1  U 2 j U 1 , and V 2 j V 1  ;j W , then V 3 \ V 2 j V 1  U 3 \ U 2 j U 1 . Fine then claims the follo wing theorem: Fine's Theorem: (Fine, 1973, Chapter I I, Theorem 8) If  satises QCC1, QCC2, QCC5, then ther e exists some agr e eing function P . Ther e exists a function F of two variables such that 1. P ( V \ V 0 j U ) = F ( P ( V 0 j V \ U ) ; P ( V j U )) , 6 2. F ( x; y ) = F ( y ; x ) , 3. F ( x; y ) is incr e asing in x for y > P ( ;j W ) , 4. F ( x; F ( y ; z )) = F ( F ( x; y ) ; z ) , 5. F ( P ( W j U ) ; y ) = y , 6. F ( P ( ;j U ) ; y ) = P ( ;j U ) . i  also satises QCC7. The only relev an t clauses for our purp oses are Clause (1), whic h is just A2, and Clause (4), whic h sa ys that F is asso ciativ e. As Lemma 3.2 sho ws, there is no asso ciativ e function satisfying A2 for Bel 0 . As I no w sho w, this means that Fine's theorem do es not quite hold either. Before doing so, let me briey touc h on a subtle issue regarding the domain of  . In the coun terexample of the previous section, Bel 0 ( V j U ) is dened as long as U 6 = ; . Fine do es not assume that the  relation is necessarily dened on all ob jects V j U suc h that U; V  W and U 6 = ; . He assumes that there is an algebra F of subsets of W (that is, a set of subsets closed under nite in tersections and complemen tation) and a subset F 0 of F closed under nite in tersections and not con taining the empt y set suc h that  is dened on conditional ob jects V j U suc h that V 2 F and U 2 F 0 . Since F 0 is closed under in tersection and do es not con tain the empt y set, F 0 cannot con tain disjoin t sets. If W is nite, then the only w a y a collection F 0 can meet Fine's restriction is if there is some nonempt y set U 0 suc h that all elemen ts in F 0 con tain U 0 . This restriction is clearly to o strong to the exten t that comparativ e conditional probabilit y is in tended to generalize probabilit y . If Pr is a probabilit y function, then it certainly mak es sense to compare Pr( V j U ) and Pr( V 0 j U 0 ) ev en 6. Fine assumes that P ( V \ V 0 j U ) = F ( P ( V j U ) ; P ( V 0 j V \ U )). I ha v e reordered the argumen ts here for consistency with Co x's theorem. 78 A Counterexample to Theorems of Co x and Fine if U and U 0 are disjoin t sets. Fine [priv ate comm unication, 1995] suggested that it migh t b e b etter to constrain QCC7 so that w e do not condition on ev en ts U that are equiv alen t to ; (where U is equiv alen t to ; if ;  U and U  ; ). Since the only ev en t equiv alen t to ; in the coun terexample of the previous section is ; itself, this means that the coun terexample can b e used without c hange. This is what is done in the pro of b elo w. I sho w b elo w ho w to mo dify the coun terexample so that it satises Fine's original restrictions. Theorem 4.1 : Ther e exists an or dering  satisfying QCC1, QCC2, QCC5, and QCC7, such that for every function P agr e eing with  , ther e is no asso ciative function F of two variables such that P ( V \ V 0 ) j U ) = F ( P ( V 0 j V \ U ) ; P ( V j U )) . Pro of: Let W and Bel 0 b e as in the coun terexample in the previous section. Dene  so that Bel 0 agrees with  . Th us, V j U  V 0 j U 0 i Bel 0 ( V j U )  Bel 0 ( V 0 j U 0 ). Clearly  satises QCC1 and QCC2. As w as men tioned earlier, since W is nite,  v acuously satises QCC5. Lemma 3.6 sho ws that  satises parts (a) and (c) of QCC7. T o sho w that  also satises part (b) of QCC7, w e m ust pro v e that if Bel 0 ( V 3 j V 2 \ V 1 )  Bel 0 ( U 2 j U 1 ) and Bel 0 ( V 2 j V 1 )  Bel 0 ( U 3 j U 2 \ U 1 ), then Bel 0 ( V 3 \ V 2 j V 1 )  Bel 0 ( U 3 \ U 2 j U 1 ). The pro of of this is almost iden tical to that of Lemma 3.6; w e simply exc hange the roles of Pr ( V 2 j V 1 ) and Pr ( V 3 j V 2 \ V 1 ) in that pro of. I lea v e the details to the reader. Lemma 3.2 sho ws that there is no asso ciativ e function F satisfying A2 for Bel 0 . All that w as used in the pro of w as the fact that Bel 0 satised the inequalities of (9). But these equalities m ust hold for an y function agreeing with  . Th us, exactly the same pro of sho ws that if P is an y function agreeing with  , then there is no asso ciativ e function F satisfying P ( V \ V 0 j U ) = F ( P ( V 0 j V \ U ) ; P ( V j U )). u t I conclude this section b y briey sk etc hing ho w the coun terexample can b e mo died so that it satises Fine's original restriction. Redene W b y adding one more elemen t w 0 . Redene f and f 0 so that f ( w 0 ) = f 0 ( w 0 ) = 10  5 ; in addition, redene f and f 0 on w 3 , w 6 , w 9 , and w 12 , so as to decrease their w eigh t b y 10  5 , the w eigh t of w 0 . Th us,  f ( w 3 ) = f 0 ( w 3 ) = 6  10  5 ,  f ( w 6 ) = f 0 ( w 6 ) = 8  10 4  10  5 ,  f ( w 9 ) = f 0 ( w 9 ) = 8  10 8  10  5 , and  f ( w 12 ) = f 0 ( w 12 ) = 14  10 18  10  5 . Finally , redene W 0 to b e f w 0 ; w 10 ; w 11 ; w 12 g . The denition of Bel 0 in terms of f , f 0 , and W 0 remains the same. With these redenitions, the pro ofs of the previous section go through essen tially unc hanged. In particular, the equalities in (9) no w hold if w e add w 0 to ev ery set. Let F 0 consist of all subsets of W con taining w 0 . Notice that F 0 is closed under in tersection and do es not con tain the empt y set. The lac k of asso ciativit y in Lemma 3.2 can no w b e demonstrated b y conditioning on sets in F 0 . As a consequence, w e get a coun terexample to Fine's theorem ev en when restricting to conditional ob jects that satisfy his restriction. 79 Halpern 5. Discussion Let me summarize the status of v arious results in the ligh t of the coun terexample of this pap er:  Co x's theorem as originally stated do es not hold in nite domains. Moreo v er, ev en in innite domains, the coun terexample and the discussion in Section 2 suggest that more assumptions are required for its correctness. In particular, the claim in his pro of that F is asso ciativ e do es not follo w.  Although the coun terexample giv en here is not a coun terexample to Acz  el's theo- rem, his assumptions do not seem strong enough to guaran tee that the function G is asso ciativ e, as he claims it is.  The v arian ts of Co x's theorem stated b y Hec k erman (1988), Horvitz, Hec k erman, and Langlotz (1986), and Aleliunas (1988) all succum b to the coun terexample.  The claim that the function F m ust b e asso ciativ e in Fine's theorem is incorrect. Fine has an analogous result (Fine, 1973, Chapter I I, Theorem 4) for unconditional comparativ e probabilit y in v olving a function G as in Acz  el's theorem. This function to o is claimed to b e asso ciativ e, and again, this do es not seem to follo w (although m y coun terexample do es not apply to that theorem). Of course, the in teresting question no w is what it w ould tak e to reco v er Co x's theo- rem. P aris's assumption A4 suces, as do es the stronger assumption of nonatomicit y (see F o otnote 4). As w e ha v e observ ed, A4 forces the domain of Bel to b e innite, as do es the assumption that the range of Bel is all of [0 ; 1]. W e can alw a ys extend a domain to an innite|indeed, uncoun table|domain b y assuming that w e ha v e an innite collection of indep enden t fair coins, and that w e can talk ab out outcomes of coin tosses as w ell as the original ev en ts in the domain. (This t yp e of \extendibilit y" assumption is fairly standard; for example, it is made b y Sa v age (1954) in quite a dieren t con text.) In suc h an extended domain, it seems reasonable to also assume that Bel v aries uniformly b et w een 0 (certain falseho o d) and 1 (certain truth). If w e also assume A4 (or something lik e it), w e can then reco v er Co x's theorem. Notice, ho w ev er, that this viewp oin t disallo ws a notion of b elief that tak es on only nitely man y gradations. Another p ossibilit y is to observ e that w e are not in terested in just one domain in isola- tion. Rather, what w e are in terested in is a notion of b elief Bel that applies uniformly to all domains. Th us, ev en if ( U; V ) and ( U 0 ; V 0 ) are pairs of subsets of dieren t (p erhaps ev en disjoin t) domains, if Bel ( V j U ) and Bel ( V 0 j U 0 ) are b oth 1 = 2, then w e w ould exp ect this to denote the same relativ e strength of b elief. In this setting, an analogue of A4 seems more reasonable. That is, w e can assume that for all 0  ;  ;   1 and  > 0, there is some domain W and subsets U 1 , U 2 , U 3 , and U 4 of W suc h that the conclusion of A4 holds. If w e further assume that the functions F , G , and S are also uniform across domains (that is, that A1, A2, and A3 hold for the same c hoice of F , G , and S in ev ery domain), then w e can again reco v er Co x's theorem. 7 7. This p oin t w as indep enden tly observ ed b y Je P aris [priv ate comm unication, 1996]. 80 A Counterexample to Theorems of Co x and Fine The idea of ha ving a notion of uncertain t y that applies uniformly in all domains seems implicit in some discussion in that Ja ynes' recen t b o ok on probabilit y theory (1996). Ja ynes fo cuses almost exclusiv ely on nite domains. 8 As he sa ys \In principle, ev ery problem m ust start with suc h nite set probabilities; extensions to innite sets is p ermitted only when this is the result of a w ell-dened and w ell-b eha v ed limiting pro cess from a nite set." T o mak e sense of this limiting pro cess, it seems that Ja ynes m ust b e assuming that the same notion of uncertain t y applies in all domains. Moreo v er, one can mak e argumen ts app ealing to con tin uit y that when w e consider suc h limiting pro cesses, w e can alw a ys nd subsets U 1 , U 2 , U 3 , and U 4 in some sucien tly ric h (but nite) extension of the original domain suc h that A4 holds. While this seems lik e p erhaps the most reasonable additional assumptions required to get Co x's result, it do es require us to consider man y domains at once. Moreo v er, it do es not allo w a notion of b elief that has only nitely man y gradations, let alone a notion of b elief that allo ws some ev en ts to b e considered incomparable in lik eliho o d. 9 Supp ose w e really are in terested in one particular nite domain, and w e do not w an t to extend it or consider all other p ossible domains. What assumptions do w e then need to get Co x's theorem? The coun terexample giv en here could b e circum v en ted b y requiring that F b e asso ciativ e on all tuples (rather than just on the constrained triples). Ho w ev er, if w e really are in terested in a single domain, the motiv ation for making requiremen ts on the b eha vior of F on b elief v alues that do not arise is not so clear. Moreo v er, it is far from clear that assuming that F is asso ciativ e suces to pro v e the theorem. F or example, Co x's pro of mak es use of v arious functional equations in v olving F and S , analogous to the equation (7) that app ears in Section 2. These functional equations are easily seen to hold for certain tuples. Ho w ev er, as w e sa w in Section 2, the pro of really requires that they hold for al l tuples. Just assuming that F is asso ciativ e do es not app ear to suce to guaran tee that the functional equations in v olving S hold for all tuples. F urther assumptions app ear necessary . Nir F riedman [priv ate comm unication] has conjectured that the follo wing condition, whic h sa ys that essen tially all b eliefs are distinct, suces:  if ;  U  V , ;  U 0  V 0 , and ( U; V ) 6 = ( U 0 ; V 0 ), then Bel ( U j V ) 6 = Bel ( U 0 j V 0 ). Ev en if this condition suces, note that it precludes, for example, a uniform probabilit y distribution, and th us again seems unduly restrictiv e. Another p ossibly in teresting line of researc h is that of c haracterizing the functions that satisfy Co x's assumptions. As the example giv en here sho ws, the class of suc h functions includes functions that are not isomorphic to an y probabilit y function. I conjecture that in fact it includes only functions that are in some sense \close" to a function isomorphic to a probabilit y distribution, although it is not clear exactly ho w \close" should b e dened (nor ho w in teresting this class really is in practice). So what do es all this sa y regarding the use of probabilit y? Not m uc h. Although I ha v e tried to argue here that Co x's justication of probabilit y is not quite as strong as 8. Actually , Ja ynes assigns probabilit y to prop ositions, not sets, but, as noted earlier, there is essen tially no dierence b et w een the t w o. 9. In terestingly , Ja ynes (1996, App endix A) admits that ha ving plausibilit y v alues b e elemen ts of a partially- ordered lattice ma y b e a reasonable alternativ e to traditional probabilit y theory . Nir F riedman and I (1995, 1996, 1997) ha v e recen tly dev elop ed suc h a theory and sho wn that it pro vides a useful basis for thinking ab out default reasoning and b elief revision. 81 Halpern previously b eliev ed, and the assumptions underlying the v arian ts of it need clarication, I am not trying to suggest that probabilit y should b e abandoned. There are man y other justications for its use. Ac kno wledgmen ts I'd lik e to thank Janos Acz  el, P eter Cheeseman, T erry Fine, Ron F agin, Nir F riedman, Da vid Hec k erman, Eric Horvitz, Christopher Meek, Je P aris, and the anon ymous referees for useful commen ts on the pap er. I'd also lik e to thank Judea P earl for p oin ting out Reic hen bac h's w ork to me and Janos Acz  el for p oin ting out F almagne's pap er. This w ork w as largely carried out while I w as at the IBM Almaden Researc h Cen ter. IBM's supp ort is gratefully ac kno wledged. The w ork w as also supp orted in part b y the NSF, under gran ts IRI- 95-03109 and IRI-96-25901, and the Air F orce Oce of Scien tic Researc h (AFSC), under gran t F94620-96-1-032 3. A preliminary v ersion of this pap er app ears in Pr o c. National Confer enc e on A rticial Intel ligenc e (AAAI '96), pp. 1313{1319 . App endix A. Pro of of Lemma 3.6 Recall that all that remains in the pro of of Lemma 3.6 is to deal with the case that Pr( V 3 j V 2 \ V 1 ) = Pr( U 3 j U 2 \ U 1 ) and Pr( V 2 j V 1 ) = Pr ( U 2 j U 1 ), and hence Pr ( V 3 \ V 2 j V 1 ) = Pr( U 3 \ U 2 j U 1 ). Before pro ceeding with the pro of, it is useful to collect some general facts ab out Pr . A set U is said to b e standar d if U is a subset of one of f w 1 ; w 2 ; w 3 g , f w 4 ; w 5 ; w 6 g , f w 7 ; w 8 ; w 9 g , or f w 10 ; w 11 ; w 12 g . A real n um b er a is said to b e r elevant if there exists some standard U and some arbitrary V suc h that a = Pr( V j U ). Notice that ev en if U 6 = ; is nonstandard, then, taking U 0 to b e the standard subset of U whic h has the greatest w eigh t, then j Pr( V j U )  Pr ( V j U 0 ) j < : 002. (This is the reason that the w eigh ts are m ultiplied b y factors suc h as 10 4 , 10 8 , and 10 18 .) Th us, for an y subsets V and U of W , w e ha v e that Pr( V j U ) is close to a relev an t n um b er (where \close" means \within .002"). Call a triple ( U; V ; V 0 ) of subsets of W go o d if Bel 0 ( V 0 \ V j U ) = Bel 0 ( V 0 j V \ U )  Bel 0 ( V j U ). Clearly if b oth ( U 1 ; U 2 ; U 3 ) and ( V 1 ; V 2 ; V 3 ) are go o d, then the lemma holds. Notice that if ( U; V ; V 0 ) is not go o d, then U  f w 10 ; w 11 ; w 12 g and f ( V \ f w 10 ; w 11 ; w 12 g ) 6 = f 0 ( V \ f w 10 ; w 11 ; w 11 g ), whic h means that V \ f w 10 ; w 11 ; w 12 g m ust con tain one of w 10 and w 11 , but not b oth, and th us m ust b e one of f w 10 g , f w 11 g , f w 10 ; w 12 g , or f w 11 ; w 12 g . Th us, w e ma y as w ell assume that at least one of ( U 1 ; U 2 ; U 3 ) or ( V 1 ; V 2 ; V 3 ) is not go o d. In that case, I claim that one of the follo wing m ust hold:  Bel 0 ( V 3 \ V 2 j V 1 ) = B el ( V 3 j V 2 \ V 1 ) = Bel 0 ( U 3 j U 2 \ U 1 ) = Bel 0 ( U 3 \ U 2 j U 1 ) = 0  U 3 \ U 2 \ U 1 = U 2 \ U 1 and V 3 \ V 2 \ V 1 = V 2 \ V 1  f ( U 1 ) = f ( V 1 ) and f ( U 1 \ U 2 ) = f ( V 1 \ V 2 ) In the rst case, w e ha v e already seen that the lemma holds. In the second case, w e ha v e Bel 0 ( V 3 \ V 2 j V 1 ) = Bel 0 ( V 2 j V 1 ), Bel 0 ( U 3 \ U 2 j U 1 ) = Bel 0 ( U 2 j U 1 ), and Bel 0 ( V 3 j V 2 \ V 1 ) = Bel 0 ( U 3 j U 2 \ U 1 ) = 1, so the lemma is easily seen to hold. Finally , in the third case, notice that since Pr ( U 2 \ U 3 j U 1 ) = Pr( V 2 \ V 3 j V 1 ), w e m ust also ha v e that f ( U 1 \ U 2 \ U 3 ) = 82 A Counterexample to Theorems of Co x and Fine f ( V 1 \ V 2 \ V 3 ). Moreo v er, it is easy to see that all these equalities m ust hold if f is replaced b y f 0 . Again, the lemma immediately follo ws. T o pro v e the claim, for deniteness, assume that ( U 1 ; U 2 ; U 3 ) is not go o d (an iden tical argumen t w orks if ( V 1 ; V 2 ; V 3 ) is not go o d). F rom the c haracterization ab o v e of triples that are not go o d, it follo ws that f ( U 1 \ U 2 ) = a  10 18 + b and f ( U 1 ) = 19  10 18 + c , where a 2 f 2 ; 3 ; 16 ; 17 g (dep ending on U 2 \ f w 10 ; w 11 ; w 12 g ), and b oth b; c < 20  10 8 . Clearly , the relev an t n um b er closest to Pr ( U 2 j U 1 ) is a= 19. Since Pr ( V 2 j V 1 ) = Pr( U 2 j U 1 ) b y assumption, Pr ( V 2 j V 1 ) is also close to a= 19. Th us, w e m ust ha v e that f ( V 1 \ V 2 ) = a  10 k + b 0 and f ( V 1 ) = 19  10 k + c 0 , where k 2 f 0 ; 4 ; 8 ; 18 g . In fact, it is easy to see that k is either 8 or 18, since there are no relev an t n um b ers of the form a= 19 (for a 2 f 2 ; 3 ; 16 ; 17 g ) that are close to Pr ( V j U ) if U  f w 1 ; w 2 ; w 3 ; w 4 ; w 5 ; w 6 g . In addition, if k = 18, then b 0 ; c 0 < 20  10 8 , while if k = 8, then b 0 ; c 0 < 20  10 4 . By standard arithmetic manipulation, w e ha v e that 10 18 ( ac 0  19 b 0 ) + 10 k (19 b  ac ) + ( bc 0  b 0 c ) = 0 : If k = 8, then it is easy to see that w e m ust ha v e ac 0  19 b 0 = 0, 19 b  ac = 0 and bc 0  b 0 c = 0, (11) while if k = 18, then w e m ust ha v e 19( b  b 0 ) + a ( c 0  c ) = 0 and bc 0  b 0 c = 0. (12) No w comes a case analysis. First supp ose that k = 8. Then w e m ust ha v e b 0 = c 0 = 0, since if c 0 6 = 0, then from (11) w e ha v e that b 0 =c 0 = a= 19, and it is easy to see that there do not exist sets T 1 and T 2 suc h that f ( T 1 ) = b 0 , f ( T 2 ) = c 0 , and b 0 =c 0 = a= 19, with b 0 ; c 0  20  10 4 . Th us, it follo ws that Pr ( U 2 j U 1 ) = Pr ( V 2 j V 1 ) = a= 19. Moreo v er, w e m ust ha v e V 1 = f w 7 ; w 8 ; w 9 g and V 2 \ V 1 either f w 7 g or f w 8 ; w 9 g , dep ending on a . It follo ws that Pr ( V 3 j V 2 \ V 1 ) m ust b e one of f 0 ; 1 = 2 ; 1 g . Since Pr ( U 3 j U 2 \ U 1 ) = Pr ( V 3 j V 2 \ V 1 ), w e m ust ha v e that Pr ( U 3 j U 2 \ U 1 ) 2 f 0 ; 1 = 2 ; 1 g . Since U 2 \ U 1 con tains exactly one of w 10 and w 11 , it is easy to see that Pr( U 3 j U 2 \ U 1 ) cannot b e 1 = 2. If Pr( U 3 j U 2 \ U 1 ) = Pr ( V 3 j V 2 \ V 1 ) = 0, then U 3 \ U 2 \ U 1 = V 3 \ V 2 \ V 1 = ; , and w e m ust ha v e Bel 0 ( U 3 \ U 2 j U 1 ) = Bel 0 ( V 3 \ V 2 j V 1 ) = 0, so the claim follo ws. On the other hand, if Pr( U 3 j U 2 \ U 1 ) = Pr ( V 3 j V 2 \ V 1 ) = 1, then U 3 \ U 2 \ U 1 = U 2 \ U 1 and V 3 \ V 2 \ V 1 = V 2 \ V 1 , and the claim again follo ws. No w supp ose k = 18. If c = c 0 , then b y (12), w e m ust ha v e that b = b 0 . It immediately follo ws that f ( U 1 ) = f ( V 1 ) and f ( U 1 \ U 2 ) = f ( V 1 \ V 2 ), so the claim holds. Th us, w e can supp ose c 6 = c 0 . Supp ose that c 6 = 0 (an iden tical argumen t w orks if c 6 = 0). Then there exists some x 6 = 1 suc h that c = xc 0 . Since bc 0  b 0 c = 0, it follo ws that b = xb 0 . Substituting xb 0 for b and xc 0 for c in (12), w e get that (1  x ) b 0 = (1  x ) c 0 = a= 19, from whic h it follo ws that b 0 =c 0 = a= 19. Moreo v er, w e also get that either b = c = 0 or b=c = a= 19. It is easy to c hec k that a m ust b e either 3 or 16. If b=c = a= 19, then w e m ust ha v e b = b 0 and c = c 0 . As w e ha v e seen, this suces to pro v e the claim. Th us, w e can assume that b = c = 0. But this means that U 1 = f w 10 ; w 11 ; w 12 g , and that U 1 \ U 2 is either f w 10 g or f w 11 ; w 12 g . It follo ws that the only p ossibilities for Pr ( U 3 j U 2 \ U 1 ) are 0, 1 = 8, 7 = 8, or 1. It is easy to see that Pr ( V 3 j V 2 \ V 1 ) cannot b e 1 = 8 or 7 = 8, while the cases where it is either 0 or 1 are easily tak en care of, as ab o v e. This completes the pro of of the claim and of the lemma. u t 83 Halpern References Acz  el, J. (1966). L e ctur es on F unctional Equations and Their Applic ations . Academic Press, New Y ork. Acz  el, J., & Daro czy , Z. (1975). On Me asur es of Information and Their Char acterizations . Academic Press, New Y ork. Aleliunas, R. (1988). A summary of a new normativ e theory of probabilistic logic. In Pr o c e e dings of the F ourth Workshop on Unc ertainty in A rticial Intel ligenc e, Min- neap olis, MN, pp. 8{14. Also in R. Shac h ter, T. Levitt, L. Kanal, and J. Lemmer, editors, Unc ertainty in A rticial Intel ligenc e 4, pages 199{206. North-Holland, New Y ork, 1990. Cheeseman, P . (1988). An inquiry in to computer understanding. Computational Intel li- genc e , 4 (1), 58{66. Co x, R. (1946). Probabilit y , frequency , and reasonable exp ectation. A meric an Journal of Physics , 14 (1), 1{13. Dub ois, D., & Prade, H. (1990). The logical view of conditioning and its application to p ossibilit y and evidence theories. International Journal of Appr oximate R e asoning , 4 (1), 23{46. F almagne, J. C. (1981). On a recurren t misuse of a classical functional equation result. Journal of Mathematic al Psycholo gy , 23 (2), 190{193. Fine, T. L. (1973). The ories of Pr ob ability . Academic Press, New Y ork. F riedman, N., & Halp ern, J. Y. (1995). Plausibilit y measures: a user's man ual. In Pr o c. Eleventh Confer enc e on Unc ertainty in A rticial Intel ligenc e (UAI '95) , pp. 175{184. F riedman, N., & Halp ern, J. Y. (1996). Plausibilit y measures and default reasoning. In Pr o c e e dings, Thirte enth National Confer enc e on A rticial Intel ligenc e (AAAI '96) , pp. 1297{1304. F riedman, N., & Halp ern, J. Y. (1997). Mo deling b elief in dynamic systems. part I: foun- dations. A rticial Intel ligenc e , 95 (2), 257{316. Hec k erman, D. (1988). An axiomatic framew ork for b elief up dates. In Lemmer, J. F., & Kanal, L. N. (Eds.), Unc ertainty in A rticial Intel ligenc e 2 , pp. 11{22. North-Holland, Amsterdam. Horvitz, E. J., Hec k erman, D., & Langlotz, C. P . (1986). A framew ork for comparing alter- nativ e formalisms for plausible reasoning. In Pr o c e e dings, Fifth National Confer enc e on A rticial Intel ligenc e (AAAI '86) , pp. 210{214. Ja ynes, E. T. (1978). Where do w e stand on maxim um en trop y?. In Levine, R. D., & T ribus, M. (Eds.), The Maximum Entr opy F ormalism , pp. 15{118. MIT Press, Cam bridge, Mass. 84 A Counterexample to Theorems of Co x and Fine Ja ynes, E. T. (1996). Pr ob ability The ory|The L o gic of Scienc e . Unpublished; a v ailable at h ttp://ba y es.wustl.edu. P aris, J. B. (1994). The Unc ertain R e asoner's Comp anion . Cam bridge Univ ersit y Press, Cam bridge, U.K. Reic hen bac h, H. (1949). The The ory of Pr ob ability . Univ ersit y of California Press, Berk eley . This is a translation and revision of the German edition, published as Wahrschein- lichkeitslehr e , in 1935. Sa v age, L. J. (1954). F oundations of Statistics . John Wiley & Sons, New Y ork. T ribus, M. (1969). R ational Descriptions, De cisions, and Designs . P ergammon Press, New Y ork. 85

A Counter Example to Theorems of Cox and Fine

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment