Conditional information and definition of neighbor in categorical random fields

“Conditional information and deﬁnition of nei gh b or in categorical random ﬁelds” Reza Hosseini, Univers it y of British Columb ia, 333-635 6 Agricultural Roa d, V ancouv er, BC, Canada, V6T1Z2 reza1317@gmail.com Abstract W e sho w that th e deﬁnition of neigh b or in Marko v r andom ﬁelds as deﬁned b y Be sag (19 74) when the join t distrib ution of the s ites is not p ositiv e is not we ll-deﬁned. In a random ﬁ eld with ﬁn ite num b er of sites we s tudy the conditions und er w hic h giving the v alue at extra sites w ill c hange the b elief of an agen t ab ou t one site. Also th e conditions un der which the information from some sites is equiv al en t to giving the v alue at all other sites is stud ied. These concepts provide an alternativ e to the concept of n eigh b or for general case wh er e th e p ositivit y condition of the join t do es n ot hold. Keyw ords: Mark o v r andom ﬁ elds; Neigh b or; Conditional probabilit y; Infor- mation 1 In tro ducti on This pap er studies the conditional probabilities and the deﬁnition of neigh b or in categorical random ﬁelds. These can b e used to describe spatial pro cesses e.g. in plan t ecology . W e start by the common deﬁnition o f neigh b or in Mark o v r a ndom ﬁelds and show tha t the deﬁnition is no t w ell-deﬁned when the join t distribution is not p ositiv e. The n w e provide a fra mew o rk to study the conditional probabilities giv en v arious amount of “information” . F or example, the conditional pro babilit y of one site giv en some others. Since the usual deﬁnition of neigh b or is not w ell-deﬁned when the “p ositivit y” condition of the join t distribution do es not hold, w e in tro duce some new concepts of “uninfo r mativ e set”, “suﬃcien t informatio n set” and “minimal information set”. Supp ose w e hav e a ﬁnite random ﬁeld consisting of n sites. The b elief o f an ag en t ab out one site can be summarized b y a proba bility distribution and can b e ch anged to a conditional distribution b y relieving new informa t io n whic h can 1 b e the v alue at some other sites. W e study when t he new information c hanges the agen t’s b elief and what is “suﬃcien t” information for the agen t in the sense that giving the info r mation w ould b e equiv alen t to giving the v a lue of all other sites. W e answ er some intere sting questions along the wa y . F o r example supp ose agent 1 has less information than agen t 2 regrading an ev ent A and a new informatio n is released. Now, suppose that a gen t 1 do es not change his b elief ab out A. One migh t conjecture that since agen t 2 has more information, he as we ll will not change his b elief after receiv ing the new information. W e sho w this conjecture is wrong by coun terexamples . 2 Neigh b or i n categ o rical rando m ﬁelds Supp ose (Ω , Σ , P ) is a probability space and { X i } n i =1 is a sto c hastic pro cess. Eac h X i tak es v a lues in M i , | M i | = m i < ∞ , and P ( x i ) > 0 , ∀ x i ∈ M i . W e use the shorthand notation: P ( x i | x i 1 · · · , x i k ) = P ( X i = x i | X i 1 = x i 1 · · · , X i k = x i k ) . Besag (1 9 74) and Cressie a nd Subash (1992), deﬁned the neigh b or a s follows: Deﬁnition 2.1 F or si te i, i = 1 , · · · , n , site j 6 = i i s c al le d a neighb o r if and only if the func tion a l form of the P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) is dep endent on x j . Note that in the ab ov e deﬁnition, w e need to mak e sure that the conditional probabilit y is deﬁned. The ab ov e conditio nal probability is deﬁned on E i = { ( x 1 , · · · , x n ) | P ( x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) > 0 } . W e show in the followin g example this deﬁnition is not w e ll-deﬁned in general since the functional form is not unique. Example 2.1 L et U 1 , · · · , U 4 denote a r andom sample fr om the uniform distribution that take only values 0 an d 1 e ach with pr ob ability 1/2. Deﬁne: X 1 = U 1 + U 2 , X 2 = [ X 1 ] + U 3 , X 3 = [ X 2 ] + U 4 , 2 wher e [ ] denotes the inte ger p art of a r e al numb er. By the last e quality in ab ove, X 3 if we know the val ue of X 2 , the value of X 1 wil l not give us extr a information. Henc e, P ( x 3 | x 2 , x 1 ) = P ( x 3 | x 2 ) . But sin c e [ X 2 ] = [ X 1 ] , we also have P ( x 3 | x 2 , x 1 ) = P ( x 3 | x 1 ) , wher e v er the c onditional pr ob ability is deﬁne d. T h is shows the deﬁni tion of ne i g hb or is not w el l - d eﬁne d in g e ner al. Next we show that t he p ositivity of the join t distribution implies that the deﬁnition of neigh b or is w ell-deﬁn ed. By p ositivit y of the joint distribution, w e mean ∀ x = ( x 1 , · · · , x n ) ∈ Π n i =1 M i , P ( X 1 = x 1 , · · · , X n = x n ) > 0 . Lemma 2.1 Supp ose X 1 , · · · , X n b e a c ate g o ric al r andom ﬁeld. If the joint distri- bution is s trictly p ositive then the c onc ept of neighb or is wel l-deﬁne d for this ﬁeld. Pro of Suppose J = { j 1 , · · · , j J } and H = { h 1 , · · · , h H } ar e sets of neigh bors of site i . Hence, P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) = f ( x j 1 , · · · , x j J ) also , P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) = g ( x h 1 , · · · , x h H ) F or some functions f , g . By p ositivity condition, the conditional probability is de- ﬁned ev erywhere. Hence, f ( x j 1 , · · · , x j J ) = g ( x h 1 , · · · , x h H ) , ∀ x = ( x 1 , · · · , x n ) ∈ Π n i =1 M i . Supp ose h ∈ H − J . Then x h do es not a pp ear on the left hand side so g is not dep enden t on x h . W e conclude H − J = ∅ . Similarly , J − H = ∅ . 3 3 Uninformativ e information sets In the following, w e consider the general case (when the p ositivity condition do es not ho ld) and deﬁne some useful concepts whic h are w ell-deﬁned eve n though the concept of neighbor is not as well-deﬁn ed a s deﬁned by Besag (1974). W e start by some useful deﬁnitions and lemmas regarding conditional prob- abilities. Consider the conditional probability P ( A | B ) where A, B are t w o ev en ts and P ( B ) > 0. Also consider a third ev en t C . It is interes ting t o study when C c hanges (or do es not change) our b eliefs ab out probability of A . F ormally , w e ha v e the following deﬁnition. Deﬁnition 3.1 We c al l C uninformative for A given B if P ( A | B , C ) = P ( A | B ) or P ( B , C ) = 0 . L et U N ( A | B ) to b e the set of al l even ts C such that P ( B , C ) = 0 or P ( A | B , C ) = P ( A | B ) . Lemma 3.1 U N ( A | B ) is close d under c ount able disjoint union. Pro of Supp ose, { C i } ∞ i =1 and C i ∩ C j = ∅ , i 6 = j . If for all C i , P ( B ∩ C i ) = 0 then result is trivial. Otherwise, Let I = { i | P ( B ∩ C i ) 6 = 0 , i = 1 , 2 , · · · } . P ( A | B , ∪ ∞ i =1 C i ) = P ( A, B , ∪ ∞ i =1 C i ) P ( B , ∪ ∞ i =1 C i ) = P i ∈ I P ( A, B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B , C i ) P ( B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B ) P ( B , C i ) P i ∈ I P ( B , C i ) = P ( A | B ) . One migh t also conjecture that U N ( A | B ) is closed under in tersec tion. W e sho w b y some coun terexamples , this is not true. 4 Example 3.1 Ω = { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } , A = { 1 , 2 , 3 , 4 } , B = Ω , C 1 = { 2 , 4 , 6 , 8 } , C 2 = { 1 , 3 , 5 , 8 } and c onsider a uniform pr ob ability distribution on Ω . Then P ( A | B ) = P ( A ) = 1 / 2 , P ( A | B , C 1 ) = P ( A | B , C 2 ) = 1 / 2 henc e C 1 , C 2 ∈ U N ( A | B ) . But P ( A | B , C 1 , C 2 ) = 0 while P ( B , C 1 , C 2 ) = 1 / 8 6 = 0 . Example 3.2 Consider the joint distribution for ( X , Y , Z ) given in T able 1, wher e every r ow has the same p r ob ability of 1/4 . S upp ose that two agents want to p r e dict the value of X . T h e ﬁrst p erson do es not have any information and the se c ond one knows that Z = 0 . Now, ass ume that we pr ovide extr a inform a tion to b oth a gents. The extr a information is the value o f Y . F or the ﬁrst agent at the b e ginning (b e for e the information ab o ut Y was given): P ( X = 0) = P ( X = 1 ) = 1 / 2 . After he knows the va lue o f Y : P ( X = 1 | Y = 0) = P ( X = 1 | Y = 1) = 1 / 2 . Henc e, the extr a information d o es n ot chang e the b elief o f the ﬁ rst agent ab out X . One might c onj e ctur e that sinc e the se c ond agent has mor e information than the ﬁrst and the new inform a tion did not help the ﬁrst agent up date his b elief, it sho uld not change the b elief of the se c ond agen t as wel l. T h is is not true! In fact a fter getting the extr a information, we have the fol low ing ine quality for the se c ond agent: 0 = P ( X = 1 | Z = 0 , Y = 1) 6 = P ( X = 1 | Z = 0 , Y = 0) = 1 / 2 . X Y Z 1 1 1 1 0 0 0 1 0 0 0 0 T able 1: The joint distribution of X , Y , Z W e to pro v e a seem ingly trivial fact ab out the conditional pr o babilities in the follo wing lemma. Lemma 3.2 Supp ose P ( A | B ) is deﬁne d. Also supp ose { C i } k i =1 , k = 1 , 2 , · · · , ∞ a (ﬁnite or c ountable) c ol le ction of disj o int sets such that ∪ k i =1 C i = Ω . Assume P ( B , C i ) = 0 or P ( A | B , C i ) = c. In other wor ds, P ( B , C i ) do es not dep en d on C i . Then C i ∈ U N ( A | B ) : P ( A | B , C i ) = P ( A | B ) or P ( B , C i ) = 0 . 5 Pro of Let I = { i | 1 ≤ i ≤ k , P ( B , C i ) > 0 } . Then we hav e P ( A | B ) = P k i =1 P ( A, B , C i ) P k i =1 P ( B , C i ) = P i ∈ I P ( A, B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B , C i ) P ( B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I cP ( B , C i ) P i ∈ I P ( B , C i ) = c. Corollary 3.1 Supp ose P ( x i | x i 1 , · · · , x i I ) dep ends only on x j 1 , · · · , x j J , wher e { j 1 , · · · , j J } ⊂ { i 1 , · · · , i I } , when the c onditional pr ob ability, P ( x i | x i 1 , · · · , x i I ) is deﬁne d. Then P ( x i | x i 1 , · · · , x i I ) = P ( x i | x j 1 , · · · , x j J ) , when the c onditional pr ob ability, P ( x i | x i 1 , · · · , x i I ) is deﬁne d. Pro of Fix ( x ′ j 1 , · · · , x ′ j J ). Let A = { X i = x i } and B = { X j 1 = x ′ j 1 , · · · , X j J = x ′ j J } . Let { k 1 , · · · , k K } = { i 1 , · · · , i I } − { j 1 , · · · , j J } . Consider the sets C x k 1 , ··· ,x k K = { X k 1 = x k 1 , · · · , X k K = x k K } , x k l ∈ M k l . These sets are disjoin t, there exist ﬁnitely man y of them and their union is Ω. Then b y the assumption P ( A | B , C x k 1 , ··· ,x k K ) = c, or P ( B , C x k 1 , ··· ,x k K ) = 0 . Now apply Lemma 3 .2 to A, B , C x k 1 , ··· ,x k K . 6 4 Suﬃcien t and minimal information sets This section introduces minimal and suﬃcien t informatio n sets. Suppose w e hav e n sites in the random ﬁeld indexed by 1 , 2 , · · · , n . W e denote a site b y i . Let i c = { 1 , 2 , · · · , n } − { i } b e the set of all other sites other than site i . Let I = { i 1 , · · · , i I } ⊂ { 1 , 2 , · · · , n } b e a collection of sites and let D I = D i 1 , ··· ,i I = { ( x i 1 , · · · , x i I ) | P ( x i 1 , · · · , x i I ) > 0 } Note that D dep ends on the set of the subscripts and not the order of them. Also note that D is the domain where the conditional probability giv en the v alues on the sites I is deﬁned. By p ( i |I ) , we mean t he conditional probability of site i giv en I deﬁne d on E i ; I = M i × D I . Also note that with the p ositivit y of the join ts distributions assumption: D I = D i 1 , ··· ,i I = Π I j =1 M i j . Since the concept of neigh b o r is not w ell-deﬁned in the general case, w e seek other useful deﬁnitions to study the general case. Note that P ( i |I ) is a function P ( i |I ) : M i × D I → [0 , 1 ] , P ( x i | x i 1 , · · · , x i I ) = P ( X i = x i | X i 1 = x i 1 , · · · , X i I = x i I ) . Deﬁnition 4.1 Suﬃcient information set: Supp ose J ⊂ I ⊂ { 1 , 2 , · · · , n } , J is c al le d a suﬃcient inform ation s e t for i , given I , if P ( i |I ) = P ( i |J ) , on E i ; I . We denote the set of al l such sets by S I ( i, I ) . Deﬁnition 4.2 I ⊂ 1 , 2 , · · · , n is c al le d a minimal in f o rmation set for i if P ( i |I ) 6 = P ( i |J ) for any J , J ⊂ I , J 6 = I . We de n ote the set of al l s uch sets by M I ( i ) . In the follow ing, w e study the pro p erties of S I (suﬃcien t information) a nd M I (minimal information) sets. First, let us see what happ ens if i ∈ I . In this case, { i } ∈ S I ( i, I ). Also, note that in general { i } ∈ M I ( i ) if m i > 1. (If m i = 1 then we need no information to say what the v a lue of site i is.) Also note that ∅ ∈ M I ( i ) in general. One migh t conjecture a smaller a set than a giv en minimal informa t io n set is a minimal set as w ell. This is not true! In example 3, { Y , Z } ∈ M I ( X ) but { Y } is not minimal since P ( X | Y ) = P ( X |∅ ). 7 Prop osition 4.1 Supp os e J ∈ S I ( i, I ) and H = I − J . Also as s ume ∅ 6 = N h 1 ⊂ M h 1 , · · · , ∅ 6 = N h H ⊂ M h H then P ( i |J ) = P ( i |J , x h 1 ∈ N h 1 , · · · , x h H ∈ N h H ) , whenever, the right hand side is deﬁne d. Pro of F ix ( x ′ j 1 , · · · , x ′ j J ), w e w an t to sho w P ( x i | x ′ j 1 , · · · , x ′ j J , x h 1 ∈ N h 1 , · · · , x h H ∈ N h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , whenev er the left hand side is deﬁned. But P ( x i | x ′ j 1 , · · · , x ′ j J , x h 1 , · · · , x h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , or P ( x ′ j 1 , · · · , x ′ j J , x h 1 , · · · , x h H ) = 0 , since J is suﬃcien t. Now use the fact that U N is closed under disjoint union a nd tak e the unio n ov er { X j 1 = x ′ j 1 , · · · , X j J = x ′ j J , X h 1 = x h 1 , · · · , X h 1 = x h H } x h 1 ∈ N h 1 , ··· ,x h H ∈ N h H Lemma 4.1 a) If J ∈ S I ( i, I ) and J ⊂ H ⊂ I then J ∈ S I ( i, H ) . b) If J ∈ S I ( i, I ) and J ⊂ H ⊂ I then H ∈ S I ( i, I ) . Pro of Let K = I − H . K = { k 1 , · · · , k K } . W e w an t to sho w that for a ﬁxed ( x ′ i 1 , · · · , x ′ i I ) ∈ D I , a) P ( x i | x ′ h 1 , · · · , x ′ h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , b) P ( x i | x ′ i 1 , · · · , x ′ i I ) = P ( x i | x ′ h 1 , · · · , x ′ h H ) By assumption for all ( x i 1 , · · · , x i I ) whic h their restriction to indices in K is ( x ′ k K , · · · , x ′ k K ) either P ( x i 1 , · · · , x i I ) = 0 or P ( x i | x i 1 , · · · , x i I ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) . 8 On t he left ha nd side tak e the union ov er { X k 1 = x k 1 , · · · , X k K = x k K } x k l ∈ M k l . W e get P ( x i | x ′ h 1 , · · · , x ′ h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) = P ( x i | x ′ i 1 , · · · , x ′ i I ) . T o generalize the concept o f neighbor, w e can use the suﬃcien t information and minimal information sets. W e call a set eﬃcien tly suﬃcien t for site i if it is minimal and suﬃcien t for i giv en i c . i.e. I is eﬃcien t ly suﬃcien t for i if and o nly if I ∈ M I ( i ) ∩ S I ( i, i c ). W e denote the set of all suc h sets E S ( i ). If for some i , E S ( i ) has o nly one elemen t, w e call that elemen t a neigh b or of site i . Note that the deﬁnition of neigh bo r coincide with the deﬁnition of neighbor by Besag (19 7 4) and Cressie and Subash (1992) if the p ositivit y condition holds. In the f o llo wing example we show t ha t this is no t necessary . Example 4.1 Consider the joint distribution of X , Y as given by T able 2 , wher e every r ow is e q ual ly pr ob able. Then the p ositivity c ondition do es not hold sinc e P ( X = 1 , Y = 0) = 0 . But for X , the site Y is a neighb or sinc e Y ∈ M I ( X ) ∩ S I ( X , Y ) . Also for Y , X is a neighb or. X Y 1 1 0 1 0 0 T able 2: The joint distribution of X , Y References J. Besag. Spatial in teractions and the statistical analysis of lattice systems. Journal of the R oyal Statistic al So cie ty se ries B , pa g es 1 92–225, 1974. N. Cressie and L. Subash. New mo dels for Mark o v random ﬁelds. Journal of Applie d Pr ob ability , pages 877–8 84, 199 2. 9

Conditional information and definition of neighbor in categorical random fields

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment