Conditional information and definition of neighbor in categorical random fields

We show that the definition of neighbor in Markov random fields as defined by Besag (1974) when the joint distribution of the sites is not positive is not well-defined. In a random field with finite number of sites we study the conditions under which…

Authors: ** 정보 제공되지 않음 (논문에 저자 정보가 명시되지 않음) **

“Conditional information and definition of nei gh b or in categorical random fields” Reza Hosseini, Univers it y of British Columb ia, 333-635 6 Agricultural Roa d, V ancouv er, BC, Canada, V6T1Z2 reza1317@gmail.com Abstract W e sho w that th e definition of neigh b or in Marko v r andom fields as defined b y Be sag (19 74) when the join t distrib ution of the s ites is not p ositiv e is not we ll-defined. In a random fi eld with fin ite num b er of sites we s tudy the conditions und er w hic h giving the v alue at extra sites w ill c hange the b elief of an agen t ab ou t one site. Also th e conditions un der which the information from some sites is equiv al en t to giving the v alue at all other sites is stud ied. These concepts provide an alternativ e to the concept of n eigh b or for general case wh er e th e p ositivit y condition of the join t do es n ot hold. Keyw ords: Mark o v r andom fi elds; Neigh b or; Conditional probabilit y; Infor- mation 1 In tro ducti on This pap er studies the conditional probabilities and the definition of neigh b or in categorical random fields. These can b e used to describe spatial pro cesses e.g. in plan t ecology . W e start by the common definition o f neigh b or in Mark o v r a ndom fields and show tha t the definition is no t w ell-defined when the join t distribution is not p ositiv e. The n w e provide a fra mew o rk to study the conditional probabilities giv en v arious amount of “information” . F or example, the conditional pro babilit y of one site giv en some others. Since the usual definition of neigh b or is not w ell-defined when the “p ositivit y” condition of the join t distribution do es not hold, w e in tro duce some new concepts of “uninfo r mativ e set”, “sufficien t informatio n set” and “minimal information set”. Supp ose w e hav e a finite random field consisting of n sites. The b elief o f an ag en t ab out one site can be summarized b y a proba bility distribution and can b e ch anged to a conditional distribution b y relieving new informa t io n whic h can 1 b e the v alue at some other sites. W e study when t he new information c hanges the agen t’s b elief and what is “sufficien t” information for the agen t in the sense that giving the info r mation w ould b e equiv alen t to giving the v a lue of all other sites. W e answ er some intere sting questions along the wa y . F o r example supp ose agent 1 has less information than agen t 2 regrading an ev ent A and a new informatio n is released. Now, suppose that a gen t 1 do es not change his b elief ab out A. One migh t conjecture that since agen t 2 has more information, he as we ll will not change his b elief after receiv ing the new information. W e sho w this conjecture is wrong by coun terexamples . 2 Neigh b or i n categ o rical rando m fields Supp ose (Ω , Σ , P ) is a probability space and { X i } n i =1 is a sto c hastic pro cess. Eac h X i tak es v a lues in M i , | M i | = m i < ∞ , and P ( x i ) > 0 , ∀ x i ∈ M i . W e use the shorthand notation: P ( x i | x i 1 · · · , x i k ) = P ( X i = x i | X i 1 = x i 1 · · · , X i k = x i k ) . Besag (1 9 74) and Cressie a nd Subash (1992), defined the neigh b or a s follows: Definition 2.1 F or si te i, i = 1 , · · · , n , site j 6 = i i s c al le d a neighb o r if and only if the func tion a l form of the P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) is dep endent on x j . Note that in the ab ov e definition, w e need to mak e sure that the conditional probabilit y is defined. The ab ov e conditio nal probability is defined on E i = { ( x 1 , · · · , x n ) | P ( x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) > 0 } . W e show in the followin g example this definition is not w e ll-defined in general since the functional form is not unique. Example 2.1 L et U 1 , · · · , U 4 denote a r andom sample fr om the uniform distribution that take only values 0 an d 1 e ach with pr ob ability 1/2. Define: X 1 = U 1 + U 2 , X 2 = [ X 1 ] + U 3 , X 3 = [ X 2 ] + U 4 , 2 wher e [ ] denotes the inte ger p art of a r e al numb er. By the last e quality in ab ove, X 3 if we know the val ue of X 2 , the value of X 1 wil l not give us extr a information. Henc e, P ( x 3 | x 2 , x 1 ) = P ( x 3 | x 2 ) . But sin c e [ X 2 ] = [ X 1 ] , we also have P ( x 3 | x 2 , x 1 ) = P ( x 3 | x 1 ) , wher e v er the c onditional pr ob ability is define d. T h is shows the defini tion of ne i g hb or is not w el l - d efine d in g e ner al. Next we show that t he p ositivity of the join t distribution implies that the definition of neigh b or is w ell-defin ed. By p ositivit y of the joint distribution, w e mean ∀ x = ( x 1 , · · · , x n ) ∈ Π n i =1 M i , P ( X 1 = x 1 , · · · , X n = x n ) > 0 . Lemma 2.1 Supp ose X 1 , · · · , X n b e a c ate g o ric al r andom field. If the joint distri- bution is s trictly p ositive then the c onc ept of neighb or is wel l-define d for this field. Pro of Suppose J = { j 1 , · · · , j J } and H = { h 1 , · · · , h H } ar e sets of neigh bors of site i . Hence, P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) = f ( x j 1 , · · · , x j J ) also , P ( x i | x 1 , · · · , x i − 1 , x i +1 , · · · , x n ) = g ( x h 1 , · · · , x h H ) F or some functions f , g . By p ositivity condition, the conditional probability is de- fined ev erywhere. Hence, f ( x j 1 , · · · , x j J ) = g ( x h 1 , · · · , x h H ) , ∀ x = ( x 1 , · · · , x n ) ∈ Π n i =1 M i . Supp ose h ∈ H − J . Then x h do es not a pp ear on the left hand side so g is not dep enden t on x h . W e conclude H − J = ∅ . Similarly , J − H = ∅ . 3 3 Uninformativ e information sets In the following, w e consider the general case (when the p ositivity condition do es not ho ld) and define some useful concepts whic h are w ell-defined eve n though the concept of neighbor is not as well-defin ed a s defined by Besag (1974). W e start by some useful definitions and lemmas regarding conditional prob- abilities. Consider the conditional probability P ( A | B ) where A, B are t w o ev en ts and P ( B ) > 0. Also consider a third ev en t C . It is interes ting t o study when C c hanges (or do es not change) our b eliefs ab out probability of A . F ormally , w e ha v e the following definition. Definition 3.1 We c al l C uninformative for A given B if P ( A | B , C ) = P ( A | B ) or P ( B , C ) = 0 . L et U N ( A | B ) to b e the set of al l even ts C such that P ( B , C ) = 0 or P ( A | B , C ) = P ( A | B ) . Lemma 3.1 U N ( A | B ) is close d under c ount able disjoint union. Pro of Supp ose, { C i } ∞ i =1 and C i ∩ C j = ∅ , i 6 = j . If for all C i , P ( B ∩ C i ) = 0 then result is trivial. Otherwise, Let I = { i | P ( B ∩ C i ) 6 = 0 , i = 1 , 2 , · · · } . P ( A | B , ∪ ∞ i =1 C i ) = P ( A, B , ∪ ∞ i =1 C i ) P ( B , ∪ ∞ i =1 C i ) = P i ∈ I P ( A, B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B , C i ) P ( B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B ) P ( B , C i ) P i ∈ I P ( B , C i ) = P ( A | B ) . One migh t also conjecture that U N ( A | B ) is closed under in tersec tion. W e sho w b y some coun terexamples , this is not true. 4 Example 3.1 Ω = { 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 } , A = { 1 , 2 , 3 , 4 } , B = Ω , C 1 = { 2 , 4 , 6 , 8 } , C 2 = { 1 , 3 , 5 , 8 } and c onsider a uniform pr ob ability distribution on Ω . Then P ( A | B ) = P ( A ) = 1 / 2 , P ( A | B , C 1 ) = P ( A | B , C 2 ) = 1 / 2 henc e C 1 , C 2 ∈ U N ( A | B ) . But P ( A | B , C 1 , C 2 ) = 0 while P ( B , C 1 , C 2 ) = 1 / 8 6 = 0 . Example 3.2 Consider the joint distribution for ( X , Y , Z ) given in T able 1, wher e every r ow has the same p r ob ability of 1/4 . S upp ose that two agents want to p r e dict the value of X . T h e first p erson do es not have any information and the se c ond one knows that Z = 0 . Now, ass ume that we pr ovide extr a inform a tion to b oth a gents. The extr a information is the value o f Y . F or the first agent at the b e ginning (b e for e the information ab o ut Y was given): P ( X = 0) = P ( X = 1 ) = 1 / 2 . After he knows the va lue o f Y : P ( X = 1 | Y = 0) = P ( X = 1 | Y = 1) = 1 / 2 . Henc e, the extr a information d o es n ot chang e the b elief o f the fi rst agent ab out X . One might c onj e ctur e that sinc e the se c ond agent has mor e information than the first and the new inform a tion did not help the first agent up date his b elief, it sho uld not change the b elief of the se c ond agen t as wel l. T h is is not true! In fact a fter getting the extr a information, we have the fol low ing ine quality for the se c ond agent: 0 = P ( X = 1 | Z = 0 , Y = 1) 6 = P ( X = 1 | Z = 0 , Y = 0) = 1 / 2 . X Y Z 1 1 1 1 0 0 0 1 0 0 0 0 T able 1: The joint distribution of X , Y , Z W e to pro v e a seem ingly trivial fact ab out the conditional pr o babilities in the follo wing lemma. Lemma 3.2 Supp ose P ( A | B ) is define d. Also supp ose { C i } k i =1 , k = 1 , 2 , · · · , ∞ a (finite or c ountable) c ol le ction of disj o int sets such that ∪ k i =1 C i = Ω . Assume P ( B , C i ) = 0 or P ( A | B , C i ) = c. In other wor ds, P ( B , C i ) do es not dep en d on C i . Then C i ∈ U N ( A | B ) : P ( A | B , C i ) = P ( A | B ) or P ( B , C i ) = 0 . 5 Pro of Let I = { i | 1 ≤ i ≤ k , P ( B , C i ) > 0 } . Then we hav e P ( A | B ) = P k i =1 P ( A, B , C i ) P k i =1 P ( B , C i ) = P i ∈ I P ( A, B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I P ( A | B , C i ) P ( B , C i ) P i ∈ I P ( B , C i ) = P i ∈ I cP ( B , C i ) P i ∈ I P ( B , C i ) = c. Corollary 3.1 Supp ose P ( x i | x i 1 , · · · , x i I ) dep ends only on x j 1 , · · · , x j J , wher e { j 1 , · · · , j J } ⊂ { i 1 , · · · , i I } , when the c onditional pr ob ability, P ( x i | x i 1 , · · · , x i I ) is define d. Then P ( x i | x i 1 , · · · , x i I ) = P ( x i | x j 1 , · · · , x j J ) , when the c onditional pr ob ability, P ( x i | x i 1 , · · · , x i I ) is define d. Pro of Fix ( x ′ j 1 , · · · , x ′ j J ). Let A = { X i = x i } and B = { X j 1 = x ′ j 1 , · · · , X j J = x ′ j J } . Let { k 1 , · · · , k K } = { i 1 , · · · , i I } − { j 1 , · · · , j J } . Consider the sets C x k 1 , ··· ,x k K = { X k 1 = x k 1 , · · · , X k K = x k K } , x k l ∈ M k l . These sets are disjoin t, there exist finitely man y of them and their union is Ω. Then b y the assumption P ( A | B , C x k 1 , ··· ,x k K ) = c, or P ( B , C x k 1 , ··· ,x k K ) = 0 . Now apply Lemma 3 .2 to A, B , C x k 1 , ··· ,x k K . 6 4 Sufficien t and minimal information sets This section introduces minimal and sufficien t informatio n sets. Suppose w e hav e n sites in the random field indexed by 1 , 2 , · · · , n . W e denote a site b y i . Let i c = { 1 , 2 , · · · , n } − { i } b e the set of all other sites other than site i . Let I = { i 1 , · · · , i I } ⊂ { 1 , 2 , · · · , n } b e a collection of sites and let D I = D i 1 , ··· ,i I = { ( x i 1 , · · · , x i I ) | P ( x i 1 , · · · , x i I ) > 0 } Note that D dep ends on the set of the subscripts and not the order of them. Also note that D is the domain where the conditional probability giv en the v alues on the sites I is defined. By p ( i |I ) , we mean t he conditional probability of site i giv en I define d on E i ; I = M i × D I . Also note that with the p ositivit y of the join ts distributions assumption: D I = D i 1 , ··· ,i I = Π I j =1 M i j . Since the concept of neigh b o r is not w ell-defined in the general case, w e seek other useful definitions to study the general case. Note that P ( i |I ) is a function P ( i |I ) : M i × D I → [0 , 1 ] , P ( x i | x i 1 , · · · , x i I ) = P ( X i = x i | X i 1 = x i 1 , · · · , X i I = x i I ) . Definition 4.1 Sufficient information set: Supp ose J ⊂ I ⊂ { 1 , 2 , · · · , n } , J is c al le d a sufficient inform ation s e t for i , given I , if P ( i |I ) = P ( i |J ) , on E i ; I . We denote the set of al l such sets by S I ( i, I ) . Definition 4.2 I ⊂ 1 , 2 , · · · , n is c al le d a minimal in f o rmation set for i if P ( i |I ) 6 = P ( i |J ) for any J , J ⊂ I , J 6 = I . We de n ote the set of al l s uch sets by M I ( i ) . In the follow ing, w e study the pro p erties of S I (sufficien t information) a nd M I (minimal information) sets. First, let us see what happ ens if i ∈ I . In this case, { i } ∈ S I ( i, I ). Also, note that in general { i } ∈ M I ( i ) if m i > 1. (If m i = 1 then we need no information to say what the v a lue of site i is.) Also note that ∅ ∈ M I ( i ) in general. One migh t conjecture a smaller a set than a giv en minimal informa t io n set is a minimal set as w ell. This is not true! In example 3, { Y , Z } ∈ M I ( X ) but { Y } is not minimal since P ( X | Y ) = P ( X |∅ ). 7 Prop osition 4.1 Supp os e J ∈ S I ( i, I ) and H = I − J . Also as s ume ∅ 6 = N h 1 ⊂ M h 1 , · · · , ∅ 6 = N h H ⊂ M h H then P ( i |J ) = P ( i |J , x h 1 ∈ N h 1 , · · · , x h H ∈ N h H ) , whenever, the right hand side is define d. Pro of F ix ( x ′ j 1 , · · · , x ′ j J ), w e w an t to sho w P ( x i | x ′ j 1 , · · · , x ′ j J , x h 1 ∈ N h 1 , · · · , x h H ∈ N h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , whenev er the left hand side is defined. But P ( x i | x ′ j 1 , · · · , x ′ j J , x h 1 , · · · , x h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , or P ( x ′ j 1 , · · · , x ′ j J , x h 1 , · · · , x h H ) = 0 , since J is sufficien t. Now use the fact that U N is closed under disjoint union a nd tak e the unio n ov er { X j 1 = x ′ j 1 , · · · , X j J = x ′ j J , X h 1 = x h 1 , · · · , X h 1 = x h H } x h 1 ∈ N h 1 , ··· ,x h H ∈ N h H Lemma 4.1 a) If J ∈ S I ( i, I ) and J ⊂ H ⊂ I then J ∈ S I ( i, H ) . b) If J ∈ S I ( i, I ) and J ⊂ H ⊂ I then H ∈ S I ( i, I ) . Pro of Let K = I − H . K = { k 1 , · · · , k K } . W e w an t to sho w that for a fixed ( x ′ i 1 , · · · , x ′ i I ) ∈ D I , a) P ( x i | x ′ h 1 , · · · , x ′ h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) , b) P ( x i | x ′ i 1 , · · · , x ′ i I ) = P ( x i | x ′ h 1 , · · · , x ′ h H ) By assumption for all ( x i 1 , · · · , x i I ) whic h their restriction to indices in K is ( x ′ k K , · · · , x ′ k K ) either P ( x i 1 , · · · , x i I ) = 0 or P ( x i | x i 1 , · · · , x i I ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) . 8 On t he left ha nd side tak e the union ov er { X k 1 = x k 1 , · · · , X k K = x k K } x k l ∈ M k l . W e get P ( x i | x ′ h 1 , · · · , x ′ h H ) = P ( x i | x ′ j 1 , · · · , x ′ j J ) = P ( x i | x ′ i 1 , · · · , x ′ i I ) . T o generalize the concept o f neighbor, w e can use the sufficien t information and minimal information sets. W e call a set efficien tly sufficien t for site i if it is minimal and sufficien t for i giv en i c . i.e. I is efficien t ly sufficien t for i if and o nly if I ∈ M I ( i ) ∩ S I ( i, i c ). W e denote the set of all suc h sets E S ( i ). If for some i , E S ( i ) has o nly one elemen t, w e call that elemen t a neigh b or of site i . Note that the definition of neigh bo r coincide with the definition of neighbor by Besag (19 7 4) and Cressie and Subash (1992) if the p ositivit y condition holds. In the f o llo wing example we show t ha t this is no t necessary . Example 4.1 Consider the joint distribution of X , Y as given by T able 2 , wher e every r ow is e q ual ly pr ob able. Then the p ositivity c ondition do es not hold sinc e P ( X = 1 , Y = 0) = 0 . But for X , the site Y is a neighb or sinc e Y ∈ M I ( X ) ∩ S I ( X , Y ) . Also for Y , X is a neighb or. X Y 1 1 0 1 0 0 T able 2: The joint distribution of X , Y References J. Besag. Spatial in teractions and the statistical analysis of lattice systems. Journal of the R oyal Statistic al So cie ty se ries B , pa g es 1 92–225, 1974. N. Cressie and L. Subash. New mo dels for Mark o v random fields. Journal of Applie d Pr ob ability , pages 877–8 84, 199 2. 9

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment