Is SP BP?
The Survey Propagation (SP) algorithm for solving $k$-SAT problems has been shown recently as an instance of the Belief Propagation (BP) algorithm. In this paper, we show that for general constraint-satisfaction problems, SP may not be reducible from…
Authors: Ronghui Tu, Yongyi Mao, Jiying Zhao
1 Is SP BP? Ronghui T u , Y ongyi Mao and Jiying Zhao School of Information T echnology and Engineering Uni v ersity of Ottaw a 800 King Edward A venue Otta wa, Ontario, K1N 6N5, C anada Email: { rtu, yymao, jyzhao } @site.uottaw a.ca Abstract The Survey Pr opagation (SP) algorithm for solving k - SA T prob lems has been sho wn recently as an instance of the Belief Propagation (BP) algorithm . In this pap er , we show that f or general c onstraint- satisfaction problems, SP may not be reducib le from BP . W e also establish the con ditions u nder which such a redu ction is p ossible. Alon g our dev elopm ent, we present a unification of the existing SP algorithm s in terms of a prob abilistically interp retable iterative procedur e — weigh ted Prob abilistic T oken Passing. Index T erms Survey Propagation, Belief Prop agation, constraint satisfaction, Markov random field, factor gr aph, message-passing a lgorithm, k -SA T , q -COL 2 I . I N T RO D U C T I O N Surve y Propagation (SP) [1] i s a recent algorithmic breakthrough in sol ving certain hard families of const raint sati sfaction problems (CSPs). Deriv ed from statistical physics, SP first demonstrated its power in s olving classic prototypical NP-complete problems, the k -SA T prob- lems [2]. — For random inst ances of these problems in the hard regime, SP is shown to be the first efficient solver [1]. Recently , SP has also been applied to other CSPs, includi ng other NP- complete prob lem families such as graph coloring (or q -COL) problems [3], as well as prob lems arising in communications and data compressions, som e examples being coding for B lackwell channels [4] and quant ization of Bernoulli sequences [5]. In all these cases, g reat successes have been demonstrated. Po werful as it appears, SP howe ver largely remains as a heuristic algorithm to date, where analytic understanding of its algorit hmic nature and rigo rous characterization of its performance are widely op en and of great curio sity and research im portance. Similar to the well-kn o wn Belief Propagatio n (BP) algorithm used in i terativ e decoding [6] and st atistical inference [7], SP o perates by iteratively passi ng “messages” in a factor graph representation [8] of the problem i nstance, where each var iable vertex corresponds to a var iable whose value is t o be decided and each fun ction vertex correspon ds to a local constraint imposed on the variables. This observation has ins pired a recent research ef fort in understand ing whether SP may be vi e wed as a special case of BP . — The significance of questions of such a kind has been witnessed repeatedly in the history of communication research, for example, in understanding the V it erbi algorithm as a dynamic prog ramming algorithm [9], in understanding the t urbo decoding algorithm [10] as an instance of Belief Propagati on [11], and in unifyin g t he BCJR algorithm [12] and the V iterbi algorith m under the umbrell a of the generalized distributiv e law [13], etc. These unified framew orks have on one hand provided additional insight s into the nature of the algorithm s, a nd on the oth er hand allo wed an easier access of the algorithm by much wider research communit ies. Specific to the question “is SP BP”, i f SP may be understood as an instance of BP , then t he existing analytic techniques of BP are readily app licable to analyzin g SP; if SP can not be characterized as a special case of BP , one is t hen motivated to seek a differe nt algorithmic framew ork t o which SP belongs or t o discover the unique algorithm ic nature of SP . The first result reporting th at SP is an in stance of BP is the work of [14] in the context of k - 3 SA T problems. T his result is generalized in [15] to an extended version of SP for solvin g k -SA T problems. Briefly , t he authors of [15] present a Mark ov R andom Field (MRF ) [16] formalism for k -SA T problems; a parameter , denoted by γ in thi s paper , i s used to parametrize the MRF . When the BP algorith m is derived on su ch an MRF , the BP message-update equat ions result in a fami ly of S P algorithms, referred to as weighted SP or SP ( γ ) in this p aper , parametrized b y γ ∈ [0 , 1] ; and when γ = 1 , SP ( γ ) is the origi nal (non-weighted) SP . In addition to extending SP — in the context of k -SA T p roblems — to a family of SP algorithms with tunable performance, another s ignificance of this resul t is a conclusive answer to the titul ar question in that cont ext, namely that SP is BP for the k -SA T prob lem family . T his result was re-developed in our earlier work [17 ] where a s impler MRF formalism usi ng Forney graphs [18 ] is presented and a more transparent reduction of BP messages to w eighted SP messages is g i ven. The o bjectiv e of this paper is to answer the questi on whether SP and m ore generally weight ed SP are special cases of BP for arbitrary CSPs beyond k -SA T problems. It is worth noting t hat weighted SP has only been presented for k -SA T problems, although its principle may be extended to design ing other CSPs in volving binary variables (see, e.g., [5]). Furthermo re, resultin g from BP on a properly defined M RF , weighted SP , unl ike th e original (non-weighted) SP , does not hav e a probabilis tic i nterpretation that does not rely on the MRF constructed in the style of [15] or [17] and the deri ved BP algorit hm thereby . Thus to answer t he question whether weighted SP i s BP for general CSPs, it is necessary t o formul ate weighted SP for arbitrary CSPs that generalizes non -weighted SP without relying on an y M RF and B P formalism. For this reason, this research and hence the structu re of this paper roughly spli t into two parts. The first part answers the q uestion what SP and weigh ted SP exactly are by presenting a p robabilistically interpretable formulati on of both no n-weighted and weigh ted SP for arbitrary CSPs. The second part presents a MRF formali sm for general CSPs in the styl e of [15] or [17], deriv es the BP update equations, and answers the quest ion whether and how BP under such MRF formalism may be reduced to SP , if at all. Although this paper focuses on the second part, namely , o n answering whether SP algorith ms are i nstances of BP on a properly defined MRF , our effort in est ablishing what SP algorithm s are and how to formulate these algorit hms for general CSPs is noteworthy . First, the noti on of wei ghted SP , as not ed earlier , has onl y been presented for k -SA T probl ems as i n [15] and in sporadic example appl ications in volving on ly b inary variables such as in [5]. 4 As wi ll become clear in this p aper , the design philos ophy of weig hted SP for C SPs in v olving binary v ariables (such as in [15] and [5]) is not readily extendable to arbitrary CSPs with arbitrary var iable alphabets, si nce an important n otion underlying SP , namely , an appropriate extension of variable alphabets, is blurred in the binary special cases. Second, for non-weighted SP , we note that its formulation in the context of general CSPs primarily exists in the literature of stati stical physics (see, e.g., [19]). Altho ugh its design recipe has been laid out for arbitrary CSPs, its expositio n in statistical physi cs language has made it rather difficult for readers with prim arily engineering or computer science background. Thus, i n add ition t o serving as the basis for the inv estigation of BP-to-SP reduction, the first part of the paper als o aims at providing a clean, transparent and easily accessibl e formulation of SP algorith ms in i ts mos t g eneral form for arbit rary CSPs, without resortin g to statistical phy sics concepts. I I . M A I N R E S U LT S A N D P A P E R O R G A N I Z A T I O N The main results o f th is paper are sum marized as follows. In the first part, we form ulate SP and weigh ted SP for general CSPs as what we call “prob- abilistic token passing” (PTP) and “weighted probabil istic token passing” (weighted PTP) re- spectiv ely , wh ere a message is a distribution (or non-negati ve fun ction) on t he set of “tokens” associated with a variable. Here a “token” i s a non-empty subset of the variable’ s alphabet 1 . It has been pre vious ly observed in SP applied to various probl ems that a “joker” sym bol i s added to the original variable alphabet. Here we po int out t hat extending the alphabet by simply add ing a joker symbol is not sufficient for g eneral CSPs, particularly for those in v olving non-bi nary var iables. W e stress that the ri ght extension of the variable alphabet is to replace it with the set of all non-empty subs ets of the original alphabet. A lthough an equi valent treatment has been described i n so me previous lit erature for non -weighted SP [19 ], this p erspecti ve is for th e first time made explicit beyond statisti cal phys ics cont ext and for both no n-weighted and weighted 1 In fact more rigorously , a token is a non-empty subset of all possible assignments of a va riable – In t his paper , for more mathematical rigor and clarity , w e make a distinction between the alphabet of a v ariable and the set of all assignments to t he v ariable, w here an assignment to variable x v is treated as a function mapping the singleton set { v } t o the alphabet of x v . Ne vertheless, one may alw ays identify the set of all assignments to x v with the alphabet of x v via a one-to-one correspon dence and loosely refer to the set of all assignments of a variable as the alphabet of the variable. 5 SP . Based on this n otion of alphabet extension, we generalize weighted SP for arbitrary CS Ps in the form of weighted PTP . In other words, the weigh ted PTP formulatio n presented in this paper serves as a recipe for designin g weighted SP algorithm for arbitrary CSPs. In the second part, we present an MRF formalism — which we refer to as “normal ly realized MRF” — for arbitrary CSPs using Forney graphs, generalizing the MRF construct ion i n the style of [15] and [17] present ed for k -SA T prob lems. States, each cons isting of a left state and a right state, are introduced in t he MRF , where the left state corresponds to t he token passed from the var iable and the right state corresponds to t he token passed from the const raint. For any giv en CSP , the M RF is parametrized by a collection of weighting function s, each corresponding to a variable in the CSP; in the k -SA T special case, these weighti ng functions m ay reduce to a single parameter , γ . Noting the combinatorial importance of such MRF in the context of k -SA T problems [15], one expects that this general formul ation of MRF for arbitrary CSP may serve a similar role, namely providing a com binatorial framew ork describing the topology of the solution space [15]. This direction, clear ly deserving further in vestigatio n, is howe ver out of the scope of this paper . On the normally realized M RF formalism, we then p roceed t o deriv e the BP update equ ations and in vestigate the reduction o f BP t o weighted PTP (noting t hat weighted PTP is weigh ted SP and that non-weight ed SP is a special case of weighted SP). Primarily re-dev elopin g the results of [15] and [17] on BP-to-SP reducti on, we show that for k -SA T problems, BP is readily reducible to weighted PTP as long as a cond ition — which we refer to as th e st ate-decoupling con dition — is impos ed on the BP messages i n i nitialization. An interesting fact abou t this condition in the context of k -SA T problem s is that as long as the condit ion is sati sfied in the first BP i teration, it will continue to be satisfied in all iterations after . This form s the basis on which BP m essages may be simplified to t he form of weighted PTP messages. This condition, also ar isin g in [15] and [17] as a peculiar and curious construction, had not been explained prior to this work. In this paper , we argue th at th e st ate-decoupling condi tion serves a critical role in th e reduction of the weighted PTP m essages from the BP m essages derived from the M RF formali sm in the style of [15] and [17], or from the normally realized MRF presented in this paper . Using the example of 3 -COL probl ems, we show that such a conditio n is also needed in all BP iterations so as for BP to reduce to PTP . Howe ver , i n t hat case, we sh o w that this conditi on can no t be made satis fied in eve ry BP iteration (except for the trivial cases in which the BP m essages contain no useful 6 information) and one m ust manually impose this condi tion by manipulati ng th e BP messages in each iteration. This result on one hand justifies the important role of t he state-decoupling condition in the reduction of BP to PTP and on the other hand asserts th at BP is not PTP and hence not SP for 3 -COL problems ! At that poi nt, one is ready to conclude that weighted PTP or weighted SP is no t a special case of BP for general CSPs. The manu al m anipulation of BP m essages in 3 -COL problems, whi ch results in what we call state-decoupled BP bri ngs up a further q uestion, namely , for general CSPs, wheth er PTP and wei ghted PTP are readily expressed as state-decoupled BP . W e proceed to sho w that for general CSPs, the reduction of weig hted PTP from BP requi res yet another condition pertainin g t o the st ructure of the CSP . Briefly , this addit ional condi tion demands that the constraints in the CSP be “locall y comp atible” wit h each other in some sense. W e sho w that the local compatibili ty con dition of the CSP is the necessary and suf ficient condition for state-decoupled BP to reduce to weighted PTP or wei ghted SP . At that end, we complete the answer to the t itular question “is SP BP?”. As mentio ned earlier , in addit ion to answering whether SP i s BP , another o bjectiv e of this paper i s t o explain SP as s imply as possi ble. For this purpose, we hav e made an ef fort in presenting th is paper in a pedago gical m anner and carrying along t he examples of k -SA T and 3 -COL problems th roughout t he paper . The remainder of this paper is organized as follows. In Section III, we present a generic formulation of CSPs wh ile also int roducing various no tations that wil l be u sed in later parts of the paper . In Section IV, we introduce the existing SP algorit hms using the e xamples of k - SA T prob lems and 3 -COL problems, w here we purposefully a void SP formulatio ns in statistical physics languages. W e then proceed in Section V to p resent a general formul ation of SP algorithms i n t erms of PTP and wei ghted PTP . In Section VI, we present the normall y realized MRF formalism and present results concerning th e reduction of BP messages to SP messages. At t his tim e, how SP alg orithms behave over iterations and how they solve a CSP are important open p roblems. Although such question s are not of parti cular impo rtance for the purpose of this paper , complet ely ignoring them appears not s atisfactory to us and perhaps also to som e readers. For this reason, we present some preliminary results along those lines for understand ing t he dynamics of PTP . — These results are i ncluded in the Appendix so as to m aintain t he focus of this paper . The p aper is briefly concl uded i n Section VII. 7 I I I . A G E N E R I C F O R M U L A T I O N O F C O N S T R A I N T S A T I S FAC T I O N P R O B L E M S Let V be a finit e set, in which each elem ent will be referred to as a coordinate . Associated with each v ∈ V , t here is a finite alp habet χ v . For each v ∈ V , we wil l ass ume througho ut of this paper that every χ v is identical t o each other , and is therefore denoted by χ . W e note that this slight loss of generality is made onl y for lig htening the upcomi ng notatio ns, and that th ere is no diffic ulty to extend the results of this paper to m ore general cases wh ere χ v ’ s are different from each ot her . For any subset U ⊆ V , a χ -assignment x U on U is a function mapping U into the set χ . That is, a χ -assignment x U specifies a way to assign eac h coordinate u ∈ U a value in χ . The set of all χ -assignment s o n U will be denoted by χ U . When U is a si ngleton s et { u } , wh ich cont ains a si ngle coordinate u , we wi ll call χ -assi gnment x { u } on { u } an elementary ( χ -)assignment and writ e it as x u for simplicit y . Clearly , any given elementary χ -assi gnment x u is uniq uely specified by a value r ∈ χ , w hich is t he assig ned v alue in χ to coordinate u . In this case, this assignm ent is denoted by r u , for e xample, if χ := { 0 , 1 } , t hen th e only possibl e χ -assignments on { u } are 0 u and 1 u , wh ich are the element ary assign ments assig ning 0 and 1 to coordinate u , respectively . Suppose t hat U ⊂ W ⊆ V and that x W is a χ -assignm ent on W . W e will u se x W : U to d enote the (function) restri ction o f x W on U . For any subset of χ -assignments Ω ⊆ χ W on W , we denote the projection o f Ω on U by Ω : U . That is, Ω : U := { x W : U : x W ∈ Ω } . If coordinate s et U can b e partitioned in to disjo int sub sets A and B , then i t is ob vious that assignm ent x U decomposes into assi gnments x U : A and x U : B , and x U may be written as ( x U : A , x U : B ) (in any o rder). Evid ently , x U may be decomposed according to any partitio n of U , not nece ssarily two-fold partitions. In particular , if a collection of sets { U i : i ∈ I } , for some I , form a partition of U , then we may assignment x U as h x U : U i i i ∈I . For simplicity , we will wri te ( x A , x B ) and h x U i i i ∈I in place of ( x U : A , x U : B ) and h x U : U i i i ∈I respectiv ely . In fact, unl ess som e particular clarity is needed, we will always write x W : U simply as x U , making the underlying x W implicit. Furthermore, when U is a s ingleton set { u } , as mentioned earlier , we will simply denote it by x u , which reduces to the con ventional “variable” notation standard lit eratures of graphical mod els. 8 Giv en χ and V , the objecti ve of a c onst raint sati sfaction problem (CSP) is t o find a global χ -assignment x V that satisfies a g iv en s et of constraints or to conclude that no such assig nment exists. Formally , we will use set C to i ndex th e set of constraints { Γ c : c ∈ C } . Each const raint Γ c , c ∈ C , app lies to a subset of the coordinates V , whi ch wil l b e denoted by V ( c ) . Specifically , each constraint Γ c is identi fied with a subset of χ V ( c ) , and the const raint is satisfied by global χ -assignment x V if x V : V ( c ) ∈ Γ c . Then any C SP may be formulated via specifying V , C , χ , { V ( c ) : c ∈ C } and { Γ c : c ∈ C } , where the objective of th e CSP i s to find a χ -assign ment x V such that Y c ∈ C [ x V : V ( c ) ∈ Γ c ] = 1 , (1) or to conclude that no such assignm ent exists. Here t he no tation [ P ] , for any Boolean p roposition P , is the Iverson’ s con vention [8], namely , e valuating to 1 if P , and to 0 oth erwise. Now it is easy to verify that the factorization structure of (1) can b e represented by a factor graph [8]: in the factor graph, “variable vertices” are i ndexed by V , where th e “variable” indexed by v ∈ V represents an elem entary ass ignment x V : { v } on { v } , or simp ly x v ; “function vertices” are indexed by C , where the functio n ind ex ed by c ∈ C is [ x V : V ( c ) ∈ Γ c ] , which, with a s light overloading of no tation, will als o be d enoted by Γ c ( x V ( c ) ) ; there is an edge connectin g variable verte x x v with function vertex Γ c if and only if v ∈ V ( c ) . Inspired by i ts correspondence (to an edge) in t he factor graph, we will use ( v − c ) to denot e a coordinate-constrain t pair ( v , c ) where coordinate v is inv olved in const raint Γ c in the CSP . For notational symmetry , we denote the set { c : v ∈ V ( c ) } by C ( v ) , namely , C ( v ) index es the set of all constraints in volving coordi nate v , or the set of all function vertices connecting to variable vertex x v . W e will assume that | C ( v ) | ≥ 2 for all v ∈ V . It is clear that such an assumption i s w ithout l oss of generality , since if a variable x v is inv olved in only one constraint, one may always modify the constraint and remove the variable from th e p roblem. Similarly , we will assume t hat | V ( c ) | ≥ 2 for every c ∈ C . Th is is a lso wit hout loss o f generality sin ce i f a constraint Γ c only in volves a singl e variable x v , it i s alwa ys pos sible to “absorb” this constraint in other constraints in volving x v (noting that x v must ha ve ano ther cons traint since | C ( v ) | ≥ 2 | ). A. k -SA T The k -SA T problems are a classic family of CSPs, known to be NP-complete for k ≥ 3 [2]. An i nstance of k -SA T prob lems consist s of a s et of variables { x v : v ∈ V } , each of whi ch takes 9 on va lues from the set χ := { 0 , 1 } , and a set of cons traints { Γ c : c ∈ C } , each of which in v olves exactly k v ariables. F or each cons traint Γ c and every v ∈ V ( c ) , there is a v alue L v,c ∈ { 0 , 1 } which we will refer to as the pr eferr ed val ue on v in constraint Γ c . The k -SA T problem is then t o decide on an ass ignment x V such t hat for each constraint Γ c , at l east one of it s inv olved coordinate is assigned i ts preferred value in Γ c . T o map back to t he afore-menti oned set-theoretic formulation of constraint s, in a k -SA T probl em, for each c ∈ C , let l c denote the χ -assignm ent on V ( c ) i n wh ich ev ery coordinate v ∈ V ( c ) i s assigned the negated value ¯ L v,c of its preferred value L v,c in Γ c , namely t hat l c : { v } = ¯ L v,c for e very ( v − c ) , then cons traint Γ c is defined as Γ c := χ V ( c ) \ { l c } . The factor-graph representation of a toy 3-SA T problem is shown in Fig. 1. For k -SA T problems, it is con venient to treat each preferred value L v,c as the label for edge ( x v , Γ c ) on the factor graph, and use dashed edge to represent label 0 and solid edge to represent label 1. W e note th at it is cust omary in this paper that variable vertices in a factor graph are listed on the left side and function (constraint) vertices lis ted on the right si de. Γ a Γ b x 4 x 5 Γ c x 1 x 2 x 3 Fig. 1. A factor graph for 3 -SA T problem specified by formula ( x 1 ∨ x 2 ∨ x 4 ) ∧ ( x 1 ∨ x 3 ∨ x 5 ) ∧ ( x 2 ∨ x 4 ∨ x 5 ) . L ogic operation notations are used here to define the problem, where ∨ denotes logic OR , ∧ denotes logic AND , and t he horizontal bar on a v ariable denotes t he neg ation of the variable. T he function represented by the factor graph is [( x 1 , x 2 , x 4 ) ∈ Γ a ] · [( x 1 , x 3 , x 5 ) ∈ Γ b ] · [( x 2 , x 4 , x 5 ) ∈ Γ c ] , where Γ a = χ { 1 , 2 , 4 } \ { (0 1 , 1 2 , 1 4 ) } , Γ b = χ { 1 , 3 , 5 } \ { (0 1 , 0 3 , 1 5 ) } , and Γ c = χ { 2 , 4 , 5 } \ { (0 2 , 0 4 , 0 5 ) } . 10 B. Graph Coloring Graph coloring or q -COL problems are another family of NP-complete problems. Given an undirected graph (∆ , Ξ) with verte x set ∆ and edge set Ξ , t he ob jectiv e of th e q -COL problem on (∆ , Ξ) is to assign each vertex in ∆ a color from q different colors such that every pair o f adjacent vertices have differe nt colors. T o us e the above generic formulation of CSPs, we will denote the set of all q colors by set χ := { 1 , 2 , . . . , q } . W e wil l denote every undi rected edge in Ξ , say the edge connecting vertices u and v , by set { u, v } . Th e set V of all coordinates is then identified with s et ∆ , and th e set C indexing all constrain ts is identi fied with Ξ . Specifically note that e very c ∈ C is then identified with some { u, v } ∈ Ξ , and V ( c ) is identified with c , or the corresponding set { u, v } . S uppos e that c = { u, v } ∈ Ξ , then constraint Γ c is identified with χ { u,v } \ { (1 u , 1 v ) , (2 u , 2 v ) , . . . , ( q u , q v ) } . Fig. 2(b) shows t he factor -graph representation of a q -COL problem on t he un directed graph shown in Fig. 2(a). 2 4 3 1 (a) Γ { 1 , 3 } Γ { 2 , 3 } Γ { 3 , 4 } x 2 x 3 x 4 x 1 Γ { 1 , 2 } (b) Fig. 2. (a) An undirected graph. (b) The factor graph for a q -COL problem on graph (a). T he global function represented by the factor graph is [( x 1 , x 2 ) ∈ Γ { 1 , 2 } ] · [( x 1 , x 3 ) ∈ Γ { 1 , 3 } ] · [( x 2 , x 3 ) ∈ Γ { 2 , 3 } ] · [( x 3 , x 4 ) ∈ Γ { 3 , 4 } ] , where Γ { u,v } := χ { u,v } \ { (1 u , 1 v ) , (2 u , 2 v ) , . . . , ( q u , q v ) } . I V . S U RV E Y P RO P A G A T I O N A L G O R I T H M S A. Surve y Pr opagation fo r k -SA T Pr oblems Extensive study has been carried o ut to understand th e hardness of k -SA T p roblems (for k ≥ 3 ) and to develop effi cient solvers. A parameter α := | C | / | V | is obs erved to be critically 11 related to t he hardness of random k -SA T problems. There app ear t wo thresholds of α , denoted by α d and α c , ( α d < α c ), marking two “phase t ransitions” [1]. When α > α c , random k -SA T problems are unsatisfiable (i.e., h a ving no satisfying assignment) wit h high p robability; when α d < α < α c , the sati sfying assignments form exponentially many disjoint “clusters”, making the p roblem extremely di f ficult; when α < α d , t he satis fying assignments m er ge int o one hug e cluster and problems are easier . In the regime of α < α d , local search alg orithms, such as BP , may find a satisfying assi gnment. In the regime of α d < α < α c , local search algorithms usually fail. The discovery and first application of survey propagation (SP) are in s olving the k -SA T problems in t he hard regime, where m essages are passed on the above-defined factor graphs [1]. In SP , a “joker” symbol “ ∗ ” is introduced to variable alphabet χ of the k -SA T problem , where x v equal to the “joker” indi cates that it is free to take any va lue from its original alphabet, and that x v equals a no n-joker symbol indi cates that it is constrained to taking the designated value. Briefly , SP on k -SA T problems m ay be viewed as an iterative method for estim ating t he “biases” of each v ariable x v on 0 , 1 and ∗ respecti vely and a va riable t hat is highly biased on 0 or 1 can be fixed t o that value whereby s implifying the probl em. It is shown t hat in th e hard regime of random k -SA T p roblems, t he “joker” sym bol connects the disconnected clusters, m aking SP remain very effecti ve even for α very close to α c [15]. For k -SA T prob lems, the original version of SP [1] is generalized in [15] to what we call the weighted SP 2 or SP ( γ ) in this paper . SP ( γ ) is a family of algorithms parametrized by a real num ber γ ∈ [0 , 1] , where SP (1) is the original SP and for s ome ju dicious choice of γ ∈ (0 , 1) , SP ( γ ) may h a ve further im prove d performance. W e note that generalizin g SP to the family of weighted SP algorithms has on ly been reported for k -SA T probl ems t o date, and one o f the objectives of this paper is t o extend such a generalization to arbitrary CSPs. Similar to BP , in the SP algorithms, messages are passed be tween variable vertices and function vertices. For t he purpose of describing the SP message-update rule for k -SA T prob lems, we introduce the fol lowing notations. For any ( v − c ) , C u c ( v ) denotes the set { b ∈ C ( v ) \ { c } : L v,b 6 = L v,c } , and C s c ( v ) denotes the set { b ∈ C ( v ) \ { c } : L v,b = L v,c } . 2 In [15 ], weighted SP is referred to as generalized S P . In this paper , we would like to reserve the term “generalized S P” to refer to S P algorithms generalized for arbitrary CSPs beyond k -SA T problems. 12 Follo wing [15], the mes sage-update rul e of SP ( γ ) is described as follows. The message passed from variable vertex x v to function vertex Γ c — also referred as a left message — is a triplet of real numbers (Π u v → c , Π s v → c , Π ∗ v → c ), and the mess age passed from function vertex Γ c to variable verte x x v — also referred to as a ri ght message — is a real number η c → v ∈ [0 , 1] . These m essages are updated respectively according to the following equations. Π u v → c := 1 − γ Y b ∈ C u c ( v ) (1 − η b → v ) Y b ∈ C s c ( v ) (1 − η b → v ) (2) Π s v → c := 1 − Y b ∈ C s c ( v ) (1 − η b → v ) Y b ∈ C u c ( v ) (1 − η b → v ) (3) Π ∗ v → c := Y b ∈ C s c ( v ) (1 − η b → v ) Y b ∈ C u c ( v ) (1 − η b → v ) (4) η c → v := Y u ∈ V ( c ) \{ v } Π u u → c Π u u → c + Π s u → c + Π ∗ u → c . (5) The initializati on of SP mess ages is usually random , and message-passi ng schedule is typi cally similar to the flooding schedule [8] in BP message passing , namely , that each it eration may be defined by all v ariable vertices passing m essages follo wed by all function vertices passin g messages. W e no te that throug hout this paper all message-passing schedules are restricted to the flooding schedule for con venience, where each it eration is defined as first updati ng all “left messages” and then updating all “right m essages” 3 Similar to BP , at the end of an iteration, SP may compute a “summ ary message” at each var iable ve rtex. For any v ∈ V , define C 1 ( v ) := { b ∈ C ( v ) : L v,b = 1 } and C 0 ( v ) := { b ∈ C ( v ) : L v,b = 0 } , then the “summ ary message” at x v is a triplet ( ζ 1 v , ζ 0 v , ζ ∗ v ) of real numbers, computed by 3 An it eration may also include updating all summary messages after updating t he right messages; see the description of summary messages. 13 ζ 1 v := 1 − γ Y b ∈ C 1 ( v ) (1 − η b → v ) Y b ∈ C 0 ( v ) (1 − η b → v ) (6) ζ 0 v := 1 − γ Y b ∈ C 0 ( v ) (1 − η b → v ) Y b ∈ C 1 ( v ) (1 − η b → v ) (7) ζ ∗ v := γ Y b ∈ C 1 ( v ) (1 − η b → v ) Y b ∈ C 0 ( v ) (1 − η b → v ) (8) where summ ary message ( ζ 1 , ζ 0 , ζ ∗ ) is typically normalized to a scaled version ( ζ 1 norm , ζ 0 norm , ζ ∗ norm ) such that ζ 1 norm + ζ 0 norm + ζ ∗ norm = 1 . Equations (2) to (8) and the normalization procedure after completely specify t he message- update rule of SP ( γ ) . Usually , SP is appl ied in conjunction with a heuri stic “decimatio n” procedure, which i s carried out after SP con ver ges or after a certain number of SP iterations. In th e decimation procedure, the “polarity” B ( v ) := ζ 0 v norm − ζ 1 v norm at each v ∈ V is calculated, and the most polarized var iable (namely , one ha ving the highest | B ( v ) | ) is fixed to 0 or 1 according to the s ign of B ( v ) : x v is set to 0 if B ( v ) > 0 , and to 1 otherwise. The k -SA T problem is then simpl ified and SP is applied again. This process iterates until the reduced problem is simple enough for a local search algorithm. When γ = 1 , it is shown in [19] and [15] that the passed messages as in (2) th rough (5) can be interpreted probabilisti cally , namely , η c → v may be interpreted as the probabili ty that a “warning” symbol is sent from Γ c to x v , and Π u v → c , Π s v → c and Π ∗ v → c are respectively the prob abilities that x v sends to Γ c symbol ¯ L v,c , symbo l L v,c and symbo l ∗ . When γ < 1 , SP ( γ ) howe ver can no longer be interpreted probabili stically . W e now present a slightly m odified formulation of SP ( γ ) , referred to as SP ∗ ( γ ) , which is completely equivalent to SP ( γ ) defined in [15], and which will be shown in a later section to hav e a n atural probabilistic interpretation. In SP ∗ ( γ ) , the left message (Π u v → c , Π s v → c , Π ∗ v → c ) passed from v ariable ver tex x v to function verte x Γ c is modified to t he equations given in (9) to (11 ), and th e right message η c → v passed 14 from function vertex Γ c to variable vertex x v and t he summary mess age ( ζ 1 v , ζ 0 v , ζ ∗ v ) at variable x v stay unchanged. Π u v → c := 1 − γ Y b ∈ C u c ( v ) (1 − η b → v ) Y b ∈ C s c ( v ) (1 − η b → v ) (9) Π s v → c := 1 − γ Y b ∈ C s c ( v ) (1 − η b → v ) Y b ∈ C u c ( v ) (1 − η b → v ) (10) Π ∗ v → c := γ Y b ∈ C s c ( v ) (1 − η b → v ) Y b ∈ C u c ( v ) (1 − η b → v ) (11) The following lemma shows that SP ( γ ) and SP ∗ ( γ ) are equiv alent. Lemma 1: For the same in itialization of { η c → v : ∀ ( v − c ) } , at any giv en iteration, SP ∗ ( γ ) and SP ( γ ) give rise to identi cal results in η c → v for e very ( v − c ) , and in ( ζ 1 v , ζ 0 v , ζ ∗ v ) for every v ∈ V . Pr oof: The lemma follows from th at in the com putation of η c → v and hence of ( ζ 1 v , ζ 0 v , ζ ∗ v ) , Π s v → c and Π ∗ v → c alwa ys appear together i n the form of Π s v → c + Π ∗ v → c . But it is easy to see that in SP ( γ ) and i n SP ∗ ( γ ) , Π s v → c + Π ∗ v → c has the same p arametric form, both equ al to Q b ∈ C u c ( v ) (1 − η b → v ) . W e conclude this subsection by re marking that it is possible to verify that a ll results concerning SP ( γ ) in [15] hold for SP ∗ ( γ ) 4 . As such, in the rest of this paper , SP ∗ ( γ ) rather than SP ( γ ) will be taken as the weighted SP for k -SA T problems . B. Surve y Pr opagation fo r q -COL Pr oblems Similar to SP de veloped for k -SA T problems, in q -COL problems, SP passes messages between the variable vertices and the function (constraint) v ertices in the factor-graph representation of the problem. Some not able d if ferences howe ver exist. First, weighted SP has not been de veloped for q -COL problems to date, and it is not e ven clear whether s uch algorithm f amily , if e xist ing, can be dev eloped in a similar manner as t hat for k -SA T in [15], nam ely , via reducing the BP algo rithm derived from a properly defined MRF . Answering this question i n a later section, we here th erefore only re view the original version of 4 Specifically , we note that B P on the MRF formulated in [15] wil l also reduce to SP ∗ ( γ ) . W e leave this for the interested readers to verify . 15 SP applied to 3 -COL problems fol lowing t he formulation in [3], which i s analogo us t o SP (1) , or the non-weighted SP , in the context of k -SA T . Second, the SP messages for q -COL problem s can be expressed more compactly , du e to a specific nature of t he problem, on whi ch we now elaborate. For q -COL problems, each const raint vertex has degree 2 . This allows the comb ination of the message passed from variable x u to a neig hboring const raint, say Γ c , wi th t he mess age passed from constraint Γ c to the other neighbor , say x v , o f Γ c . As a consequence, Γ c may be s uppressed in the factor graph, and m essages are directly passed between variable vertices that are distance 2 apart 5 (or equivalently , mess ages are passed on graph (∆ , Ξ) ). Following [3], a compact version of SP mess age-passing rul e for 3 -COL problems is given as follows, wh ere the message passed from v ariable x u to v ariable x v is a quadruplet of real numbers ( η 1 u → v , η 2 u → v , η 3 u → v , η ∗ u → v ) . F or i = 1 , 2 , 3 , η i u → v := Q w ∈ N ( u ) \{ v } (1 − η i w → u ) − P j 6 = i Q w ∈ N ( u ) \{ v } ( η ∗ w → u + η j w → u ) + Q w ∈ N ( u ) \{ v } η ∗ w → u P j = 1 , 2 , 3 Q w ∈ N ( u ) \{ v } (1 − η j w → u ) − P j = 1 , 2 , 3 Q w ∈ N ( u ) \{ v } ( η ∗ w → u + η j w → u ) + Q w ∈ N ( u ) \{ v } η ∗ w → u (12) where N ( u ) is the s et { v : v ∈ V , { u, v } ∈ Ξ } , n amely , the set of neighbori ng vertices of vertex u on graph { ∆ , Ξ } ; and η ∗ u → v := 1 − X j = 1 , 2 , 3 η j u → v . (13) For 3 -COL problems, the “summ ary message” computed at each variable vertex x v is a quadruplet of real n umbers, denoted by ( ζ 1 v , ζ 2 v , ζ 3 v , ζ ∗ v ) , where for i = 1 , 2 , 3 , ζ i v := Q u ∈ N ( v ) (1 − η i u → v ) − P j 6 = i Q u ∈ N ( v ) ( η ∗ u → v + η j u → v ) + Q u ∈ N ( v ) η ∗ u → v P j = 1 , 2 , 3 Q u ∈ N ( v ) (1 − η j u → v ) − P j = 1 , 2 , 3 Q u ∈ N ( v ) ( η ∗ u → v + η j u → v ) + Q u ∈ N ( v ) η ∗ u → v and ζ ∗ v := 1 − X j = 1 , 2 , 3 ζ j v . Similar to that for k -SA T problems, the sum mary mess age for a 3 -COL probl em at variable x v may ind icate the “bias” of variable x v to each letter in { 1 , 2 , 3 , ∗} . In the decim ation procedure 5 Still implementing the flooding schedule, the SP message-upda te rule for 3 -COL problems howe v er suppresses the passing of one set of messages (say , for example, the right messages) by including the computation of t hese messages in updating the other set of messages. 16 for 3 -COL problems – carried out in a similar w ay to that for k -SA T problems, a v ariable is fixed to a color i ∈ { 1 , 2 , 3 } if it is high ly biased to that color . The reader is referred to [3] for a detailed account of a heuristi c decimat ion rule used in sol ving 3 -COL problems using SP . W e note that t his paper primaril y focuses on SP up date equations, where the decimation asp ect of SP i s l ar gely i gnored. V . S P A S P RO BA B I L I S T I C T O K E N P A S S I N G T o date, SP algo rithms have been applied to various other CSPs, for example, i n codi ng for Blackwell channels [4], in qu antization of Bernoulli sou rces [5], and in s olving graph coloring problems [3], etc.. Howe ver , a general formulati on of SP , particularly t hat of weighted SP , for solving arbitrary non-binary CSPs, has been largely mi ssing. Specifically , we not e the fol lowing milestones in the form ulation o f SP algorit hms. • The work of [19] presents non -weighted version of SP formulas for general CSPs beyond those in volving onl y binary v ariables. Howe ver , the e xpositi on of [19] uses the language of s tatistical physics, rather remot e to t he engineering community , and a cleaner and more friendly formulation of SP , and particularly of weigh ted S P , is desirable for general probl ems. • The work of [15] presents weighted SP for k -SA T p roblems, in which weighted SP is treated as a special case o f BP in a properly defined MRF . Th is treatment of SP and the corresponding principle for de veloping weighted SP are concei vably app licable to all binary CSPs. Howe ver , it has remain ed open, prio r to th is work, whether such an approach to understanding and developing weighted SP is applicable to arbitrary non-binary CSPs. The line of d e velopment in this section is summarized below . W e will first present an understanding of non-weighted SP for arbitrary CSPs (namely , t hat formulated in [19]) in terms of “probabilisti c token passing (PTP)”. Although simil ar under- standing has b een previously reported in various contexts, we here stress the role of extending the variable alphabet in SP alg orithms, and explicitly point out that the alphabet extension is not to sim ply i nclude an extra jo ker symbol, but to r eplace t he va riable alphabet with its power set (excluding the empty-set element). T o make the PTP procedure more in tuitively sensib le, prior to d efining PTP , we will i ntroduce a precursor o f PTP , which we call “determinis tic token passing” (DTP). 17 After introducing PTP , w e then sho w that the probabilisti c interpretation of non-weigh ted SP in terms of PTP makes it naturally generalizable to a weighted version, which we call weighted PTP . For a brief p re view , t he generalization of PTP to weighted PTP essentially inv olves generalizing a fu nctional dependency in PTP message-update rule to a pr obabilis tic dependency . Int erestingly as we will s how , it turns out that for k -SA T problems, weighted PTP precisely coincides wit h weighted SP of [15]. This shoul d con vincingly demonstrate that weig hted PTP is a generalization of weighted SP for arbitrary CSPs. The outline of this sec tion is gi ven as follows. Subsection V -A introduces the notion of alph abet extension and related concepts. Subs ection V -B defines DTP as a precursor of PTP . In Subsection V -C, we int roduce PTP . In Subs ection V -D, we show t hat PTP is equiv alent to SP , using 3 -COL problem as an example. In Subsection V -E, we introduce weighted PTP . In Subsection V -F, we show that weighted PTP generalize weighted SP using k -SA T problems as an example. A. Alphabet Extension For a given CSP with variable alphabet χ , we define the extended alphabet χ ∗ as t he power set of χ excluding t he empty s et ∅ . That is, χ ∗ = { t : t ⊆ χ, t 6 = ∅} ). The e xtended alphabet χ ∗ of k -SA T p roblems is then the set {{ 0 } , { 1 } , { 0 , 1 }} . For 3 -COL problems, χ ∗ is the set {{ 1 } , { 2 } , { 3 } , { 1 , 2 } , { 1 , 3 } , { 2 , 3 } , { 1 , 2 , 3 }} . Each element t of χ ∗ will be written as a string – in bold font – contain ing the elem ents of t . For example, we may write { 1 , 2 } as 12 , { 1 , 2 , 3 } as 123 and { 1 } simp ly as 1 . Giv en any subset U ⊆ V , a χ ∗ -assignment y U on U i s referred to as a r ectangle on U . The set of all rectangles on U is denot ed by ( χ ∗ ) U . Given rectangle y U ∈ ( χ ∗ ) U , for e very v ∈ U , y U : { v } , or simply writ ten as y v — follo wing an ea rlier conv ention of t his paper — is referred to as t he v - side of y U . Apparently , rectangle y U has | U | sides, and may also be w ritten as th e concatenation of all its sides, namel y , as h y v i v ∈ U . For any v ∈ V , an elementary χ ∗ -assignment t v ∈ ( χ ∗ ) { v } will be referre d to as a token on v . Using t his n omenclature, t he v -side of any re ctangle is a token o n v . W e note that a token t v may be interpreted as a set of element ary χ -assignments on { v } , which is in fac t t he set of all elementary χ -assignment s on { v } that assign v a value in set t v ( v ) ⊆ χ . For example, suppose t hat χ := { 1 , 2 , 3 } , t hen to ken 12 v may be identified with the set { 1 v , 2 v } of element ary χ -assignments on { v } . 18 It is worth noting that when a token t v is identi fied with a s et of elementary χ -assignm ents on v , a rectangle h t v i v ∈ U may be identified with the Cartesian pr oduct of all its si des. For example, rectangl e ( 12 v , 23 u ) may be interpreted as the follo wing set of χ -assignm ents on { v , u } : { (1 v , 2 u ) , (1 v , 3 u ) , (2 v , 2 u ) , (2 v , 3 u ) } . Under this interpretation, we will also make frequent uses of the Cartesian product notation , writing rectangle ( 12 v , 23 u ) as 12 v × 23 u , and rectangle h t v i v ∈ U as Q v ∈ U t v . W e n ote th at this interpretation is in fac t the reason for which we choose the termino logies “rectangle” and “side”. For simplicit y , from here on, we shall reserve the term “assig nment” to referring to a χ - assignment only , and a χ ∗ -assignment will be referred to as a “rectangle”, “side” or “token”. W e say that an assignment x U on U is contained in rectangle y U if x U : { v } ( v ) ∈ y U : { v } ( v ) for ev ery v ∈ U . For example, assignment (1 v , 2 u ) is contained in rectangle ( 13 v , 23 u ) W e wi ll use x U ∈ y U to denote this containedness relatio nship, since this notation is precise when t he rectangle y U is int erpreted as a set of assignments on U . Giv en a CSP and a ( v − c ) pair , we define fun ction F v c : ( χ ∗ ) V ( c ) \{ v } → ( χ ∗ ) { v } as follo ws: for ev ery rectangle Q u ∈ V ( c ) \{ v } t u on V ( c ) \ { v } , F v c Y u ∈ V ( c ) \{ v } t u := χ { v } × Y u ∈ V ( c ) \{ v } t u ∩ Γ c : { v } . W e often wri te F v c in short as F c since the domain and co-domain of the function may be recov ered from the form of its argument. Given rectangle Q u ∈ V ( c ) \{ v } t u on V ( c ) \ { v } , we call F c Q u ∈ V ( c ) \{ v } t u the f or ced token by rectangle Q u ∈ V ( c ) \{ v } t u via constraint Γ c . It is easy to verify that th e forced token F c Q u ∈ V ( c ) \{ v } t u ! is si mply the set of all (elementary) assignm ents on { v } whi ch, when concatenated wit h an assi gnment on V ( c ) \ { v } contained in rectangle Q u ∈ V ( c ) \{ v } t u , make lo cal constraint Γ c satisfied. W e no w gi ve som e examples using the toy 3 - SA T probl em shown in Fig. 1 to illu strate this definition. C onsid er constraint Γ a , if rectangle t { 1 , 2 } on { 1 , 2 } is defined as ( 1 1 , 01 2 ) , t hen forced token F a ( t { 1 , 2 } ) = 01 4 , si nce when assigning var iable x 4 either va lue 0 or 1 , it is poss ible to find an assignment of v ariables x 1 and x 2 in rectangle t { 1 , 2 } that makes Γ a satisfied; on the oth er hand , if t { 1 , 2 } = ( 0 1 , 1 2 ) , then forced token F a ( t { 1 , 2 } ) = 0 4 , since rectangle t { 1 , 2 } contains a si ngle assig nment of x 1 and x 2 (namely (0 1 , 1 2 ) ), and t he only ass ignment of x 4 that will make cons traint Γ a satisfied is the one assigning 0 to x 4 , namely 0 4 . 19 A “monot onicity p roperty” of functio n F c , stated in the fol lowing lemma, follows immediately from the definition o f the function. Lemma 2: Suppose that x v and Γ c are a pair of neighbori ng variable and cons traint vertices in the factor g raph, and that y V ( c ) \{ v } and y ′ V ( c ) \{ v } are two rectangles on V ( c ) \ { v } . Then y V ( c ) \{ v } ⊂ y ′ V ( c ) \{ v } impliese that F c y V ( c ) \{ v } ⊆ F c y ′ V ( c ) \{ v } . B. Deterministic T oken P assing (DTP) As we will introduce — for arbitrary CS Ps — a probabilistic interpretation of non-weighted SP (namely , PTP) and generalize it to a weig hted version (namely , weighted PTP), in thi s subsection, we first introduce an algorithm ic procedure, which we call deterministi c t oken pas sing or DTP . W e note th at the purpose of introduci ng DTP is to provide an easier access to PTP , a procedure to be introdu ced in the next subsection. In DTP , m essages are t okens passed along the edges of th e factor graph representin g the CSP of interest. Specifically , the token p assed from and to each variable x v is a token on v , or equiv alently , a set of (elementary) assignments o n { v } . For any pair of neigh boring vertices x v and Γ c on the factor graph, th e token, or left m essage, t v → c passed from variable x v to const raint Γ c depends on all incoming tokens (right messages) pass ed to x v except th at passed from Γ c . Similarly , th e token, or right message, t c → v passed from constraint Γ c to variable x v depends on all incoming tokens (l eft messages) passed to Γ c except that passed from x v . Each iteration of token passing in DT P is d efined by every variable passing a t oken on each of its edges fol lowed by every const raint pass ing a token on each of it s edges. W ithin any it eration, th e token-passing rule of DTP is given as follows. t v → c := \ b ∈ C ( v ) \{ c } t b → v (14) t c → v := F c Y u ∈ V ( c ) \{ v } t u → c . (15) That i s, t he token passed from a variable is the intersection of its incoming tokens from the upstream, whereas t he token passed from a constraint is the forced token via the constraint by the rectangle formed by the upstream incom ing tokens as sides. It is intui tiv e to illum inate this message-passing rule usin g the following analogy . W e may view t he token sent from a v ariable as the “intention” of the variable, indicating the possible 20 values t hat the va riable intend s to take. On the other hand, we may view the t oken sent from a constraint as the “command” from the constraint, indicating the possible values that the constraint allows the desti nation variable to take. If a is an in tention and b is a comm and, where both are tokens on the same coordinate, then the relation ship a ⊆ b may be viewed as that “intention a obeys command b ” . Under th is perspective, the token sent from a v ariable is the “maxim al” intention of the variable that o beys all incoming comm ands from the upstream c onst raints; on the other hand, the token sent from a con straint is the “m aximal” command th at is “compati ble” with all i ncoming i ntentions from the upstream variables. Here “maximalit y” is i n t he sense of maximizing th e cardinality o f the subset o f assignments , and “compatibil ity” is in the sense of satisfying the local const raint. Examples of token passing for a 3-COL probl em are illustrated in Fig. 3. 23 v 1 u x v x u Γ c (a) 12 v 23 v 2 v x v Γ c Γ b Γ a (b) Fig. 3. Examples of deterministic t oke n passing for a 3-COL problem. ( a) T oken t c → v passed fr om constraint Γ c to variable x v . (b) T oken t v → c passed from variable x v to constraint Γ c . A summary message or “summary token” at variable vertex x v may be comput ed, according to the rule i n (16) for each v ∈ V at any it eration after t he all const raint vertices hav e passed tokens. t v := \ b ∈ C ( v ) t b → v . (16) Using the “int ention/command” analogy , the summary token at a v ariable is th e “maximal” intention of the variable that obeys the incom ing comm ands from all directions. Some caution is needed on th e well-definedness of the updating rule of passed tokens and summary tokens. That is , in (14), (15) and (16 ) the right-hand side can be equal t o the empt y set ∅ , which i s not a wel l-defined t oken. Whenev er in an iteration a not-well-defined token (i.e., 21 the empty set) arises from the updating rule, we may force DTP to t erminate. — As we will s ee later in the “random” version of DTP (i.e., PTP and weighted PTP), we will eventually conditi on on the case in which these events do no t h appen. At any iteration, one may read ou t the summary tokens at all v ariable vertices and form a rectangle on V using these tokens as its sides. It i s clear that at any g iv en iteration, the result ing rectangle formed by t he summary tokens depends on the initiali zation of DTP . Although ou r prim ary purpose of introdu cing DTP is to make smoo ther the transiti on to understanding PTP , in Appendix A, we present som e elementary resu lts concerning the dynamics of DTP . W e note that those resul ts will also be used to deriv e so me ins ights on the dyn amics of PTP — an algorithm ic procedure that we introd uce next as a simple formulation of SP . C. Pr obabi listic T ok en P assing (PTP) W e now introduce the “probabilistic token pass ing” (or PTP) procedure. The key distincti on between PTP and DTP is th at on each edge and along each direction , PTP passes a random token and the mess ages bein g u pdated in PTP are the distributions of the random to kens. Specifically , PTP m essage-update rule can be cons tructed by considering the foll o wing mech- anism of passi ng random tokens. 1) On each edge connecting var iable x v and constraint Γ c in t he factor graph, the token t v → c passed to con straint Γ c and the token t c → v passed to variable x v are bot h random variabl es , distributed over ( χ ∗ ) { v } . 2) For any given vertex in the factor graph, all of its in coming random tokens are assumed to be i ndependent. 3) For any giv en vertex i n the factor graph, th e outgoing random token sent along any edge is a function of all the incom ing random to kens from the upstream, where t he functional dependency is precisely t hat specified in DTP , namely , (14) o r (15), depending on whether the vertex is a variable vertex or a functio n (cons traint) verte x. 4) The summary (random) token t v at each variable vertex x v is a functi on o f all incoming random tokens, where the funct ional dependency is prec isely that specified in DTP , n amely , (16). Building on this mechanism, we will then define each PTP (passed or summ ary) message as the distribution of the corresponding random token conditioned on that the token i s well defined 22 (namely , no t equal t o the emp ty set). W e note that such a “conditioning” merely in volves a normalization (namely , scaling) of each message so th at it sums to 1 over all valid tokens. W e will use λ v → c to denote the message sent from x v to Γ c — al so referred to as a left message, ρ c → v to denote the message sent from Γ c to x v — also referred to as a right message, and µ v to denote the summary message at variable vertex x v . It is then straight-forward to derive the message-update rule of PTP as foll o ws, where the superscript “ norm ” on a message indicates that the mess age has been normalized. PTP Message-Update Rule λ v → c ( t v → c ) := X h t b → v i b ∈ C ( v ) \{ c } t v → c = \ b ∈ C ( v ) \{ c } t b → v Y b ∈ C ( v ) \{ c } ρ norm b → v ( t b → v ) (17) ρ c → v ( t c → v ) := X h t u → c i u ∈ V ( c ) \{ v } t c → v = F c Y u ∈ V ( c ) \{ v } t u → c Y u ∈ V ( c ) \{ v } λ norm u → c ( t u → c ) (18 ) µ v ( t v ) := X h t c → v i c ∈ C ( v ) t v = \ c ∈ C ( v ) t c → v Y c ∈ C ( v ) ρ norm c → v ( t c → v ) , (19) and the normalized messages are defined as λ norm v → c ( t v → c ) := λ v → c ( t v → c ) / X t ∈ ( χ ∗ ) { v } λ v → c ( t ) (20) ρ norm c → v ( t c → v ) := ρ c → v ( t c → v ) / X t ∈ ( χ ∗ ) { v } ρ c → v ( t ) (21) µ norm v ( t v ) := µ v ( t v ) / X t ∈ ( χ ∗ ) { v } µ v ( t ) . (22) W e note that th e update of messages in each PTP iteration is proceeded by first comp uting the un-normali zed messages and t hen computi ng their normali zed version. 23 D. S P a s PTP W e now show that SP i s precisely PTP using the example of 3 -COL problems. Here we not e that it is po ssible (and entails littl e additional difficulty) t o show the equiv alence between PTP and th e general formulat ion of non-weighted SP [19] for arbitrary CSPs. Howe ver , as we feel it unnecessary to di stract t he readers with th e additional statistical physi cs terminol ogies presented in [19], we choose not t o repeat the exposition of SP in [19] and only show that SP is PTP for the special case of 3 -COL problems. In the factor graph representing a 3 -COL problem, noting that each constraint vertex has degree 2, we wi ll make a slight ab use o f notation: for any ( v − c ) pair , we wi ll use V ( c ) \ { v } to also denote t he index of the unique other variable vertex (besides x v ) connecting t o Γ c , although V ( c ) \ { v } origi nally refers to the singleton s et contai ning that index. Whether V ( c ) \ { v } should be treated as t he index of a variable or as the singleton set cont aining the index should be clear from the context. For notational simpl icity , from here on, for ever y element in the toke n set ( χ ∗ ) { v } , when no ambiguity is resul ted, we will suppress the subscript indicating the coordinate of the element. For example, we will wri te 12 v as 12 , when the subs cript can be recovered from the con text. Additionall y , we will use i, j, and k to denote the three dis tinct colors 1 , 2 , and 3 in the 3 -COL problem, so that token i can refer to any token th at is a singleton set, token ij can refer to any token th at contains a pair o f assi gnments, and t oken ijk refers to the token containing all three assignments. Using these notations , the PTP message-update rule for 3 -COL probl ems can be easily derive d, which is presented in the following lemm a. Lemma 3: For 3 -COL probl ems, the PTP mess age-update rule is: 24 λ v → c ( i ) := Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( ij ) + ρ norm b → v ( ik ) + ρ norm b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( ij ) + ρ norm b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( ik ) + ρ norm b → v ( ijk )) + Y b ∈ C ( v ) \{ c } ρ norm b → v ( ijk ) (23) λ v → c ( ij ) := Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( ij ) + ρ norm b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ρ norm b → v ( ijk ) (24) λ v → c ( ijk ) := Y b ∈ C ( v ) \{ c } ρ norm b → v ( ijk ) (25) ρ c → v ( ij ) := λ norm V ( c ) \{ v }→ c ( k ) (26) ρ c → v ( ijk ) := λ norm V ( c ) \{ v }→ c ( ij ) + λ norm V ( c ) \{ v }→ c ( ik ) + λ norm V ( c ) \{ v }→ c ( jk ) + λ norm V ( c ) \{ v }→ c ( ijk ) (27) µ v ( i ) := Y c ∈ C ( v ) ( ρ norm c → v ( ij ) + ρ norm c → v ( ik ) + ρ norm c → v ( ijk )) − Y c ∈ C ( v ) ( ρ norm c → v ( ij ) + ρ norm c → v ( ijk )) − Y c ∈ C ( v ) ( ρ norm c → v ( ik ) + ρ norm c → v ( ijk )) + Y c ∈ C ( v ) ρ norm c → v ( ijk ) (28) µ v ( ij ) := Y c ∈ C ( v ) ( ρ norm c → v ( ij ) + ρ norm c → v ( ijk )) − Y c ∈ C ( v ) ρ norm c → v ( ijk ) (29) µ v ( ijk ) := Y c ∈ C ( v ) ρ norm c → v ( ijk ) . (30) It is then possible to relate the PTP messages and the (non-weighted) SP messages for 3 -COL problems, and show th eir equivalence . Theor em 1: For 3 -COL problems, the correspondence between SP and PTP message-update rules is η i u → v ↔ λ norm u →{ u,v } ( i ) η ∗ u → v ↔ 1 − X i = 1 , 2 , 3 λ norm u →{ u,v } ( i ) = λ norm u →{ u,v } ( ij ) + λ norm u →{ u,v } ( ik ) + λ norm u →{ u,v } ( jk ) + λ norm u →{ u,v } ( ijk ) η i u ↔ µ norm u ( i ) η ∗ u ↔ 1 − X i = 1 , 2 , 3 µ norm u ( i ) (31) Pr oof: First we wi ll i dentify c in the subscript of λ norm u → c with { u, v } in which v indexes the destination vertex in the s ubscript of η u → v . 25 For any c = { u, v } , let α u,v = λ norm u → c ( ij ) + λ norm u → c ( ik ) + λ norm u → c ( jk ) + λ norm u → c ( ijk ) . When applying PTP update equation s (26) and (27) to equations (23 ) to (25) and re-writing the update rule in terms of left messages only , the un-normali zed left messages are updated as follows. λ u → c ( i ) = Y b ∈ C ( u ) \{ c } 1 − λ norm V ( b ) \{ u }→ b ( i ) − Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( j ) + α V ( b ) \{ u } ,u (32) − Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( k ) + α V ( b ) \{ u } ,u + Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u λ u → c ( ij ) = Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( k ) + α V ( b ) \{ u } ,u − Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u (33) λ u → c ( ijk ) = Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u . (34) After normalization , we have λ norm u → c ( i ) = 1 β · Y b ∈ C ( u ) \{ c } 1 − λ norm V ( b ) \{ u }→ b ( i ) − Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( j ) + α V ( b ) \{ u } ,u − Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( k ) + α V ( b ) \{ u } ,u + Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u (35) λ norm u → c ( ij ) = 1 β · Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( k ) + α V ( b ) \{ u } ,u − Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u (36) λ norm u → c ( ijk ) = 1 β · Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u , (37) where β := P t ∈ ( χ ∗ ) { u } λ u → c ( t ) . It is easy t o s ee that β = X i = 1 , 2 , 3 Y b ∈ C ( u ) \{ c } 1 − λ norm V ( b ) \{ u }→ b ( i ) − X i = 1 , 2 , 3 Y b ∈ C ( u ) \{ c } λ norm V ( b ) \{ u }→ b ( i ) + α V ( b ) \{ u } ,u + Y b ∈ C ( u ) \{ c } α V ( b ) \{ u } ,u . For any c = { u, v } , it is clear that when id entifying λ norm u → c ( i ) wi th η i u → v and id entifying α { u,v } = 1 − P i = 1 , 2 , 3 λ norm u →{ u,v } ( i ) with η ∗ u → v , t he up date rule for passed mess age ( η 1 u → v , η 2 u → v , η 3 u → v , η ∗ u → v ) in SP is result ed. T o pro ve the equiv alence of PTP and SP summary messages, we can follo w the same procedure as we d id for proving the equiv alence of PTP left m essages and SP left m essages. When apply ing 26 message upd ate equations (26) and (27) to equations (28) to (30) and re-write sum mary messages in terms of left messages, the PTP summary messages are updated as foll ows. µ u ( i ) = Y c ∈ C ( u ) 1 − λ norm V ( c ) \{ u } → c ( i ) − Y c ∈ C ( u ) λ norm V ( c ) \{ u } → c ( j ) + α V ( c ) \{ u } ,u (38) − Y c ∈ C ( u ) λ norm V ( c ) \{ u } → c ( k ) + α V ( c ) \{ u } ,u + Y c ∈ C ( u ) α V ( c ) \{ u } ,u µ u ( ij ) = Y c ∈ C ( u ) λ norm V ( c ) \{ u } → c ( k ) + α V ( c ) \{ u } ,u − Y c ∈ C ( u ) α V ( c ) \{ u } ,u (39) µ u ( ijk ) = Y c ∈ C ( u ) α V ( c ) \{ u } ,u . (40) After normalization , we have µ norm u ( i ) = 1 β ′ · Y c ∈ C ( u ) 1 − λ norm V ( c ) \{ u } → c ( i ) − Y c ∈ C ( u ) λ norm V ( c ) \{ u } → c ( j ) + α V ( c ) \{ u } ,u − Y c ∈ C ( u ) λ norm V ( c ) \{ u } → c ( k ) + α V ( c ) \{ u } ,u + Y c ∈ C ( u ) α V ( c ) \{ u } ,u (41) µ norm u ( ij ) = 1 β ′ · Y c ∈ C ( u ) λ norm V ( c ) \{ u }→ c ( k ) + α V ( c ) \{ u } ,u − Y c ∈ C ( u ) α V ( c ) \{ u } ,u (42) µ norm u ( ijk ) = 1 β ′ · Y c ∈ C ( u ) α V ( c ) \{ u } ,u , (43) where β ′ := P t ∈ ( χ ∗ ) { u } µ u ( t ) . It is easy t o s how that β ′ = X i = 1 , 2 , 3 Y c ∈ C ( u ) 1 − λ norm V ( c ) \{ u } → c ( i ) − X i = 1 , 2 , 3 Y c ∈ C ( u ) λ norm V ( c ) \{ u }→ c ( i ) + α V ( c ) \{ u } ,u + Y c ∈ C ( u ) α V ( c ) \{ u } ,u . For any u ∈ V , it is clear that when identifyi ng µ norm u ( i ) wi th η i u and identifyi ng 1 − P i = 1 , 2 , 3 µ norm u ( i ) with η ∗ u , the update rule for su mmary message ( η 1 u , η 2 u , η 3 u , η ∗ u ) in SP is resulted. This theorem suggests that for 3 -COL problems, SP is PTP . Similar results can be shown for k -SA T problems — i nstead of showing this result, we wil l in a later section, show a mo re general result, n amely that weighted SP is weighted PTP for k -SA T problem s. It should be 27 con vincing then that th e g eneral principl e of designing SP algorithm for arbitrary C SPs is the recipe specified in the PTP mess age-update rule. In the correspondence between SP and PTP for 3 -COL problems establi shed in this t heorem, it is worth noting that s ymbol i in the SP messages corresponds to t he singleton token i that contains the single element i , and sym bol ∗ in the SP messages correspon ds to th e group of all non-singleton tokens. W e note that the fact that all non-singl eton tokens can b e represented by a single symbol ∗ is rather a coincidence, int rinsically related to the structure of 3 -COL problems, and should not be un derstood as a general principle. Specifically , for 3 -COL problems, each constraint verte x has degree 2 , and as long as a non-singleton token is passed to a constraint verte x, the outgoing t oken from the constraint vertex will be token 123 . It is precisely due to this f act that all non-singleton tokens ca n be represented by the same symbol — the joker symbol ∗ , as is con ventionally termed. This observation t hen im plies that for general CSPs with non-binary alphabet, SP , or equiva lently PTP , may be expected to contain more than one “j oker” symbols, each corresponding to one or sev eral non -singleton tokens. In other words, this sugg ests that the notion of “joker” symbo l in SP messages is not a fu ndamental one, and that the rather fundamental perspective of SP is the extension of the v ariable alphabet to its power set wit h empty set excluded — or equiv alently via a one-to-one correspondence, the set of all tokens associated with the variable. Finally , we remark that th ere can be a cav eat o n whether SP and PTP are exactly equ i valent, when taking into account the decimation procedure associated wit h t he SP a lgorit hms. Specif- ically , we note that decimation is performed based on summary messages in SP . For 3 -COL problems, each SP summary m essage contains “biases” on four di f ferent symb ols, but each PTP summary message contai ns “biases” on sev en different tokens. The natural decimation procedure for PTP is then t o fix one “hi ghly biased” v ariable to one of t he se ven tokens, rather than to one of the four s ymbols. Although it is not clear at this p oint wh ether t his finer procedure may provide g ains in algorit hm performance, i t ne vertheless s uggests t hat PTP is sli ghtly more general than SP . In vestigation on possible benefit of thi s sligh t g enerality can be an interesting direction of research. 28 E. W eight ed PTP In the mechanism of passing random tokens that underlies the PTP message passin g rule, th e outgoing t oken sent from a variable vertex is a function of all incomin g tokens from its upstream. A natural ang le to generalize the dependency of these ou tgoing tokens on th e incoming tokens is to generalize thi s fun ctional dependency to a pr obabili stic dependency . Specifically , us ing the “intention-comm and” analogy , this probabili stic dependency will a llow the i ntention of a v ariable, conditioned on all incoming comm ands from the u pstream, to take any s et of t he values — not necessarily the maximal set — that obeys by the command s, and this probabi listic dependency is specified via th e probabil ity of each al lowed intention. This result in what we call weighted PTP . In weighted PTP , we assum e that the token t v → c passed from variable verte x x v to const raint verte x Γ c may be any sub set of th e intersection of all incomi ng tokens passed to x v except that passed from Γ c , and the probabi lity that token t v → c equals to each subset is specified via a non-negative function ω v ( a | b ) defined on ( χ ∗ ) { v } × ( χ ∗ ) { v } ∪ {∅ v } for each v ∈ V . W e will restrict ω v ( a | b ) t o an obedience conditi onal on ( χ ∗ ) { v } , the definition of which is given as follows. Definition 1 (Obedience Condi tional): A non-negati ve function h ( a | b ) on ( χ ∗ ) { v } × ( χ ∗ ) { v } ∪ {∅ v } is said to be an obedience conditional on ( χ ∗ ) { v } if h ( a |∅ v ) = 0 for all a ∈ ( χ ∗ ) { v } and h ( a | b ) = 0 for any a, b ∈ ( χ ∗ ) { v } with a 6⊆ b . First we note th at in the definiti on, variable a in h ( · ) is in tended to refer to an “intent ion”, var iable b is intended to refer to a “command”, and function h is e valuated to zero if the command is null or if the int ention does not obey the comm and. Th is i s th e reason for which we name such a functio n an “obedience” conditional. Second, it is also worth noting that an obedience conditional h as defined above is not a t rue conditio nal distribution, si nce it is not the case that P a h ( a | b ) = 1 for all b . Howe ver , it is a minor t echnicality to modify the defi niti on of h (without impactin g t he development of any result in thi s paper) so that it is indeed a cond itional 29 distribution 6 . Thus for the purpose of this paper , one may always re gard an obedience conditional as a condit ional dis tribution of an intent ion given a com mand. Apparently , fun ction [ a = b ] is a special case of ob edience condi tional, characterizing a special functional dependency of intention a on comm and b , n amely t hat the intenti on set a is exactly the command set b . W e now give the p recise message-updat e rule of weighted PTP where the only di f ference wit h PTP is in left message and su mmary message. W eighted PTP Message-Update Rule λ v → c ( t v → c ) := X h t b → v i b ∈ C ( v ) \{ c } ω v t v → c \ b ∈ C ( v ) \{ c } t b → v Y b ∈ C ( v ) \{ c } ρ norm b → v ( t b → v ) (44) ρ c → v ( t c → v ) := X h t u → c i u ∈ V ( c ) \{ v } t c → v = F c Y u ∈ V ( c ) \{ v } t u → c Y u ∈ V ( c ) \{ v } λ norm u → c ( t u → c ) (45 ) µ v ( t v ) := X h t c → v i c ∈ C ( v ) ω v t v \ c ∈ C ( v ) t c → v Y c ∈ C ( v ) ρ norm c → v ( t c → v ) , (46) and the normalized messages are defined as λ norm v → c ( t v → c ) := λ v → c ( t v → c ) / X t ∈ ( χ ∗ ) { v } λ v → c ( t ) (47) ρ norm c → v ( t c → v ) := ρ c → v ( t c → v ) / X t ∈ ( χ ∗ ) { v } ρ c → v ( t ) (48) µ norm v ( t v ) := µ v ( t v ) / X t ∈ ( χ ∗ ) { v } µ v ( t ) . (49) 6 Giv en an obedience conditional h , we may define a conditional distribution ˜ h ( a | b ) . Let Z be max b ∈ ( χ ∗ ) { v } P a ∈ ( χ ∗ ) { v } h ( a | b ) . Let non-negati ve function ˜ h ( a | b ) on “ ( χ ∗ ) { v } ∪ {∅ v } ” × “ ( χ ∗ ) { v } ∪ {∅ v } ” be defined as follows: ˜ h ( a |∅ v ) := [ a = ∅ v ] ; ˜ h ( ∅ v | b ) := 1 − P a ∈ ( χ ∗ ) { v } h ( a | b ) / Z for all b 6 = ∅ v ; and for all other ( a, b ) , ˜ h ( a | b ) := h ( a | b ) / Z . It is easy to see that ˜ h ( a | b ) is a conditional distribution. Since ev entually we wil l condition on that a 6 = ∅ , it is straight-forward to verify that the role of h is equi valen t to ˜ h . 30 It is easily seen that weighted PTP is a family of algorithms, p arametrized by a coll ection of obedience conditionals , { ω v : v ∈ V } , each for a coordinate. The fact that conditi onal distribution ω v ( a | b ) generalizes indicator fun ction [ a = b ] im mediately impl ies that weighted PTP g eneralizes PTP , as stated in the following lemma. Lemma 4: If ω v ( a | b ) := [ a = b ] for all v ∈ V , th en weighted PTP is PTP . F . W eighted PTP Generalizes W eighted SP Now we will show t hat the weighted SP de veloped for k -SA T problems [15 ] is a special case of weighted PTP . That is, for k -SA T probl ems, when setting functions { ω v : v ∈ V } i n weig hted PTP to a p articular form, weighted SP , or SP ∗ ( γ ) i s resul ted. For a k -SA T problem, let functio n ω v ( a | b ) for ever y v ∈ V in weig hted PTP b e defined via a single real nu mber γ ∈ [0 , 1 ] as follows. ω v ( a | b ) := γ , if a = b = 01 1 − γ , if a ⊂ b = 01 1 , if a = b 6 = 01 0 , otherwise (50) Lemma 5: Let { ω v : v ∈ V } in k -SA T be defined as in (50). The m essage-update rul e of 31 weighted PTP is t hen: λ v → c ( 0 ) := Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 0 ) + ρ norm b → v ( 01 )) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) (51) λ v → c ( 1 ) := Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 1 ) + ρ norm b → v ( 01 )) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) (52) λ v → c ( 01 ) := γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) (53) ρ c → v ( 0 ) := [ L v,c = 0] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) (54) ρ c → v ( 1 ) := [ L v,c = 1] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) (55) ρ c → v ( 01 ) := 1 − Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) (56) µ v ( 0 ) := Y c ∈ C ( v ) ( ρ norm c → v ( 0 ) + ρ norm c → v ( 01 )) − γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) (57) µ v ( 1 ) := Y c ∈ C ( v ) ( ρ norm c → v ( 1 ) + ρ norm c → v ( 01 )) − γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) (58) µ v ( 01 ) := γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) . (59) Pr oof: These update equati ons can be imm ediately obtained from weighted PTP message update equations (44) t o (46 ), where (56) foll o ws from ρ c → v ( 01 ) = Y u ∈ V ( c ) \{ v } ( λ norm u → c ( 0 ) + λ norm u → c ( 1 ) + λ norm u → c ( 01 )) − Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) = 1 − Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) . Theor em 2: Let { ω v : v ∈ V } in a k -SA T problem be defined as i n (50). Denote by (Π s norm v → c , Π u norm v → c , Π ∗ norm v → c ) t he normalized version of SP m essage (Π s v → c , Π u v → c , Π ∗ v → c ) , namely that Π s norm v → c = Π s v → c / (Π s v → c + Π u v → c + Π ∗ v → c ) , Π u norm v → c = Π u v → c / (Π s v → c + Π u v → c + Π ∗ v → c ) , and Π ∗ norm v → c = Π s v → c / (Π s v → c + Π u v → c + Π ∗ v → c ) . Then the correspondence between SP ∗ ( γ ) message- 32 update rule and wei ghted PTP message-update rule is Π s norm v → c ↔ [ L v,c = 0] · λ norm v → c ( 0 ) + [ L v,c = 1] · λ norm v → c ( 1 ) (60) Π u norm v → c ↔ [ L v,c = 0] · λ norm v → c ( 1 ) + [ L v,c = 1] · λ norm v → c ( 0 ) (61) Π ∗ norm v → c ↔ λ norm v → c ( 01 ) (62) η c → v ↔ ρ norm c → v ( 0 ) + ρ norm c → v ( 1 ) (63) ζ 0 v ↔ µ v ( 0 ) (64) ζ 1 v ↔ µ v ( 1 ) (65) ζ ∗ v ↔ µ v ( 01 ) . (66) Prior to proving the theorem, we will introduce some not ations and a simple lemma w hich will be u seful in the proof. For any neig hboring variable vertex x v and constraint ve rtex Γ c , we will denote by L v , c the singleton token containin g the singl e elementary ass ignment that assigns coordinate v the edge label L v,c . Similarly , we will denote by ¯ L v , c the singl eton token containing t he sin gle element ary assign ment that assigns coordinate v t he ne gated edge label ¯ L v,c . W ith these no tations, the following lemma immediately follows from Lemma 5. Lemma 6: For any ( v − c ) pair in a k -SA T problem, t he right message ρ norm c → v satisfies: ρ norm c → v ( L v , c ) + ρ norm c → v ( 01 ) = 1 (67) ρ norm c → v ( ¯ L v , c ) + ρ norm c → v ( 01 ) = ρ norm c → v ( 01 ) . (68) Now we are ready to prove Theorem 2. Pr oof: W e will refer to t he mess age correspondence in E quations (60) to (62) as the “left correspondence”, the correspondence i n (63) as the “right correspondence”, and t he correspon- dence in Equ ations (64) to (66) as the “summary correspondence”. W e will p rov e th e theorem by first sh o wing that if the left correspondence holds, th en the right correspondence holds , and con versely that if the right correspon dence holds , then th e left correspondence holds. This shoul d prove that correspondence b etween SP ∗ ( γ ) and weight ed PTP in their passed mess ages. W e will th en complete the p roof by showing the s ummary correspondence. First suppo se that the left correspondence holds, n amely that Π s norm v → c = [ L v,c = 0] · λ norm v → c ( 0 ) + [ L v,c = 1] · λ norm v → c ( 1 ) , Π u norm v → c = [ L v,c = 0] · λ norm v → c ( 1 ) + [ L v,c = 1] · λ norm v → c ( 0 ) , and Π ∗ norm v → c = 33 λ norm v → c ( 01 ) . In each iteration, by Lemm a 5 and t he fact [ L v,c = 1] + [ L v,c = 0] = 1 for eve ry ( v − c ) pair , the right m essages s atisfy ρ c → v ( 0 ) + ρ c → v ( 1 ) + ρ c → v ( 0 1) = [ L v,c = 0] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) +[ L v,c = 1] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) +1 − Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) = 1 . That is, each rig ht message ρ c → v is already normalized, or ρ c → v = ρ norm c → v . Then ρ norm c → v ( 0 ) + ρ norm c → v ( 1 ) = ρ c → v ( 0 ) + ρ c → v ( 1 ) = [ L v,c = 0] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) +[ L v,c = 1] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) = Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm u → c ( 0 ) · Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm u → c ( 1 ) = Y u ∈ V ( c ) \{ v } : L u,c =1 ([ L u,c = 1] · λ norm u → c ( 0 ) + [ L u,c = 0] · λ norm u → c ( 1 )) · Y u ∈ V ( c ) \{ v } : L u,c =0 ([ L u,c = 1] · λ norm u → c ( 0 ) + [ L u,c = 0] · λ norm u → c ( 1 )) = Y u ∈ V ( c ) \{ v } ([ L u,c = 1] · λ norm u → c ( 0 ) + [ L u,c = 0] · λ norm u → c ( 1 )) ( a ) = Y u ∈ V ( c ) \{ v } Π u norm u → c ( b ) = Y u ∈ V ( c ) \{ v } Π u u → c Π u u → c + Π s u → c + Π ∗ u → c = η c → v , where equality ( a ) is due to the assumed left correspondence, and equality ( b ) follows from the definition of Π u norm v → c . Thus we h a ve shown that if the left correspondence holds , then th e right correspondence holds. 34 Now suppos e that the right correspondence holds, namely that η c → v = ρ norm c → v ( 0 ) + ρ norm c → v ( 1 ) for ev ery ( v − c ) pair . Following th e PTP message-update equations (51) to (53), we hav e [ L v,c = 0] · λ v → c ( 0 ) + [ L v,c = 1] · λ v → c ( 1 ) = [ L v,c = 0] · Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 0 ) + ρ norm b → v ( 01 )) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) +[ L v,c = 1] · Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 1 ) + ρ norm b → v ( 01 )) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) = [ L v,c = 0] · Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 0 ) + ρ norm b → v ( 01 )) + [ L v,c = 1] · Y b ∈ C ( v ) \{ c } ( ρ norm b → v ( 1 ) + ρ norm b → v ( 01 )) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) ( 68 ) = [ L v,c = 0] · Y b ∈ C s c ( v ) ( ρ norm b → v ( 0 ) + ρ norm b → v ( 01 )) · Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) +[ L v,c = 1] · Y b ∈ C s c ( v ) ( ρ norm b → v ( 1 ) + ρ norm b → v ( 01 )) · Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) ( 67 ) = [ L v,c = 0] · Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) + [ L v,c = 1] · Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) = Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) − γ Y b ∈ C ( v ) \{ c } ρ norm b → v ( 01 ) = Y b ∈ C u c ( v ) ρ norm b → v ( 01 ) · 1 − γ Y b ∈ C s c ( v ) ρ norm b → v ( 01 ) = Y b ∈ C u c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) · 1 − γ Y b ∈ C s c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) ( c ) = Y b ∈ C u c ( v ) (1 − η b → v ) · 1 − γ Y b ∈ C s c ( v ) (1 − η b → v ) = Π s v → c , where equality (c) above is due to the assum ed right correspondence. W e will denote t his result by (A). 35 Follo wing very sim ilar procedures, it can be shown that [ L v,c = 0] · λ v → c ( 1 ) + [ L v,c = 1] · λ v → c ( 0 ) = Y b ∈ C s c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) · 1 − γ Y b ∈ C u c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) = Π u v → c W e wil l denote this result by (B). Similarly , λ v → c ( 01 ) = γ Y b ∈ C s c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) · Y b ∈ C u c ( v ) (1 − ρ norm b → v ( 0 ) − ρ norm b → v ( 1 )) = Π ∗ v → c . W e wil l denote this result by (C). Combining results (A), (B) and (C), we hav e λ v → c ( 0 ) + λ v → c ( 1 ) + λ v → c ( 01 ) = Π u v → c + Π s v → c + Π ∗ v → c . That is, the scaling constant for norm alizing ( λ v → c ( 0 ) , λ v → c ( 1 ) , λ v → c ( 01 )) and that for nor- malizing (Π u v → c , Π s v → c , Π ∗ v → c ) are i dentical. Then results (A), (B ) and (C) re spectively translate to [ L v,c = 1] · λ norm v → c ( 1 ) + [ L v,c = 0] · λ norm v → c ( 0 ) = Π s norm v → c [ L v,c = 0] · λ norm v → c ( 1 ) + [ L v,c = 1] · λ norm v → c ( 0 ) = Π u norm v → c λ norm v → c ( 01 ) = Π ∗ norm v → c . At this point we ha ve established the correspondence between the passed messages in weighted PTP and those in weighted SP . W e now prove the s ummary correspondence. 36 Starting from Lemm a 5, we hav e µ v ( 0 ) = Y c ∈ C ( v ) ( ρ norm c → v ( 0 ) + ρ norm c → v ( 01 )) − γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) = Y c ∈ C 1 ( v ) ( ρ norm c → v ( 0 ) + ρ norm c → v ( 01 )) Y c ∈ C 0 ( v ) ( ρ norm c → v ( 0 ) + ρ norm c → v ( 01 )) − γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) ( 67 ) , ( 68 ) = Y c ∈ C 1 ( v ) ρ norm c → v ( 01 ) − γ Y c ∈ C ( v ) ρ norm c → v ( 01 ) = 1 − γ Y c ∈ C 0 ( v ) ρ norm c → v ( 01 ) Y c ∈ C 1 ( v ) ρ norm c → v ( 01 ) = 1 − γ Y c ∈ C 0 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) Y c ∈ C 1 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) ( d ) = 1 − γ Y c ∈ C 0 ( v ) (1 − η c → v ) Y c ∈ C 1 ( v ) (1 − η c → v )) = ζ 0 v where ( d ) above is due to the right correspondence th at we just proved. Symmetrically , it can be shown that µ v ( 1 ) = 1 − γ Y c ∈ C 1 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) Y c ∈ C 0 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) = 1 − γ Y c ∈ C 1 ( v ) (1 − η c → v ) Y c ∈ C 0 ( v ) (1 − η c → v )) = ζ 1 v . Finally , it i s s traight-forward to see µ v ( 01 ) = γ Y c ∈ C 0 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) Y c ∈ C 1 ( v ) (1 − ρ norm c → v ( 0 ) − ρ norm c → v ( 1 )) = γ Y c ∈ C 0 ( v ) (1 − η c → v ) Y c ∈ C 1 ( v ) (1 − η c → v ) = ζ ∗ v . 37 This proves the summ ary correspond ence and completes the proo f. This th eorem asserts that weighted SP dev eloped for k -SA T problems is an ins tance of weighted PTP that we prop ose in this paper , or alternatively phrased, weighted PTP g eneralizes weighted SP from the context of k -SA T p roblems t o arbit rary CSPs with arbitrary variable alphabets. When specifying parameter γ to be 1 , this result immediately im plies that no n- weighted SP is no n-weighted PTP for k -SA T problems. Additionall y , we note that i n the correspondence between th e summ ary m essages of w eighted PTP and weight ed SP in the above th eorem, it is clear t hat symbols 0 , 1 , and ∗ in weighted SP (or SP) corresponds to tokens (sets) 0 , 1 and 01 respectiv ely . In addition, if we use notation L v , c , we may re-write the correspondence between the left messages of weight ed SP and th ose of weighted PTP in the above theorem as Π s v → c ↔ λ v → c ( L v , c ) Π u v → c ↔ λ v → c ( ¯ L v , c ) Π ∗ v → c ↔ λ v → c ( 01 ) That is, symbol s “ s ” and “ u ” in SP respective ly correspond to singleto n s et L v , c and ¯ L v , c . T hese observations suggest that, although blurred by the addition of s ingle symbol ∗ to the v ariable alphabet, the t rue alphabet used as the s upport of SP messages is the set of all tokens associated with the variable, or equiva lently , the power set of the original alphabet with the empt y set removed. At this point, questions may naturally arise pertaining to what PTP and weighted PTP d o tow ards the goal of solving a CSP . Although rigorous question this question remains lar gely open at this poi nt, we present some prelim inary results in Appendix B. From Appendix B, intuitively one m ay view PTP or weighted PTP as essent ially updating a random rec tangl e whose s ides are independently d istributed random v ariables; as PTP iterates, it driv es so me side of the random rectangle to being determini stically biased towards a singleton that contains t he solution of the CSP . The reader is referred to Appendi x B for more detailed exposition. V I . T H E R E D U C T I O N O F S P F RO M B P At this point, we have identified SP with an equiva lent but probabilist ically interpretable algorithmic procedure, PTP , and generalized weighted SP from the special case of k -SA T and 38 binary problems to arbitrary CSPs, in terms of weight ed PTP . Now we are in the position to discuss the reduction of SP from BP , where w e will refer to SP exclusively as PTP , and weighted SP exclusive ly as weighted PTP . As is well known, the deriv ation of the BP algorithm is based on a well-defined factoring function, or seen from a p robabilistic perspectiv e, a M arko v random field (MRF). Thus , whet her PTP or weighted PTP m ay be reduced from BP boils down t o whether there is an MRF formulation on which the derived BP algorithm coinci des with PTP or weighted P TP . In [15 ], an MRF is constructed fo r k -SA T problem, on which BP reduces to what we now call weighted PTP . In [17], similar results are sho wn using a different M RF formali sm, where (generalized) states are int roduced and the MRF is represented by a Forney graph or normal realization [18]. Although in some sense, the normally realized MRF formalism of [17] is equivalent to the M RF of [15], the Forne y-graph formalism in [17] makes the de velopment cleaner and more transparent, and t he explicit introduction o f states provides a better correspondence w ith the weighted PTP messages. In this section, we first generalize the M RF formalism , i n t he style of [15 ] or [17], to arbitrary CSPs, and derive the correspon ding BP algorit hm. W e t hen in vestigate whether the derived BP algorithm may be reduced to PTP or weighted PTP . W e will begin this in vestigation with the special case of k -SA T problems, and then proceed to the 3 -COL problems and to general CSPs. Re-de veloping the results of [15] and [17] for k -SA T problem s, we sho w that the BP algorithm on the normally realized M RF i s readily reducible to weighted PTP as long as th e BP messages are initialized to satisfyin g certain cond ition. W e no te that this condition, when satisfied in the first BP i teration, will necessarily be satisfied in lat er iterations i n k -SA T problems. Identifying the important role of this condition, we call this condit ion the sta te-decoupling condit ion . Howe ver , as we proceed to show , in 3 -COL prob lems, it is impossib le for the state-decoupling cond ition t o hold true non -tri viall y across all BP iterations. Ne vertheless, if one manually manipulate th e BP messages to impose thi s conditio n in every iteration, which results in a modified BP message- update rule referred t o as state-decoupled B P or SDBP in short, then th e (SD)BP messages will s till reduce to PTP . This on one hand justifies the role of the state-decouplin g conditi on in BP-to-PTP reductio n, and on the other hand suggests that for general CSPs, PTP (or SP) i s not a sp ecial case o f th e BP algorit hm. W e then proceed further by in vestigating whether the state- decoupling con dition i s su f ficient for B P to reduce to PTP or weight ed PTP for g eneral CSPs. 39 T o t hat end, we show t hat yet anoth er “l ocal compatibil ity” condition concerning the structure of the CSP (in terms of t he int eraction betw een neighboring constraints) is required for SDBP to reduce to PTP or weigh ted PTP . A. Normally Realized Markov Random Field Giv en a CSP represented by factor graph G , we now define its corresponding normall y r ealized Ma rko v random field ˜ G using a Forney graph representation [18]. W e note that random var iables in volved in t he probabil ity mass function (PMF) represented by ˜ G are no longer those associated with factor graph (or equiv alently MRF) G , but rather a ne w set of random variables, each distributed over th e set of tokens ass ociated with a coordinate. Additio nally , as the central component of th e Forne y graph, anot her set of random va riables, typi cally called generalized states or si mply s tates , are also included. Specifically , as a graph, ˜ G can be constructed by addi ng a “half-edge” to each va riable vertex of G . As a factor graph, ˜ G uses a diffe rent notatio n: edges and half edges are interpreted as “va riables” and vertices are interpreted as local functions; a variable is an argument of th e function if and only if the corresponding edge or half edge is incident on the corresponding verte x. W e now define each variable and lo cal functi on in ˜ G . • Each local function (or vertex) in ˜ G corresponding to variable v ertex x v in G will be deno ted by g v ( · ) , and referred t o as a left fu nction . • Each local function (or vertex) in ˜ G corresponding to function v ertex Γ c will be denoted by f c ( · ) , and referred t o as a righ t f unction . • The half edge incid ent on g v represents variable y v , referred to as a side , t aking values from ( χ ∗ ) { v } . • The edge connecting left function g v and right fun ction f c represents variable s v,c , referred to as a st ate , taking values from ( χ ∗ ) { v } × ( χ ∗ ) { v } . W e will also write state s v,c as pair s L v,c , s R v,c of l eft state s L v,c and right state s R v,c . • Left fun ction g v for v ∈ V is defined as g v ( y v , s v,C ( v ) ) := ω v y v \ c ∈ C ( v ) s R v,c · Y c ∈ C ( v ) [ s L v,c = y v ] , (69) where s v,C ( v ) is the short -hand no tation for h s v,c i c ∈ C ( v ) and ω v is an obedience con ditional on ( χ ∗ ) { v } . 40 • Right functio n f c for each c ∈ C i s defined as f c ( s V ( c ) ,c ) := Y v ∈ V ( c ) [ s R v,c = F c ( s L V ( c ) \{ v } ,c )] , (70) where s V ( c ) ,c is the short -hand no tation for h s v,c i v ∈ V ( c ) . • The global functi on represented by ˜ G is F ( y V , s V ,C ) := Y v ∈ V g v ( y v , s v,C ( v ) ) · Y c ∈ C f c ( s V ( c ) ,c ) , (71) where s V ,C is the short-hand notati on for { s v,c : ∀ ( v − c ) } . It is clear th at up on normali zation, function F may represent a PMF and t he factorization of F encoded by ˜ G realizes an MRF . An example of such no rmally realized MRF , corresponding to the toy 3-SA T problem in Figu re 1, is given in Fig. 4. Using t he “intention-com mand” analogy , one may view that for any v , both y v and each left state s L v,c stores the i ntention of variable x v , and that for any given c , each righ t state s R v,c stores the command of constraint Γ c sent to v ariable v . The intention of variable x v depends on the intersection of all incom ing commands probabilistically via the obedience c ondit ional ω v . Th e command of Γ c sent t o each v ariable x v need to equal the forced token by the rectangle formed by the intent ions from all other n eighboring variables. W e say t hat a configuration of ( y V , s V ,C ) is valid under F if it is in the support o f function F (namely , if it gi ves rise to a non -zero value of function F ). Further , rectangle y V is said to be valid under F i f there exists a configuratio n of s V ,C such that ( y V , s V ,C ) is valid under F . Then it immediately follows that the PMF represented by MRF ˜ G , upon m ar ginalizin g over states s V ,C , characterizes the set of all valid rectangles under F (via the support of t he marginal of F on y V ). W e now give an in tuitive explanation of the MRF defining the distribution of rectangle y V . A s imple prop erty of such MRFs is giv en in the following lemma, which immediately follows from the definition o f the left function s. Lemma 7: If configuration ( y V , s V ,C ) is valid u nder F , then it hold s for ev ery ( v − c ) that s L v,c = y v ⊆ s R v,c . Now we consider applying the BP message-update rule on t he Forney graph ˜ G we just defined, where we will use ρ c → v (referred to as a righ t message) and λ v → c (referred to as a left message) 41 s 5 ,c y 2 y 3 y 4 y 5 y 1 s 2 ,a s 1 ,b s 2 ,c s 3 ,b s 4 ,a s 5 ,b s 4 ,c s 1 ,a f c g 1 g 2 g 3 g 4 g 5 f a f b Fig. 4. The Forne y graph representing the normal realization of the toy problem i n Fi gure 1. to denote the m essage passed from a right function f c to a left function g v and t he message passed from left function g v to ri ght function f c respectiv ely , and use µ v to denote the summary message at variable y v . W e note that both right m essage ρ c → v and left mess age λ v → c are functions on the state space ( χ ∗ ) { v } × ( χ ∗ ) { v } . Lemma 8: The BP message-update rule on Forney graph ˜ G is: λ v → c ( s L v,c , s R v,c ) := X s R v,C ( v ) \{ c } ω v s L v,c \ b ∈ C ( v ) s R v,b Y b ∈ C ( v ) \{ c } ρ b → v ( s L v,c , s R v,b ) (72) ρ c → v ( s L v,c , s R v,c ) := X s L V ( c ) \{ v } ,c s R v,c = F c ( s L V ( c ) \{ v } ,c ) Y u ∈ V ( c ) \{ v } λ u → c ( s L u,c , F c ( s L V ( c ) \{ u } ,c )) (73) µ v ( y v ) := X s R v,C ( v ) ω v y v \ c ∈ C ( v ) s R v,c Y c ∈ C ( v ) ρ c → v ( y v , s R v,c ) . (74) Before proving t his lemma, it is useful to note the following elementary results. Lemma 9: 1) F or any function φ , X y φ ( x, y ) [ y = z ] = φ ( x, z ) . (75) 2) For any collection of functi ons φ 1 , φ 2 , . . . , φ m , X x 1 ,x 2 ,...,x n n Y i =1 φ i ( x i ) = n Y i =1 X x i φ i ( x i ) . (76) 42 W e now prove Lemm a 8. Pr oof: λ v → c ( s L v, c , s R v, c ) = X y v X s v,C ( v ) \{ c } ω v y v \ b ∈ C ( v ) s R v, b Y b ∈ C ( v ) [ s L v, b = y v ] Y b ∈ C ( v ) \{ c } ρ b → v ( s L v, b , s R v, b ) = X y v [ s L v, c = y v ] X s R v,C ( v ) \{ c } ω v y v \ b ∈ C ( v ) s R v, b X s L v,C ( v ) \{ c } Y b ∈ C ( v ) \{ c } ρ b → v ( s L v, b , s R v, b ) · [ s L v, b = y v ] ( 76 ) = X y v [ s L v, c = y v ] X s R v,C ( v ) \{ c } ω v y v \ b ∈ C ( v ) s R v, b Y b ∈ C ( v ) \{ c } X s L v,b ρ b → v ( s L v, b , s R v, b ) · [ s L v, b = y v ] ( 75 ) = X y v [ s L v, c = y v ] X s R v,C ( v ) \{ c } ω v y v \ b ∈ C ( v ) s R v, b Y b ∈ C ( v ) \{ c } ρ b → v ( y v , s R v, b ) = X s R v,C ( v ) \{ c } ω v s L v, c \ b ∈ C ( v ) s R v, b Y b ∈ C ( v ) \{ c } ρ b → v ( s L v, c , s R v, b ) . ρ c → v ( s L v, c , s R v, c ) = X s V ( c ) \{ v } ,c Y u ∈ V ( c ) [ s R u,c = F c ( s L V ( c ) \{ u } ,c )] Y u ∈ V ( c ) \{ v } λ u → c ( s L u,c , s R u,c ) = X s L V ( c ) \{ v } ,c [ s R v, c = F c ( s L V ( c ) \{ v } ,c )] X s R V ( c ) \{ v } ,c Y u ∈ V ( c ) \{ v } [ s R u,c = F c ( s L V ( c ) \{ u } ,c )] · λ u → c ( s L u,c , s R u,c ) ( 76 ) = X s L V ( c ) \{ v } ,c [ s R v, c = F c ( s L V ( c ) \{ v } ,c )] Y u ∈ V ( c ) \{ v } X s R u,c [ s R u,c = F c ( s L V ( c ) \{ u } ,c )] · λ u → c ( s L u,c , s R u,c ) ( 75 ) = X s L V ( c ) \{ v } ,c [ s R v, c = F c ( s L V ( c ) \{ v } ,c )] Y u ∈ V ( c ) \{ v } λ u → c ( s L u,c , F c ( s L V ( c ) \{ u } ,c )) . µ v ( y v ) = X s v,C ( v ) ω v y v \ c ∈ C ( v ) s R v, c Y c ∈ C ( v ) s L v, c = y v Y c ∈ C ( v ) ρ c → v ( s L v, c , s R v, c ) = X s R v,C ( v ) ω v y v \ c ∈ C ( v ) s R v, c X s L v,C ( v ) Y c ∈ C ( v ) [ s L v, c = y v ] · ρ c → v ( s L v, c , s R v, c ) ( 76 ) = X s R v,C ( v ) ω v y v \ c ∈ C ( v ) s R v, c Y c ∈ C ( v ) X s L v,c [ s L v, c = y v ] · ρ c → v ( s L v, c , s R v, c ) ( 75 ) = X s R v,C ( v ) ω v y v \ c ∈ C ( v ) s R v, c Y c ∈ C ( v ) ρ c → v ( y v , s R v, c ) . 43 B. W eight ed PTP as BP fo r k -SA T Now we show t hat for k -SA T problems , weight ed PTP is an instance of BP when the parametrization of weighted PTP is consistent with the parametrizatio n of the normally realized MRF from which BP is derived. W e begin with introdu cing a s implification of notat ions. For any ( v − c ) and edge label L v,c , we will write L v , c as L , and ¯ L v , c as ¯ L . This suppression of the dependency of L v , c and ¯ L v , c on their subscripts should not result in any ambigui ty , when t he context clearly indicates the subscript ( v , c ) or t he edge to which th e edge label L v,c refers. Addi tionally , for any v ∈ V , we will write 01 v as ∗ . Thus, each left or right state will take configurations from set { L , ¯ L , ∗} , where the int erpretation of L and ¯ L depends o n the edge with which the state is ass ociated. For any given configuration of a state ( s L v,c , s R v,c ) , we will suppress the comma between the left-state configuration and the right-state configuration. For example, state c onfiguration s ( L , ∗ ) , ( ¯ L , ∗ ) , ( ∗ , ∗ ) and ( L , L ) wi ll be written respectively as L ∗ , ¯ L ∗ , ∗∗ and LL . Lemma 10: Let F be defined via (69), (70) and (71 ), where each weightin g function ω v is defined in (50). If ( y V , s V ,C ) is valid u nder F , then 1) for e very ( v − c ) , it holds t hat s R v,c 6 = ¯ L , s v,c 6 = ¯ LL and that s v,c 6 = ∗ L , and 2) F ( y V , s V ,C ) = γ n ∗|∗ ( y V ,s V ,C ) · (1 − γ ) n ·|∗ ( y V ,s V ,C ) , where n ∗|∗ ( y V , s V ,C ) and n ·|∗ ( y V , s V ,C ) are respectiv ely the cardinalities of set { v ∈ V : y v = T c ∈ C ( v ) s R v,c = ∗} and set { v ∈ V : y v ⊂ T c ∈ C ( v ) s R v,c = ∗} . Pr oof: For part 1, first we observe that s R v,c 6 = ¯ L , directly following from the definition of the right functi on (70). T hen by Lemma 7, it is easy to see that s v,c 6 = ¯ LL and that s v,c 6 = ∗ L . For part 2, we may proceed as follows. F ( y V , s V ,C ) = Y v ∈ V g v ( y v , s v,C ( v ) ) · Y c ∈ C f c ( s V ( c ) ,c ) ( 69 ) , ( 70 ) = Y v ∈ V ω v y v \ c ∈ C ( v ) s R v,c · Y c ∈ C ( v ) [ s L v,c = y v ] · Y c ∈ C Y v ∈ V ( c ) [ s R v,c = F c ( s L V ( c ) \{ v } ,c )] ( a ) = Y v ∈ V ω v y v \ c ∈ C ( v ) s R v,c ( b ) = γ n ∗|∗ ( y V ,s V ,C ) · (1 − γ ) n ·|∗ ( y V ,s V ,C ) , 44 where equalit y ( a ) is due to th e fact th at ( y V , s V ,C ) is valid under F , and equality ( b ) follows from the definition o f the weighting functi on ω y v T c ∈ C ( v ) s R v,c in (50). The second part of th is l emma, as a slig ht digression , sugg ests t hat the PMF under this MRF model is identical to th at of [15], si nce an equiv alent result is shown for the M RF in [15]. W e note that the MRF in [15] serves as a combinatori al framework for the st udy of k -SA T problems, which leads to further insights of SP for k -SA T problems (the rea der is referred to [15] for additio nal result s). T o a certain extent, one m ay expect that th e normally realized MRF presented here may serve sim ilar p urposes for general CSPs. The first part of this lemma sug gests that alt hough each state takes on values from { L , ¯ L , ∗} × { L , ¯ L , ∗} , t here are in fact only four pos sible state configurations t hat cont ribute to defining a valid rectangle. When applying the BP message-update rule on the Forney graph representa- tion of a k -SA T problem, this im plies that messages λ v → c , ρ c → v and µ v are all supported by { LL , ¯ L ∗ , L ∗ , ∗∗} . The BP message-update rule is gi ven in Lem ma 11, which directly follows from equations (72) to (74). Lemma 11: The BP mess age-update rule appl ied on F orney graph ˜ G of a k -SA T problem giv es rise to: 45 λ v → c ( LL ) := Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ( ρ b → v ( LL ) + ρ b → v ( L ∗ )) (77) λ v → c ( ¯ L ∗ ) := Y b ∈ C s c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C u c ( v ) ( ρ b → v ( L ∗ ) + ρ b → v ( LL )) − γ Y b ∈ C u c ( v ) ρ b → v ( L ∗ ) (78) λ v → c ( L ∗ ) := Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ( ρ b → v ( L ∗ ) + ρ b → v ( LL )) − γ Y b ∈ C s c ( v ) ρ b → v ( L ∗ ) (79) λ v → c ( ∗∗ ) := γ Y b ∈ C u c ( v ) ∪ C s c ( v ) ρ b → v ( ∗∗ ) (80) ρ c → v ( LL ) := Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) (8 1) ρ c → v ( ¯ L ∗ ) := Y u ∈ V ( c ) \{ v } ( λ u → c ( L ∗ ) + λ u → c ( ∗∗ ) + λ u → c ( ¯ L ∗ )) + X u ∈ V ( c ) \{ v } ( λ u → c ( LL ) − λ u → c ( L ∗ ) − λ u → c ( ∗∗ )) Y w ∈ V ( c ) \{ u,v } λ w → c ( ¯ L ∗ ) − Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) (82) ρ c → v ( L ∗ ) := Y u ∈ V ( c ) \{ v } ( λ u → c ( L ∗ ) + λ u → c ( ∗∗ ) + λ u → c ( ¯ L ∗ )) − Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) (83) ρ c → v ( ∗∗ ) := Y u ∈ V ( c ) \{ v } ( λ u → c ( L ∗ ) + λ u → c ( ∗∗ ) + λ u → c ( ¯ L ∗ )) − Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) (84) µ v ( 0 ) := Y c ∈ C 1 ( v ) ρ c → v ( ¯ L ∗ ) Y c ∈ C 0 ( v ) ( ρ c → v ( LL ) + ρ c → v ( L ∗ )) − γ Y c ∈ C 0 ( v ) ρ c → v ( L ∗ ) (85) µ v ( 1 ) := Y c ∈ C 0 ( v ) ρ c → v ( ¯ L ∗ ) Y c ∈ C 1 ( v ) ( ρ c → v ( LL ) + ρ c → v ( L ∗ )) − γ Y c ∈ C 1 ( v ) ρ c → v ( L ∗ ) (86) µ v ( ∗ ) := γ Y c ∈ C ( v ) ρ c → v ( ∗∗ ) . (87) Now we are ready to in v estigate ho w these BP messages may reduced to (weighted) PTP messages. It turns o ut that the following con dition has a special role to play i n t his reduction. ρ c → v ( L ∗ ) = ρ c → v ( ¯ L ∗ ) = ρ c → v ( ∗∗ ) (88) 46 Pr oposition 1: In k -SA T problems, i f the BP messages are initi alized to sati sfy condition (88), then this conditi on is satisfied in every BP it eration. Pr oof: W e only need t o show that if (88) i s s atisfied during i nitialization, then it is satisfied in the first iteration after in itialization. – In fact, noting that ρ c → v ( L ∗ ) = ρ c → v ( ∗∗ ) necessarily holds in each BP it eration due to (83) and (84 ), we onl y need to prove that ρ c → v ( ¯ L ∗ ) = ρ c → v ( L ∗ ) holds in the first iteration provided BP messages are i nitialized to satisfy (88). Under this initial ization condi tion, we hav e, in th e first BP iteration after , λ v → c ( L ∗ ) + λ v → c ( ∗∗ ) = Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) × Y b ∈ C s c ( v ) ( ρ b → v ( L ∗ ) + ρ b → v ( LL )) − γ Y b ∈ C s c ( v ) ρ b → v ( L ∗ ) + γ Y b ∈ C u c ( v ) ∪ C s c ( v ) ρ b → v ( ∗∗ ) = Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ( ρ b → v ( L ∗ ) + ρ b → v ( LL )) − γ Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ρ b → v ( L ∗ ) + γ Y b ∈ C u c ( v ) ∪ C s c ( v ) ρ b → v ( ∗∗ ) ( a ) = Y b ∈ C u c ( v ) ρ b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ( ρ b → v ( L ∗ ) + ρ b → v ( LL )) = λ v → c ( LL ) , where equality ( a ) i s due to the initializatio n conditi on (88). Then in the subs equent update of the rig ht messages, we have ρ c → v ( ¯ L ∗ ) = Y u ∈ V ( c ) \{ v } ( λ u → c ( L ∗ ) + λ u → c ( ∗∗ ) + λ u → c ( ¯ L ∗ )) + X u ∈ V ( c ) \{ v } ( λ u → c ( LL ) − λ u → c ( L ∗ ) − λ u → c ( ∗∗ )) Y w ∈ V ( c ) \{ u,v } λ w → c ( ¯ L ∗ ) − Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) ( b ) = Y u ∈ V ( c ) \{ v } ( λ u → c ( L ∗ ) + λ u → c ( ∗∗ ) + λ u → c ( ¯ L ∗ )) − Y u ∈ V ( c ) \{ v } λ u → c ( ¯ L ∗ ) = ρ c → v ( L ∗ ) , where equality ( b ) i s due to t he above resul t λ v → c ( LL ) = λ v → c ( L ∗ ) + λ v → c ( ∗∗ ) . 47 Theor em 3: In a k -SA T problem, suppose t hat th e following two condition s are imposed in the BP messages. 1) For eve ry ( v − c ) , t he BP mess ages are i nitialized such that (88) is satis fied. 2) In each BP it eration, λ v → c is scaled to λ norm v → c such that λ norm v → c ( L ∗ ) + λ norm v → c ( ¯ L ∗ ) + λ norm v → c ( ∗∗ ) = 1 , before it is passed along the edge; that is, λ norm v → c ( s L v,c , s R v,c ) := 1 P s L v,c λ v → c ( s L v,c , ∗ ) · λ v → c ( s L v,c , s R v,c ) for every ( s L v,c , s R v,c ) in the s upport of λ v → c and t he right m essages are updated based o n the normali zed left messages, namely , ρ c → v ( LL ) := Y u ∈ V ( c ) \{ v } λ norm u → c ( ¯ L ∗ ) (89) ρ c → v ( ¯ L ∗ ) := Y u ∈ V ( c ) \{ v } ( λ norm u → c ( L ∗ ) + λ norm u → c ( ∗∗ ) + λ norm u → c ( ¯ L ∗ )) + X u ∈ V ( c ) \{ v } ( λ norm u → c ( LL ) − λ norm u → c ( L ∗ ) − λ norm u → c ( ∗∗ )) Y w ∈ V ( c ) \{ u,v } λ norm w → c ( ¯ L ∗ ) − Y u ∈ V ( c ) \{ v } λ norm u → c ( ¯ L ∗ ) (90) ρ c → v ( L ∗ ) := Y u ∈ V ( c ) \{ v } ( λ norm u → c ( L ∗ ) + λ norm u → c ( ∗∗ ) + λ norm u → c ( ¯ L ∗ )) − Y u ∈ V ( c ) \{ v } λ norm u → c ( ¯ L ∗ ) (91) ρ c → v ( ∗∗ ) := Y u ∈ V ( c ) \{ v } ( λ norm u → c ( L ∗ ) + λ norm u → c ( ∗∗ ) + λ norm u → c ( ¯ L ∗ )) − Y u ∈ V ( c ) \{ v } λ norm u → c ( ¯ L ∗ ) . (92 ) Then the correspondence between BP mess ages and weighted PTP messages is λ norm(BP) v → c ( L ∗ ) ↔ [ L v,c = 0] · λ norm(PTP) v → c ( 0 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 1 ) (93) λ norm(BP) v → c ( ¯ L ∗ ) ↔ [ L v,c = 0] · λ norm(PTP) v → c ( 1 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 0 ) (94) λ norm(BP) v → c ( ∗∗ ) ↔ λ norm(PTP) v → c ( ∗ ) (95) ρ (BP) c → v ( L ∗ ) ↔ ρ norm(PTP) c → v ( ∗ ) (96) ρ (BP) c → v ( LL ) ↔ ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( 1 ) (97) µ (BP) v ( 0 ) ↔ µ (PTP) v ( 0 ) (98) µ (BP) v ( 1 ) ↔ µ (PTP) v ( 1 ) (99) µ (BP) v ( ∗ ) ↔ µ (PTP) v ( ∗ ) . (100) Pr oof: 48 Note that based on Proposition 1, condi tion ρ (BP) c → v ( L ∗ ) = ρ (BP) c → v ( ¯ L ∗ ) = ρ (BP) c → v ( ∗∗ ) h olds in e very BP iteration. From the proof of Proposition 1, it als o h olds in ev ery BP iteratio n t hat λ norm(BP) v → c ( L ∗ ) + λ norm(BP) v → c ( ∗∗ ) = λ norm(BP) v → c ( LL ) . (101) Now we will prove thi s th eorem by first proving that the “left correspondence” ((93) to (95)) implies the “right correspondence” ((96) and (97)) and con versely that the “right correspondence” implies the “left correspondence”, whereby proving t he correspondence in th e passed mess ages. W e th en prove the summ ary correspondence ((98) to (100)). First s uppose that left correspond ence h olds, namely that λ norm(BP) v → c ( L ∗ ) = [ L v,c = 0] · λ norm(PTP) v → c ( 0 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 1 ) , λ norm(BP) v → c ( ¯ L ∗ ) = [ L v,c = 0] · λ norm(PTP) v → c ( 1 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 0 ) , and λ norm(BP) v → c ( ∗∗ ) = λ norm(PTP) v → c ( ∗ ) . F ollowing PTP message-updating equations (54) to (56), we hav e ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( 1 ) ( a ) = ρ (PTP) c → v ( 0 ) + ρ (PTP) c → v ( 1 ) = [ L v,c = 0] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm(PTP) u → c ( 0 ) Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm(PTP) u → c ( 1 ) +[ L v,c = 1] · Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm(PTP) u → c ( 0 ) Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm(PTP) u → c ( 1 ) = Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm(PTP) u → c ( 0 ) Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm(PTP) u → c ( 1 ) = Y u ∈ V ( c ) \{ v } : L u,c =1 [ L u,c = 0] · λ norm(PTP) u → c ( 1 ) + [ L u,c = 1] · λ norm(PTP) u → c ( 0 ) × Y u ∈ V ( c ) \{ v } : L u,c =0 [ L u,c = 0] · λ norm(PTP) u → c ( 1 ) + [ L u,c = 1] · λ norm(PTP) u → c ( 0 ) = Y u ∈ V ( c ) \{ v } [ L u,c = 0] · λ norm(PTP) u → c ( 1 ) + [ L u,c = 1] · λ norm(PTP) u → c ( 0 ) ( b ) = Y u ∈ V ( c ) \{ v } λ norm(BP) u → c ( ¯ L ∗ ) = ρ (BP) c → v ( LL ) where equality ( a ) i s due to the fact that ρ norm(PTP) c → v = ρ (PTP) c → v as is sh o wn in the proof of Th eorem 2, equality ( b ) is due to t he assumed left correspondence. 49 Similarly , we have ρ norm(PTP) c → v ( ∗ ) = ρ (PTP) c → v ( ∗ ) = 1 − Y u ∈ V ( c ) \{ v } : L u,c =1 λ norm(PTP) u → c ( 0 ) Y u ∈ V ( c ) \{ v } : L u,c =0 λ norm(PTP) u → c ( 1 ) = 1 − Y u ∈ V ( c ) \{ v } λ norm(BP) u → c ( ¯ L ∗ ) ( c ) = Y u ∈ V ( c ) \{ v } ( λ norm(BP) u → c ( L ∗ ) + λ norm(BP) u → c ( ¯ L ∗ ) + λ norm(BP) u → c ( ∗∗ )) − Y u ∈ V ( c ) \{ v } λ norm(BP) u → c ( ¯ L ∗ ) = ρ (BP) c → v ( L ∗ ) , where equality ( c ) is due to th e fact t hat λ norm(BP) u → c ( L ∗ ) + λ norm(BP) u → c ( ¯ L ∗ ) + λ norm(BP) u → c ( ∗∗ ) = 1 . Thus we proved that i f the left correspondence holds, then the right correspond ence holds. Now suppose that the right correspondence hol ds, nam ely that ρ (BP) c → v ( L ∗ ) = ρ norm(PTP) c → v ( ∗ ) , and ρ (BP) c → v ( LL ) = ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( 1 ) . W e then have ρ (BP) c → v ( L ∗ ) + ρ (BP) c → v ( LL ) = ρ norm(PTP) c → v ( ∗ ) + ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( 1 ) = 1 . Follo wing PTP message-update equation s (51) to (53), we hav e 50 [ L v,c = 0] · λ (PTP) v → c ( 0 ) + [ L v,c = 1] · λ (PTP) v → c ( 1 ) = [ L v,c = 0] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 0 ) + ρ norm(PTP) b → v ( ∗ )) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) +[ L v,c = 1] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 1 ) + ρ norm(PTP) b → v ( ∗ )) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) = [ L v,c = 0] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 0 ) + ρ norm(PTP) b → v ( ∗ )) +[ L v,c = 1] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 1 ) + ρ norm(PTP) b → v ( ∗ )) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) ( 68 ) = [ L v,c = 0] · Y b ∈ C s c ( v ) ( ρ norm(PTP) b → v ( 0 ) + ρ norm(PTP) b → v ( ∗ )) · Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) +[ L v,c = 1] · Y b ∈ C s c ( v ) ( ρ norm(PTP) b → v ( 1 ) + ρ norm(PTP) b → v ( ∗ )) · Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) ( 67 ) = [ L v,c = 0] · Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) + [ L v,c = 1] · Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) = Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) = Y b ∈ C u c ( v ) ρ norm(PTP) b → v ( ∗ ) 1 − γ Y b ∈ C s c ( v ) ρ norm(PTP) b → v ( ∗ ) ( d ) = Y b ∈ C u c ( v ) ρ (BP) b → v ( L ∗ ) 1 − γ Y b ∈ C s c ( v ) ρ (BP) b → v ( L ∗ ) ( e ) = Y b ∈ C u c ( v ) ρ (BP) b → v ( L ∗ ) Y b ∈ C s c ( v ) ( ρ (BP) b → v ( L ∗ ) + ρ (BP) b → v ( LL )) − γ Y b ∈ C s c ( v ) ρ (BP) b → v ( L ∗ ) ( f ) = Y b ∈ C u c ( v ) ρ (BP) b → v ( ¯ L ∗ ) Y b ∈ C s c ( v ) ( ρ (BP) b → v ( L ∗ ) + ρ (BP) b → v ( LL )) − γ Y b ∈ C s c ( v ) ρ (BP) b → v ( L ∗ ) = λ (BP) v → c ( L ∗ ) where equality ( d ) is due to the ass umed right correspondence, equality ( e ) i s d ue to the fact that 51 ρ (BP) c → v ( L ∗ ) + ρ (BP) c → v ( LL ) = 1 , and equality ( f ) is due t o that the condition ρ (BP) b → v ( L ∗ ) = ρ (BP) b → v ( ¯ L ∗ ) is satisfied in every iteration. W e will denot e this result by ( A ) . Similarly , we have [ L v,c = 0] · λ (PTP) v → c ( 1 ) + [ L v,c = 1] · λ (PTP) v → c ( 0 ) = [ L v,c = 0] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 1 ) + ρ norm(PTP) b → v ( ∗ )) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) +[ L v,c = 1] · Y b ∈ C ( v ) \{ c } ( ρ norm(PTP) b → v ( 0 ) + ρ norm(PTP) b → v ( ∗ )) − γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) = λ (BP) v → c ( ¯ L ∗ ) . W e wil l denote this result by ( B ) . Finally , we have λ (PTP) v → c ( ∗ ) = γ Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ∗ ) = γ Y b ∈ C ( v ) \{ c } ρ (BP) b → v ( ∗∗ ) = λ (BP) v → c ( ∗∗ ) . W e wil l denote this result by ( C ) . Combining results of ( A ) , ( B ) and ( C ) , we have λ (PTP) v → c ( 0 ) + λ (PTP) v → c ( 1 ) + λ (PTP) v → c ( ∗ ) = λ (BP) v → c ( L ∗ ) + λ (BP) v → c ( ¯ L ∗ ) + λ (BP) v → c ( ∗∗ ) . That is, the scaling constant for normalizing ( λ (PTP) v → c ( 0 ) , λ (PTP) v → c ( 1 ) , λ (PTP) v → c ( ∗ )) and that for nor- malizing ( λ (BP) v → c ( L ∗ ) , λ (BP) v → c ( ¯ L ∗ ) , λ (BP) v → c ( ∗∗ )) are identi cal. Therefore, result ( A ) , ( B ) and ( C ) respectiv ely t ranslate to [ L v,c = 0] · λ norm(PTP) v → c ( 0 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 1 ) = λ norm(BP) v → c ( L ∗ ) [ L v,c = 0] · λ norm(PTP) v → c ( 1 ) + [ L v,c = 1] · λ norm(PTP) v → c ( 0 ) = λ norm(BP) v → c ( ¯ L ∗ ) λ norm(PTP) v → c ( ∗ ) = λ norm(BP) v → c ( ∗∗ ) . At t his poin t we ha ve proved the correspondence between the passed messages in BP and those in weigh ted PTP . 52 W e now pro ve the summary correspondence. Following the PTP m essage-update equations (57) to (59), we hav e µ (PTP) v ( 0 ) = Y c ∈ C ( v ) ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( ∗ ) − γ Y c ∈ C ( v ) ρ norm(PTP) c → v ( ∗ ) = Y c ∈ C 1 ( v ) ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( ∗ ) Y c ∈ C 0 ( v ) ρ norm(PTP) c → v ( 0 ) + ρ norm(PTP) c → v ( ∗ ) − γ Y c ∈ C ( v ) ρ norm(PTP) c → v ( ∗ ) ( 67 ) , ( 68 ) = Y c ∈ C 1 ( v ) ρ norm(PTP) c → v ( ∗ ) − γ Y c ∈ C ( v ) ρ norm(PTP) c → v ( ∗ ) = Y c ∈ C 1 ( v ) ρ norm(PTP) c → v ( ∗ ) 1 − γ Y c ∈ C 0 ( v ) ρ norm(PTP) c → v ( ∗ ) = Y c ∈ C 1 ( v ) ρ (BP) c → v ( ¯ L ∗ ) Y c ∈ C 0 ( v ) ( ρ (BP) c → v ( LL ) + ρ (BP) c → v ( L ∗ ) − γ Y c ∈ C 0 ( v ) ρ (BP) c → v ( L ∗ ) = µ (BP) v ( 0 ) . Follo wing a similar procedure, we hav e µ (PTP) v ( 1 ) = Y c ∈ C ( v ) ( ρ norm(PTP) c → v ( 1 ) + ρ norm(PTP) c → v ( ∗ )) − γ Y c ∈ C ( v ) ρ norm(PTP) c → v ( ∗ ) = Y c ∈ C 0 ( v ) ρ (BP) c → v ( ¯ L ∗ ) Y c ∈ C 1 ( v ) ( ρ (BP) c → v ( LL ) + ρ (BP) c → v ( L ∗ ) − γ Y c ∈ C 1 ( v ) ρ (BP) c → v ( L ∗ ) = µ (BP) v ( 1 ) . Finally , we have µ (PTP) v ( ∗ ) = γ Y c ∈ C ( v ) ρ norm(PTP) c → v ( ∗ ) = γ Y c ∈ C ( v ) ρ (BP) c → v ( ∗∗ ) = µ (BP) v ( ∗ ) , which proves the summ ary correspondence. 53 C. Sta te-Decoupled B P In th is s ubsection, we will consider reducin g PTP from BP for 3 -COL problems , where we only focus on the non-weight ed version of PTP , nam ely that each weight ing functi on ω v is defined as ω v ( a | b ) := [ a = b ] . (102) This gives the form of BP messages in th e form specified in the following lemm a, easil y obtainable from BP up date equations (72) to (74). Lemma 12: The BP message-upd ate rule for 3-COL probl ems is as follow: λ v → c ( i , ij ) := Y b ∈ C ( v ) \{ c } ( ρ b → v ( i , ij ) + ρ b → v ( i , ik ) + ρ b → v ( i , ijk )) − Y b ∈ C ( v ) \{ c } ( ρ b → v ( i , ij ) + ρ b → v ( i , ijk )) (103) λ v → c ( i , ijk ) := Y b ∈ C ( v ) \{ c } ( ρ b → v ( i , ij ) + ρ b → v ( i , ik ) + ρ b → v ( i , ijk )) − Y b ∈ C ( v ) \{ c } ( ρ b → v ( i , ij ) + ρ b → v ( i , ijk )) − Y b ∈ C ( v ) \{ c } ( ρ b → v ( i , ik ) + ρ b → v ( i , ijk )) + Y b ∈ C ( v ) \{ c } ρ b → v ( i , ijk ) (104) λ v → c ( ij , ij ) := Y b ∈ C ( v ) \{ c } ( ρ b → v ( ij , ij ) + ρ b → v ( ij , ijk )) (105 ) λ v → c ( ij , ijk ) := Y b ∈ C ( v ) \{ c } ( ρ b → v ( ij , ij ) + ρ b → v ( ij , ijk )) − Y b ∈ C ( v ) \{ c } ρ b → v ( ij , ijk ) (106) λ v → c ( ijk , ijk ) := Y b ∈ C ( v ) \{ c } ρ b → v ( ijk , ijk ) (107) 54 ρ c → v ( i , ij ) := λ V ( c ) \{ v }→ c ( k , jk ) (108) ρ c → v ( i , ijk ) := λ V ( c ) \{ v }→ c ( jk , jk ) (109) ρ c → v ( ij , ij ) := λ V ( c ) \{ v }→ c ( k , ijk ) (110) ρ c → v ( ij , ijk ) := λ V ( c ) \{ v }→ c ( ij , ijk ) + λ V ( c ) \{ v }→ c ( jk , ijk ) + λ V ( c ) \{ v }→ c ( ik , ijk ) + λ V ( c ) \{ v }→ c ( ijk , ijk ) (111) ρ c → v ( ijk , ijk ) := λ V ( c ) \{ v }→ c ( ij , ijk ) + λ V ( c ) \{ v }→ c ( jk , ijk ) + λ V ( c ) \{ v }→ c ( ik , ijk ) + λ V ( c ) \{ v }→ c ( ijk , ijk ) (112) µ v ( i ) := Y c ∈ C ( v ) ( ρ c → v ( i , ij ) + ρ c → v ( i , ik ) + ρ c → v ( i , ijk )) − Y c ∈ C ( v ) ( ρ c → v ( i , ij ) + ρ c → v ( i , ijk )) − Y c ∈ C ( v ) ( ρ c → v ( i , ik ) + ρ c → v ( i , ijk )) + Y c ∈ C ( v ) ρ c → v ( i , ijk ) (113) µ v ( ij ) := Y c ∈ C ( v ) ( ρ c → v ( ij , ij ) + ρ c → v ( ij , ijk )) − Y c ∈ C ( v ) ρ c → v ( ij , ijk ) (114) µ v ( ijk ) := Y c ∈ C ( v ) ρ c → v ( ijk , ijk ) . (115) Before we begin to consider the BP-to-PTP reduction for 3 -COL problems, it is h elpful to take a closer look at the BP-to-PTP reduction mechanism for k -SA T prob lems. In Theorem 3, on e may notice th e two condition s g ove rning th e BP-to-PTP reductio n for k - SA T probl ems, namely , the init ialization condition and the normalizatio n condition. It i s arguable that the norm alization cond ition impo sed on the BP messages, although serving to simpli fy the form of BP mess ages and possibly to alt er the interpretation of the messages, does not hav e a critical impact on the message-passing dynamics. This is because the normalization conditio n merely in volv es a scali ng operation, wit hout whi ch BP m essages and PTP messages for k -SA T would still be equivalent up to a scaling factor . On the other hand, the initializati on cond ition in Theorem 3 plays an important role on the m essage-passing dynamics. In essence, the initializatio n condition assures that any right message d epends on ly on t he right state it in volves. Usi ng t he “intention-comm and” analogy , in which one vie ws each right state as stori ng the “command” sent from a constraint and each l eft st ate as sto ring the “intention” of a variable, this condi tion 55 simply restricts t hat the distri bution of the command sent to any variable does not depend on the intention of the var iable. It i s remarkable that this interpretation of the initialization cond ition in Theorem 3 (or (88)) is consistent with the PTP message-passing rule, in which any right message (i.e., outgoing distribution of command) sent to a v ariable is independent of (or , not a functio n of,) the incom ing intenti on from that variable. This is ho wev er not the case for the right messages of BP in general. W e are t hen motivated to form alize this condition for general CSPs as what we call the “state- decoupling” condition and impo se it on the ri ght messages of BP , so as to achieve a consistency with PTP . It is intui tiv ely sensible that su ch a consi stency is needed in the reduction o f PTP from BP . Definition 2 (State-Decoupl ing Conditi on): For an arbit rary CSP and a t any gi ven it eration, the BP messages b ased on the MRF formalism defined by (69), (70), and (71) are said t o satisfy the state-decoupling condition if for ever y ( v − c ) , th e right m essage ρ c → v ( s v,c ) is only a function of the ri ght state s R v,c , namely , i f for any fixed s R v,c ∈ ( χ ∗ ) { v } and any s L v,c ⊂ s R v,c , ρ c → v ( s L v,c , s R v,c ) = ρ c → v ( s R v,c , s R v,c ) . It is clear that the initializati on conditi on for BP-to-PTP reduction for k -SA T in T heorem 3 is equiv alent to t his condition, where we note that the condit ion in Theorem 3 only pu ts restrictions on the ri ght m essages with right s tate equal to ∗ , si nce for the remaini ng case with righ t st ate equal to L t his condition is t ri vially satisfied. It i s interesting to observe , as sh o wn in Proposit ion 1, that for k -SA T problems, as long as the st ate-decoupling condit ion is imposed in the initialization of t he BP mess ages, the condition is preserve d in eve ry iteration. This serves as the basis for BP to reduce to PTP as shown in Theorem 3 and its proof. For 3 -COL problems, howe ver , the corresponding result to Proposi tion 1 does not hol d. Lemma 13: For 3 -COL p roblems, if the state-decoupling condition hol ds for BP messages both in it eration l and in iteration l + 1 , then th e right message in iteration l must sat isfy for e very ( v − c ) ρ c → v ( s L , s R ) = 0 as long as right state s R 6 = 123 . 56 Pr oof: In 3-COL problem s, the state-decoupling condition can be expressed as ρ c → v ( i , ij ) = ρ c → v ( ij , ij ) ρ c → v ( i , ijk ) = ρ c → v ( ij , ijk ) = ρ c → v ( ijk , ijk ) . Note that we on ly need to prove th e Lemma for s R being a pair of assignments,since when s R is a singleto n, all right messages equal 0 by the construction of the MRF and Lemm a 12 describing the BP m essage-update rul e for 3 -COL. In iteratio n l + 1 , following 3-COL message-update equat ions (103) to (112) and usi ng a superscript to denote the iteration number , we hav e ρ ( l +1) c → v ( i , ij ) = λ ( l +1) V ( c ) \{ v }→ c ( k , jk ) = Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) , (116) ρ ( l +1) c → v ( ij , ij ) = λ ( l +1) V ( c ) \{ v }→ c ( k , ijk ) = Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) + Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) . (117) Now suppose that the state-decoupl ing condition as expressed above can b e satisfied both i n iteration l and in iteratio n l + 1 . Then we may equate the right-hand sides of (11 6) and (117), namely , 57 Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) = Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) + Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) , which implies Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) = Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ik ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) + Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , jk ) + ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) − Y b ∈ C ( V ( c ) \{ v } ) \{ c } ρ ( l ) b → V ( c ) \{ v } ( k , ijk ) . Since every right message must be non-negativ e, wh en the s tate-decoupling conditio n i s satisfied in iteration l , the only way to make the above equality h old is the case wh ere ρ ( l ) b → V ( c ) \{ v } ( k , ik ) = 0 . Under the state-decoupli ng cond ition, this als o means ρ ( l ) b → V ( c ) \{ v } ( ik , ik ) = 0 . Thus we establish this lemma. This lemma suggests that when the BP messages satisfy t he state-decoupling condition in two consecutiv e iterations, t hen the righ t messages mus t take a t rivial form — equal t o [ s R = 123 ] up to scale, and contain no in formation. At this point, one is left with either the o ption of concluding that PTP (or SP) is not an instance of BP for 3 -COL problems (and hence for general CSPs) or the option o f doubting the usefulness of the state-decoupli ng condition in BP-to-SP reductio n. In the remainder of this subsection, we wi ll cl ear this doub t and assert the usefulness of the state-decoupling conditio n by showing that when the state-decouplin g conditi on is manu ally imposed on the BP messages 58 in each iteration, BP still reduces to PTP for 3 -COL problems. That wil l allow us to conclude that PTP (or SP) is not a special case of BP . T o force the state-decouplin g condition to be s atisfied i n each BP iteration, now we modify the message-passing rule of BP on the Forney graph representation of general CSPs, and introdu ce a “new” m essage-passing p rocedure which we refer to as the state-decoupled BP or SDBP . W e note th at introducing this “new” message-passing procedure i s solely for the purpose of verifying the usefulness of the state-decoupling conditio n and h opefully arri ving at a un ified reduction mechanism for PTP to reduce from BP (or more precisely from SDBP). Beyond this purpose, we have no intent ion to justify the i ntroduction of SDBP . Identical to BP at local functi on vertices, SDBP diffe rs from BP in that messages pass ed from the right functio ns need an additio nal processing (so that the state-decoupl ing condition is satisfied) before they are passed to the left functions. In SDBP , there are three kinds of messages: ri ght message ρ c → v is computed at right function f c to pass along the edge to g v ; state-decoupled righ t message ρ ∗ c → v is com puted at the edge connectin g f c and g v , which satisfies the state-decoupl ing condition, comput ed only based on the righ t message ρ c → v on the same edg e and to be passed to l eft function g v ; left messa ge λ v → c is computed at the left functi on g v to pass along the edge connecting to f c . The precise definition of SDBP message-update rule is giv en n ext. Definition 3: The SDBP mess age-update rul e is defined as fol lows. λ v → c ( s L v,c , s R v,c ) := X s R v,C ( v ) \{ c } ω v s L v,c \ b ∈ C ( v ) s R v,b · Y b ∈ C ( v ) \{ c } ρ ∗ b → v ( s R v,b ) (118) ρ ∗ c → v ( s R v,c ) := δ · ρ c → v ( s R v,c , s R v,c ) (119) ρ c → v ( s L v,c , s R v,c ) := X s L V ( c ) \{ v } ,c s R v,c = F c ( s L V ( c ) \{ v } ,c ) Y u ∈ V ( c ) \{ v } λ u → c ( s L u,c , F c ( s L V ( c ) \{ u } ,c )) (120) µ v ( y v ) := X s R v,C ( v ) ω v y v \ c ∈ C ( v ) s R v,c Y c ∈ C ( v ) ρ ∗ c → v ( s R v,c ) (121) where δ = 1 / P s R v,c ∈ ( χ ∗ ) { v } ρ c → v ( s R v,c , s R v,c ) . Comparing this definition with t he BP message-update rule in Lemm a 8, the following re marks are in order . F irst, the e xpression of right messages ρ in terms of left messages λ is identical to that in BP . Second, each state-decoupled message ρ ∗ c → v may be re garded as a function of 59 ( s L v,c , s R v,c ) but the value of the function only depends the s R v,c component, namely that t he (state- decoupled) right message satisfies the st ate-decoupling condition. Furthermore, the expression of λ in t erms of ρ ∗ is precisely the same as the expression of λ in terms of ρ in BP 7 . Follo wing this definition, the next lemma summarizes the SDBP message-update rul e for 3 -COL problems. Lemma 14: Let { ω v : v ∈ V } in 3-COL probl ems be defined as in (102). The SDBP message- update rule is then : λ v → c ( i , ij ) := Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ik ) + ρ ∗ b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ijk )) (122) λ v → c ( i , ijk ) := Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ik ) + ρ ∗ b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ik ) + ρ ∗ b → v ( ijk )) + Y b ∈ C ( v ) \{ c } ρ ∗ b → v ( ijk ) (123) λ v → c ( ij , ij ) := Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ijk )) (124) λ v → c ( ij , ijk ) := Y b ∈ C ( v ) \{ c } ( ρ ∗ b → v ( ij ) + ρ ∗ b → v ( ijk )) − Y b ∈ C ( v ) \{ c } ρ ∗ b → v ( ijk ) (125) λ v → c ( ijk , ijk ) := Y b ∈ C ( v ) \{ c } ρ ∗ b → v ( ijk ) (126) ρ ∗ c → v ( ij ) := δ · λ V ( c ) \{ v }→ c ( k , ijk ) (127) ρ ∗ c → v ( ijk ) := δ · λ V ( c ) \{ v }→ c ( ij , ijk ) + λ V ( c ) \{ v }→ c ( ik , ijk ) + λ V ( c ) \{ v }→ c ( jk , ijk ) + λ V ( c ) \{ v }→ c ( ijk , ijk ) (128) 7 Although i t is possible to formulate SDBP in more compact f orm by , f or example, suppressing ρ and expressing the message- update rule only using ρ ∗ and λ , we feel the current way of formulating SDBP makes it easier t o compare SDBP with BP . 60 µ v ( i ) := Y c ∈ C ( v ) ( ρ ∗ c → v ( ij ) + ρ ∗ c → v ( ik ) + ρ ∗ c → v ( ijk )) − Y c ∈ C ( v ) ( ρ ∗ c → v ( ij ) + ρ ∗ c → v ( ijk )) − Y c ∈ C ( v ) ( ρ ∗ c → v ( ik ) + ρ ∗ c → v ( ijk )) + Y c ∈ C ( v ) ρ ∗ c → v ( ijk ) (129 ) µ v ( ij ) := Y c ∈ C ( v ) ( ρ ∗ c → v ( ij ) + ρ ∗ c → v ( ijk )) − Y c ∈ C ( v ) ρ ∗ c → v ( ijk ) (130) µ v ( ijk ) := Y c ∈ C ( v ) ρ ∗ c → v ( ijk ) , (131) where δ is such that ρ ∗ c → v ( ijk ) + X ij ρ ∗ c → v ( ij ) = 1 . It is now poss ible to establi sh a correspondence between PTP and SDBP mess ages for 3 -COL problems. Theor em 4: For 3-COL problems, the correspondence between PTP and SDBP message- update rules is λ (PTP) v → c ( i ) ↔ λ (SDBP) v → c ( i , ijk ) (132) λ (PTP) v → c ( ij ) ↔ λ (SDBP) v → c ( ij , ijk ) (133) λ (PTP) v → c ( ijk ) ↔ λ (SDBP) v → c ( ijk , ijk ) (134) ρ norm(PTP) c → v ( ij ) ↔ ρ ∗ (SDBP) c → v ( ij ) (135) ρ norm(PTP) c → v ( ijk ) ↔ ρ ∗ (SDBP) c → v ( ijk ) (136) µ (PTP) v ( i ) ↔ µ (SDBP) v ( i ) (137) µ (PTP) v ( ij ) ↔ µ (SDBP) v ( ij ) (138) µ (PTP) v ( ijk ) ↔ µ (SDBP) v ( ijk ) . (139) Pr oof: W e will first p rove that i f the “right correspondence” (namely that (135) and (136)) holds, then the “left correspondence” (namely th at (132) to (134)) hol ds. Suppose that the right correspondence holds (where t he s ymbol ↔ in (135) and (136) i s understood as equality). Then 61 λ (SDBP) v → c ( i , ijk ) = Y b ∈ C ( v ) \{ c } ρ ∗ (SDBP) b → v ( ij ) + ρ ∗ (SDBP) b → v ( ik ) + ρ ∗ (SDBP) b → v ( ijk ) − Y b ∈ C ( v ) \{ c } ρ ∗ (SDBP) b → v ( ij ) + ρ ∗ (SDBP) b → v ( ijk ) − Y b ∈ C ( v ) \{ c } ρ ∗ (SDBP) b → v ( ik ) + ρ ∗ (SDBP) b → v ( ijk ) + Y b ∈ C ( v ) \{ c } ρ ∗ (SDBP) b → v ( ijk ) = Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ij ) + ρ norm(PTP) b → v ( ik ) + ρ norm(PTP) b → v ( ijk ) − Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ij ) + ρ norm(PTP) b → v ( ijk ) − Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ik ) + ρ norm(PTP) b → v ( ijk ) + Y b ∈ C ( v ) \{ c } ρ norm(PTP) b → v ( ijk ) = λ (PTP) v → c ( i ) . Similarly , we can prove that λ (SDBP) v → c ( ij , ijk ) = λ (PTP) v → c ( ij ) and λ (SDBP) v → c ( ijk , ijk ) = λ (PTP) v → c ( ijk ) . It then follows that the left correspondence holds. Now we p rov e that if the left correspon dence holds, then the right correspondence holds . Suppose that the left correspondence holds , t hen we hav e ρ norm(PTP) c → v ( ij ) = α · ρ (PTP) c → v ( ij ) = α · λ norm(PTP) V ( c ) \{ v }→ c ( k ) = α β · λ (PTP) V ( c ) \{ v }→ c ( k ) = αβ · λ (SDBP) V ( c ) \{ v }→ c ( k , ijk ) where α = 1 / P t ∈ ( χ ∗ ) v ρ (PTP) c → v ( t ) and β = 1 / P t ∈ ( χ ∗ ) V ( c ) \{ v } λ (PTP) V ( c ) \{ v }→ c ( t ) . W e also have ρ ∗ (SDBP) c → v ( ij ) = δ · λ (SDBP) V ( c ) \{ v }→ c ( k , ijk ) . Since both ρ ∗ (SDBP) c → v and ρ norm(PTP) c → v are normali zed, it m ust hold that α β = δ . This in- dicates th at ρ norm(PTP) c → v ( ij ) = ρ ∗ (SDBP) c → v ( ij ) . Follo wing a sim ilar procedure, one can show that ρ norm(PTP) c → v ( ijk ) = ρ ∗ (SDBP) c → v ( ijk ) . Th is implies that the righ t correspondence holds. At this point, we have establi shed the correspondence between passed messages in PTP and those in SDBP . 62 Now we will prove the s ummary correspondence (namely , that (137) to (139)). µ (SDBP) v ( i ) = Y c ∈ C ( v ) ( ρ ∗ (SDBP) c → v ( ij ) + ρ ∗ (SDBP) c → v ( ik ) + ρ ∗ (SDBP) c → v ( ijk )) − Y c ∈ C ( v ) ( ρ ∗ (SDBP) c → v ( ij ) + ρ ∗ (SDBP) c → v ( ijk )) − Y c ∈ C ( v ) ( ρ ∗ (SDBP) c → v ( ik ) + ρ ∗ (SDBP) c → v ( ijk )) + Y c ∈ C ( v ) ρ ∗ (SDBP) c → v ( ijk ) = Y c ∈ C ( v ) ( ρ norm(PTP) c → v ( ij ) + ρ norm(PTP) c → v ( ik ) + ρ norm(PTP) c → v ( ijk ) − Y c ∈ C ( v ) ( ρ norm(PTP) c → v ( ij ) + ρ norm(PTP) c → v ( ijk )) − Y c ∈ C ( v ) ( ρ norm(PTP) c → v ( ik ) + ρ norm(PTP) c → v ( ijk )) + Y c ∈ C ( v ) ρ norm(PTP) c → v ( ijk ) = µ (PTP) v ( i ) . Similarly , we can prove that µ (SDBP) v ( ij ) = µ (PTP) v ( ij ) and µ (SDBP) v ( ijk ) = µ (PTP) v ( ijk ) . This proves t he summary correspondence. At this end, it shoul d be con vincing that the state-decoupli ng conditi on i s an i mportant ingredient in the reduction of BP to PTP . It is worth not ing t hat in the case of k -SA T problems, this condition can b e imposed simply by th e i nitialization of BP m essages. Howe ver in the case of 3 -COL problem s, one needs to manually impose this condi tion at each iteration, namely , carrying out SDBP in stead of BP , so as to arrive at an equivalence t o PTP messages. This extra complexity in volved in 3 -COL pro blems then sug gests that for 3 -COL problem s, PTP and hence SP are not a special case of BP . Thu s at this end, on e may conclude that SP is n ot BP for g eneral CSPs. Now it remains to in vestigate, for general CSPs, whether t he state-decoupling condit ion i s suffi cient for PTP or weighted PTP to reduce from BP , or equiva lentl y whether and when PTP and weighted PTP are SDBP . 63 D. The Reduction of W eighted P TP fr om SDBP for General CSPs Up to t his point , we see that th e state-decoupling conditi on critically governs the reduction of BP to PTP (or weighted PTP) for k -SA T p roblems and 3 -COL problems. In this subsection, we will howe ver show that the state-decoupling condit ion is not sufficient for B P (more precisely SDBP) to reduce t o PTP and t hat an additional conditi on is needed in the general context. Definition 4 (F orce able T ok en): For any ( v − c ) , we say that a token t v ∈ ( χ ∗ ) { v } is for ceable by Γ c if there exists a rectangle Q u ∈ V ( c ) \{ v } t u on V ( c ) \ { v } such that F c Q u ∈ V ( c ) \{ v } t u ! = t v . W e will d enote by F c ( v ) the set of all tokens on v that are forceable by Γ c . Let A c ( v ) := S t ∈F c ( v ) t . Since A c ( v ) = F c Q u ∈ V ( c ) \{ v } ( χ ∗ ) { u } , it follows that A c ( v ) is always forceable. In fact, it is easy to s ee that A c ( v ) is the “largest” forceable token on v by Γ c — in the sense of containing all other forceable tokens as its sub sets — due to the monotoni city of F c ( · ) . In k -SA T problems, for an y ( v − c ) , it is easy to see that F c ( v ) = {∗ , L } , and A c ( v ) = ∗ . In 3 -COL problem s, for any ( v − c ) , it i s easy to see t hat F c ( v ) = { 123 , 12 , 23 , 13 } , and A c ( v ) = 123 . For any ( c − v ) , let A ∼ c ( v ) be defined by A ∼ c ( v ) := \ b ∈ C ( v ) \{ c } A b ( v ) . Definition 5 (Locally Compa tible Constraint ): A constraint Γ c is said to be locally compatibl e if for an y v ∈ V ( c ) , any forceable tok en t v ∈ F c ( v ) , any rectangle t ′ ∈ F − 1 c ( t v ) o n V ( c ) \ { v } (where F − 1 c ( t v ) i s th e set of al l rectangles y V ( c ) \{ v } on V ( c ) \ { v } s uch th at F c ( y V ( c ) \{ v } ) = t v ) and any u ∈ V ( c ) \ { v } , it hol ds that A ∼ c ( u ) ⊆ F c t v × t ′ : V ( c ) \{ u,v } . W e note that the local compatibilit y of a con straint Γ c as defined above is not si mply a property of Γ c itself. It also relies on the structure of all constraints that are distance-2 away from Γ c in the factor graph. Theor em 5: Let the set of obedience condit ionals { ω v : v ∈ V } be given, where each v ∈ V corresponds to a coordinate of a CSP . Let both the MRF of th e CSP (that specified via (69), (70) and (71)) and the weighted PTP for the CSP be both parametrized by { ω v : v ∈ V } . Then if e very constraint of the CSP i s locally compati ble, the SDBP derived from the MRF is equiv alent 64 to the weight ed PTP , where the correspondence is ρ norm(PTP) c → v ↔ ρ ∗ (SDBP) c → v . Con versely , if such an equiv alence holds for ever y choice of { ω v : v ∈ V } , t hen ev ery constraint of the CSP must be locally com patible. Alternative ly phrased, Theorem 5 suggests that if the state-decoupling condit ion is satis fied in ev ery iteration of BP , t he local compatibi lity condition on all constraints is th e necessary and suffi cient condit ion for weighted PTP to reduce from BP . — W e no te t hat Th eorem 5 on ly refers to the equiv alence of right messages. It is howe ver st raight-forward t o verify (as seen i n earlier proofs of equiv alent resul ts i n this paper) that right equiv alence i mplies the summary equiv alence. This theorem answers t he quest ion when SP i s SDBP in a general setting. Pr oof: Follo wing th e message-update rule of SDBP , ρ ∗ (SDBP) c → v ( s R v,c ) ∝ X s L V ( c ) \{ v } ,c s R v,c = F c s L V ( c ) \{ v } ,c Y u ∈ V ( c ) \{ v } λ (SDBP) u → c s L u,c , F c s L V ( c ) \{ u } ,c = X s L V ( c ) \{ v } ,c s R v,c = F c s L V ( c ) \{ v } ,c Y u ∈ V ( c ) \{ v } X s R u,C ( u ) \{ c } ω u s L u,c \ b ∈ C ( u ) \{ c } s R u,b ∩ F c s L V ( c ) \{ u,v } ,c × s R v,c · Y b ∈ C ( u ) \{ c } ρ ∗ (SDBP) b → u ( s R u,b ) (140) Similarly following the mess age-update rul e of weighted PTP , we have ρ norm(PTP) c → v ( t c → v ) ∝ X t V ( c ) \{ v }→ c t c → v = F c ( t V ( c ) \{ v }→ c ) Y u ∈ V ( c ) \{ v } X t C ( u ) \{ c }→ u ω u t u → c \ b ∈ C ( u ) \{ c } t b → u · Y b ∈ C ( u ) \{ c } ρ norm(PTP) b → u ( t b → u ) . (141) Identifying e very right state s R v,c in (140) with token t c → v in (141) and every left state s L v,c (140) with token t v → c in (141), the only difference between (140) and (141) is the ar gument 65 of functi on ω u . (W e note t hat si nce both ρ ∗ (SDBP) c → v and ρ norm(PTP) c → v are n ormalized, the scaling constant in (140) and (141) are n ecessarily the same.) W e now prove the suf ficiency and necessity of the l ocal compatibilit y cond ition for the equiv alence bet ween ρ norm(PTP) c → v and ρ ∗ (SDBP) c → v via t he following chain of two-way impli cations. ρ ∗ (SDBP) c → v ↔ ρ norm(PTP) c → v , ∀ v ∈ V ( c ) ⇔ ω u s L u,c \ b ∈ C ( u ) \{ c } s R u,b ∩ F c s L V ( c ) \{ u,v } ,c × s R v,c = ω u s L u,c \ b ∈ C ( u ) \{ c } s R u,b ∀ v ∈ V ( c ) and every s R v,c , s L V ( c ) \{ v } ,c in the supp ort o f s R v,c = F c s L V ( c ) \{ v } ,c , ∀ u ∈ V ( c ) \ { v } and every choice of | C ( u ) \ { c }| tokens on { u } , s R u,b : b ∈ C ( u ) \ { c } , with each s R u,b in the supp ort of ρ (PTP) b → u . ⇔ \ b ∈ C ( u ) \{ c } s R u,b ∩ F c s L V ( c ) \{ u,v } ,c × s R v,c = \ b ∈ C ( u ) \{ c } s R u,b ∀ v ∈ V ( c ) and every s R v,c , s L V ( c ) \{ v } ,c such that s R v,c ∈ F c ( v ) and s L V ( c ) \{ v } ,c ∈ F − 1 c ( s R v,c ) , ∀ u ∈ V ( c ) \ { v } and every choice of | C ( u ) \ { c }| tokens on { u } , s R u,b : b ∈ C ( u ) \ { c } , with each s R u,b ∈ F b ( u ) . ⇔ \ b ∈ C ( u ) \{ c } s R u,b ⊆ F c s L V ( c ) \{ u,v } ,c × s R v,c ∀ v ∈ V ( c ) and every s R v,c , s L V ( c ) \{ v } ,c such that s R v,c ∈ F c ( v ) and s L V ( c ) \{ v } ,c ∈ F − 1 c ( s R v,c ) , ∀ u ∈ V ( c ) \ { v } and every choice of | C ( u ) \ { c }| tokens on { u } , s R u,b : b ∈ C ( u ) \ { c } , with each s R u,b ∈ F b ( u ) . ⇔ \ b ∈ C ( u ) \{ c } A b ( u ) ⊆ F c s L V ( c ) \{ u,v } ,c × s R v,c ∀ v ∈ V ( c ) and every s R v,c , s L V ( c ) \{ v } ,c such that s R v,c ∈ F c ( v ) and s L V ( c ) \{ v } ,c ∈ F − 1 c ( s R v,c ) , and e very u ∈ V ( c ) \ { v } . 66 ⇔ A ∼ c ( u ) ⊆ F c s L V ( c ) \{ u,v } ,c × s R v,c , ∀ v ∈ V ( c ) and every s R v,c , s L V ( c ) \{ v } ,c such that s R v,c ∈ F c ( v ) and s L V ( c ) \{ v } ,c ∈ F − 1 c ( s R v,c ) , and e very u ∈ V ( c ) \ { v } . ⇔ Constraint Γ c is locally com patible. Thus ρ norm(PTP) c → v ↔ ρ ∗ (SDBP) c → v , for every ( x v , Γ c ) ∈ E ( G ) ⇔ Every constrain t Γ c is locally compatib le. Now it is easy to verify that for both k -SA T and 3 -COL problems , the fact that PTP or weighted PTP can be reduced from BP with state-decoupling condition imposed is due to the fact th at ev ery constraint is lo cally com patible. For k -SA T probl ems, as noted earlier , F c ( v ) = { L , ∗} . If we pick t v to be ei ther token from F c ( v ) , then for an y t ′ ∈ F − 1 c ( t v ) and an y u ∈ V ( c ) \ { v } , it can be v erified that F c t ′ : V ( c ) \{ u,v } × t v = ∗ . T his makes A ∼ c ( u ) ⊆ F c t ′ : V ( c ) \{ u,v } × t v alwa ys satis fied, independent o f the factor graph structure of the p roblem instance. For 3 -COL problem s, as no ted earlier , we see F c ( v ) = { 123 , 12 , 23 , 13 } . Suppose that u is the only other coordinat e (except v ) that is in volved in constraint Γ c . If we pick t v to be an y token from F c ( v ) , then F u c ( t v ) = 123 . This again makes A ∼ c ( u ) ⊆ F u c ( t v ) alw ays sati sfied, independent of th e factor graph st ructure of the problem i nstance. That is , in bo th k -SA T and 3 -COL prob lems, t he structure of each local const raint alo ne guarantees t he local compati bility condition sati sfied by ever y constraint, i rrespecti ve of how a constraint interacts wit h o ther constraints (that are d istant 2 apart) as is generally required in the local compat ibility condition . W e g eneralize this fact in the following corollary — immediat ely following Theorem 5 — which provides a sufficient c ondit ion for SDB P to reduce to PTP withou t relying on the int eraction of neighbo ring cons traints. F or CSP s constructed with generic local constraint by random factor g raph st ructure, the corollary m ay tu rn out to be u seful. Cor ollary 1: Let bot h the MRF of the CSP (specified v ia (69), (70) and (71)) and the weighted PTP for the CSP b e parametrized by t he same { ω v : v ∈ V } . Suppose that every constraint Γ c 67 x v Γ c x u Γ b x w Fig. 5. A portion of a factor graph G . is such that for any v ∈ V ( c ) , any force able token t v ∈ F c ( v ) , any rectangle t ′ ∈ F − 1 c ( t v ) on V ( c ) \ { v } , and any u ∈ V ( c ) \ { v } , i t holds that F c t v × t ′ : V ( c ) \{ u,v } = ( χ ∗ ) { v } . Then SDBP deriv ed from t he MRF is equiv alent to weighted PTP , where t he correspondence is ρ norm(PTP) c → v ↔ ρ ∗ (SDBP) c → v . For completeness, w e conclude thi s section by constructing an example of CSP in which the local compati bility condit ion is no t satisfied by e very constraint. Suppose that Γ c and Γ b are two of t he constraints defining a CSP , a nd the f actor graph rep- resenting the CSP locally obeys the structure shown in Figure 5. Suppose that each variable of the CSP has alphabet χ = { 0 , 1 , 2 } and that Γ c is defined as Γ c := { (0 v , 0 u ) , (0 v , 1 u ) , (1 v , 2 u ) , (2 v , 2 u ) } . Suppose that Γ b is defined as Γ b := { (0 u , 0 w ) , (1 u , 1 w ) , (2 u , 1 w ) } . Note that F c ( v ) = { 0 v , 12 v , 012 v } , and it is easy to verify that A ∼ c ( u ) = A b ( u ) = F b ( 012 w ) = 012 u . Now if we pi ck t v = 0 v , then we ha ve A ∼ c ( u ) 6⊆ F c ( t v ) = 01 u . Thus constraint Γ c is not locall y compatible, and following Theorem 5, PTP or weighted PTP can not be reduced from SDBP for t his CSP . W ith this example, we s ee that it is not always t he case that SDBP is SP . V I I . C O N C L U D I N G R E M A R K S In this paper , we study the q uestion wh ether SP alg orithms (non-weight ed and weig hted) are special cases of BP for general const raint satisfaction problems. The first contribution of t his paper is a simpl e formul ation of SP algorithm s for general CSPs as t he weighted PTP algorit hm. An advantage of this formulation is that i t has a probabi listically interpretable update rule which allows SP algorit hms t o be developed for arbitrary CSPs. The second and main contribution of this paper is the answer to the titular question in the most general context. W e show that in general, SP algorithms can not be reduced from t he BP 68 algorithm deri ved from the MRF formalism in the style of [15] and [17]. Such a reduction is only possible for certain special cases where the notions of st ate-decoupling condit ion and local compatibili ty condition are bot h satisfied. It is worth noti ng that ou r answer to whether SP is BP is only restricted to t he M RF formalism in the style of [15] or [17]. Although th is rest riction is not completely satisfactory , it appear s to us that such an M RF formalism is the most natural in light of the natural correspondence between the st ates in th e MRF and the S P mess ages (namely that left st ates correspond to the “intentions” of var iables and right states correspond to t he “comm ands” of t he cons traints). An additional and perhaps e ven stronger justification of this MRF i s its combinatorial descriptiv e power as is elaborated i n [15] for k -SA T problems, which — usi ng the t erminology of this paper — captures the connectivity of the solut ion in the space of a ll “rectangles”. In fact, we conjecture that further inv estigation of this perspectiv e m ay provide useful insig hts into the algorithm design for solving hard instances of CSPs, whether or not SP or BP is consid ered as the choice of algori thms. 8 Further we note that the BP algorithm has been understood as a special case of Generalized Belief Propagation (GBP) [20]. In that p erspecti ve, BP may b e d eri ved from iterativ e mi nimiza- tion of the Bethe-approximatio n o f th e notion of free energy [20]. The framew ork of GBP all ows a variety of ways (unified under the notio n of “region graphs”) to approxim ate the free energy whereby leading to a m uch richer family of BP-like alg orithms. Giv en the results of this paper , one may n ot want t o exclude the poss ibility that certain cho ice of free-energy approxi mation allows t he correspondi ng GBP t o reduce t o SP algorithms for general CSPs. Research along that direction may still be of int erest. As the final remark, howe ver , the auth ors of thi s paper would li ke to raise a phi losophical question, i n light of the simpl icity in the (weight ed) PTP formulation of SP and, i n contrast, the complexity in volved in reducing BP t o SP: Should we at tempt to s eek a complicated explanation for a simple algorithm? Does the simpl icity of SP (understood in terms of weight ed PTP) impl y a mo re natural, sim pler but very different underlying graphical model — beyond MRF — that may better explain SP? 8 In [15], under the MRF formalism, Gibbs sampling-based approach has also been presented as an algorithm for solving random k -SA T problems. 69 A P P E N D I X W e now present som e results concerning the dynam ics of SP , based on the formulati on of PTP and w eighted PTP . Th ese results , although rather elementary , sh ould help provide intuit ions regarding what PTP is doing in sol ving a CSP . W e will start with the deterministi c precursor of PTP , DTP . A. On the Dynamics of DTP W e will refer to a subgraph H of factor graph G as a factor-subgraph o f G if for ev ery constraint vertex Γ c in H , all neighboring variable vertices of Γ c in G are also i n H . It is apparent that factor -subgraph H is a factor graph representing a CSP in volving precisely a subset of the constraints in G . W e will d enote by C [ H ] the index set of all constraint vertices in H , b y V [ H ] the ind ex set of all variable vertices in H , and by Γ H the set of all ass ignments on V [ H ] that satisfy ever y constraint Γ c , c ∈ C [ H ] . If factor -subgraph H is a tree, it is als o referred to as a fa ctor tr ee of G . For any factor t ree T of G , we wi ll denote by L [ T ] t he index s et of all leaf vertices of T . Since we have assumed that factor graph G contains no degree- 1 constraint vertices, i t is necessary that the l eaf vertices of any factor tree T of G are all variable vertices, i.e., that L [ T ] contains no i ndex of any constraint verte x. Suppose that T is a factor tree of factor graph G , U ⊂ V [ T ] , and v ∈ V [ T ] \ U . For any rectangle t U on U , define F U → v T ( t U ) := t U × ( χ ∗ ) V [ T ] \ U ∩ Γ T : { v } . It is easy to see that functi on F U → v T ( · ) reduces to F v c ( · ) introduced earlier , when T con tains a single factor and U is V ( c ) \ { v } . Giv en a factor tree T of G and two v ertices in T indexed by a and b respectiv ely , we will introduce another notation of message index, a T − → b , which index es the mess age sent by the verte x with index a along its on ly edge that is on th e path (in T ) leading to the verte x with index b . For example, suppo se that in factor tree T , constraint verte x Γ c has a neighbor of x u and is on the path from x u to x v in T , then message index u T − → v is equiv alent to u → c , and t u T − → v is equiv alent to t u → c . 70 A factor tree T of G wil l be referred to as a ( v , l ) -tree of G if the var iable vertex x v is in T , e very leaf vertex in T is di stance 2 l from vertex x v , and all vertices in G t hat ha ve distance to x v no larger than 2 l are contained in T . It is clear that g iv en G , v ∈ V and a positive integer l , if a ( v , l ) -tree of G exists, it is u nique. W e t herefore denote it by T l v . Giv en T l v of factor graph G , factor t ree T l v − c of G is the subgraph of T l v induced by vertex x v and all vertices of T l v whose paths to x v (in T l v ) trav erse throu gh vertex Γ c . On t he other hand, factor tree T l v 6− c is the subgraph of T l v induced by vertex x v and all vertices of T l v whose paths to x v (in T l v ) do no t t ra verse through vertex Γ c . In what foll ows, we wi ll use s uperscript ( l ) on a message to refer to the m essage in the l th iteration. Pr oposition 2: Suppose that l ≥ 1 and that f actor tree T l v of factor graph G exists. Then in iteration l of DTP , t ( l ) c → v = F L [ T l v − c ] → v T l v − c Y u ∈ L [ T l v − c ] t (1) u T l v − c − → v . Pr oof: W e wil l prove this result by ind uction on l . For the base case, we have t (1) c → v = F v c Y u ∈ V ( c ) \{ v } t (1) u → c = F L [ T 1 v − c ] → v T 1 v − c Y u ∈ L [ T 1 v − c ] t (1) u T 1 v − c − → v . As the inductive hypothesis, suppo se t hat the result o f this prop osition holds for a given iteration n umber l ≥ 1 . This impli es specifically that for every u ∈ V ( c ) \ { v } and every b ∈ C ( u ) \ { c } , t ( l ) b → u = F L [ T l u − b ] → u T l u − b Y w ∈ L [ T l u − b ] t (1) w T l u − b − → u . 71 Then t ( l +1) u → c = \ b ∈ C ( u ) \{ c } t ( l ) b → u = \ b ∈ C ( u ) \{ c } F L [ T l u − b ] → u T l u − b Y w ∈ L [ T l u − b ] t (1) w T l u − b − → u = F L [ T l u 6− c ] → u T l u 6− c Y w ∈ L [ T l u 6− c ] t (1) w T l u 6− c − → u . Finally , t ( l +1) c → v = F v c Y u ∈ V ( c ) \{ v } t ( l +1) u → c = F v c Y u ∈ V ( c ) \{ v } F L [ T l u 6− c ] → u T l u 6− c Y w ∈ L [ T l u 6− c ] t (1) w T l u 6− c − → u = F L [ T l +1 v − c ] → v T l +1 v − c Y w ∈ L [ T l +1 v − c ] t (1) w T l +1 v − c − → v . This completes the proof. T ranslating this results to summ ary tokens, the following result can be obt ained im mediately . Cor ollary 2: Suppose that l ≥ 1 and that factor tree T l v of factor graph G exists. Then in iteration l of DTP , t ( l ) v = F L [ T l v ] → v T l v Y u ∈ L [ T l v ] t (1) u T l v − → v . The impli cation of th is resul t is that on factor graph with sufficiently large g irth, DTP is in fact very well-beha ved: th e s ummary token at an y v ariable x v in iteration l depends precisely on the i nitial tokens passed by v ariables t hat are 2 l away from x v . Specifically , o ne may view those tokens form a rectangle on L [ T l v ] , and the s ummary token at x v in iteration l is p recisely the set of all assi gnments on { v } t hat can make Γ T l v satisfied, give n the assig nment on L [ T l v ] is from that rectangle. Now we deve lop som e results of DTP that require no “local cycle-freeness” in the factor graph. 72 Lemma 15: At ev ery v ∈ V and for any l , t ( l ) v = \ c ∈ C ( v ) t ( l +1) v → c . Pr oof: Suppose that x v ∈ t ( l ) v . Th en x v ∈ t ( l ) c → v for every c ∈ C ( v ) , by the definition of summary messages. It follows that x v ∈ t ( l +1) v → c for every c ∈ C ( v ) . Then x v ∈ T c ∈ C ( v ) t ( l +1) v → c . This shows that t ( l ) v ⊆ T c ∈ C ( v ) t ( l +1) v → c . On the other hand , suppose that x v ∈ T c ∈ C ( v ) t ( l +1) v → c . Then x v ∈ t ( l +1) v → c = T b ∈ C ( v ) \{ c } t ( l ) b → v , for ev ery c ∈ C ( v ) . It follows that x v ∈ t ( l ) b → v for ev ery b ∈ C ( v ) , giving rise to that x v ∈ T b ∈ C ( v ) t ( l ) b → v = t ( l ) v . Thus T c ∈ C ( v ) t ( l +1) v → c ⊆ t ( l ) v . Therefore t ( l ) v = T c ∈ C ( v ) t ( l +1) v → c . Lemma 16: Suppos e that ˆ x V is a satis fying ass ignment on V , namely that ˆ x V satisfies (1). If ˆ x V ∈ Q v ∈ V T c ∈ C ( v ) t ( l ) v → c in som e it eration l , then ˆ x V ∈ Q v ∈ V t ( l ) v . Pr oof: The f act that ˆ x V ∈ Q v ∈ V T c ∈ C ( v ) t ( l ) v → c implies that for e very v ∈ V and c ∈ C ( v ) , ˆ x V : { v } ∈ T c ∈ C ( v ) t ( l ) v → c ⊆ t ( l ) v → c , and hence via the “monot onicity” of function F c , F c { ˆ x V : V ( c ) \{ v } } ⊆ F c Q u ∈ V ( c ) \{ v } t ( l ) u → c ! = t ( l ) c → v . Incorporati ng t hat ˆ x V is a s atisfying ass ignment, we see that ˆ x V : { v } ∈ F c { ˆ x V : V ( c ) \{ v } } ⊆ t ( l ) c → v , for every v ∈ V and c ∈ C ( v ) . Thus ˆ x V : { v } ∈ T c ∈ C ( v ) t ( l ) c → v = t ( l ) v . It th en foll ows that ˆ x V ∈ Q v ∈ V t ( l ) v . Pr oposition 3: Suppose that ˆ x V is a satisfyin g ass ignment and that t he initiali zation of DTP is such that ˆ x V : { v } ∈ t (1) v → c for every v ∈ V and c ∈ C ( v ) . Then in any iteration l , t he rectangle Q v ∈ V t ( l ) v formed by the summary tokens contains ˆ x V . Pr oof: At iteration 1, the fact th at ˆ x V : { v } ∈ t (1) v → c for every v ∈ V and c ∈ C ( v ) implies t hat ˆ x V ∈ Q v ∈ V T c ∈ C ( v ) t (1) v → c . Followed b y L emma 16, we have ˆ x V ∈ Q v ∈ V t (1) v . As the ind uctiv e hypothesi s, suppose we have ˆ x V ∈ Q v ∈ V t ( l ) v at iteration l . At it eration l + 1 , followed b y Lemma 15, we ha ve ˆ x V ∈ Q v ∈ V T c ∈ C ( v ) t ( l +1) v → c . Then by Lemma 16, ˆ x V ∈ Q v ∈ V t ( t +1) v . Therefore, this proposi tion is p rov ed by inducti on. At t his end , we have shown that if DTP is initi alized to “containing” a satisfying ass ignment, then this assignment is contained in the rectangl e formed by t he s ummary to kens in all i terations. That is, the solu tion of the CSP will ne ver get “lost” during DTP iteration provided that it is 73 contained in the initial rectangle. This result (Propos ition 3) and Corollary 2 presented earlier will become us eful when we discus s the d ynamics of PTP . B. On the Dynamics of PTP and W eighted PTP W e now turn our attenti on to (non-weighted) PTP . Denote by G l v the factor -subgraph of G which contains all factors whose mess ages hav e propagated to variable x v by the end of PTP iteration l . That i s, G l v is the factor-subgraph of G that contains variable vertex x v and all vertices who se distances t o x v are no larger than 2 l . It is apparent that if G l v is a tree, then it is the ( v , l ) factor tree T l v . Let l ∗ be the sm allest l such t hat at least for one v ∈ V , T l v does not exist. Denote m v ( l ) := Γ G l v : { v } . That is, m v ( l ) is the number of assignments of var iable x v that can make all constraints in G l v satisfied. Clearly , m v ( l ) is a no n-increasing function of l . W e will first restrict th e CSP to a “ sing le-solution CSP”, i.e., ha ving e xactly one satisfying assignment. W e will denote t his assi gnment on V by ˆ x V . Let ˆ l be the smallest l for whi ch min v m v ( l ) = 1 . It is worth noting that such ˆ l exists sin ce the CSP has precisely one sol ution. Let ˆ v satisfy m ˆ v ( ˆ l ) = 1 . Pr oposition 4: Let fac tor graph G represent a single-sol ution CSP . Suppose t hat th e i nitial- ization of PTP is s uch t hat every left m essage λ (1) v → c ( t ) is strictl y posit iv e for eve ry t ∈ ( χ ∗ ) { v } . If ˆ l < l ∗ , then µ norm ( ˆ l ) ˆ v ( t ) = [ t = { ˆ x V : { ˆ v } } ] . Pr oof: This result relies o n Coroll ary 2. First, ˆ l < l ∗ implies that ( ˆ v , ˆ l ) factor tree T ˆ l ˆ v exists. Then by Corollary 2, if DTP is in itialized such that t he tokens sent from th e leav es of T ˆ l ˆ v form Q u ∈ L [ T ˆ l ˆ v ] t (1) u T ˆ l ˆ v − → ˆ v , then th e summary token at v in the ˆ l th iteration is F L [ T ˆ l ˆ v ] → ˆ v T ˆ l ˆ v Q u ∈ L [ T ˆ l ˆ v ] t (1) u T ˆ l ˆ v − → ˆ v . Since ˆ v s atisfies m ˆ v ( ˆ l ) = 1 , it is necessary th at F L [ T ˆ l ˆ v ] → ˆ v T ˆ l ˆ v Q u ∈ L [ T ˆ l ˆ v ] t (1) u T ˆ l ˆ v − → ˆ v is eith er token { ˆ x V : { v } } or ∅ , wh ich depends on the rectangle initi alized. 74 Now PTP on T ˆ l ˆ v , with respect to x ˆ v , may be und erstood as initial izing a random rectangle on L [ T ˆ l ˆ v ] (the dist ribution of wh ich is characterized by the produ ct of the initial messages), transforming the random rectangle to random t oken on ˆ v v ia a functional mappin g F L [ T ˆ l ˆ v ] → ˆ v T ˆ l ˆ v ( · ) , and conditioni ng on the resultin g tok en being valid (non-empty set). The fact that initial messages of PTP are strictly positive assures that ev ery rectangle on L [ T ˆ l ˆ v ] h as non -zero p robability during initializatio n. After conditioning on the resulting token being v alid, the token ∅ is removed from the allowed realization of the result ing token and thu s the result ing token equals { ˆ x V : { v } } with probability 1. Thi s com pletes the proof. This result and its proof can be easily extended to a somewhat larger family of CSPs each containing multipl e sol utions, as shown in the next propositi on. Pr oposition 5: Suppose that in the CSP , there e xist a coordinate ˆ v ∈ V and an assignment ˆ x ˆ v ∈ ( χ ∗ ) { v } such that ever y satisfying configuration ˜ x V ∈ Γ s atisfies ˜ x V : { v } = ˆ x v . If for som e integer ˆ l , T ˆ l ˆ v exists and m ˆ v ( ˆ l ) = 1 , t hen µ norm ( ˆ l ) ˆ v ( t ) = [ t = { ˆ x ˆ v } ] . The proof i s si milar t o that for p roposition 4, which ess entially relies on Corollary 2 and t hat the local t ree rooted at ˆ v is large eno ugh. Skipping th e proof, we note t hat Proposi tion 4 may be viewed as a special case of Proposition 5. Based on the re sult s above, we provide some remarks concerning the dy namics of PTP and ar gue i ntuitively how it solves a CSP . 1) Simi lar to what was ar gued in t he proof of Propositio n 4, the key insi ght regarding what PTP is d oing is that PTP updates a random rectangle whose sid es are distributed independently . At t he initiali zation stage, PTP defines a rando m rectangle on V , where the sides of the random rectangles are treated as independent random variables. In ev ery iteration, PTP maps thi s random rectangle to a new rando m rectangle in t he following steps. a) Apply a functional mapping defined by the rig ht-message u pdate rule and the left- message update rule. b) Eliminate the resulti ng emp ty rectangles (via condit ioning on that each side of the resulting random rectangle is not the em pty set and re-normalization ). 75 c) T ake the marginal dis tribution of t he resulting random rectangle on each si de variable, and treat all s ides as being independ ent random var iables. Th is defines a ne w random rectangle. PTP iterates over th ese steps to con tinuously up date the random rectangle. 2) For single-sol ution CSPs, based on Proposition 4, if the girth of the graph is large enough, at least one side of the new rectangle, after some iterations , becom es det erministic, namely the singleton set contain ing the correct assi gnment for that v ariable. This would allow the decimation procedure t o fix this variable to the correct assignment and reduce the problem. Similar re sult s hol d for CSPs h a ving more than one solutions but in which all soluti ons share a single assignment o n some coordi nate. By Propos ition 5, in t his case, when the local tree rooted at that v ariable is su f ficiently large , PTP will find that v ariable and its correct assignment. Of course, t he condition of P roposi tion 4 and that of Proposition 5, namely that there is a sufficiently lar ge local tree rooted at a variable and that the variable only has one correct assignm ent, may not hol d in realit y . As a consequence, no side of the random re ctangle is deterministically a singleton. In that case, the decimation procedure must deal with this am biguity — resul ted from no n-ideal f actor graph structure and t he complexity of the sol ution s pace — and make a goo d gu ess t o fix a variable. 3) Proposi tion 4 and Proposition 2 also su ggest that when the graph h as large girth (and when t he solution s sh are one common assi gnment on some coo rdinate), as P TP iterates, the rectangles containing no soluti ons will be gradually remov ed fr om the samp le space of the random rectangle. 4) Proposi tion 3 impli es that regardless of cycle structure of th e graph, all soluti on-containing rectangles will be kept (possibl y in a form of comb ining each other) over PTP it erations. 5) Combi ning 3) and 4) above, one may view each PTP i teration as performing a “filtering” operation on the distribution of the random rectangle. As the distribution of the random rectangle e v olves, the probability mass moves gradually to on e biased to som e solu tion- containing rectangles. When the g raph has large girth and s ome coordinate is in a “fa vor - able” positio n (in a sense combining i ts location in the graph and it s role in th e solu tion space), the summ ary message at this coordi nate m ay b ecome m ore determi nistically biased to a si ngleton t oken, makin g d ecimation possibl e. 76 Finally , we briefly remark on weight ed PTP . Similar to PTP , weighted PTP also updates a random rectangle. Ho wev er , instead of using a functi onal mappin g, i n s tep a) of the above procedure, it uses a conditi onal dist ribution. By e xamining the form of the obedience conditionals, i t is intuitive that comparing wit h PTP , weighted PTP shifts the di stribution of each sid e of the random rectangle more towards “smaller” tokens on each coordi nate. (Here t v is said to be smaller than t ′ v if t v ⊂ t ′ v .) This provides the algorithm better opportunity t o l ead to som e side of the random rectangle m ore determ inistically biased to a s ingleton. R E F E R E N C E S [1] M. M ´ ezard, G. Parisi, and R. Zecchina, “ Analytic and algorithmic solution of random satisfiability problems, ” Science , , no. 297, pp. 812–815, 2002. [2] S . A. Cook, “T he complexity of theorem-pro ving procedures, ” in 3r d Annual ACM Symposium on Theory of Computing . Association for Computing Machinery , 1971, pp. 151–158. [3] A. Braunstein, R. Mulet, A . Pagnani, M. W eigt, and R . Z ecchina, “Polynomial iterative algorithms for coloring and analyzing random graphs, ” Physical Review E , vol. 68, no. 3, pp. 036702, 2003. [4] W . Y u and M. Aleksic, “Coding for the blackwell channel: a surve y propagation approach , ” in Pr oc. IEEE Int. Symp. Inform. T heory , Adelaide, Australi a, 2005, pp. 1583–158 7. [5] M. J. W ainwright and E. Manev a, “Lossy source encoding via message-passing and decimation over generalized codew ords of L DGM codes, ” in Proc . IEEE Int. Symp. Inform. Theory , Adelaide, Australia, 2005, pp. 1493–149 7. [6] T . Richardson and R. Urbanke, “The capacity of low-de nsity parity-check codes under message-passing decoding, ” IEEE T rans. Inform. Theory , vol. 47, no. 2, pp. 599–618, F eb . 2001. [7] J. Pearl, Pro babilistic Reasoning in Intelli gen t Systems: N etworks of Plausible Infer ence , Morgan Kaufmann , San Matco, CA, 1st edition, 1988. [8] F . R . Kschischang, B. J. Frey , and H.-A. Loeliger , “Facto r graphs and the sum-product algorithm, ” IEEE T rans. Inform. Theory , vol. 47, no. 2, pp. 498–519, Feb 2001. [9] Jr . G. D. Forney , “The Viterbi algorithm, ” Proc. IEEE , vol. 61, no. 3, pp. 268–278, March 1973. [10] C. Berr ou, A. Glavieux, and P . Thitimajshima, “Near shannon limit error-correcting coding and decoding: Turbo codes, ” IEEE T rans. Comm. , vol. 44, no. 10, pp. 1261–127 1, Oct. 1996. [11] D. J. C. R. J. McEliece, D. J. C. MacKay and J. Cheng, “Turbo decoding as an instance of Pearl’ s “belief propagation” algorithm, ” IEEE JSAC , vol. 16, no. 2, pp. 140–152, 1998. [12] F .Jelinek L.Bahl, J.Cocke and J.Ravi v , “Optimal decodin g of linear codes for minimizing symbol error rate, ” IEEE T r ans. Inform. T heory , vol. 20, no. 2, pp. 284–287, March 1974. [13] S. M. Aji and R . J. McEli ece, “The generalized distributiv e law , ” IEEE T rans. Inform. Theory , vol. 46, no. 2, pp. 325–343, Mar . 2000. [14] A. Braunstein and R. Zeccchina, “Survey propagation as l ocal equilibrium equations, ” J. Stat. Mech. , June 2004. [15] E. Mane v a, E. Mossel, and M. J. W ainwright, “ A ne w look at survey propagation and its generalizations, ” in SODA , 2005, pp. 1089–1098. 77 [16] R. Kindermann and J. L. S nell, Markov Rand om F ields and Their Applications , American Mathematical S ociety , Providence , RI, 1980. [17] R. Tu, Y . Mao, and J. Zhao, “On generalized surve y propagation: normal r ealization and sum-product interpretation, ” in Pr oc. IEE E Int. Symp. Inform. Theory , 2006. [18] Jr . G. D. Forney , “Codes on graphs: normal realizations, ” IEEE T rans. Inform. Theory , vol. 47, no. 2, pp. 520–548 , Feb . 2001. [19] A. Brauntein, M. M ´ ezard, M. W eight, and R. Zecchina, Constraint sa tisfaction by survey pro pagation , Sep. 2003, http://arxiv .org/abs/co nd-mat/0212451. [20] J. S . Y edidia, W . T . Freeman, and Y . W eiss, “Constructing free-energy approximation s and generalized belief propa gation algorithms, ” IEEE T rans. Inform. T heory , vol. 51, no. 7, pp. 2282–2312 , July 2005.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment