Equivalence of LP Relaxation and Max-Product for Weighted Matching in General Graphs

Equi v alence of LP Relaxation and Max-Product for W eighted Matching in General Graphs Sujay Sanghavi LIDS, MIT sanghavi@mit.edu Abstract — Max-product belief p ropaga tion is a loc al, iterati ve algorithm to ﬁnd th e mode/MAP esti mate of a probability d istri- bution. W hile it has been successfully employed in a wid e variety of applications, there are relativ ely few th eor etical guarantees of conv ergence and corr ectn ess for general loopy graphs that may have many short cycles. Of these, even fewer pro vide exact “necessary and sufﬁcient” ch aracterizations. In this paper we inv estigate the problem of using max-pr od uct to ﬁnd the maximum w eight matching in an arbitrary graph with edge weights. This is done b y ﬁ rst constructing a probability distribution whose mode corr espon ds to th e optimal matching, and then running max-pro d uct. W eighted ma tching can also be posed as an integer program, for whi ch there is an LP relaxation. This r elaxation is not always tight. In this paper we show that 1) If th e LP relaxa tion is tight, th en max-product always con verges, and that t oo to the correct answer . 2) If the LP relaxation is l oose, then max-product d oes not con verge. This prov ides an exact, d ata-dependent characterization of max- product perfor mance, and a precise connection to LP relaxation, which is a well-studied optimization technique. Also, since LP relaxa tion is known to be tight f or bipartite graphs, our results generalize oth er recent results on usin g max-product to ﬁ nd weighted matchings in bipartite graphs. I . I N T R O D U C T I O N Message-passing algo rithms, like Belief Pro pagation and its variants a nd g eneralization s, hav e bee n shown empir- ically to b e very effecti ve in solving m any instances of hard/com putationally intensi ve p roblems in a wide ran ge of ﬁelds. These algo rithms were origin ally design ed f or exact inference (i.e. calcu lation of margina ls/max-marginals) in tree- structured pro bability distributions. Their ap plication to gen- eral g raphs inv o lves rep licating their iterati ve lo cal up date rules o n the general g raph. In this case howev e r , there are no gu arantees of either con vergence or correctness in general. Understand ing and characteriz ing the p erforman ce of message-passing algo rithms in general graph s remains an activ e research area. [1 , 2] show correctness for g raphs with at mo st o ne cycle. [3, 4 ] show th at for g aussian p roblems the sum-prod uct algor ithm ﬁnds the correct means upon con- vergence, but d oes not a lw ays ﬁnd th e cor rect variances. [5, 6] show asymp totic correctness for r andom graphs associated with decoding . [7] sho ws that if max-prod uct con verges, th en it is optim al in a relativ ely large “local” neighborhoo d. In this paper we consider th e pr oblem of using m ax-prod uct to ﬁnd the maximum weight matchin g in an arbitr ary graph with ar bitrary edge weigh ts. This pr oblem can be formula ted as a n integer pro gram, which has a natural LP relax ation. In this paper we prove the follo w ing 1) If the LP rela xation is tight, th en max-prod uct alw ay s conv e rges, and that too to the co rrect answer . 2) If the LP rela xation is loose, th en max-pro duct do es not conv e rge. Bayati, Shah and Sharm a [8] were the ﬁrst to investigate max-pr oduct for the weighted matching problem. They s howed that if the graph is bipar tite then max-pro duct always con- verges to the correct answer . Recently , this result has been extended to b -matchings on bipartite g raphs [9]. Since the LP relaxation is alw a ys tight for bipartite graph s, the ﬁr st part of our results rec over their results an d can be viewed as the correct g eneralization to arb itrary grap hs, since in this case the tightness is a fun ction of structure as well as weights. W e would like to po int out three features of our work: 1) It p rovides a necessary and sufﬁcient conditio n fo r con- vergnce of max-p roduct in arbitrary pro blem instances. There are very few n on-trivial cla sses o f pr oblems for which there is such a tight characterization of message- passing perfor mance. 2) The character ization is data depen dent : it is decided based not only o n the graph structure but also on the weights of the particu lar instance. 3) T ig htness of LP relaxatio ns is well-studied for b road classes o f pr oblems, mak ing th is ch racterization pr omis- ing in terms of b oth understanding and development of new a lgorithms. Relations, similarities a nd comparison s b etween max -produ ct and linear p rogramm ing h av e been used/m entioned b y several authors [ 10–12 ], and a n exact ch aracterization of this relation- ship in genera l r emains an interesting endeav o r . I n particular, it w ou ld be interesting to in vestigate the implicatio ns of these results as regar ds elucidating the relatio nship between iterative decodin g of ch annel codes and LP deco ding [13]. I I . W E I G H T E D M A T C H I N G A N D I T S L P R E L A X AT I O N A matching in a g raph is a set o f edges such that no two edges in the set a re incident on the same node. Gi ven a graph G = ( V , E ) , with no n-negative weights w e on the edges e ∈ E , the weighted matching pr o blem is to ﬁnd the matching M ∗ whose edge s hav e the highest total weig ht. In this paper we ﬁnd it convenient to r efer to edges b oth as e ∈ E and as ( i, j ) , where i, j ∈ V . W eighted m atching can be written as the following inte ger progr am (I P): max X w e x e s.t. X j ∈N ( i ) x ij ≤ 1 for all i ∈ V (1) x e ∈ { 0 , 1 } for all e ∈ E The LP re laxation of the above pr oblem is to replace the constraint x e ∈ { 0 , 1 } with the constraint x e ≥ 0 . This r elaxa tion is in general n ot tigh t , i.e. ther e migh t exist non- integer solu tions with stric tly higher v alu e th an any integral solution. It is k nown however that the LP r elaxation is always tight f or bipar tite grap hs: no ma tter what the edge weig hts, the bipartite-n ess ensures tightness o f the LP re laxation. If a g raph is n ot b ipartite, the tightness o f th e LP relaxation will depend on the edge weights: the same graph may hav e tightness for one set of weights and lo oseness for another set. The dual of the ab ove linear pr ogram is the vertex cover problem : minim ize the to tal of the weigh ts z i that need to be placed on nod es so as to “cover” the edge weigh ts: (DP) min X z i s.t. w ij ≤ z i + z j for all ( i, j ) ∈ E z i ≥ 0 for all i Lemma 1 (complime ntary slac kness): When th e LP relax- ation is tight, the optimal matching M ∗ and the optimal dual variables z and satisfy the following p roperties: 1) if ( i, j ) ∈ M ∗ then w ij = z i + z j 2) if ( i, j ) / ∈ M ∗ then w ij ≤ z i + z j 3) if no edge in M ∗ is inciden t on node i , then z i = 0 4) z i ≤ ma x e w e for all i I I I . B A C K G R O U N D O N T H E M A X - P RO D U C T A L G O R I T H M The factor gr a ph [14 ] of a probability distrib utio n repre sents the conditional indepen dencies of the distribution. T he Max- Product ( MP) algor ithm is a simple, lo cal, iterative message passing algorith m that can b e used (in an attemp t) to ﬁnd the mode/MAP estimate of a probability distribution. Nodes and factors pa ss messages to each other , and nodes m aintain “beliefs”, which represent th e max -marginals. When m ax- produ ct is ap plied to prob lems inv olvin g ge neral “lo opy” graphs, one of the fo llowing thr ee scenarios may result: 1) The algorithm may not converge. 2) The algo rithm may co n verge, but to an incorrect an swer . 3) The algorithm may converge to the correct answer . As has been men tioned, here has b een siginifcant work at- tempting to u nderstand the properties o f MP for loo py grap hs. For the results in this paper, we will use the following two insights: 1) At any time, the belief o f the max-pro duct algorith m for a g iv en variable co rrespond s to the belief at the r oot of the correspon ding computation tree distribution [2] associated with that variable at that time. W e describe what this compu tation tree distribution cor responds to for the weighted match ing problem in the next s ection. 2) If max-pro duct do es co n verge, the resulting beliefs ar e optimal in a large “local” neigh borho od [7]: let b x be the assign ment as given by th e converged ma x-prod uct and e x b e any o ther a ssignment. If the variables assigned different values in b x a nd e x for m an indu ced grap h containing at most one cycle in each co mponen t, then p ( b x ) ≥ p ( e x ) . I V . M A X - P R O D U C T F O R W E I G H T E D M A T C H I N G The pr oblem of ﬁnding M ∗ can be f ormulated as the prob - lem of ﬁn ding the mode o f a s u itably (artiﬁcally) c onstructed probab ility distribution p . In fact, th ere are in general se veral ways to construct this distribution for th e same instance of a graph G . W e now p resent one construction 1 . Associate a binary variable x e ∈ { 0 , 1 } with each ed ge e ∈ E , and let p ( x ) = 1 Z Y i ∈ V 1 { P j ∈N ( i ) x ij ≤ 1 } Y e ∈ E e w e x e (2) Here N ( i ) rep resents th e n eighbor hood of node i in G , and Z is a normalizing constant. Th e variable x e can be in terpreted as follows: x e = 1 i ndicates that e ∈ M ∗ , while x e = 0 indicates e / ∈ M ∗ . Th e term 1 { P j ∈N ( i ) x ij ≤ 1 } enforce s the cosntraint that of the ed ges inciden t to node i , at most one can be assign ed the value “1”. Thus, it is easy to see that p ( x ) > 0 if and only if the edges with x e = 1 constitute a matching in G . Fur thermore, the mode of p correspon ds to the max-weigh t match ing M ∗ . The factor grap h max -prod uct in volves messages between variables and factors. In ou r case the variables are the edges ( i, j ) ∈ E , and the factors ar e no des i ∈ V . Thus at any time t there will be messages m t i → ( i,j ) from n ode (factor) i to edge (variable) ( i, j ) , as well as messages m t ( i,j ) → i . Each message wil l be a len gth-two vector o f real numbers, in dexed by 0 and 1. Th e message update rules can be simp liﬁed to the following: m t +1 ( i,j ) → i [1] = e w ij m t j → ( i,j ) [1] m t +1 ( i,j ) → i [0] = m t j → ( i,j ) [0] m t +1 i → ( i,j ) [1] = Y k ∈N ( i ) − j m t ( k,i ) → i [0] m t +1 i → ( i,j ) [0] = max { Y k ∈N ( i ) − j m t ( k,i ) → i [0] , max k ∈N ( i ) − j m t ( k,i ) → i [1] } Also, at every time each edge (variable) maintains a b elief vector b t ( i,j ) as follows: b t ( i,j ) [0] = m t i → ( i,j ) [0] × m t j → ( i,j ) [0] b t ( i,j ) [1] = e w ij m t i → ( i,j ) [1] × m t j → ( i,j ) [1] 1 This c onstruction is differe nt from th e one in [8], which had a pairwi s e model with v ariables correspondin g to nodes in the gra ph. Howe ver , the results of this paper continue to hold when the construc iton in [8] is modiﬁed to be applic able to general graphs The p deﬁned above can be used to ﬁn d M ∗ as follows: ﬁrst run max-pro duct. At any time t and f or ea ch edge e there will be two beliefs b t e [0] a nd b t e [1] . If max-p roduct co n verges, assign to each variable the v a lue (i.e. “0 ” o r “1”) th at c orrespon ds to th e s tronger be lief. Th en, declar e the set of all ed ges set to “1” to be the max -produ ct outpu t. A. The Comp utation T ree for W eighted Matching Our proofs rely on the computation tree inter pretation [2 , 15] of th e Ma x-prod uct beliefs. W e now describe th is inter - pretation when max-p roduct i s app lied to p as gi ven in (2). For an ed ge e let T e ( k ) be the fu ll dep th- k co mputation tr ee r ooted at e . This is generated r ecursiv ely: take T e ( k − 1) and to each leaf v a dd as child ren a copy of each o f the neig hbors of v in G , except for the u nique neigh bor of v which is already present in T e ( k − 1) . Also, eac h new edge has the same weight as its copy in th e or iginal G . The recur sion is star ted with the single-edge tree T e (1) = e , both of whose endpoints are leav e s. This initial edge is the r oo t of T e . Consider now the “full synchr onous” max-produ ct, where at each time e very message in the network is updated. In this case the co mputation tree T e ( k ) for ed ge e at time k will be T e ( k ) . Alternatively , max-p roduct m ay be executed asy nchrono usly with only a subset of the messages upd ated in every time slot. In this case T e ( k ) will be a sub -tree o f T e ( k ) . In either case, the computation tree interpretation states at time k we hav e b k e [1] > b k e [0] if and on ly if the roo t of T e ( k ) is a member of a max-weig ht matching on the tree T e ( k ) . The ﬁgu re below shows an examp le w here on the left is G : th e four-cycle abcd and the chor d ac , with a m atching M = { ( a, b ) , ( c, d ) } depicted in bold. On the rig ht is the computatio n tree T ( a,b ) (4) which is th e full tree of d epth 4 rooted at ed ge ( a, b ) . The bo ld edges d epict the pr ojection M T of M o nto T ( a,b ) (4) : an edge e in the tree is in M T if and only if its copy in G is in M . a a b c d a b d c d b a a c a b c d a b d Lemma 2: Let M b e a ma tching in G and T e ( k ) be a computatio n tree. Le t M T be the set of all copies in T e ( k ) of all edges in M . Then, M T is a matching in T e ( k ) . Also, if M is maximal in G , M T is maxim al in T e . Of course T e will als o contain other matchings that are n ot projection s of matchings in G . Finally , we say that a (p ossibly not full) tree T e ( k ) is full u pto d epth k 1 if the full tree T e ( k 1 ) is contained in T e ( k ) . V . E Q U I V A L E N C E O F M A X - P R O D U C T A N D L P R E L A X AT I O N W e are no w read y to prove the main result of this pape r: the equivalence of Max-Prod uct and LP Relaxation. Befor e we procee d, we deﬁne the following term s 1) W e say that the LP r ela xation is tig ht if the linear progr am (L P) obtained by relaxin g the inte g er program (1) ha s a un ique optimal solu tion at whic h all values x e are either 0 or 1. 2) W e say that ma x-pr odu ct conver ges by step k if the variable assignments (0 or 1) that maximize the belief s at each node remain constan t once the a ssociated com- putation tree is full u p to d epth at least k . Note that this includes b oth sync hronou s and asynch ronous message updates. W e say th at max-pr o duct conver ges if ther e exists some k < ∞ such that max -produ ct converges by step k . Finally , we say th at max p r oduct con verg es to the c orr ect a nswer if th e beliefs b e at conver gence are such that b e [1] > b e [0] if and only if e ∈ M ∗ , and b e [1] < b e [0] if and only if e / ∈ M ∗ W e also need to make some uniqueness assump tions. I t is well-recogn ized that ma x-prod uct may perform poorly in the presence of mu ltiple optima, and th at ch aracterizing perfo r- mance in this case is hard. For the r est of th is paper we will assume the following: A1 M ∗ is the uniqu e optimal m atching. A2 The linear program always has a uniqu e optimal solu- tion. Note that th is can be fra ctional, but it has to be unique. A. Max-p r oduct is as P owerful as LP Relaxation In this section we prove th at if the LP relaxa tion is tight then Max-Prod uct conv erges to the correct answer . Recall that when the LP is tight, part 2 of Le mma 1 says that if ( i, j ) / ∈ M ∗ then w ij ≤ z i + z j . Th e un iqueness assump tions A1 -2 furth er imply that the inequality is strict: w ij < z i + z j . Anothe r way of saying this is that ther e exists an ǫ > 0 such that w ij ≤ z i + z j − ǫ for all ( i, j ) / ∈ M ∗ (3) Theor e m 1: Consider a weighted graph G for which the LP relaxation is tight. Then max-product con verges to the correct answer by step 2 w max ǫ , wh ere w max = ma x e w e is th e weight of the heaviest edge, and ǫ satisﬁes (3). Pr oof: Let M ∗ be the optimal matching on G . F or m ax-prod uct to be con vergent and co rrect, we need that b t e [1] > b t e [0] for all e ∈ M ∗ and b t e [1] < b t e [0] fo r all e / ∈ M ∗ , an d fo r all t such that T e ( t ) is full upto dep th 2 w max ǫ . So suppose that for such a t there exists an e / ∈ M ∗ such that b t e [1] > b t e [0] . Then, there exists a matchin g M in T e ( t ) such that ( a) the ro ot e ∈ M , an d (b ) M has the largest weight among matchings on T e ( t ) . Let M ∗ T be the set of all e dges in T e ( t ) that are copies of ed ges in M ∗ . By lemma 2, M ∗ T is a maximal matchin g on T e ( t ) . Also, the ro ot e / ∈ M ∗ T by assumption. The symmetric difference M ∗ T △ M consists of disjoint alternating paths in T e ( t ) : each p ath will h av e e very alternate edge in M ∗ T and all oth er e dges in M . Let P be th e path that contains the roo t e . W e n ow show that w ( P ∩ M ∗ T ) > w ( P ∩ M ) . Recall that the optimal du al solution as signs to each node i in G a “dual value” z i ≥ 0 . Associate now with each node in T e ( t ) the dual v alu e of its copy in G . T hen, by Lemma 1 we have that w ij = z i + z j for each ( i, j ) ∈ P ∩ M ∗ T . Sup pose now that neither end point of P is a leaf of T e ( t ) . In this case, we have w ( P ∩ M ∗ T ) = X i ∈ P z i On the other hand , we kn ow that (3) holds for each ed ge in P ∩ M . Addin g these up giv es w ( P ∩ M ) ≤ X i ∈ P z i − ǫ | P ∩ M | By assumption, the root e ∈ P ∩ M , so | P ∩ M | ≥ 1 and hence w ( P ∩ M ∗ T ) > w ( P ∩ M ) when n o en dpoints o f P ar e leav e s. Suppose now that exactly one of the endpoints v of P is a leaf of T e ( t ) . In this case, we have that w ( P ∩ M ∗ T ) ≥ X i ∈ P z i − z v ≥ X i ∈ P z i − w max where the last inequa lity follo ws from part 4 of Lemma 1 Als o, T e ( t ) is assumed to b e full up to dep th k , so th is im plies that | P ∩ M | ≥ k 2 . This mean s that w ( P ∩ M ) ≤ X i ∈ P z i − ǫ k 2 Now , since k ≥ 2 w max ǫ , this im plies that w ( P ∩ M ∗ T ) > w ( P ∩ M ) . T he ﬁn al case, where both end points of P ar e leaves, works out in the same way , except that now | P ∩ M | ≥ k and w ( P ∩ M ∗ T ) ≥ P i ∈ P z i − 2 w max . Thus, in any case, we have that w ( P ∩ M ∗ T ) > w ( P ∩ M ) . Consider now the set o f edges M − ( P ∩ M ) + ( P ∩ M ∗ T ) . This set fo rms a matching on T e ( t ) , and has h igher weight than M . This contr adicts the choice of M , and so es tablishes that b t e [1] < b t e [0] for all e / ∈ M ∗ . A similar contradictio n argument can be used to establish that b t e [1] > b t e [0] for all e ∈ M ∗ . This completes the pr oof.  B. LP Rela xation is as P owerful a s Max-pr od uct In this section we pr ove that if the LP relaxation is loo se then max-pro duct do es n ot co n verge to th e corr ect answer . Before we do so h owe ver, we note that this imp lies a stronger result: that whe n LP is loose then in fact m ax-pro duct does not conv erge at all. Lemma 3: Conside r the distribution p ( x ) as given in (2 ). If Max-Prod uct conver ges, then its outp ut exactly c orrespon ds to the true optim al m atching M ∗ . The proof of this lemma uses the “local optimality” r esult of W eiss and Freem an [7] . In par ticular , fo r p it turns out that local optimality imp lies g lobal o ptimality . This means that it is not po ssible for max-p roduct to converge to an incor rect answer: it will either no t con verge at all, or con verge to M ∗ . W e do not use this explicitly in the proo fs below , but it strengthen s th e results as mention ed ab ove. W e n ow p roceed with showing that max -produ ct does not conv e rge to the correct M ∗ when LP is lo ose. As a ﬁrst step, w e need a comb inatorial ch aracterization of wh en th e LP relaxatio n is loose. W e now make some deﬁnition s. W e say that a nod e v is satu rated b y a matching M if there exists an edge e ∈ M that is incid ent to v . A blossom with respect to a matching M is an odd cycle C with | C |− 1 2 edges in M . 2 Note th at a b lossom has a unique base : a n ode not s aturated by an y edge in C ∩ M . A stemmed blossom B 1 (w .r .t M ) is a blossom C , along with an alternating path (stem) P that starts at the base of C , and starts with an edge in M . Also, P sho uld b e such that th e set M − ( P ∩ M ) + ( P − M ) r emains a matching in G . A ba d stemmed blossom is o ne in which the edge weights satisfy w ( C ∩ M ) + 2 w ( P ∩ M ) < w ( C − M ) + 2 w ( P − M ) Note that it may well be the case that | P | = 0 , in which c ase B 1 is just an od d cycle. The fo llowing is an example of a bad stemmed blossom . The bold edges are th e ones in M , the number s denote the weigh ts o f the corr esponding edges, and the last nod e i has n o ed ge o f M incident on it. The b lossom C in this case is the cycle abcde , and node c is its base. The path/stem P is cf g h i . i 3 3 3 3 3 1 1 1 0.5 b a c d e f g h A blossom p air B 2 is two blossom s C 1 and C 2 and an alternating path P between the bases of the two blossoms such that P begins an d ends with ed ges in M . A bad blossom pair is one in which the ed ge weights satisfy w ( C 1 ∩ M ) + w ( C 2 ∩ M ) + 2 w ( P ∩ M ) < w ( C 1 − M ) + w ( C 2 − M ) + 2 w ( P − M ) The following is an example of a bad blossom pair . 3 3 3 3 3 3 1 3 3 3 3 2 Blossoms were ﬁrst deﬁned in [16], wh ich also pro vided the ﬁrst e fﬁc ient algorit hm for weighted m atchi ng in arbitrary graphs. The fo llowi ng proposition provides a co mbinatorial charac- terization of wh en the L P relaxation is loose, an d is cru cial to the pro of o f the subsequen t t heorem. Pr oposition 1: I f the LP relax ation is loose, then there exists a ba d stemmed blossom, or a b ad blossom pair , with respect to the optimal match ing M ∗ . Pr oof: In appe ndix. W e use the p resence of these “bad” su bgraph s in G to show that m ax-pro duct d oes not converge to the cor rect answer . Before we do so, we need o ne ad ditional lemm a. This states that if max-produc t co n verges by step k to s ome match ing M on G , then the optim al matching M T on the computation tree looks like M in the neighb orhood o f the root. Lemma 4: Supp ose max-prod uct conv erges to a matching M in G by step k . Consider any edg e e , som e m ≥ 1 and a correspo nding comp utation tree T e which is full up to depth k + m . L et M T be th e max -weight matching on the tre e. The n, for any edge f ∈ T e that is within distance m of the r oot e , f ∈ M T if and only if its copy f 1 in G is such that f 1 ∈ M . Note th at the above lemma also applies to the root e of the tree. W e are now ready to state and prove the m ain resu lt of this section. Recall that the belief b e on an edge at conv ergen ce is in correct if either e ∈ M ∗ but b e [0] > b e [1] , or e / ∈ M ∗ but b e [1] > b e [0] . Theor e m 2: Consider a weighted g raph G f or wh ich the LP relaxation is lo ose. Then, the m ax-pro duct beliefs d o not conv e rge to the correct M ∗ : for any giv en k , th ere exists a k 1 ≥ k and computa tion trees T e , e ∈ E such that each T e is full upto depth k 1 , but the b eliefs on so me of the edges are incorrect. Le mma 3 further imp lies that in fact in this case max-pr oduct d oes not conv erge at all. Pr oof: Let M ∗ be the m ax-weight m atching on G . Since the LP relaxation is loose, by Prop. 1, there exists either a bad stemmed blossom or a ba d blossom pair w .r .t. M ∗ . Suppose ﬁrst that it contains a bad stemmed blossom B 1 , an d consider some e ∈ C ∩ M ∗ that is in the “blossom ” part of B 1 (as opposed to the stem) and also in M ∗ . From th e tw o nodes of e , make max imal altern ating paths P 1 and P 2 that remain in B 1 and start o ut in op posite d irections on C . For th e stemme d blossom example a bove, if e is the edge ( a, b ) then the two paths will be bcf g hi and aedcf g hi . Let d 1 = w ( P 1 − M ∗ ) − w ( P 1 ∩ M ∗ ) , and similarly d 2 for P 2 . d 1 represents th e c hange in the weight o f the matching if each ed ge in P 1 were “switched”, i.e. their member ship in the matchin g was reversed from its origina l value. It is easy to see that d 1 + d 2 − w ( e ) = w ( C − M ∗ ) + 2 w ( P − M ∗ ) − w ( C ∩ M ∗ ) − 2 w ( P ∩ M ∗ ) By assumption B 1 is a bad blossom an d he nce we have that d 1 + d 2 − w ( e ) > 0 . Suppose max-pr oduct co n verges to M ∗ by step k . Consider now the com putation tree T e which is full u pto depth k + | V | , where | V | is the numb er of n odes in G . Let M T be th e max-weigh t matching on T e . Lem ma 4 implies that M T will be a pro jection of M ∗ in a d istance- | V | neighborh ood o f the root. Also, starting fr om the root e , each of P 1 and P 2 will have a uniq ue copy , say R 1 and R 2 respectively , in T e , with | R 1 | , | R 2 | < | V | . Since P 1 and P 2 are alternating w .r .t. M ∗ , it f ollows that R 1 and R 2 will b e alternating with respect to M T . Also, the set S = R 1 ∪ e ∪ R 2 forms an alternating path on T e with respect to M T , and this be gins and ends in nod es unsaturated by M T . Th us, M T can b e aug mented by th is p ath: the set M T − ( S ∩ M T ) + ( S − M T ) will be a m atching o n T e . Also, the weig ht gain fro m doing this a ugmentatio n will be exactly d 1 + d 2 − w ( e ) , which we know is strictly positi ve. Thus, this shows that M T is no t the optimal matching on T e , wh ich co ntradicts the choice of M T . Th is means that our assumption abou t max- produ ct convergence to M ∗ is incorrect. Thus, we see that if there exists a bad stemmed blossom w .r .t. M ∗ in G then m ax-prod uct does not converge to M ∗ . A similar argumen t h olds f or the case o f a bad blossom pair B 2 , except that instead of p aths P 1 and P 2 above we n ow have to look at a lternating walks W 1 and W 2 that live in B 2 and are lon g enough. These walks can th en be mappe d to an augmen ting path on T e which strictly imp roves M T , leading to a contradiction as w as seen in the case o f the paths P 1 and P 2 . This com pletes the proof.  V I . D I S C U S S I O N The results of this paper can be generalized to the case of perfect matchin gs, b -matching s an d perfect b -match ings in general g raphs, wh ere similar results h old. In this paper max- produ ct is shown to be as p owerful as LP relaxatio n, but it would b e more in teresting to outlin e a direct operational link between ma x-prod uct and a linear program ming algorithm. As an examp le, [8] shows that fo r bipartite m atching m ax-prod uct has an op erational co rrespond ance with th e auction algo rithm [17]. Also, th e form of the messag e update equations suggests that it can be implem ented via an equ i valent message passing update r ule between just the nodes of the graph G , instead o f having messages go from nodes to edges and vice versa. More gen erally , it would be interesting to see if the ide as presented in this pap er could be used/ge nealized to s h ow co n- nections between linear prog ramming an d belief propagation in other application s. A C K N O W L E D G E M E N T S The au thor would like to ackn owledge Dm itry Malio utov , whose experim ents sugg ested a stron g link b etween LP relax - ation and max -produ ct performan ce for n on-bipar tite graphs. Dmitry is also responsible for pointing th e auth or to the local optimality result [7]. A P P E N D I X Proof of Proposition 1 W e now show that if the LP relaxation is lo ose then there exists in the gr aph either a b ad stemmed blo ssom or a b ad blossom pair, with respect to the optimal ma tching M ∗ . Let x be the optimal (fra ctional) solution to the LP relaxation. Let E ′ be th e set of all edg es e such that eithe r (a) e ∈ M ∗ , or (b) e / ∈ M ∗ and x e > 0 . Then, E ′ will contain at least one edge e / ∈ M ∗ , because if all e / ∈ M ∗ had x e = 0 then the LP w ould be tigh t. L et G ′ = ( V , E ′ ) be the subgrap h of G having o nly th e edges in E ′ . An cycle augmentation is any ev e n cycle in which e very alternate edge is in M ∗ . A pa th augmen tation is any path in which e very altern ate edg e is in M ∗ , an d which b egins and ends in nodes un saturated by M ∗ . For any augmentation A , we hav e that M ∗ − ( A ∩ M ∗ ) + ( A − M ∗ ) is also a match ing in G ′ . Th us, if M ∗ is th e un ique max - weight matching it has to be th at w ( A ∩ M ∗ ) > w ( A − M ∗ ) . Lemma 5: G ′ cannot contain any aug mentations: c ycles or paths. Pr oof: Let A be an augme ntation in G ′ . By assumptio n, x e > 0 for all e ∈ A − M ∗ , which implies that x e < 1 for all e ∈ A ∩ M ∗ . Thus, there exists some ǫ > 0 such that decreasing each x e , e ∈ A − M ∗ by ǫ and incr easing each x e , e ∈ A ∩ M ∗ by ǫ represents a valid ne w feasible point for the L P . Th e weigh t o f this new po int exceeds the weight of x b y ǫ ( w ( A ∩ M ∗ ) − w ( A − M ∗ )) > 0 . Howe ver this contradicts the optim ality of x , and thu s G ′ cannot contain any augmentation .  Let S b e the lon gest alternating sequen ce of edg es in G ′ , and let v 1 and v 2 be its endp oints. By the lemma above, both cannot be u nsaturated. W e say that v 1 or v 2 is a saturated le af if it is saturated by M ∗ and there e xist no edges in G ′ − M ∗ incident o n it. Also, no te that an endpo int is satura ted if an d only if its corr esponding edge in S is also in M ∗ . The fact that S is the longest sequence means that it cannot be extended fur ther beyon d v 1 and v 2 . This im plies that o ne of the following cases must occur: 1) Both v 1 and v 2 are both saturated leav e s In th is case, the con straints at v 1 and v 2 are lo ose. So, there exists an ǫ such that if all x e , e ∈ S − M ∗ are decreased by ǫ and all x e , e ∈ S ∩ M ∗ are incre ased by ǫ then the new solutio n remains fea sible. This new solution will have strictly hig her weig ht than x , which is a contrad iction. Thus this case cannot occur . 2) v 1 is a saturated leaf and v 2 is unsaturated . An ǫ -perturba tion argu ment like the on e above can be used to show that this case too cannot occur . 3) v 1 is saturated by M ∗ . but is n ot a leaf. v 2 is either unsaturated , or a saturated leaf. Since S cannot b e extende d, it has to be that all edges in G ′ − M ∗ incident to v 1 have other endpo ints in S . Let e be one such edge. Then, e ∩ S form s a stemmed blossom: the r esulting cycle has to b e o dd, and th e remaining p art of S will be a stem who se endpoin t i s v 2 . Note that in this case it has to b e that the constraint at v 2 is loose. 4) Both v 1 and v 2 are saturated b y M ∗ , but are n ot leaves. Applying the above blossom argument to both v 1 and v 2 yields the existence of a blossom pair . Thus if the LP relaxation is loose then there exists a stemmed blo ssom or a blossom p air . Now all that is r emaining to show is that they are “b ad”. Let B 1 be a stemmed blossom in G ′ , co nsisting o f blo ssom C and stem P . Th en, th ere exists some ǫ > 0 such that if x e , e ∈ C ∩ M ∗ is increased by ǫ , x e , e ∈ C − M ∗ is decre ased by ǫ , x e , e ∈ P ∩ M ∗ is increased by 2 ǫ , an d x e , e ∈ C − M ∗ is decre ased by 2 ǫ , then the n ew solution remains feasible fo r the LP . Also, the n ew solution weighs ǫ [ w ( C ∩ M ∗ ) + 2 w ( P ∩ M ∗ ) − w ( C − M ∗ ) − 2 w ( P − M ∗ )] more tha n x . For x to be the uniq ue optimal of the LP , this h as to be strictly negativ e and thus any stemmed b lossom B 1 is bad. A similar argument sho ws that any blossom pair is bad. This ﬁnishes the pro of of the proposition.  R E F E R E N C E S [1] S. M. Aji, G. B. Horn, and R. J. McEliece , “On the con vergen ce of iterat ive decoding on graphs with a sin gle cyc le, ” in ISIT , 1998, p. 27 6. [2] Y . W eiss, “Correctne ss of loc al probabi lity propagatio n in grap hical models with loops, ” Ne ural Computation , vol . 12, no. 1, pp. 1–4 1, 2000. [3] Y . W eiss and W . Freeman, “Correc tness of belief propagati on in gaussian graphic al models of arbit rary topology , ” N eural Computation , v ol. 13, no. 10, pp. 2173–2200, 2001. [4] D. Maliout ov , J. Johnson, and A. Wi llsky , “W alk-sums and belief propagat ion in ga ussian graphical models, ” Jo urnal of Machine Learnin g Resear ch , vol. 7, pp. 2031–2064 , Oct. 2006. [5] T . Ric hardson and R. Urbanke, “The capacity of lo w-density parity check codes under messag e-passing decoding , ” IEEE T ran sactions on Informatio n Theory , vol . 47, pp. 599–618, 2001. [6] P . Ru smev ichie ntong and B. V . Roy , “ An analysis of belief propagatio n on the turbo decoding graph with gaussian densitie s, ” IE EE T ransacti ons on Information Theory , vol. 47, no. 2, pp. 745–765, 2001. [7] Y . W eiss and W . Freeman, “On the optimalit y of solutions of the max-product beli ef-propagation algorit hm in arbi trary graphs, ” IEEE T ransacti ons on Information Theory , v ol. 47, no. 2, pp. 73 6–744, Feb . 2001. [8] M. Bayati, D. Shah, and M. Shar ma, “Maximum wei ght matching vi a max-product belief propagation, ” in ISIT , Sept. 2005, pp. 1763 – 1767. [9] B. Huang and T . Jebara, “Loopy belief propagati on for bipartite maximum weight b-matching, ” in A rtiﬁci al Intel ligence and Statistic s (AIST ATS) , March 2007. [10] C. Y anover , T . Meltzer , and Y . W eiss, “Linea r programming relaxa tions and belief propagation – an empirica l study , ” J ourmal of Mac hine Learning R esear ch , vol. 7, pp. 1887–1907, 2006. [11] M. W ainwright, T . Jaakkola, and A. W illsky , “Map estimation via agreemen t on (hyper)trees: Message-passin g and linear -programming approac hes, ” IEEE T ransact ions on Information Theory , vol. 51 , no. 11, pp. 3697–3717, Nov . 2005. [12] J. F eldman, D. Kar ger , and M. W ainwright, “Line ar programming-b ased decodin g of turbo-li ke codes and its rel ation to i terat i ve approache s. ” in Allerton Confere nce on C ommunicat ion, Cont rol, and C omputing , 2002. [13] J. Feldman, M. W ainwright, and D. Ka rger , “Using linear programming to decode binary linea r codes, ” IEEE T ransacti ons on Information Theory , vol. 51, pp. 954–97 2, 2005. [14] F . Kschisc hang, B. Frey , and H. Loelige r , “Factor graphs and the sum- product algorith m, ” IEEE T ransactions on Informat ion Theory , vol. 47, no. 2, pp. 498–519 , Feb . 2001. [15] S. T atikonda and M. Jordan, “Loopy belief propaga tion and gibbs measures, ” in U ncertai nty in Artiﬁ cial Intel lig ence , vol. 18, 2002, pp. 493–500. [16] J. Edmonds, “Paths, t rees and ﬂowers, ” C anadian Journal of Mathe mat- ics , vol. 17, pp. 449–467, 1965. [17] D. Bertsekas, “ Auction algorithms for netw ork ﬂo w prob lems: A tutorial introduc tion, ” Computati onal Optimization and Appli cations , vol. 1, pp . 7–66, 1992.

Equivalence of LP Relaxation and Max-Product for Weighted Matching in General Graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment