Planar Cycle Covering Graphs

Planar Cycle Co v ering Graphs Julian Y ark ony , Alexander T. Ihler, Charless C. F o wlk es Departmen t of Computer Science Univ ersity of California, Irvine { yarkony,ihler,fowlkes } @ics.uci.edu Abstract W e describ e a new v ariational low er-b ound on the minim um energy conﬁguration of a planar binary Mark o v Random Field (MRF). Our metho d is based on adding auxiliary no des to every face of a planar em b edding of the graph in order to capture the eﬀect of unary p otentials. A ground state of the re- sulting approximation can be computed eﬃ- cien tly b y reduction to minim um-weigh t p er- fect matc hing. W e show that optimization of v ariational parameters ac hieves the same lo wer-bound as dual-decomp osition in to the set of all cycles of the original graph. W e demonstrate that our v ariational optimiza- tion con verges quickly and provides high- qualit y solutions to hard com binatorial prob- lems 10-100x faster than comp eting algo- rithms that optimize the same b ound. 1 In tro duction Dual-decomp osition metho ds for optimization hav e emerged as an extremely pow erful tool for solving com binatorial problems in graphical mo dels. These tec hniques can b e thought of as decomp osing a com- plex mo del into a collection of easier-to-solve comp o- nen ts, providing a v ariational b ound which can then b e optimized ov er its parameters. A wide v ariety of algorithms ha ve b een prop osed, often distinguished b y the class of mo dels from which subproblems are constructed, including trees (W ain wrigh t et al., 2005; Kolmogoro v, 2006), planar graphs (Glob erson and Jaakk ola, 2007), outer-planar graphs (Batra et al., 2010), k-fans (Kappes et al., 2010), or some more heterogeneous mix of com binatorial subproblems (e.g., T orresani et al., 2008). While the class of tree-reweigh ted metho ds are now fairly w ell understo o d, man y of the same concepts and guidance av ailable for trees are not av ailable for more general classes of decomp ositions. In this pap er, w e analyze rew eighting metho ds that seek to decompose binary MRFs into subproblems consisting of tractable planar subgraphs. W e sho w that the ultimate build- ing blo c ks of such a decomp osition are simple cycles of the original graph and that to achiev e the tightest p ossible bounds, one must c hoose a set of subproblems that cov er all suc h cycles. Cycles in planar-rew eighted decomp osition thus play a role analogous to trees in tree-rew eighted decomp ositions. There are v arious tec hniques for enforcing consistency o ver cycles in an MRF. F or example, one can tri- angulate the graph and introduce constraints o ver all triplets in the resulting triangulation. How ev er, this inv olves O ( n 3 ) constraints which is impractical in large-scale inference problems. A more eﬃcien t route is to only add a small num b er of constrain ts as needed, e.g., using a cutting-plane approach (Sontag and Jaakkola, 2007). The contribution of this pap er is a graphical construc- tion for a new v ariational bound that enforces the con- strain ts ov er al l cycles in a planar binary MRF with only a constan t factor o verhead. This represen tation is v ery simple and eﬃcien t to optimize, whic h w e demon- strate in exp erimental comparisons to existing state- of-the-art, cycle-enforcing metho ds where we achiev e substan tial p erformance gains. 2 Exact Inference for Binary Outer-planar MRFs Consider the energy function E ( X ) asso ciated with a general binary MRF deﬁned ov er a collection of v ari- ables ( X 1 , X 2 , . . . ) ∈ { 0 , 1 } N with sp eciﬁed unary and pairwise p otentials. It is straightforw ard to show that an y such MRF can b e reparametrized up to a con- stan t using pairwise disagreemen t costs θ ij along with unary parameters θ i (see, e.g., Kolmogoro v and Zabih, 2004; Schraudolph and Kamenetsky, 2008). The en- Figure 1: (a) shows a standard planar MRF whic h is represen ted by an energy function containing unary and pairwise p oten tials (b) shows an equiv alen t MRF in which the unary terms hav e b een replaced by an auxiliary no de (square). Both (a) and (b) are intractable in general. (c) shows a decomp osition whic h gives a low er-bound on the ground-state of (a) b y using a collection of outer-planar graphs whose ground states can b e computed eﬃcien tly using minim um-weigh t p erfect matching. (d) shows the new low er-b ound construction introduced in this pap er whic h uses multiple auxiliary nodes, one for each face of the original graph. ergy function can thus b e written as E ( X , θ ) = X i>j θ ij [ X i 6 = X j ] + X i θ i [ X i 6 = 0] (1) where [ · ] is the indicator function and w e ha v e dropped an y constan t terms. 1 W e can express such an energy function without in- cluding any unary terms b y introducing an auxiliary v ariable X 0 and replacing the unary terms with pair- wise connections to X 0 so that E 1 ( X, θ ) = X i>j θ ij [ X i 6 = X j ] + X i θ i [ X i 6 = X 0 ] (2) If we ﬁx X 0 = 0, then E 1 is clearly equiv alen t to our original energy function E . Since the p otentials in E 1 are symmetric, for an y state X = ( X 0 , X 1 , . . . ), there is a state ¯ X with identical energy , giv en by ﬂipping the states of ev ery X i including X 0 . Thus an y X that minimizes E 1 can b e easily mapp ed to a minimizer of E . Minimizing the energy function E 1 can b e in terpreted as the problem of ﬁnding a bi-partition of a graph G 1 whic h has a vertex i corresp onding to each v ariable X i and edges for any pair ( i, j ) with θ ij 6 = 0. The cost of a partition is simply the sum of the weigh ts θ ij of edges cut. Given a minimal weigh t partition, we can ﬁnd a corresp onding optimal state X by assigning all the nodes in the partition con taining X 0 to state 0 and the complement to state 1. Since the edge weigh ts θ ij ma y b e negative, such a minimal weigh t cut is typically non-empt y . While minimizing E ( X , θ ) is computationally in- tractable in general (Barahona, 1982), a clever con- 1 W e assume in the rest of this pap er that all MRFs are parameterized in this manner. In particular an MRF without unary parameters is one in which all the pairwise terms are symmetric. struction due to Kasteleyn (1961, 1967) and Fisher (1961, 1966) allo ws one to ﬁnd minimizing states when the graph corresponding to E 1 is planar. This is based on the complementary relation b etw een states of the no des X and p erfect matchings in the so-called ex- panded dual of the graph G 1 . A minimizing state for a planar problem can thus b e found eﬃciently , e.g. us- ing Edmonds’ blossom algorithm (Edmonds, 1965) to compute minim um-weigh t p erfect matc hings. 2 W e use the Blossom V implementation of Kolmogorov (2009) whic h is quite eﬃcien t in practice, easily handling problems with a million no des in a few seconds. F ur- thermore, for planar problems, one can also compute the partition function associated with E in p olynomial time. See the rep ort of Schraudolph and Kamenetsky (2008) for an in-depth discussion and implemen tation details. While this reduction to p erfect matching pro vides a unique to ol for energy minimization and probabilis- tic inference, the requirement that G 1 b e planar is a serious restriction. In particular, ev en if the original graph G corresp onding to E is planar, e.g., in the case of the grid graphs commonly used in computer vision applications, G 1 is typically not, since the addition of edges from every no de to the auxiliary no de X 0 ren- ders the graph non-planar. Assuming arbitrary v alues of θ i , those energy functions E to whic h this metho d can b e applied are exactly the set whose graphs G are outer-planar . An outer-planar graph is a graph with a planar em b edding where all v ertices share a com- mon face (e.g., the exterior face). F or such a graph, ev ery v ertex can b e connected to a single auxiliary no de placed inside the common face without any edges crossing so that the resulting graph G 1 is still planar. 2 Matc hings in planar graphs can b e found somewhat more eﬃciently than for general graphs which yields the b est kno wn worst-case running time of O ( N 3 / 2 log N ) for max-cut in planar graphs (Shih et al., 1990). See examples in Figure 1. 3 3 Inference with Dual Decomp osition Dual decomp osition is a general approach for leverag- ing such islands of tractability in order to p erform in- ference in more general MRFs. The application of dual decomp osition to inference in graphical mo dels w as p opularized by the work of W ain wrigh t et al. (2003, 2005) on T ree-Reweigh ted Belief Propagation (TR W). TR W ﬁnds an optimal decomposition of an MRF into a collection of tree-structured problems where exact inference is tractable. More formally , let t index a col- lection of subproblems deﬁned o ver the same set of v ariables X and whose parameters sum up to the orig- inal parameter v alues, so that θ = P t θ t . The energy function is linear in θ so w e ha ve E M AP = min X E ( X , Θ) = min X X t E ( X , Θ t ) (3) ≥ max P t θ t = θ X t min X t E ( X t , Θ t ) (4) The inequality arises b ecause each subproblem t is solv ed indep endently and thus may yield diﬀeren t so- lutions. On the other hand, if the solutions to the sub- problems all happ en to agree then the b ound is tight. The problem of maximizing the low er-b ound ov er p os- sible decomp ositions { θ t } is con vex and when infer- ence for each sub-problem is tractable (for example, θ t is tree-structured) the b ound can b e optimized ef- ﬁcien tly using message passing (ﬁxed-p oint iterations) based on computing min-marginals in each subprob- lem (W ain wrigh t et al., 2003) or b y pro jected subgra- dien t metho ds (Komo dakis et al., 2007). A p ow erful tool for understanding the minimization in Equation 4 is to work with the Lagrangian dual. Equation 3 is an integer linear program ov er X , but the integralit y constrain ts can b e relaxed to a linear program ov er con tinuous parameters µ represen ting min-marginals which are constrained to lie within the mar ginal p olytop e , µ ∈ M ( G ). The set of constraints that deﬁne M ( G ) are a function of the graph struc- ture G and are deﬁned by an (exponentially large) set of linear constraints that restrict µ to the set of min- marginals ac hiev able by some consisten t joint distri- bution (see W ainwrigh t and Jordan, 2008). Low er- b ounds of the form in Equation 4 corresp ond to re- laxing this set of constraints to the intersection of the 3 Note that outer-planar graphs hav e treewidth tw o and hence the minimum energy solution can also be found ef- ﬁcien tly using the standard junction tree algorithm. How- ev er, the reduction to matc hing is still of interest for gen- eral planar graphs without unary p otentials, which hav e a treewidth of O ( √ N ). constrain ts enforced by the structure of each subprob- lem. F or the tree-structured subproblems of TR W, this relaxation results in the so-called lo c al p olytop e L ( G ) which enforces marginalization constrain ts on eac h edge. Since L ( G ) is an outer bound on M ( G ), min- imization yields a lo wer-bound on the original prob- lem. F or any relaxed set of constraints, the v alues of µ may not corresp ond to the min-marginals of any v alid distribution, and so are referred to as pseudo- marginals. One can tighten the b ound in Equation 4 by adding additional subproblems to the primal (or equiv alen tly constrain ts to the dual) whic h enforce consistency ov er larger sets of v ariables. This has been explored, e.g. b y Son tag and Jaakk ola (2007) who suggest adding cycle inequalities to the dual whic h enforce consistency of pseudo-marginals around a cycle. Since there are a large num b er of p otential cycles presen t in the graph, Son tag suggests either using a cutting plane algorithm to successively add violated cycle constraints (Son tag and Jaakk ola, 2007) or to only add small cycles such as triplets or quadruplets (Sontag et al., 2008) that can b e enumerated with relativ e ease and optimized using lo cal message passing rather than general LP solvers. F or binary problems, it is natural to consider replacing W ainwrigh t’s tree subproblems with tractable outer- planar subgraphs. This has b een explored b y Glob er- son and Jaakk ola (2007) and Batra et al. (2010) who prop osed decomp osing a graph into a set of planar graphs for the purp oses of estimating the partition function 4 and minimum energy state resp ectively . F or energy minimization, it is well-kno wn that any set of subproblems that cov er every edge is suﬃcient to ac hieve the TR W b ound; but what is the b est set of planar graphs to use? Is it necessary to use all outer-planar or even all planar subgraphs? It turns out that the set of all outer-planar or planar sub- graphs is equiv alent to the set of all cycle constraints in G , whic h can b e enforced by any so-called cycle b asis of the graph. This observ ation leads to algo- rithms such as reweigh ted p erfect matching (Schrau- dolph, 2010), which explicitly constructs a set of sub- problems that form a complete cycle basis, or incre- men tal algorithms to enforce cycle constraints (Son tag and Jaakkola, 2007; Sontag et al., 2008; Komodakis and Paragios, 2008). In the follo wing sections, we fo cus on the case in which the original MRF is planar but the addition of the aux- iliary unary no de makes it non-planar. W e describ e a no vel, compactly expressed v ariational approxima- 4 More precisely , Glob erson and Jaakkola (2007) con- sider the inclusion of an y binary , planar subgraph of G 1 . This may include subgraphs with treewidth greater than t wo. tion. W e then pro ve that it ac hieves as tigh t a bound as decomposition into any collection of cycles or outer- planar graphs. This also gives a relativ ely simple pro of that the tightest b ounds achiev able by sets of planar, outer-planar, or cycle subproblems are equiv alen t, and that the set of subproblems that are necessary and suf- ﬁcien t to achiev e this b ound form a cycle basis, i.e., co ver every c hordless cycle in the original graph at least once. 4 Planar Cycle Cov erings Consider a planar embedding of the graph G corre- sp onding to an MRF. Since w e cannot directly connect the unary no de X 0 to every node in the graph without losing planarity , w e prop ose the following relaxation. F or each face f of G add an indep endent copy of the unary no de X f 0 and connect it to all vertices on the b oundary of the face with w eights θ f i . Let N i b e the set of unary no de copies attached to no de i . W e split the original unary potential θ i across all the unary face no des connected to i while maintaining the constraint that P f ∈ N i θ f i = θ i ; see Figure 1(d). Using this sys- tem we hav e the follo wing relaxation E M AP = min X : X f 0 = X 0 X i>j θ ij [ X i 6 = X j ] + X i,f θ f i [ X i 6 = X f 0 ] ≥ min X X i>j θ ij [ X i 6 = X j ] + X i,f θ f i [ X i 6 = X f 0 ] (5) The inequality arises b ecause we ha ve dropped the constrain t that all copies of X 0 tak e on the same v alue. On the other hand, since the graph corresp onding to the relaxation in Equation 5 is planar, w e can compute the minimum exactly . F urthermore, we hav e freedom to adjust the θ f i parameters so long as they sum up to our original parameters. This yields the v ariational problem E P C C = max θ : P f θ f i = θ i min X X i>j θ ij [ X i 6 = X j ]+ X i,f θ f i [ X i 6 = X f 0 ] (6) where E M AP ≥ E P C C . W e refer to this construction as a planar cycle c overing of the original graph since the singular p otentials for each face cycle are co vered b y some auxiliary no de (and as w e shall see, all other cycles also are cov ered in a precise sense). Although this planar decomp osition includes duplicate copies of no des from the original problem, it diﬀers in that there are not m ultiple indep endent subproblems but just a single, larger planar problem to b e solved. This is in some w ays analogous to the work of Y ark on y et al. (2010) whic h replaces the collection of spanning trees in TR W with a single “cov ering tree”. As with dual decomp osition, the parameters may b e optimized using subgradient or marginal ﬁxed-point up dates. F or example, the subgradient up dates for θ f i at a given setting of X can b e easily computed by tak- ing a gradient and enforcing the summation constraint. This yields the up date rule θ f i = θ f i + λ   [ X i 6 = X f 0 ] − 1 | N i | X g ∈ N i [ X i 6 = X g 0 ]   (7) where | N i | is the num ber of auxiliary face no des at- tac hed to X i and λ is a stepsize parameter. After each suc h gradien t step, one must recompute the optimal setting of X which can b e done eﬃciently using per- fect matching. The subgradient up date lends itself to a simple in- terpretation. If X f 0 disagrees with X i but the other neigh b oring copies { X g 0 } do not, then the cost for X f 0 and X i disagreeing is increased. On the other hand, if all the copies { X g 0 } take on the same state then the up date leav es the parameters unc hanged. 5 Cycle Decomp ositions and Cycle Co v ering Bounds In this section, we show that the planar cycle co ver b ound E P C C for an y planar binary MRF G is equiv- alen t to the low er-bound given by decomp osition into the collection of all cycles of G . F or a giv en planar binary MRF with graph G , consider the b ound E C Y C LE giv en by decomp osing the MRF in to the collection of all cycles of G . By optimizing the allo cation of parameters across these subproblems one pro duces a low er-bound that is generally tigh ter than that given b y TR W and related algorithms since the subproblems can correctly account for the energy of frustrated cycles that is approximated in the tree- based b ound. In fact, for planar graphs without unary p oten tials adding cycle subproblems is enough to mak e the low er-b ound tigh t. Lemma 5.1 The lower-b ound E C Y C LE given by the optimal cycle de c omp osition of a planar MRF with no unary p otentials is tight. F or such an MRF the set of states corresp onds exactly with the set of edge incidence v ectors represen ting cuts in the graph. The conv ex hull of this set is kno wn as the cut polytop e. The connection b etw een the cut p olytop e and the cycle decomp osition is seen b y taking the Lagrangian dual of the low er-b ound optimization whic h yields a constrained optimization of the edge incidence vectors (pseudo-marginals) o ver a p olytop e deﬁned by cycle inequalities. F or planar graphs (or Figure 2: Demonstration that the minimal energy of a cycle is equal to the maxim um low er-bound given by an appro ximation in whic h unary p otentials are represen ted by a decoupled set of auxiliary v ariables (squares). At optimalit y of the v ariational parameters, all six cuts depicted must hav e equal energies and thus it is p ossible to c ho ose a ground-state in which all the duplicate copies of the auxiliary node are in the same state. more generally graphs containing no K 5 minor), the set of cycle inequalities is suﬃcient to completely de- scrib e the cut p olytop e. See Barahona and Mahjoub (1986) for proof and related discussion b y Son tag and Jaakk ola (2007). Just as lo cal edge consistency implies global consistency for a tree, cycle consistency implies global consistency for a planar binary MRF without unary p otentials. While the num ber of simple cycles gro ws exponentially in the size of the graph for general planar graphs, it is still possible to solve such a problem in p olynomial time. It is not in fact necessary to include ev ery cy- cle subproblem but simply a subset whic h form a cy- cle basis (Barahona, 1993). F urthermore, there exists an eﬃciently computable witness for identifying a vi- olated cycle (Barahona and Mahjoub, 1986). Son tag and Jaakk ola (2007) use this as the basis for a cut- ting plane metho d whic h successively adds cycle con- strain ts to the dual. 5 W e would no w like to consider cycles in MRFs which do hav e unary p otentials. W e start with the simplest case of a single cycle. Lemma 5.2 The minimum ener gy of a single cycle is the same as the maximum lower-b ound given by the gr aph in which the unary p otentials have b e en r eplac e d by a c ol le ction of auxiliary no des (one for e ach e dge in the cycle) wher e e ach no de in the cycle is c onne cte d to the p air of auxiliary no des c orr esp onding to its incident e dges. Pr o of Sketch. Figure 2 provides a visualization of the set of auxiliary nodes (squares) added to the cycle (cir- cles). W e refer to this as the “saw” graph. Supp ose w e hav e optimized the decomp osition of unary param- eters across the auxiliary no de connections to maxi- 5 It is imp ortan t to note that a cycle basis for G 1 is not suﬃcien t to achiev e the b ound E C Y C LE giv en by the collection of all cycles in G since a cycle in G corresponds to a wheel in G 1 . mize the low er-b ound. W e claim that at the optimal decomp osition, there alwa ys exists a minimal energy conﬁguration such that all the auxiliary no des take on state 0, making the b ound equiv alent to the cycle with a single auxiliary no de. Supp ose we choose a minim um energy conﬁguration of the graph but the duplicate auxiliary no des tak e on mixed states. Start at some p oint along the cycle where there is an auxiliary no de in state 0 and proceed clo c kwise un til w e ﬁnd an auxiliary node in state 1. As w e contin ue around the cycle we will encounter some later p oint at which the auxiliary no des return to b eing in state 0. This is most easily visualized in terms of the cut separating 0 and 1 no des as sho wn in Figure 2. Let X i b e the ﬁrst no de which is attached to a pair of disagreeing auxiliary no des X a 0 , X b 0 and X j b e the second attac hed to X e 0 , X f 0 . Consider the four possible cuts highlighted in red and green in Figure 2. At the optimal decomp osition of the parameters, it must b e the case that these paths hav e equal costs. If not, then we could transfer weigh t (e.g. from θ a i to θ b i ) and increase the energy , contradicting optimalit y . Let C 1 = ( θ ic + θ a i ) = ( θ id + θ b i ) and C 2 = ( θ j h + θ f j ) = ( θ j g + θ e j ). If one of the four cuts sho wn is minimal then it must be that C 1 + C 2 ≤ 0, otherwise the path which cuts none of these edges (orange) w ould b e preferred. Ho wev er, if C 1 + C 2 < 0 then there is yet another cut (blue) which would achiev e an energy that is low er by a non-zero amount ( C 1 + C 2 ) by cutting b oth sets of edges. Therefore, it m ust b e the case that C 1 + C 2 = 0 and thus either orange or blue cuts also represents a minimal conﬁguration that leav es the collection of auxiliary nodes in state 0. A similar line of argument w orks for the cases when X c = 1 or X h = 1 or b oth. W e are thus free to ﬂip the states of the blo c k of dis- agreeable auxiliary no des and their neighbors on the cycle without changing the energy . W e can then con- tin ue around the cycle in this manner until all copies of the auxiliary no des are in state 0 as desired.  W e are no w ready to giv e the main result of this sec- tion. Theorem 5.3 The lower-b ound given by the planar cycle c overing gr aph is e qual to the lower-b ound given by de c omp osition into the c ol le ction of al l cycles so that E P C C = E C Y C LE . Pr o of Sketch. W e pro ceed by showing a circular se- quence of inequalities. Figure 3 provides a graphical o verview. T ake the set of cycles whic h yield the b ound E C Y C LE . W e can apply Lemma 5.2 to transform eac h cycle subproblem into a corresp onding “saw” contain- ing an auxiliary no de for eac h edge while maintain- ing the b ound. W e then observe that every such aug- men ted cycle is a subgraph of the planar cycle cov- ering graph. As with an y such decomp osition into subgraphs, the minimal energy of the cycle cov ering graph must b e at least as large as the sum of the min- imal subgraph energies and hence E C Y C LE ≤ E P C C . On the other hand, since the PCC graph is now a pla- nar binary MRF with no unary terms, by Lemma 5.1 w e can decompose it exactly into the collection of its constituen t cycles with no loss in the b ound. Fi- nally each of these cycles is itself a subgraph of some augmen ted cycle and hence w e must also ha ve that E C Y C LE ≥ E P C C , proving equality .  Batra et al. (2010) and Glob erson and Jaakk ola (2007) b oth prop ose decomp osing a binary MRF into a set of tractable planar graphs. Based on the previous result, w e can clearly see that the b est achiev able b ound un- der such a decomp osition must include a subproblem that cov ers every chordless cycle in the original graph. If consistency along a particular cycle is not enforced w e can alwa ys arrange parameters so that the resulting b ound is arbitrarily bad. W e also sho w the con verse, that outer-planar decomposition can do no b etter than the set of cycles. Corollary 5.4 The b est lower-b ound achieve d by any outer-planar de c omp osition for a planar MRF is no lar ger than E P C C . Pr o of Sketch. T ake an y outer-planar decomp osition of a planar MRF. W e ﬁrst note that an outer-planar graph may b e decomposed in to a forest of blo cks con- sisting of either biconnected comp onents or individual edges, where blo c ks are connected b y single vertices (cut vertices). Each biconnected comp onent in turn has a dual graph which is a tree, meaning it consists of face cycles whic h ha v e one edge in common (see e.g., Syslo (1979) for a more in-depth discussion). W e ﬁrst split apart the forest in to blocks. Consider an y pair of blocks connected at a single cut vertex X i . T o split them, we introduce copies X 1 i X 2 i of the cut v er- tex which are allo wed to take on indep endent states. The unary parameter θ i is shared b etw een these tw o copies with the constrain t that θ 1 i + θ 2 i = θ i . There exists an optimal decomp osition of θ i whic h assures the tw o nodes share an optimizing conﬁguration. F or, supp ose to the contrary that the optimal decomp o- sition yielded a minim um energy conﬁguration where X 1 i and X 2 i to ok on diﬀerent states, sa y X 1 i = 0 and X 2 i = 1. Then, shifting w eight from θ 1 i to θ 2 i w ould driv e up the energy of such a disagreeing conﬁgura- tion, contradicting optimality of the decomp osition. Once blo cks hav e b een split apart, w e may apply es- sen tially the same argument to split eac h biconnected comp onen t into its constituent face cycles. Consider the pair of neighboring no des X i , X j whic h are split in to X 1 i , X 2 i , X 1 j , and X 2 j . At the optimal decomp osi- tion of the parameters θ i , θ j , θ ij , it again m ust b e the case that the copies of the duplicated edge m ust share at least one optimizing conﬁguration. If not then the parameters could b e redistributed by removing weigh t from one or more un used states in one copy and adding it to the set of optimizing states for the other copy . This w ould increase the energy and thus contradict optimalit y of the decomp osition. Th us any outer-planar decomp osition is equiv alent to a b ound given b y the set of constituent cycles and edges. Every one of these subproblems is a subgraph of the cycle cov ering graph and so the b ound can b e no tighter than the PCC graph b ound.  6 Exp erimen tal Results W e demonstrate the p erformance of the planar cycle co ver b ound on randomly generated Ising grid prob- lems, and compare against tw o state-of-the-art ap- proac hes: max-pro duct linear programming (MPLP) with incrementally added cycles (Sontag et al., 2008) and reweigh ted p erfect matching (RPM) (Schrau- dolph, 2010). Eac h problem consists a grids of size N x N with pair- wise p otentials drawn from a uniform distribution θ ij ∼ U ( − 1 , 1). The unary p oten tials are generated from a uniform distribution θ i ∼ U ( − a, a ), where the magnitude a determines the diﬃculty of the problem. Large v alues are relativ ely easy to solve, since each v ariable has strong lo cal information ab out its optimal v alue; as a b ecomes smaller the problems typically b e- come more diﬃcult. W e generate three categories of problem, “easy” ( a = 3 . 2), “medium” ( a = 0 . 8), and “hard” ( a = 0 . 2), and show the results on each class of problem separately . T o make it easy to test conv er- gence, we scaled the w eights b y 500 and rounded them to integers. Th us a gap of less than 1 betw een low er Figure 3: Graphical depiction of Theorem 5.3 demonstrating that the planar cycle cov ering graph enforces constrain ts ov er all cycles of the original graph. (a) depicts the low er b ound E C Y C LE based on a decomp osition in to the collection of all simple cycles of the original graph. Lemma 5.2 shows that this b ound is equiv alent to the b ound giv en by a corresp onding collection of graphs (b) in whic h unary p oten tials are captured by multiple auxiliary nodes placed along each edge. Since ev ery one of these graphs is a subgraph of the planar cycle co vering graph (c) their minimum energy must b e less than E P C C . Finally , since the planar cycle cov ering graph (c) has no unary p oten tials, it is equal to its collection of cycles which are themselves all subgraphs of (b). and upp er bounds pro vides a certiﬁcate of optimalit y . W e implemented the PCC bound using the Blossom V implementation of Kolmogorov and Zabih (2004). A t eac h step t we obtain b oth a low er-b ound E t P C C and a conﬁguration of X = [ X 1 , . . . , X N ] and the copies { X f 0 } . W e compute the energy of t wo p ossi- ble joint solutions, X and its complemen t ¯ X , and sav e the best solution found so far and its energy ˆ E t as a current upp er b ound. The v ariational parameters are up dated using the pro jected sub-gradient giv en in Equation 7, and the step size λ is chosen using Poly ak’s step size rule, i.e., given sub-gradient g ( θ ) we choose λ = 1 2 ( ˆ E t − E t P C C ) / k g k 2 . The incremental up date feature of Blossom V is used to sp eed up successiv e optimizations as the v ariational parameters are mo di- ﬁed. F or b oth MPLP and RPM, we used the original au- thors’ co de av ailable online. MPLP ﬁrst runs an op- timization corresponding to the tree-rew eighted low er b ound (TR W), then successively tightens this b ound b y trying to iden tify cycles whose constrain ts are sig- niﬁcan tly violated and adding those subproblems to the collection. F or grids, it en umerates and chec ks eac h square of four v ariables; we mo diﬁed the co de sligh tly to ensure that any given square is added only once. Because weak tree agreemen t can lead to sub op- timal ﬁxed p oints in MPLP , we tried b oth the standard message up dates and a v ersion which used subgradi- en t steps, but found little diﬀerence and rep ort only the ﬁxed point up date results. W e also note that b e- cause this implementation of MPLP explicitly en umer- ates only a subset of cycles, the MPLP implemen tation ma y not pro vide the tightest p ossible lo wer-bound, an eﬀect we observe in our exp erimen ts. F or RPM, we used the author’s implementation IsInf , whic h uses a bundle-trust optimization subroutine for its subgradien t up dates. IsInf do es not compute up- p er b ounds (prop osed solutions) frequen tly; in plots sho wing the change in b ounds o ver time we mo diﬁed the co de to also return such a solution, but used the default b ehavior for our timing comparisons. Figure 4 sho ws the upp er and low er b ounds found by eac h algorithm as a function of time, for a single 32 × 32 problem instance from each of the three categories. F or the “easy” problem, all three metho ds ﬁnd and v erify the optimal solution (zero dualit y gap); in this case, MPLP conv erges more quickly than RPM, and PCC is faster still. F or the “medium” problem, we see that MPLP conv erges more slo wly and to a small duality gap, with RPM slightly faster and PCC still fastest. F or the “hard” problem, MPLP has a large dualit y gap; in this case RPM and PCC still conv erge to and v erify the optim um. In all cases, PCC is signiﬁcantly faster than the other metho ds. Figure 5 shows timing results as a function of problem size for all three algorithms. Since eac h metho d may con verge (return a pro v ably optimal solution) on some problems but not others, we report t wo quan tities: the geometric mean of the time ov er all problems for which the method conv erged (upp er ro w), and the fraction of problems that the metho d successfully solved (low er ro w). As can b e seen, PCC is signiﬁcantly faster than the other tw o metho ds across b oth problem diﬃculty and size, and successfully solv es a greater p ercentage of the problems. 7 Discussion W e hav e describ ed a new v ariational b ound for p er- forming inference in planar binary MRFs. Our b ound subsumes those given by b oth the tree-reweigh ted (TR W) and outer-planar decompositions of suc h a graph since it implicitly includes every edge and cyc le as a sub-problem. Unlike approac hes such as MPLP whic h successiv ely add cycles, we are able to get the full b eneﬁt of all cycle constrain ts immediately . As a result we achiev e fast conv ergence in practice. The PCC graph b ound is limited to planar binary problems. W e are currently exploring routes to remo ve these limitations. F or example, in general non-planar graphs, we can triangulate the graph to get a cycle ba- sis of triangles and then “glue” those triangles together in to the smallest p ossible planar graph. In addition to MAP inference, it will also b e interesting to see ho w the PCC graph relates to v ariational approximations to the marginals. Ac knowledgemen ts This w ork w as supported by a grant from the UC Labs Researc h Program References F. Barahona. On the computational complexity of Ising spin glass mo dels. Journal of Physics A: Mathematic al, Nucle ar and Gener al , 15(10):3241-3253, 1982. F. Barahona. On cuts and matchings in planar graphs. Mathematic al Pr ogr amming , 60:53–68, 1993. F. Barahona and A. Mahjoub. On the cut p olytop e. Math- ematcial Pr o gr amming , 36:157–173, 1986. D. Batra, A. Gallagher, D. Parikh, and T. Chen. Beyond trees: MRF inference via outer-planar decomposition. In CVPR , 2010. J. Edmonds. Paths, trees, and ﬂow ers. Canad. J. Math. , 17:449467, 1965. M. Fisher. Statistical mec hanics of dimers on a plane lat- tice. Physic al R eview , 124 (6):1664-1672, 1961. M. Fisher. On the dimer solution of planar Ising models. 7(10):1776-1781, 1966. A. Glob erson and T. Jaakk ola. Appro ximate inference us- ing planar graph decomp osition. In NIPS , pages 473– 480, 2007. J. Kapp es, S. Schmidt, and C. Schnoerr. MRF inference b y k-fan decomposition and tight lagrangian relaxation. In ECCV , 2010. P . Kasteleyn. The statistics of dimers on a lattice: I. the n umber of dimer arrangements on a quadratic lattice. Physic a , 27(12):1209-1225, 1961. P . Kasteleyn. Graph theory and crystal physics. In F r ank Har ary, e ditor, Gr aph Theory and The or etic al Physics , pages 43–110, 1967. V. Kolmogorov. Con vergen t tree-reweigh ted message pass- ing for energy minimization. IEEE T r ans. Pattern Anal. Machine Intel l. , 28(10):1568–1583, 2006. V. Kolmogoro v. Blossom V: A new implemen tation of a minim um cost perfect matc hing algorithm. Mathemati- c al Pr ogr amming Computation , 1(1):43–67, 2009. V. Kolmogorov and R. Zabih. What energy functions can b e minimized via graph cuts? IEEE T r ans. Pattern Anal. Machine Intel l. , 26(2):147–159, 2004. N. Komodakis and N. P aragios. Bey ond lo ose LP- relaxations: Optimizing MRFs by repairing cycles. In ECCV , 2008. N. Komo dakis, N. Paragios, and G. Tziritas. MRF opti- mization via dual decomp osition: Message-passing re- visited. In ICCV , Rio de Janeiro, Brazil, Oct. 2007. doi: 10.1109/ICCV.2007.4408890. N. Schraudolph. Polynomial-time exact inference in np- hard binary MRFs via reweigh ted perfect matc hing. In AIST A TS , 2010. N. Sc hraudolph and D. Kamenetsky . Eﬃcient exact infer- ence in planar Ising mo dels. T echnical Report 0810.4401, Oct. 2008. W.-K. Shih, S. W u, and Y. Kuo. Unifying maximum cut and minim um cut of a planar graph. IEEE T r ansactions on Computers , 39:694–697, 1990. D. Sontag and T. Jaakk ola. New outer b ounds on the marginal p olytop e. In NIPS , 2007. D. Sontag, T. Meltzer, A. Globerson, Y. W eiss, and T. Jaakkola. Tightening LP relaxations for MAP using message passing. In UAI , 2008. M. Syslo. Characterizations of outerplanar graphs. Dis- cr ete Mathematics , 26:1, 47-53, 1979. L. T orresani, V. Kolmogorov, and C. Rother. F eature cor- resp ondence via graph matc hing: Models and global op- timization. In ECCV , pages 596–609, 2008. M. J. W ainwrigh t and M. I. Jordan. Graphical mo dels, exp onen tial families, and v ariational inference. F ounda- tions and T r ends in Machine L e arning , 1:1–305, 2008. M. J. W ainwrigh t, T. Jaakkola, and A. S. Willsky . T ree– based reparameterization analysis of sum–pro duct and its generalizations. IEEE T r ans. Inform. The ory , 49(5): 1120–1146, May 2003. M. J. W ainwrigh t, T. Jaakk ola, and A. S. Willsky . MAP estimation via agreement on (hyper)trees: message- passing and linear programming approaches. IEEE T rans. Inform. The ory , 51(11):3697–3717, 2005. J. Y arkon y , C. F owlk es, and A. Ihler. Cov ering trees and lo wer-bounds on the quadratic assignment. In CVPR , 2010. 10 1 10 2 10 3 −3 −2 −1 0 1 2 3 x 10 4 Time (ms) Relative Energy Easy PCC RPM MPLP 10 1 10 2 10 3 10 4 10 5 10 6 −3 −2 −1 0 1 2 3 x 10 4 Time (ms) Relative Energy Medium PCC RPM MPLP 10 1 10 2 10 3 10 4 10 5 10 6 −3 −2 −1 0 1 2 3 x 10 4 Time (ms) Relative Energy Hard PCC RPM MPLP Figure 4: Average con vergence b eha vior of low er- and upp er-b ounds for randomly generated 32x32 Ising grid problems. W e compare PCC, the planar cycle cov er b ound (blue) to RPM (green) and MPLP (red) for easy , medium and hard problems. The problem diﬃculty is con trolled by the relative inﬂuence of unary and pairwise p oten tials. Energies are a veraged ov er 10 random problem instances and plotted relative to a MAP energy of 0. 8 16 32 64 128 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Time (ms) Easy PCC RPM MPLP 8 16 32 64 128 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Time (ms) Medium PCC RPM MPLP 8 16 32 64 128 10 0 10 1 10 2 10 3 10 4 10 5 10 6 Time (ms) Hard PCC RPM MPLP 8 16 32 64 128 0 0.5 1 Size Fraction solved 8 16 32 64 128 0 0.5 1 Size Fraction solved 8 16 32 64 128 0 0.5 1 Size Fraction solved Figure 5: Conv ergence times as a function of problem size for randomly generated Ising grid problems. W e compare PCC (blue) to RPM (green) and MPLP (red) for easy , medium and hard problems. W e record times for upp er- and low er- b ounds to conv erge av eraged ov er 10 problem instances. W e only include in the av erage con vergence time those problem instances for whic h an algorithm was able to ﬁnd the MAP conﬁguration (a dualit y gap of less than 1). The second ro w of plots sho ws in each case the fraction of problems for whic h this happ ened.

Planar Cycle Covering Graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment