Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions

Submodular function minimization is a key problem in a wide variety of applications in machine learning, economics, game theory, computer vision, and many others. The general solver has a complexity of $O(n^3 \log^2 n . E +n^4 {\log}^{O(1)} n)$ where…

Authors: Srikumar Ramalingam, Chris Russell, Lubor Ladicky

Efficient Minimization of Higher Order Submodular Functions using   Monotonic Boolean Functions
Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions Srikumar Ramalingam 1 Chris Russell 2&3 L ’ubor Ladický 4 Philip H.S. T orr 5 1 Univ ersity of Utah, USA 2 Alan T uring Institute, UK 3 Univ ersity of Edinbur gh, UK 4 ETH Zurich, Switzerland 5 Univ ersity of Oxford, Oxford, UK Abstract. Submodular function minimization is a key problem in a wide variety of applications in machine learning, economics, game theory , computer vision, and many others. The general solver has a complexity of O ( n 3 log 2 n.E + n 4 log O (1) n ) where E is the time required to e v aluate the function and n is the number of variables [32]. On the other hand, man y computer vision and machine learning problems are defined over special subclasses of submodular functions that can be written as the sum of man y submodular cost functions defined ov er cliques containing fe w v ariables. In such functions, the pseudo-Boolean (or polynomial) representation [3] of these subclasses are of degree (or order , or clique size) k where k  n . In this work, we de velop ef ficient algorithms for the minimization of this useful subclass of submodular functions. T o do this, we define novel mapping that transform submodular functions of order k into quadratic ones. The underlying idea is to use auxiliary variables to model the higher order terms and the transformation is found using a carefully constructed linear program. In particular , we model the auxiliary v ariables as monotonic Boolean functions, allowing us to obtain a compact transformation using as fe w auxiliary variables as possible. The transformed quadratic function can be ef ficiently minimized using the standard max- flow algorithm with a time complexity of O (( n + m ) 3 ) where m is the total number of auxiliary variables in volv ed in transforming all the higher order terms to quadratic ones. Specifically , we sho w that our approach for fourth order function requires only 2 auxiliary variables in contrast to 30 or more v ariables used in existing approaches. In the general case, we give an upper bound for the number or auxiliary v ariables required to transform a function of order k using Dedekind number , which is substantially lower than the e xisting bound of 2 2 k . Keyw ords: submodular functions, quadratic pseudo-Boolean functions, monotonic Boolean func- tions, Dedekind number , max-flow/mincut algorithm 1 Introduction Many optimization problems in sev eral domains such as operations research, computer vision, machine learning, and computational biology inv olve submodular function minimization. Submodular functions (See Definition 1) are discrete analogues of con ve x functions [33]. Examples of such functions include cut capacity functions, matroid rank functions and entrop y functions. Submodular function minimization techniques may be broadly classified into tw o cate gories: algorithms for general submodular functions and efficient and customized algorithms for subclasses of submodular functions. This paper falls under the second category . General solvers: The role of submodular functions in optimization was first discovered by Edmonds when he gave se veral important results on the related poly-matroids [10]. Grötschel, Lovász, and Schrijver first gave a polynomial-time algorithm for minimization of submodular function using ellipsoid method 2 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr [17]. Recently sev eral combinatorial and strongly polynomial algorithms [13,22,24,45,36] have been de- veloped based on the work of Cunningham [9]. The current best strongly polynomial algorithm for min- imizing general submodular functions [32] has a run-time complexity of O ( n 3 log 2 n.E + n 4 log O (1) n ) , where E is the time taken to ev aluate the function, and n is the number of variables. W eakly polynomial time algorithms with a smaller dependence on n also exist. For example, Lee et al. [32] shows a method with a run-time complexity of O ( n 2 log nM .E + n 3 log O (1) nM ) , where M is the maximum absolute value of the function v alues. Specialized solvers: Higher order submodular functions are useful in modeling man y computer vision and machine learning problems [26,31,21]. Such problems typically in v olve millions of pixels making the use of general solvers highly infeasible. Further , each pixel may take multiple discrete v alues and the con version of such a problem to a Boolean one introduces further variables. On the other hand, the cost functions for many such optimization algorithms belong to a small subclass of submodular functions. The goal of this paper is to provide an efficient approach for minimizing these subclasses of submodular functions using a max-flow algorithm. Notations: Let B denote the Boolean set { 0 , 1 } and R the set of reals. Let the vector x = ( x 1 , ..., x n ) ∈ B n , and V = { 1 , 2 , ..., n } be the set of indices of x . W e introduce a set r epresentation to denote the labelings of x . Let S 4 = { 1 , 2 , 3 , 4 } and let P be the po wer set of S 4 . For example, a labeling { x 1 = 1 , x 2 = 0 , x 3 = 1 , x 4 = 1 } is denoted by the set { 1 , 3 , 4 } . For a subset A ⊆ V , let us denote by 1 A ∈ B n its characteristic vector , i.e. 1 S j = ( 1 if j ∈ A, 0 otherwise. (1) Definition 1. Submodular functions map f : B n → R and satisfy the following condition: f ( X ) + f ( Y ) ≥ f ( X ∨ Y ) + f ( X ∧ Y ) , (2) wher e X and Y ar e elements of B n and the symbols ∨ and ∧ denote union and intersection of sets r espectively . In this paper , we use a pseudo-Boolean polynomial representation for denoting submodular functions. Definition 2. Pseudo-Boolean functions ( P B F ) take a Boolean vector as argument and r eturn a r eal num- ber , i.e . f : B n → R [3]. These can be uniquely e xpr essed as multi-linear polynomials, i.e. for all f ther e exists a unique set of r eal numbers { a S : S ∈ B n } : f ( x 1 , ..., x n ) = X S ⊆ V a S ( Y j ∈ S x j ) , a S ∈ R , (3) wher e a ∅ is said to be the constant term. The term or der refers to the maximum degree of the polynomial. A submodular function of second order in volving Boolean variables can be easily represented using a graph such that the minimum cut, computed using a max-flo w algorithm, also efficiently minimizes the function. Ho wev er , max-flow algorithms can not exactly minimize non-submodular functions or some submodular ones of an order greater than 3 [49]. There is a long history of research in solving subclasses of submodular functions both exactly and ef- ficiently using max-flow algorithms [1,28,18,48,38]. In this paper , we propose a linear programming formulation that is capable of answering this question: giv en any pseudo Boolean function, it can derive a quadratic submodular formulation of the same cost or a closest quadratic submodular function (i.e., say Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 3 under L 1 norm), if an exact deriv ation does not exist. The problem of using a linear program (LP) for expressing a gi ven function using other functions (with A V s) was already established in [6]. Compared to the existing results, we also provide a smaller LP for submodular functions and also sho w that we need only fewer A V s compared to existing methods. Definition 3. F k denotes a class of pseudo-Boolean functions of order k such that e very function f ( x ) ∈ F k satisfies the submodularity pr operty given in Definition 2. It was first sho wn in [18] that an y function in F 2 can be minimized exactly using a max-flow al- gorithm. Billionnet and Minoux [1] showed that any function in F 3 can be transformed into a function in F 2 using additional variables. While transforming a given higher order function to a function in F 2 , we use additional variables that we refer to as auxiliary variables ( A V ). In the course of this paper , you will see that these A V s are often more difficult to handle than variables in the original function and our algorithms are driv en by the quest to understand the role of these auxiliary variables and to eliminate the unnecessary ones. K olmogorov [27] improved the complexity of Iwata’ s capacity scaling algorithm [23] for special func- tions which are represented as a sum of submodular terms. This is the first line of research that does not use auxiliary v ariables to handle higher order terms. The formulation of K olmogoro v also closely resem- bles the approach of Cooper [7], who used a linear program with an exponential number of constraints for solving the minimization of the submodular function. It was shown that we can hav e a algorithm that can be parallelized for minimizing decomposable submodular functions, which can be decomposed into sum of simple submodular functions. In [35], it was shown that the algorithm con verges linearly , and they also pro vide upper and lower bounds on the rate of con vergence. Recently , Zivn y et al. [49] made substantial progress in characterizing the class of functions that can be transformed to F 2 . Their most notable result is to sho w that not all functions in F 4 can be transformed to a function in F 2 . This result stands in strong contrast to the third order case that was positi vely re- solved more than two decades earlier [1]. Using Theorem 5.2 from [37] it is possible to decompose a giv en submodular function in F 4 into 10 different groups G i , i = { 1 .. 10 } , where each G i is shown in T able 1. Zi vny et al. showed that one of these groups ( G 10 ) can not be e xpressed using any function in F 2 employing an y number of A V s. Most of these results were obtained by mapping the submodular function minimization to a valued constraint satisf action problem. 1.1 Problem Statement and main contrib utions Lar gest subclass of submodular functions: W e are interested in transforming a gi ven function in F k into a function in F 2 using A V s. As such a transformation is not possible for all submodular functions of order four or more [49], our goal is to implicitly map the largest subclass F k 2 that can be transformed into F 2 . This distinction between the two classes F k 2 and F k will be crucial in the remainder of the paper (see Figure 1). Definition 4. The class F k 2 is the lar gest subclass of F k such that every function f ( x ) ∈ F k 2 has an equivalent quadratic function h ( x , z ) ∈ F 2 using A V s z = z 1 , z 2 , ..., z m ∈ B m satisfying the following condition: f ( x ) = min z ∈ B m h ( x , z ) , ∀ x . (4) In this paper , we are interested in de veloping an algorithm to transform e v ery function in this class F k 2 to a function in F 2 . 4 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Fig. 1. All the function in the classes F 1 , F 2 , F 3 and F k 2 , k ≥ 2 can be transformed to functions in F 2 and minimized using the maxflow/mincut algorithm. Efficient transformation of higher or der functions: W e propose a linear programming algorithm to trans- form higher order submodular functions to quadratic ones using monotonic Boolean functions ( M B F [8]). This framework provides sev eral advantages. First we show that the state of an A V in a minimum cost labeling is equiv alent to an M B F defined over the original v ariables. This provides an upper bound on the number of A V s giv en by the Dedekind number [29], which is defined as the total number of M B F s over a set of n binary variables. In the case of fourth order functions, there are 168 such functions. Using the properties of M B F s and the nature of these A V s in our transformation, we prov e that these 168 A V s can be replaced by two A V s. Minimal use of A V s: One of our goals is to use a minimum number ( m ) of A V s in performing the trans- formation of (4). Although, given a fixed choice of F k 2 , reducing the v alue of m does not change the complexity of the resulting min/cut algorithm asymptotically , it is crucial in se veral machine learning and computer vision problems. In general, most image based labeling problems in volv e millions of pixels and in typical problems, the number of fourth order priors is linearly proportional to the number of pixels. Such problems may be infeasible for large v alues of m . It was sho wn that the transformation of functions in F 4 2 can be achiev ed using about 30 auxiliary v ariables [50]. On the other hand, we show that we can transform the same class of functions using only 2 additional nodes. Note that this reduction is applicable to ev ery fourth order term in the function. A typical vision problem may in volve functions ha ving 10000 F 4 2 terms for an image of size 100 × 100 . Under these parameters, our algorithm will use 20000 A V s, whereas the e xisting approach [50] w ould use as man y as 300000 A V s. In se veral practical problems, this improv ement will make a significant dif ference in the running time of the algorithm. For a function in F k 2 , the maximum number of A V s required is gi ven by 2 2 k [6]. W e sho w that one can transform the function using substantially fe wer number of A V s gi ven by Dedekind number . In section 3.1, we sho w that the Dedekind number is substantially lower than 2 2 k . In [6], an LP based approach w as used to obtain the bound of 2 2 k . W e also use an LP-based approach, ho wev er the use of monotonic Boolean functions enables us to impro ve this bound to Dedekind number . The idea of reducing the number of A V s in an LP formulation has been done in other contexts [46]. In [46], a combinatorial structure commonly referred to as gadgets were computed using linear programming. This enables the transformation of constraints from one optimization problem to another . In this work, we sho w that we can transform a function with sev eral A V s to a function inv olving much fe wer A V s using a linear programming approach. 1.2 Limitations of Current A pproaches and Open Pr oblems Decomposition of submodular functions: Many existing algorithms for transforming higher order func- tions target the minimization of a single k -variable k th order function. Ho we ver , the transformation frame- work is incomplete without showing that a given n -v ariable submodular function of k th order can be de- composed into several individual k -variable k th order sub-functions. Billionnet prov ed that it is possible Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 5 to decompose a function in F 3 in volving several variables into 3-v ariable functions in F 3 [1]. T o the best of our knowledge, the decomposition of fourth or higher order functions is still an open problem and it will remain a hard problem due to the following reasoning. In [16], it was pro ven that testing a membership of a function f with n variables in F 4 is NP-complete. It is easy to test the submodularity of a fourth order function with 4 variables. Thus if a function f with n variables is decomposed into se veral 4-variable fourth order functions and if each of these indi vidual 4-variable functions are submodular, then the function f is submodular . This seems to be most possible case when we know that a function is sub- modular . Thus it is very unlikely to know that a function is submodular and not know its decomposition. Giv en this, it is lik ely that specialized solvers based on max-flow algorithms may ne ver solv e the general class of submodular functions. Ho we ver , this decomposition problem is not a critical issue in machine learning and vision problems. This is because the higher order priors from natural statistics already occur in different sub-functions of k nodes - in other words, the decomposition is known a priori. This paper only focuses on the transformation of a single k -variable function in F k . As mentioned abov e, the so- lution to this problem is still suf ficient to solv e large functions with hundreds of nodes and higher order priors in applications. Non-Boolean pr oblems: The results in this paper are applicable only to set or pseudo-Boolean functions. Many real w orld problems in volve variables that can take multiple discrete values. Ishikawa showed that it is possible to transform a multi-label second order function to a Boolean second order function using Boolean v ariables to encode multi-label variables [20]. T o denote a single multi-label v ariable with l labels, l Boolean v ariables were used. Ishika wa’ s method considered functions with con ve x priors, a class of functions that is slightly more restricted than general submodular functions. Schlesinger and Flach later showed that it is possible to transform general submodular multi-label functions of second order to Boolean second order functions [44]. This approach used l − 1 Boolean variables to encode an l -label multi-label v ariable. Ramalingam et al. [39] generalized this work for transforming multi-label higher order functions to Boolean second-order functions. In [39], the transformation does not preserve submodularity for fourth or higher order functions [39]. Zivn y et al. [49] proved that it is not possible to hav e a submodularity preserving transformation for fourth or higher order functions. Excess A V s: The complexity of an efficient max-flo w algorithm is O (( n + m ) 3 ) where n is the number of variables in the original higher order function and m is the number of A V s. T ypically in imaging problems, the number of higher order terms is of O ( n ) and the order k is less than 10. Thus the minimization of the function corresponding to an entire image with O ( n ) higher order terms will still ha ve a complexity of O (( n + n ) 3 ) . Howe v er when m becomes at least quadratic in n , for example, if a higher-order term is defined ov er ev ery triplet of v ariables in V , the comple xity of the max-flow algorithm will exceed that of a general solver being O (( n + n 3 ) 3 ) . Thus in applications in volving a v ery large number of higher order terms, a general solver may be more appropriate. 2 Preliminaries Definition 5. The (discr ete) derivative of a function f ( x 1 , . . . , x n ) with r espect to x i is given by: δ f δ x i ( x 1 , . . . , x n ) = f ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , x n ) − f ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , x n ) . (5) 6 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Definition 6. The second discr ete derivative of a function ∆ i,j ( x ) is given by ∆ i,j ( x ) = δ δ x j δ f δ x i ( x 1 , . . . , x n ) (6) =  f ( x 1 ,...,x i − 1 , 1 ,x i +1 ...,x j − 1 , 1 ,x j +1 ...,x n ) − f ( x 1 ,...,x i − 1 , 0 ,x i +1 ...,x j − 1 , 1 ,x j +1 ...,x n )  −  f ( x 1 ,...,x i − 1 , 1 ,x i +1 ...,x j − 1 , 0 ,x j +1 ...,x n ) − f ( x 1 ,...,x i − 1 , 0 ,x i +1 ...,x j − 1 , 0 ,x j +1 ...,x n )  . Note that it follows fr om the definition of submodular functions (2) , that their second derivative is always non-positive for all x . 3 T ransforming functions in F n 2 to F 2 Consider the following submodular function f ( x ) ∈ F n 2 represented as a multi-linear polynomial: f ( x ) = X S ∈ B n a S ( Y j ∈ S x j ) , a S ∈ R . (7) Let us consider a function h ( x , z ) ∈ F 2 where z is a set of A V s used to model functions in F n 2 . Any general function in F 2 can be represented as a multi-linear polynomial (consisting of linear and bi-linear terms in volving all v ariables): h ( x , z ) = X i a i x i − X i,j : i>j a i,j x i x j + X l a l z l − X l,m : l>m a l,m z l z m − X i,l a i,l x i z l . (8) The negati ve signs in front of the bi-linear terms ( x i x j , z l x i , z l z m ) emphasize that their coefficients ( − a ij , − a il , − a lm ) must be non-positi ve if the function is submodular . W e are seeking a function h such that: f ( x ) = min z ∈ B n h ( x , z ) , ∀ x . (9) Here the function f ( x ) is kno wn. W e are interested in computing the coef ficients ( a ), and in determining the number of auxiliary variables required to express a function as a pairwise submodular function. The problem is challenging due to the inherent instability and dependencies within the problem – different choices of parameters cause auxiliary variables to take different states. T o explore the space of possible solutions fully , we must characterize what states an A V takes. 3.1 A uxiliary V ariables as Monotonic Boolean Functions Definition 7. A monotonic (increasing) Boolean function ( M B F ) m : B n → B takes a Boolean vector as ar gument and r eturns a Boolean, s.t if y i ≤ x i , ∀ i = ⇒ m ( y ) ≤ m ( x ) . Lemma 1. Let z s ( x ) be a function that tak es an ar gument x and r eturns a Boolean as shown below: z s ( x ) = arg min z s  min z 0 h ( x , z 0 , z s )  , (10) wher e h ( x , z 0 , z s ) is a submodular function defined in Equation (8) and satisfying Equation (9). The function z s ( x ) that maps a Boolean vector x to the Boolean state of z s is an M B F (See Definition 7), wher e z 0 is the set of all auxiliary variables except z s . Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 7 Pr oof. W e consider a current labeling x with an induced labeling of z s = z s ( x ) . W e first note h 0 ( x , z s ) = min z 0 h ( x , z 0 , z s ) (11) is a submodular function i.e. it satisfies (2). W e no w consider incr easing the value of x , that is gi ven a current labeling x we consider a ne w labeling x ( i ) such that x ( i ) j = ( 1 if j = i x j otherwise. (12) W e wish to prov e z s ( x ( i ) ) ≥ z s ( x ) ∀ x , i. (13) Note that if z s ( x ) = 0 or x i = 1 this result is tri vial. This lea ves the case: z s ( x ) = 1 and x i = 0 . It follows from (6) that: h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 0) − h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 0) ≥ h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 1) − h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 1) . (14) Using Equation (10), we deriv e the follo wing from our hypothesis z s ( x ) = 1 and x i = 0 : h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 0) ≥ h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 1) . (15) Hence by replacing h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 0) with h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 1) in Equation (14), we hav e h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 0) − h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 0) ≥ (16) h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 1) − h 0 ( x 1 , . . . , x i − 1 , 0 , x i +1 , . . . , 0) . This implies the following: h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 0) ≥ h 0 ( x 1 , . . . , x i − 1 , 1 , x i +1 , . . . , 1) . (17) Therefore z s ( x ( i ) ) = 1 as per the Equation (10). Repeated application of the statement giv es y i ≤ x i , ∀ i = ⇒ z s ( y ) ≤ z s ( x ) as required u t Definition 8. The Dedekind number M ( n ) is the number of M B F s of n variables. F inding a closed-form expr ession for M ( n ) is known as the Dedekind pr oblem [25,29]. The Dedekind number of known values are shown below: M (1) = 3 , this corresponds to the set of functions: M 1 ( x 1 ) ∈ { 0 , 1 , x 1 } , (18) where 0 and 1 are the functions that take any input and return 0 or 1 respectiv ely . M (2) = 6 corresponding to the set of functions: M 2 ( x 1 , x 2 ) = { 0 , 1 , x 1 , x 2 , x 1 ∨ x 2 , x 1 ∧ x 2 } . (19) Similarly , M (3) = 20 , M (4) = 168 , M (5) = 7581 , M (6) ≈ 7 . 8 × 10 6 , M (7) ≈ 2 . 4 × 10 12 , and M (8) ≈ 5 . 6 × 10 23 . Theorem 1. On transforming the larg est graph-r epr esentable subclass of k th or der function to pairwise Boolean function, the upper bound on the maximal number of r equir ed A V s is given by the Dedekind number D ( k ) . 8 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Pr oof. The proof is straightforward. Consider a general multinomial, of similar form to Equation (7) with more than D ( k ) A V s. It follo ws from Lemma 1 that at least 2 of the A V s must correspond to the same M B F , and always tak e the same values. Hence, all references to one of these A V in the pseudo- Boolean representation can be replaced with references to the other, without changing the associated costs. Repeated application of this process will leav e us with a solution with at most D ( k ) A V s. u t Although this upper bound is large for e ven small values of k , it is much tighter than the existing upper bound of S ( k ) = 2 2 k [6] (also see Proposition 24 in [51]). Lemma 2. Let D ( k ) denote the Dedekind number for all positive values of k . Given S ( k ) = 2 2 k and for even values of k , we have: S ( k ) ≥ 2 P i ∈{ 0 , 1 ,...,k }\{ k 2 − 1 , k 2 } ( k i ) D ( k ) . (20) When k is odd, we have: S ( k ) ≥ 2 P i ∈{ 0 , 1 ,...,k }\{ k − 1 2 , k +1 2 } ( k i ) D ( k ) . (21) Pr oof. For e ven small v alues of k = { 3 , ..., 8 } the upper bound using Dedekind’ s number is much tighter compared to S ( k ) : ( M (3) = 20 , S (3) = 256) , ( M (4) = 168 , S (4) = 65536) , ( M (5) = 7581 , S (5) ≈ 4 . 29 × 10 9 ) , ( M (6) ≈ 7 . 8 × 10 6 , S (6) ≈ 1 . 85 × 10 19 ) , ( M (7) ≈ 2 . 4 × 10 12 , S (7) ≈ 3 . 4 × 10 38 ) , and ( M (8) ≈ 5 . 6 × 10 23 , S (8) ≈ 1 . 156 × 10 77 ) . For k > 8 , D ( k ) remains unknown, and the development of a closed form solution remains an activ e area of research. Sev eral upper bounds ha ve been deriv ed for D ( k ) and we use the follo wing bound by Hansel [ 19,25] to prov e our result. D ( k ) ≤ 3 ( k b k 2 c ) , (22) D ( k ) ≤ 2 log 2 (3) ( k b k 2 c ) . (23) The proof is gi ven for tw o dif ferent cases depending on whether k is e ven or odd. First let us consider the case when k is even.  k b k 2 c  =  k k 2  . (24) W e can obtain the following:  k k 2  = k × ( k − 1) ... ( k − k 2 ) 1 × 2 ... ( k 2 ) = k × ( k − 1) ... ( k − ( k 2 − 1)) 1 × 2 ... ( k 2 − 1) =  k ( k 2 − 1)  . (25) Using binomial theorem we know that X i ∈{ 0 , 1 ,...k }  k i  = 2 k . (26) Using Equations( 25) and ( 26) we hav e the following Equation: 2 k = X i ∈{ 0 , 1 ,..., k 2 − 2 , k 2 +1 ,...,k }  k i  + 2  k k 2  . (27) Since log 2 (3) < 2 , it is easy to observe the following: 2 k ≥ X i ∈{ 0 , 1 ,..., k 2 − 2 , k 2 +1 ,...,k }  k i  + log 2 (3)  k k 2  . (28) Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 9 T aking both sides to the exponent of 2, we ha ve the follo wing: 2 2 k ≥ 2 P i ∈{ 0 , 1 ,..., k 2 − 2 , k 2 +1 ,...,k } ( k i ) +log 2 (3) ( k k 2 ) (29) 2 2 k ≥ 2 P i ∈{ 0 , 1 ,..., k 2 − 2 , k 2 +1 ,...,k } ( k i ) 2 log 2 (3) ( k k 2 ) (30) S ( k ) ≥ 2 P i ∈{ 0 , 1 ,...,k }\{ k 2 − 1 , k 2 } ( k i ) D ( k ) . (31) This implies that S ( k ) is significantly larger than D ( k ) . Let us consider the case when k is odd.  k b k 2 c  =  k k − 1 2  . (32) It is well known that:  k k +1 2  = k × ( k − 1) ... ( k − k − 1 2 )( k − k +1 2 ) 1 × 2 ... ( k − 1 2 )( k +1 2 ) =  k k − 1 2  k − 1 k + 1 =  k k − 1 2  (1 − 2 k + 1 ) . (33) Using Equations( 33) and ( 26) we hav e the following Equation: 2 k = X i ∈{ 0 , 1 ,..., k − 3 2 , k +3 2 ,...,k }  k i  + (1 + 1 − 2 k + 1 )  k k − 1 2  . (34) Since log 2 (3) < (1 + 1 − 2 k +1 ) for k > 8 , it is easy to observe the follo wing: 2 k ≥ X i ∈{ 0 , 1 ,..., k − 3 2 , k +3 2 ,...,k }  k i  + log 2 (3)  k k − 1 2  . (35) By lifting both sides to the power of 2, we ha ve the follo wing relation: 2 2 k ≥ 2 P i ∈{ 0 , 1 ,..., k − 3 2 , k +3 2 ,...,k } ( k i ) +log 2 (3) ( k k − 1 2 ) (36) 2 2 k ≥ 2 P i ∈{ 0 , 1 ,..., k − 3 2 , k +3 2 ,...,k } ( k i ) 2 log 2 (3) ( k k − 1 2 ) (37) S ( k ) ≥ 2 P i ∈{ 0 , 1 ,...,k }\{ k − 1 2 , k +1 2 } ( k i ) D ( k ) . (38) u t W e observe that S ( k ) is significantly larger than D ( k ) when k is odd. In [52], the problem of improving this upper bound was mentioned as an open problem. In some sense, both these upper bounds are not practically feasible for e ven small v alues of k . This number is prohibitiv e because we are looking for an e xact transformation that preserves submodularity . By using auxiliary variables, we can also transform a gi ven higher order function to a non-submodular one using much fewer variables [21,12,15]. In section 5, we will further tighten the bound for fourth order functions. Note that this representation of A V s as M B F is over -complete, for example if the M B F of a auxiliary variable z i is the constant function z i ( x ) = 1 we can replace min z ,z i h ( x , z , z i ) with the simpler (i.e. one containing less auxiliary variables) function min z h ( x , z , 1) . Giv en any function f in F k 2 , the equi v alent pairwise form f 0 ∈ F 2 can be found by solving a linear program. The construction of the linear program is giv en in the follo wing section. 10 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr 4 The Linear Program A sketch of the formulation can be giv en as follows: In general, the presence of A V s of indeterminate state, giv en a labeling x makes the minimizing an LP non-con v ex and challenging to solve directly . Instead of optimizing this problem containing A V s of unspecified state, we create an auxiliary v ariable associated with e very M B F . Hence gi ven an y labeling x the state of e very auxiliary v ariable is fix ed a priori, making the problem conv ex. W e show ho w the constraints that a particular A V must conform to a given M B F can be formulated as linear constraints, and that consequently the problem of finding the closest member of f 0 ∈ F 2 to any pseudo Boolean function is a linear program. This program will make use of the max-flo w linear program formulation to guarantee that the min- imum cost labeling of the A V s corresponds to their M B F s. T o do this we must first rewrite the cost of Equation (8) in a slightly different form. W e write: f ( x , z ) = c ∅ + X i c i,s (1 − x i ) + X i c t,i x i + X i,j : i>j c i,j x i (1 − x j ) + X l c l,s (1 − z l ) + X l c t,l z l + X l,m : l>m c l,m z l (1 − z m ) + X i,l c i,l x i (1 − z l ) . (39) where c ∅ is a constant that may be either positiv e or negativ e and all other c are non-negati ve v alues referred to as the capacity of an edge. By [11], this form is equiv alent to that of (8), in that any function that can be written in form (8), can also be written as (39) and visa versa. 4.1 The Max-flow Linear Pr ogram Under the assumption that x is fix ed, we are interested in finding a minimum of the Equation: f x ( z ) = c ∅ + X i c i,s (1 − x i ) + X i c t,i x i + X i,j : i>j c i,j x i (1 − x j ) + X l c l,s (1 − z l ) + X l c t,l z l + X l,m : l>m c l,m z l (1 − z m ) + X i,l c i,l x i (1 − z l ) = d x , ∅ + X l d x ,l,s (1 − z l ) + X l d x ,t,l z l + X l,m : l>m d x ,l,m z l (1 − z m ) (40) where d x , ∅ = c ∅ + X i : x i =0 c i,s + X i : x i =1 c t,i + X i,j : i>j ∧ x i =1 ∧ x j =0 c i,j (41) d x ,l,s = c l,s + X i : x i =1 c i,l , d x ,t,l = c t,l and d x ,l,m = c l,m . (42) Then the minimum cost of Equation (39) may be found by solving its dual max-flow program. Writing ∇ x ,s for flow from the sink, and ∇ x ,t for flow to the sink, we seek max ∇ x ,s + d x , ∅ , (43) subject to the constraints that f x ,ij − d x ,ij ≤ 0 , ∀ ( i, j ) ∈ E P j :( j,i ) ∈ E f x ,j i − P j :( i,j ) ∈ E f x ,ij ≤ 0 , ∀ i 6 = s, t ∇ x ,s + P j :( j,s ) ∈ E f x ,j s − P j :( s,j ) ∈ E f x ,sj ≤ 0 ∇ x ,t + P j :( j,t ) ∈ E f x ,j t − P j :( t,j ) ∈ E f x ,tj ≤ 0 f x ,ij ≥ 0 , ( i, j ) ∈ E (44) Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 11 where E is the set of all ordered pairs ( l, m ) : ∀ l > m , ( s, l ) : ∀ l and ( l, t ) : ∀ t , and f x ,i,j corresponds to the flow through the edge ( i, j ) . W e will not use this e xact L P formulation, but instead rely on the f act that f x ( z ) is a minimal cost labeling if and only if ther e exists a flow satisfying constr aints (44) such that f x ( z ) − ∇ x ,s − d x , ∅ ≤ 0 . (45) 4.2 Choice of M B F as a set of linear constraints W e are seeking minima of a quadratic pseudo Boolean function of the form (39), where x is the variables we are interested in minimizing and z the auxiliary v ariables. As previously mentioned, formulations that allow the state of the auxiliary variable to vary tend to result in non-con v ex optimization problems. T o av oid such difficulties, we specify as the location of minima of z as a set of hard constraints. W e want that: min z f x ( z ) = f x ([ m 1 ( x ) , m 2 ( x ) , . . . m D ( k ) ( x )]) ∀ x . (46) where f x is defined as in (40), and m 1 , . . . m D ( k ) are the set of all possible M B F s defined o ver x . By setting all of the capacities d i,j to 0 , it can be seen that a solution satisfying (46) must exist. It follows from the reduction described in Lemma 1, and that all functions that can be expressed in a pairwise form can also be expressed in a form that satisfies these restrictions. W e enforce condition (46) by the set of linear constraints (44) and (45) for all possible choices of x . Formally we enforce the condition f x ([ m 1 ( x ) , . . . , m D ( k ) ( x )]) − ∇ x ,s − d x , ∅ ≤ 0 . (47) Substituting in (40) we hav e 2 k sets of conditions, namely , X l d x ,l,s (1 − m l ( x )) + X l d x ,t,l (1 − m l ( x )) + X l,m : l>m d x ,l,m m l ( x ) (1 − m m ( x )) − ∇ x ,s ≤ 0 , (48) subject to the set of constraints (44) for all x . Note that we make use of the max-flow formulation, and not the more ob vious min-cut formulation, as this remains a linear program e v en if we allo w the capacity of edges d 1 to vary . Submodularity Constraints W e further require that the quadratic function is submodular or equiv alently , the capacity of all edges c i,j be non-negati ve. This can be enforced by the set of linear constraints that c i,j ≥ 0 , ∀ i, j. (49) 4.3 Finding the nearest submodular Quadratic Function W e now assume that we ha ve been gi ven an arbitrary function g ( x ) to minimize, that may or may not lie in F k . W e are interested in finding the closest possible function in F 2 to it. T o find the closest function to it (under the L 1 norm), we minimize: 1 In itself d is just a notational con venience, being a sum of coef ficients in c . 12 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr min c X x ∈ B k    g ( x ) − min z f ( x , z )    = (50) min c X x ∈ B k    g ( x ) − f ( x , m ( x ))    = (51) min c X x ∈ B k    g ( x ) −  c ∅ + X i c i,s (1 − x i ) + X i c t,i x i + X i,j : i>j c i,j x i (1 − x j ) (52) + X l c l,s (1 − m l ( x )) + X l c t,l (1 − m l ( x )) + X l,m : l>m c l,m m l ( x ) (1 − m m ( x )) + X i,l c i,l x i (1 − m l ( x ))     where m ( x ) = [ m 1 ( x ) , . . . , m D ( k ) ( x )] is the vector of all M B F s ov er x , and subject to the family of constraints set out in the pre vious subsection. Note that e xpressions of the form P i | g i | can be written as P i h i subject to the linear constraints h i > g i and h i > − g i and this is a linear program. u t 4.4 Discussion Sev eral results follow from the linear program described in the previous section. In particular , if we consider a function g of the same form as Equation (3) such that min c X x ∈ B k    g ( x ) − min z f ( x , z )    = 0 . (53) exactly defines a linear polytope for any choice of | x | = k , and this result holds for any choice of basis functions. Of equal note, the conv ex-conca ve procedure [47] is a generic move-making algorithm that finds local optima by successiv ely minimizing a sequence of con vex (i.e. tractable) upper-bound functions that are tight at the current location ( x 0 ). [34] showed ho w this could be similarly done for quadratic Boolean functions, by decomposing them into submodular and supermodular components. The work [30] showed that any function could be decomposed into a quadratic submodular function, and an additional ov erestimated term. Ne vertheless, this decomposition was not optimal, and the y did not suggest how to find a optimal o verestimation. The optimal overestimation which lies in F 2 for a cost function defined ov er a clique g may be found by solving the above L P subject to the additional requirements: g ( x ) ≤ min z f ( x , z ) , ∀ x 6 = x 0 (54) g ( x 0 ) ≥ min z f ( x 0 , z ) . (55) Efficiency concerns: As we consider larger cliques, it becomes less computationally feasible to use the techniques discussed in this section, at least without pruning the number of auxiliary variables considered. As previously mentioned, constant A V s and A V s that corresponds to that of a single variable in x i.e. z l = x i can be safely discarded without loss of generality . In the following section, we show that a function in F 4 2 can be represented by only two A V s, rather than 168 as suggested by the Dedekind number . Howe v er , in the general case a minimal form representation eludes us. As a matter of pragmatism, it may be useful to attempt to solv e the L P of the previous section without making use of any A V , and to successiv ely introduce ne w variables, until a minimum cost solution is found. Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 13 5 Tighter Bounds: T ransf orming functions in F 4 2 to F 2 Consider the following submodular function f ( x 1 , x 2 , x 3 , x 4 ) ∈ F 4 represented as a multi-linear poly- nomial: f ( x 1 , x 2 , x 3 , x 4 ) = a 0 + X i a i x i + X i>j a ij x i x j + X i>j >k a ij k x i x j x k + a 1234 x 1 x 2 x 3 x 4 , ∆ ij ( x ) ≤ 0 (56) where i, j, k ∈ S 4 and ∆ ij ( x ) is the discrete second deri v ativ e of f ( x ) with respect to x i and x j . Consider a function h ( x 1 , x 2 , x 3 , x 4 , z s ) ∈ F 2 where z s is an A V used to model functions in F 4 2 . In general, we need se veral A V s to transform a function in F 4 2 to a function in F 2 . Any general function in F 2 using one A V , can be represented as a multi-linear polynomial (consisting of linear and bilinear terms in volving all fi v e variables): h ( x 1 , x 2 , x 3 , x 4 , z s ) = b 0 + X i b i x i − X i>j b ij x i x j + ( g s − 4 X i =1 g s,i x i ) z s , b ij ≥ 0 , g s,i ≥ 0 , i, j ∈ S 4 . (57) The negati ve signs in front of the bilinear terms ( x i x j , z s x i ) emphasize that their coefficients ( − b ij , − g s,i ) must be non-positi ve to ensure submodularity . W e have the following condition from Equation (4), gi ven in page 3: f ( x 1 , x 2 , x 3 , x 4 ) = min z s ∈ B h ( x 1 , x 2 , x 3 , x 4 , z s ) , ∀ x . (58) Here the coef ficients ( a i , a ij , a ij k , a ij k l ) in the function f ( x ) are kno wn. W e wish to compute the co- efficients ( b i , b ij , g s , g s,n ) where i, j ∈ V , i 6 = j, n ∈ S 4 . If we were gi ven ( g s , g s,i ) then from Equa- tions (57) and (58) we would ha ve: z s = ( 1 if g s − P 4 i =1 g s,i x i < 0 , 0 otherwise. (59) Our main result is to prove that any function h ∈ F 4 2 can be transformed to a function h 0 ( x 1 , x 2 , x 3 , x 4 , z t , z r ) ∈ F 2 in volving only tw o auxiliary variables z t and z r as stated in Theorem 2. Let A be the family of sets corresponding to labelings of x such that: z s = 0 = arg min z s h ( x , z s ) . In the same way let B be the family of sets corresponding to labelings of x such that: z s = 1 = arg min z s h ( x , z s ) . These sets A and B partition x , as defined belo w: Definition 9. A partition divides P into sets A and B such that A = {S ( x ) : 0 = arg min z ∈ B h ( x , z ) , x ∈ B 4 } and B = P \A . Note that ∅ ∈ A . Here S ( x ) denotes the set corr esponding to x . In the rest of the paper, we say that the A V z s is associated with [ A , B ] or denote it by z s : [ A , B ] . W e illustrate the concept of a partition in Figure 2. A few partitions that play a key role in our transformation are referred to as forward, backward, and intermediate partitions. Definition 10. The f orward reference partition [ A f , B f ] tak es the form: B ∈ B f ⇐ ⇒ | B | ≥ 3 , A f = P \B f (60) The backward refer ence partition [ A b , B b ] is shown below: B ∈ B b ⇐ ⇒ | B | ≥ 2 , A b = P \B b (61) 14 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr (a) (b) Fig. 2. W e show some examples of partitions using Hasse diagrams. Here, we use set representation for de- noting the labelings of ( x 1 , x 2 , x 3 , x 4 ) . For e xample the set { 1 , 2 , 4 } is equi valent to the labeling { x 1 = 1 , x 2 = 1 , x 3 = 0 , x 4 = 1 } . In (a), A = {{} , { 2 } , { 3 } , { 4 } , { 2 , 3 } , { 2 , 4 } , { 3 , 4 } , { 2 , 3 , 4 }} and B = {{ 1 } , { 1 , 2 } , { 1 , 3 } , { 1 , 4 } , { 1 , 2 , 3 } , { 1 , 2 , 4 } , { 1 , 3 , 4 } , S 4 } . (a) and (b) are examples of partitions. An y arbitrary A V must be associated with one of these 168 partitions as giv en by the Dedekind number D ( k ) . F igur e 3(a) and (b) show the forwar d and bac kwar d partitions r espectively . W e consider a set of 18 partitions as intermediate partitions [ A i , B i ] as shown in F igur e 4. Ther e ar e 6 intermediate partitions wher e ther e ar e five sets in B i that have cardinality 2 (one such partition is shown in F igur e 4(a)). There are 12 intermediate partitions where ther e ar e four sets in B i that have car dinality 2 (one such partition is shown in F igure 4(b)). One may expect more intermediate partitions by considering all possible differ ent sets in B i having cardinality 2. However , we will see later that such partitions ar e not necessary for transforming a function in F 4 2 to a function in F 2 . (a) (b) Fig. 3. The two r efer ence partitions, r eferr ed to as forwar d and backwar d, ar e shown. The basic idea in our w ork is to replace se veral A V s using the minimum number of A V s without changing the values of the function at their respecti v e minima. Definition 11. W e say that a function h ( x , z ) can be transformed to another function h 0 ( x , z 0 ) wher e z 6 = z 0 if the following condition is satisfied: min z h ( x , z ) = min z 0 h 0 ( x , z 0 ) , ∀ x (62) Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 15 (a) (b) Fig. 4. W e have a total of 18 intermediate partitions. In (a), we show one of the 6 intermediate partitions wher e five sets in B i have cardinality 2. W e denote this as I (34) , where the index refers to the only set that does not have car dinality 2. In (b), we show one of the 12 intermediate partitions wher e four sets in B i have cardinality 2. W e denote this as I (24 , 34) , where the indices r efer to the sets that do not have car dinality 2. wher e z and z 0 ar e vectors of auxiliary variables with differ ent partitions. The car dinality of z need not be equal to the car dinality of z 0 . Through a sequence of transformations of the above form, we start with a general function h ( x , z ) and finally compute a function h 0 ( x , z s , z t ) with only tw o A V s in reference partitions. Lemma 3. Let z a : [ A s , B s ] and z b : [ A s , B s ] be two A V s that have the same partition then h ( x , z a , z b ) ∈ F 2 can be transformed to some function h 0 ( x , z s ) ∈ F 2 in volving only one A V z s . Pr oof. According to the Equation 62, we can transform a function h ( x , z a , z b ) to h 0 ( x , z s ) if it satisfies the following condition: min z a ,z b h ( x , z a , z b ) = min z s h 0 ( x , z s ) , ∀ x (63) Since the A V s z a and z b take the same partition [ A s , B s ] their Boolean values are equal for different configurations of x . Thus we can replace all instances of z a and z b with z s . u t Theorem 2. Any function f ( x ) in F 4 2 can be transformed to some function h ( x , z f , z s ) in F 2 wher e z f corr espond to the forwar d partition and z s can either be the backwar d partition or one of the 18 intermediate partitions. Pr oof. Using Theorem 5.2 from [37] we can decompose a giv en submodular function in F 4 into functions in 10 different groups G i , i = { 1 .. 10 } where each G i is shown in T able 1. As shown in [50] the functions in G 10 does not belong to F 4 2 . It was also sho wn that any submodular function that has any functions from group G 10 does not belong to F 4 2 according to Theorem 16(3) in [50]. Thus all the functions in F k 2 should be composed of functions in the groups G i , i ∈ { 1 , ..., 9 } . The number of distinct terms in each group G i is giv en in T able 1. Overall, there are 31 distinct functions in the groups G i , i ∈ { 1 , ..., 9 } . The terms in the first group G 1 has only second de gree terms. Hence, the functions in this group does not require any A V s. The terms in the next 7 groups G i , i ∈ { 2 , ..., 8 } can each be represented by a single A V , which can be either z f or z b . Here z f and z b denote A V s in the forward and backward partitions respectiv ely . The 6 terms in G 9 can be represented using two A V s z f and z i , where z f and z i correspond to forward and intermediate reference partitions (denoted by I ( k , l ) in Figure 4(a)) respectively . It is important to note that the functions in G 9 in volv e interaction between z f and z i , i.e., there exists a bilinear term z f z i in G 9 . W e prov e the result by considering two cases. 16 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Group G i |G i | f ( x ) min z 1 ,z 2 h ( x , z 1 , z 2 ) where h ∈ F 2 G 1 ( i, j ) 6 − x i x j − x i x j G 2 ( i, j, k ) 4 − x i x j x k min z f (2 − x i − x j − x k ) z f G 3 1 − x i x j x k x l min z f (3 − x i − x j − x k − x l ) z f G 4 1 − x i x j x k x l + x i x j x k + x i x j x l + x i x k x l + x j x k x l − x i x j − x i x k − x i x l − x j x k − x j x l − x k x l min z b (1 − x i − x j − x k − x l ) z b G 5 ( i, j, k ) 4 x i x j x k x l − x i x j x k − x i x l − x j x l − x k x l min z b (2 − x i − x j − x k − 2 x l ) z b G 6 ( i, j, k ) 4 x i x j x k − x i x j − x i x k − x j x k min z b (1 − x i − x j − x k ) z b G 7 ( i ) 4 x i x j x k x l − x i x j x k − x i x j x l − x i x k x l min z f (3 − 2 x i − x j − x k − x l ) z f G 8 1 2 x i x j x k x l − x i x j x k − x i x j x l − x i x k x l − x j x k x l min z f (2 − x i − x j − x k − x l ) z f G 9 ( i, j ) 6 x i x j x k x l − x i x j − x i x k x l − x j x k x l min z f ,z i ((2 − x k − x l ) z f + (1 − x i − x j ) z i − z f z i ) G 10 6 − x i x j x k x l + x i x k x l + x j x k x l − x i x k − x i x l − x j x k − x j x l − x k x l f ( x ) / ∈ F 4 2 as shown in [50] T able 1. The above table is adapted fr om F igur e 2 of [51] where { i, j, k , l } = S 4 . Each gr oup G i has several terms depending on the values of { i, j, k, l } . The number of distinct terms in each gr oup is given by |G i | . Since the gr oups G 4 and G 8 in volve all four variables and are symmetric, the y contain one function each. z f and z b corr espond to A V s for forwar d and bac kwar d partitions. z i corr esponds to one of the intermediate partitions denoted by I ( k l ) in F igur e 4(a). F or each gr oup G i , we also use an index ( . ) in the first column to identify a specific function fr om others in its gr oup. Absence of G 9 functions: In the first case, we consider functions that can be expressed as a sum of functions in the first 8 groups G i , i = { 1 .. 8 } . In other words, we study the scenario where we express the function without using any function from G 9 . Let us denote such a function as f 0( x ) that can be expressed as a sum of 25 functions from the 8 groups G i , i = { 1 .. 8 } as sho wn belo w: f 0( x ) = α 1 G 1 ( i, j ) + · · · α 25 G 8 (64) The only A V s inv olved in all the functions are z f and z b . Using Lemma 3 we can obtain a function that uses only two v ariables z f and z b as shown belo w: f 0( x ) = g ( x ) + min z f ,z b ( g f ( x ) z f + g b ( x ) z b ) , (65) where g ( x ) , g f ( x ) and g b ( x ) are functions in volving x . This implies that any function that can be ex- pressed without any function from G 9 can be expressed using the only forward and backw ard partitions. Pr esence of G 9 functions: Let us consider an arbitrary function f ( x ) in F 4 2 that is expressed as a sum of functions from these 31 groups including functions from G 9 : f ( x ) = α 1 G 1 ( i, j ) + α 2 G 2 ( i, j ) + · · · + α 31 G 9 ( k , l ) (66) In T able 2, we show that the sum of two functions can always be represented using only two auxiliary variables. In T ables 3, 4 and 5 we sho w the sum of functions with 3, 4 and 5 terms respecti v ely . Different combinations of functions lead to functions that can always be e xpressed with only 2 auxiliary variables. W ithout loss of generality , we hav e a voided the repetition for all possible indices by treating them using the set { i, j, k, l } = { 1 , 2 , 3 , 4 } . W e have already proved the case where where G 9 is absent. Thus, the tables only show summations that in volve at least one function from G 9 . Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 17 In some cases, we do not show the sums of functions with real coef ficients ( α , β , γ , δ, η ) to demon- strate the special scenarios where the combination of two functions in volving intermediate partition z i can be transformed to a function that in v olves only z f and z b . In such cases, we can do the following sequentially: α 1 f 1 ( x ) + α 2 f 2 ( x ) = α 1 ( f 1 ( x ) + f 2 ( x )) + ( α 2 − α 1 ) f 2 ( x ) , α 1 ≤ α 2 (67) Let β = α 2 − α 1 and let f 3 ( x ) = f 1 ( x ) + f 2 ( x ) as per the T able 2. Now we can further use T able 2 on α 1 f 3 ( x ) + β f 2 ( x ) to generate other functions. As we see in T able 1, the function G 9 ( i, j ) uses two auxiliary variables z f and z i ∈ I ( k , l ) . As we observe in T able 2, on adding the function G 9 ( i, j ) with other functions we hav e the following scenarios: 1. The coef ficient of z i is unaltered and we only change the coef ficients of z f . This happens in 6 of the additions in T able 2 as given by ( G 9 ( i, j ) , G 1 ( i, j )) , ( G 9 ( i, j ) , G 2 ( j, k , l )) , ( G 9 ( i, j ) , G 7 ( k )) and ( G 9 ( i, j ) , G 8 ) . 2. W e obtain a function that can be expressed with only one z f . This happens in 4 of the additions in T able 2 as giv en by ( G 9 ( i, j ) , G 2 ( i, j, k )) , ( G 9 ( i, j ) , G 3 ) , ( G 9 ( i, j ) , G 7 ( i )) , and ( G 9 ( i, j ) , G 9 ( k , l )) . 3. W e obtain a function that can be e xpressed with only one z b . This happens in 2 of the additions shown in T able 2 as giv en by ( G 9 ( i, j ) , G 4 ) and ( G 9 ( i, j ) , G 6 ( j, k , l )) . 4. W e obtain a function that can be expressed with z f and z b . This happens in one of the additions shown in T able 2 as giv en by ( G 9 ( i, j ) , G 5 ( i, j, k )) . 5. W e obtain a function with z f and z i , whose coef ficients are changed. This happens in 3 of the addi- tions shown in T able 2 as gi ven by ( G 9 ( i, j ) , G 5 ( j, k , l )) , ( G 9 ( i, j ) , G 6 ( i, j, k )) , and ( G 9 ( i, j ) , G 9 ( i, k )) . There are 6 functions in group G 9 . Howe ver , as sho wn in T able 2, additions in v olving G 9 ( i, j ) and G 9 ( k , l ) produce functions inv olving only one auxiliary variable z f . In other words, sum of functions in volving G 9 ( i, k ) can sometimes be represented using functions from the first 8 groups ( G 1 to G 8 ). Out of the 6 functions in G 9 only two of them are necessary at a time. Without loss of generality , we rewrite the f ( x ) using a maximum of 2 functions in group G 9 as shown belo w: f ( x ) = α 1 G 1 ( i, j ) + · · · + α 26 G 9 ( i, j ) + α 27 G 9 ( i, k ) . (68) The remaining four terms in G 9 are not necessary due to the following reasons: – G 9 ( i, l ) is not necessary because its addition to G 9 ( i, j ) and G 9 ( i, k ) will lead to a function in v olving only z f and z b as per the second last row of T able 3. – Any function G 9 ( j, k ) is not necessary because its addition to G 9 ( i, j ) and G 9 ( i, k ) will lead to a function in volving only z f and z b as per the last row of T able 3. – Any function G 9 ( j, l ) is not necessary because its addition to G 9 ( i, k ) can be represented using a function that in volv es only z f as per the last row of T able 2. – Any function G 9 ( k , l ) is not necessary because its addition to G 9 ( i, j ) can be represented using a function that in volv es only z f as per the last row of T able 2. W e observ ed that we need a maximum of two functions from G 9 to represent any function in F 4 2 . So there are two possibilities for f ( x ) and we denote them as f 1( x ) and f 2( x ) depending on whether we use one or two of the functions from G 9 as shown belo w: f 1( x ) = α G 9 ( i, j ) + β G 5 ( i, k , l ) + γ G 5 ( j, k , l ) + δ G 6 ( i, j, k ) + η G 6 ( i, j, l ) + σ min z f ( g f ( x ) z f ) + g ( x ) , (69) f 2( x ) = α G 9 ( i, j ) + β G 9 ( i, k ) + γ G 5 ( j, k , l ) + δ G 6 ( i, j, k ) + η min z f ( g f ( x ) z f ) + g ( x ) . (70) 18 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr W e have represented f 1 and f 2 using 7 and 6 terms respectiv ely . W e show that we do not need any other functions for representing f 1 and f 2 : – G 1 : W e can represent them using g ( x ) . – G 2 : These functions in volv e z f and we can represent them using min z f g f ( x , z f ) according to Lemma 3. – G 3 : These functions in volv e z f and we can represent them using min z f g f ( x , z f ) according to Lemma 3. – G 4 : By adding this function to G 9 ( i, j ) we obtain functions in G 6 . The generated G 6 functions can be subsequently added to any functions in G 9 ( i, j ) or G 9 ( i, k ) . If there is no G 9 term, then we can represent the function using z f and z b as explained earlier in the case of f 0 . – G 5 : In the case of f 1 , some functions in G 5 can be added to G 9 ( i, j ) to obtain G 6 and G 8 . The generated functions can be subsequently added to any G 9 ( i, j ) terms. If there is no G 9 ( i, j ) term, then we can represent the function using z f and z b as explained earlier in the case of f 0 . The functions G 5 ( i, k , l ) and G 5 ( j, k , l ) alter the coef ficients of z i . For no w , we keep these functions of G 5 as separate terms in function f 1( x ) . In the case of f 2 , some functions in G 5 can be added to G 9 ( i, j ) or G 9 ( i, k ) to obtain G 6 and G 8 terms. The generated functions can be subsequently added to any G 9 ( i, j ) or G 9 ( i, k ) terms. If there is no G 9 ( i, j ) term, then we can represent the function using z f and z b as e xplained earlier in the case of f 0 . On adding the function G 5 ( j, k , l ) to G 9 ( i, j ) or G 9 ( i, k ) we produce functions that alter the coefficients of z i . For no w , we keep this function in G 5 to represent the original function f 2( x ) . – G 6 : In the case of f 1 , some functions in G 6 can be added to G 9 ( i, j ) to obtain functions in G 5 , which can be subsequently added to any G 9 ( i, j ) terms. If there is no G 9 ( i, j ) term, then we can represent the function using z f and z b as explained earlier in the case of f 0 . The functions G 6 ( i, j, k ) and G 6 ( i, j, l ) produce functions that alter the coefficients of z i . For no w , we keep these two functions of G 6 as separate terms in f 1( x ) . In the case of f 2 , some functions in G 6 can be added to G 9 ( i, j ) or G 9 ( i, k to obtain G 5 , which can be subsequently added to any G 9 ( i, j ) or G 9 ( i, k ) terms. If there is no G 9 ( i, j ) term, then we can represent the function using z f and z b as explained earlier in the case of f 0 . On adding the function G 6 ( i, j, k ) to G 9 ( i, j ) or G 9 ( i, k ) we produce functions that alter the coefficients of z i . For now , we keep this function G 6 ( i, j, k ) as separate term in f 2( x ) . – G 7 : Some functions in G 7 can be added to G 9 ( i, j ) or G 9 ( i, k ) to generate functions in G 8 , which can be subsequently added to an y G 9 ( i, j ) or G 9 ( i, k ) terms. In other cases, the functions in G 7 only modify the coefficients of z f terms that can be represented by the function min z f ( g f ( x ) z f ) . – G 8 : This function can be represented by min z f ( g f ( x ) z f ) since it only has one A V z f . Using T able 5 we rewrite f 1 as gi ven belo w: f 1( x ) = g ( x ) + min z f ,z i (( α (2 − x k − x l ) + σ g ( x )) z f + (71) ( α (1 − x i − x j )+ β (2 − x i − 2 x j − x k − x l )+ γ (2 − 2 x i − x j − x k − x l )+ δ (1 − x i − x j − x k )+ η (1 − x i − x j − x l )) z i − αz f z i ) = g ( x ) + min z f ,z i ( g 0 f ( x ) z f + g i ( x ) z i − αz f z i ) (72) Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 19 f 1 ( x ) , α ∈ R + f 2 ( x ) , β ∈ R + min z 1 ,z 2 h ( x , z 1 , z 2 ) , where h ( x , z 1 , z 2 ) = f 1 ( x ) + f 2 ( x ) , ∀ x A V s α G 9 ( i, j ) β G 1 ( i, j ) − β x i x j + α min z f ,z i ((2 − x k − x l ) z f + (1 − x i − x j ) z i − z f z i ) z f , z i ∈ I ( k l ) G 9 ( i, j ) G 2 ( i, j, k ) − x i x j + G 7 ( k ) = − x i x j + min z f (3 − x i − x j − 2 x k − x l ) z f z f α G 9 ( i, j ) β G 2 ( j, k , l ) min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x k − x l )) z f + α (1 − x i − x j ) z i − αz f z i ) z f , z i ∈ I ( k l ) G 9 ( i, j ) G 3 − x i x j + G 2 ( i, k , l ) + G 2 ( j, k , l ) = − x i x j + min z f ((2 − x i − x k − x l ) + (2 − x j − x k − x l )) z f z f G 9 ( i, j ) G 4 G 6 ( i, j, k ) + G 6 ( i, j, l ) - x k x l = − x k x l + min z b (2 − 2 x i − 2 x j − x k − x l ) z b z b G 9 ( i, j ) G 5 ( i, j, k ) G 8 + G 6 ( i, j, l ) - x k x l = − x k x l + min z f (2 − x i − x j − x k − x l ) z f + min z b (1 − x i − x j − x l ) z b z f , z b α G 9 ( i, j ) β G 5 ( j, k , l ) min z f ,z i (( α (2 − x k − x l ) z f + ( α (1 − x i − x j ) + β (2 − 2 x i − x j − x k − x l )) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 6 ( i, j, k ) min z f ,z i (( α (2 − x k − x l ) z f + ( α (1 − x i − x j ) + β (1 − x i − x j − x k )) z i − αz f z i ) z f , z i ∈ I ( k l ) G 9 ( i, j ) G 6 ( j, k , l ) G 5 ( i, k , l ) − x k x l = − x k x l + min z b (2 − x i − 2 x j − x k − x l ) z b z b G 9 ( i, j ) G 7 ( i ) G 8 − x i x j + G 2 ( i, k , l ) = − x 1 x 2 + min z f ((2 − x i − x j − x k − x l ) + (2 − x i − x k − x l )) z f z f α G 9 ( i, j ) β G 7 ( k ) min z f ,z i (( α (2 − x k − x l ) + β (3 − x i − x j − 2 x k − x l )) z f + α (1 − x i − x j ) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 8 min z f ,z i ( α (2 − x k − x l ) + β (2 − x i − x j − x k − x l )) z f + α (1 − x i − x j ) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 9 ( i, k ) min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x l )) z f + ( α (1 − x i − x j ) + β (1 − x i − x k )) z i - αz f z i − β z f z i ) z f , z i ∈ I ( k l, j l ) G 9 ( i, j ) G 9 ( k , l ) G 8 − x i x j − x k x l = − x i x j − x k x l + min z f ((2 − x i − x j − x k − x l ) z f z f T able 2. W e show the sum of a function G 9 ( i, j ) with any other function in T able 1 can be expr essed using two auxiliary variables. Her e the index set { i, j, k , l } = S 4 denotes the four distinct inte gers S 4 = { 1 , 2 , 3 , 4 } . 20 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr f 1 ( x ) , α ∈ R + f 2 ( x ) , β ∈ R + f 3 ( x ) , γ ∈ R + min z 1 ,z 2 h ( x , z 1 , z 2 ) , h ( x , z 1 , z 2 ) = f 1 ( x ) + f 2 ( x ) + f 3 ( x ) , ∀ x A V s α G 9 ( i, j ) β G 5 ( i, k , l ) γ G 5 ( j, k , l ) min z f ,z i (( α (2 − x k − x l ) z f + ( α (1 − x i − x j )+ β (2 − x i − 2 x j − x k − x l )+ γ (2 − 2 x i − x j − x k − x l )) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 6 ( i, j, k ) γ G 6 ( i, j, l ) min z f ,z i (( α (2 − x k − x l ) z f ( α (1 − x i − x j )+ β (1 − x i − x j − x k )+ γ (1 − x i − x j − x l )) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 5 ( j, k , l ) γ G 6 ( i, j, l ) min z f ,z i (( α (2 − x k − x l ) z f ( α (1 − x i − x j )+ β (2 − 2 x i − x j − x k − x l )+ γ (1 − x i − x j − x l )) z i − αz f z i ) z f , z i ∈ I ( k l ) α G 9 ( i, j ) β G 9 ( i, k ) γ G 5 ( j, k , l ) min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x l )) z f ( α (1 − x i − x j )+ β (1 − x i − x k )+ γ (2 − 2 x i − x j − x k − x l )) z i − αz f z i − β z f z i ) z f , z i ∈ I ( k l, j l ) α G 9 ( i, j ) β G 9 ( i, k ) γ G 6 ( i, j, k ) min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x l )) z f ( α (1 − x i − x j )+ β (1 − x i − x k )+ γ (1 − x i − x j − x k )) z i − αz f z i − β z f z i ) z f , z i ∈ I ( k l, j l ) G 9 ( i, j ) G 9 ( i, k ) G 9 ( i, l ) − x j x k x l + G 8 + G 5 ( j, k , l ) = − x j x k x l + min z f (2 − x i − x j − x k − x l ) z f + min z b (2 − 2 x i − x j − x k − x l ) z b z f , z b G 9 ( i, j ) G 9 ( i, k ) G 9 ( j, k ) G 7 ( l ) + G 8 + G 6 ( i, j, k ) = min z f ((3 − x i − x j − x k − 4 x l )+ (2 − x i − x j − x k − x l )) z f + min z b (1 − x i − x j − x k ) z b z f , z b T able 3. W e show the sum of any three functions from T able 1 can be e xpr essed using two auxiliary variables. Here the index set { i, j, k, l } = S 4 denotes the four distinct inte gers S 4 = { 1 , 2 , 3 , 4 } . Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 21 f 1 ( x ) f 2 ( x ) f 3 ( x ) f 4 ( x ) min z 1 ,z 2 h ( x , z 1 , z 2 ) , h ( x , z 1 , z 2 ) = f 1 ( x ) + f 2 ( x ) + f 3 ( x ) + f 4 ( x ) , ∀ x A V s G 9 ( i, j ) G 5 ( i, k , l ) G 5 ( j, k , l ) G 5 ( i, j, k ) ( G 9 ( i, j ) + G 5 ( i, j, k ))+ ( G 5 ( i, k , l ) + G 5 ( j, k , l )) = ( G 8 + G 6 ( i, j, l ) + G 1 ( k , l ))+ ( G 5 ( i, k , l ) + G 5 ( j, k , l )) z f , z b G 9 ( i, j ) G 6 ( i, j, k ) G 6 ( i, j, l ) G 6 ( j, k , l ) ( G 9 ( i, j ) + G 6 ( j, k , l ))+ ( G 6 ( i, j, k ) + G 6 ( i, j, l )) = ( G 5 ( i, j, k ) + G 1 ( k , l ))+ ( G 6 ( i, j, k ) + G 6 ( i, j, l )) z f , z b α G 9 ( i, j ) β G 5 ( i, k , l ) γ G 5 ( j, k , l ) δ G 6 ( i, j, k ) min z f ,z i ( α (2 − x k − x l ) z f + ( α (1 − x i − x j )+ β (2 − x i − 2 x j − x k − x l )+ γ (2 − 2 x i − x j − x k − x l )+ δ (1 − x i − x j − x k )) z i − αz f z i ) z f , z i ∈ I ( k , l ) α G 9 ( i, j ) β G 6 ( i, j, k ) γ G 6 ( i, j, l ) δ G 5 ( i, k , l ) min z f ,z i ( α (2 − x k − x l ) z f + ( α (1 − x i − x j )+ β (1 − x i − x j − x k )+ γ (1 − x i − x j − x l )+ δ (2 − x i − 2 x j − x k − x l )) z i − αz f z i ) z f , z i ∈ I ( k , l ) α G 9 ( i, j ) β G 9 ( i, k ) γ G 5 ( j, k , l ) δ G 6 ( i, j, k ) min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x l )) z f + ( α (1 − x i − x j )+ β (1 − x i − x k )+ γ (2 − 2 x i − x j − x k − x l )+ δ (1 − x i − x j − x k )) z i − αz f z i − β z f z i ) z f , z i ∈ I ( k , l ) T able 4. W e show the sum of any four functions fr om T able 1 can be expr essed using two auxiliary variables. Her e the index set { i, j, k, l } = S 4 denotes the four distinct inte gers S 4 = { 1 , 2 , 3 , 4 } . f 1 ( x ) , α ∈ R + f 2 ( x ) , α ∈ R + f 3 ( x ) , α ∈ R + f 4 ( x ) , α ∈ R + f 5 ( x ) , α ∈ R + min z 1 ,z 2 ( f 1 ( x ) + f 2 ( x ) + f 3 ( x ) + f 4 ( x ) + f 5 ( x )) , ∀ x α G 9 ( i, j ) β G 5 ( i, k , l ) γ G 5 ( j, k , l ) δ G 6 ( i, j, k ) η G 6 ( i, j, l ) min z f ,z i ( α (2 − x k − x l ) z f + ( α (1 − x i − x j )+ β (2 − x i − 2 x j − x k − x l )+ γ (2 − 2 x i − x j − x k − x l )+ δ (1 − x i − x j − x k )+ η (1 − x i − x j − x l )) z i − αz f z i ) T able 5. W e show the sum of five functions fr om T able 1 can be expr essed using two auxiliary variables z f and z i ∈ I ( k , l ) . Here the inde x set { i, j, k , l } = S 4 denotes the four distinct inte gers S 4 = { 1 , 2 , 3 , 4 } . 22 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Using the last row of T able 4 we rewrite f 2 as gi ven belo w: f 2( x ) = g ( x ) + min z f ,z i (( α (2 − x k − x l ) + β (2 − x j − x l ) + g f ( x )) z f + (73) ( α (1 − x i − x j )+ β (1 − x i − x k )+ γ (2 − 2 x i − x j − x k − x l )+ δ (1 − x i − x j − x k )) z i − αz f z i − β z f z i ) = g ( x ) + min z f ,z i ( g 0 f ( x ) z f + g i ( x ) z i − ( α + β ) z f z i ) (74) It is sho wn that f 1 and f 2 need only two A V s z f and z s . In the case of f 0 , z s is a backward partition. In the case of f 1 and f 2 , z s belongs to one of the 18 intermediate partitions. u t 6 Linear Programming solution For a gi ven function f ( x 1 , x 2 , x 3 , x 4 ) in F 4 2 , our goal is to compute a function h ( x , z ) in F 2 . Theorem 2 shows that we need only two A V s ( z f , z s ) . Here z f corresponds to the forward reference partition. The A V z s is either the backward partition or one of the 18 intermediate reference partitions. Unfortunately , we do not kno w which one of these 19 partitions is required before we do the transformation. In what follows, we will show the transformation assuming that we know the specific partition for z s . Note that z b is a special case of z i and we do not use the bilinear term z f z s when z s = z b . In order to handle this condition we use a Boolean v ariable that takes the value 0 when the intermediate partition is the backward reference partition and 1 otherwise: δ ( z s ) = ( 0 if z i ∈ [ A b , B b ] , 1 otherwise. (75) The required function h ( x , z ) is the following: h ( x , z f , z s ) = b 0 + X i b i x i − X i>j b ij x i x j + ( g f − 4 X i =1 g f ,i x i ) z f + ( g s − 4 X i =1 g s,i x i ) z s − δ ( z s ) j f s z f z s , (76) such that b ij , g f ,i , g s,i , j f s ≥ 0 and i, j ∈ S 4 . As we know the partitions of ( z f , z s ) , we know their Boolean v alues for all labelings of x . W e need the coefficients ( b i , b ij , j f s , g f , g s , g f ,i , g s,i ) , i ∈ S 4 to compute h ( x 1 , x 2 , x 3 , x 4 , z f , z s ) . These coef ficients satisfy both submodularity constraints (that the co- efficients of all bilinear terms ( x i x j , x i z f , x j z s , z f z s ) are less than or equal to zero) and those imposed by the reference partitions. First we list the submodularity conditions below:     b ij g f ,i g s,i j f s     T | {z } S p ≥ 0 , i, j = S 4 , i 6 = j, (77) Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 23 where 0 refers of a vector composed of 0 ’ s of appropriate length. Next we list the conditions which guarantee f ( x ) = min z f ,z s h ( x , z f , z s ) , ∀ x . Let η ( S ) be the value of z f z s for S ∈ P . This can be obtained using the partitions of z f and z s . η ( S ) = ( 1 if S ∈ ( B f ∩ B s ) 0 otherwise. (78) Let us denote the value of A V z s for different subsets of S 4 as giv en belo w: z S s = ( 1 if S ∈ B s , 0 if S ∈ A s . (79) Let G and H denote v alues of functions f and h respecti vely: G = f ( 1 S 1 , 1 S 2 , 1 S 3 , 1 S 4 ) , ∀ S ∈ P (80) H = h ( 1 S 1 , 1 S 2 , 1 S 3 , 1 S 4 , 0 , 0) + ( g f − 4 X i =1 g f ,i 1 S i ) z S f + ( g s − 4 X i =1 g s,i 1 S i ) z S s − δ ( z s ) η ( S ) j f s (81) As a result we hav e the following 16 linear Equations (N.B. there are 2 4 (16) dif ferent S ): G = H , ∀ S ∈ P (82) W e already know the partition of ( z f , z s ) and their appropriate v alues a priori. The following constraints ensure that z f and z s behav e according to their associated partitions.  g f − P 4 i =1 g f ,i 1 S i g s − P 4 i =1 g s,i 1 D i  | {z } G g ≥ 0 , S ∈ A f , D ∈ A s (83)  g f − P 4 i =1 g f ,i 1 S i − δ ( z s ) η ( S ) j f s g s − P 4 i =1 g s,i 1 D i − δ ( z s ) η ( D ) j f s  | {z } G l ≤ 0 , S ∈ B f , D ∈ B s . (84) W e need to compute the coef ficients ( b ij , g f , g f ,i , g s , g s,i , j f s ) that satisfy the Equations (77), (82), and (84). This is equiv alent to finding a feasible point in a linear programming problem: min const (85) s.t S p ≥ 0 , G = H , G g ≥ 0 , G l ≤ 0 (86) In the abov e LP formulation we assumed that we know the partition of A V s z f and z s . Howe ver , z s can be one of the 19 partitions. Before we do the transformation it is not easy to know which one of the 19 partitions is necessary . So we solve the LP 19 times and iterate over all the 19 partitions to identify the necessary one. For the correct partition, will be able to find a solution that satisfies all the constraints. 24 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr Fig. 5. W e gener ated submodular functions using non-weighted sum of functions fr om gr oups G i , i = { 2 , ..., 9 } . The x-axis denotes the number of functions chosen fr om G i , i = { 2 , ..., 8 } in gener ating the submodular functions and the y-axis gives the per centage of tr ansformations r equiring z i in intermediate partition. 7 Experiments The functions in the class F 4 2 can be transformed to functions in F 2 using 25 A V s according to existing results [49]. W e sho w that this transformation can be done using only two A V s using a linear program. In Matlab, the transformation takes around 0.03 seconds and it can be further improved using ef ficient C++ implementation. In our experiments as shown in Figure 5, we generated submodular functions using non-weighted sum of functions from G 9 and functions from groups G i , i = { 2 , ..., 8 } . W e do not consider functions from group G 1 since they do not require any A V s. The number of functions n G used from groups G i , i = { 2 , ..., 8 } is increased from 0 to 19. F or each v alue of n G , we generated 1000 functions and the non- negati ve weights are randomly generated in the interv al [0 , 1] . W e observed that as we increase n G , the generated submodular functions were less likely to use intermediate A V s. This also concurs with T able 2, that many combinations of G 9 with other functions can be represented using functions in the first 8 groups ( G 1 to G 8 ) that do not require any A V s in intermediate partition. 8 Discussion and open problems The reduction of higher order functions to quadratic ones will be beneficial for de veloping efficient minimization algorithms. These techniques can be broadly classified into two types: submodularity- preserving [1,28,18,48,38,14,40,50,42] and general techniques [41,12,15,2,21]. This paper belongs to the submodular -preserving class of algorithms where higher order submodular functions are transformed to quadratic submodular functions using A V s. The general techniques are usually employed in associa- tion with roof-duality approaches for minimizing non-submodular functions [4,3,5,43]. The general tech- niques also emplo y A V s and these A V s need not be M B F s. The e xisting upper bound for general reduction techniques is giv en by G ( k ) = 2 k − 2 ( k − 3) + 1 for a k th order function. W e show the comparison between the A V s used in general techniques and submodularity-preserving techniques in T able 6. Note that the upper bound for the number of A V s required for submodularity-preserving transfor- mation is much higher than for general reduction techniques. W e have improv ed the upper bound for submodular functions from 2 2 k to Dedekind number D ( k ) . In the case of fourth order functions we hav e Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 25 T ype Degree 3 4 5 6 7 8 General Ishikawa [21] 1 5 17 49 129 321 Submodularity Preserving Dedekind [25] 1 2 7581 ≈ 7 . 8 × 10 6 ≈ 2 . 4 × 10 12 ≈ 5 . 6 × 10 23 T able 6. Comparison of the number of A V s used for general versus submodularity-preserving techiques. The Dedekind number D ( k ) is unknown for k > 8 . further improv ed the upper bound from 168 ( D (4) ) to 2. Acknowledgments: Srikumar Ramalingam would like to thank Mitsubishi Electric Research Labora- tories (MERL) for the support. L ’ubor Ladický is funded by Max Planck Center for Learning Systems Fellowship. Philip H.S. T orr would like to acknowledge the financial support provided by ERC grant ERC-2012-AdG 321162-HELIOS, EPSRC/MURI grant ref EP/N019474/1, EPSRC grant EP/M013774/, and EPSRC Programme Grant Seebibyte EP/M013774/1. References 1. A. Billionnet and M. Minoux. Maximizing a supermodular pseudo-boolean function: a polynomial algorithm for supermodular cubic functions. Discrete Appl. Math. , 12(1):1–11, 1985. 2. E. Boros and A. Gruber . On quadratization of pseudo-boolean functions. In ISAIM , 2012. 3. E. Boros and P . L. Hammer . Pseudo-boolean optimization. Discr ete Appl. Math. , 123(1-3):155–225, 2002. 4. E. Boros and P .L. Hammer . A max-flow approach to improved roof-duality in quadratic 0-1 minimization. T echnical Report RRR 15-1989, R UTCOR, Rutg ers University , 1989. 5. E. Boros, P .L. Hammer, R. Sun, and G. T avares. A max-flow approach to impro ved lower bounds for quadratic unconstrained binary optimization (qubo). Discrete Optimization , 5(2):501â ˘ A ¸ S529, 2008. 6. D.A. Cohen, M.C. Cooper , P . Creed, and S. Zi vny P .G. Jea vons. An algebraic theory of complexity for discrete optimization. In SIAM Journal of Computing , volume 42(5), pages 1915–193, 2013. 7. M.C. Cooper . Minimization of locally defined submodular functions by optimal soft arc consistency . Constraints , 13(4):437â ˘ A ¸ S458, 2008. 8. Y . Crama and P .L. Hammer . Boolean Functions: Theory , Algorithms and Applications . Cambridge University Press, 2011. 9. W .H. Cunningham. On submodular function minimization. Combinatorica , 5:185–192, 1985. 10. J. Edmonds. Submodular functions, matroids and certain polyhedra. Calgary International Confer ence on Combinatorial Structur es and their applications , page 69â ˘ A ¸ S87, 1969. 11. M.L. Fisher , G.L. Nemhauser , and L.A. W olsey . An analysis of approximation for maximizing submodular setfunctions-i. Mathematical Pro gramming Studies , 8:73–87, 1978. 12. A. Fix, A. Gruber , E. Boros, and R. Zabih. A graph cut algorithm for higher-order markov random fields. In ICCV , pages 1020 – 1027, 2011. 13. L. Fleischer and S. Iwata. A push-relabel framework for submodular function minimization and applications to parametric optimization. Discrete Applied Mathematics , 131(2):311â ˘ A ¸ S322, 2001. 14. D. Freedman and P . Drineas. Energy minimization via graph cuts: Settling what is possible. In CVPR , v olume 2, pages 939–946, 2005. 15. A.C. Gallagher , D. Batra, and D. Parikh. Inference for order reduction in markov random fields. In CVPR , pages 1857 – 1864, 2011. 16. G. Gallo and B. Simeone. On the supermodular knapsack problem. Mathematical Pr o gramming: Series A and B , 45(2):295–309, 1989. 17. M. Grötschel, L. Lovász, and A. Schrijver . The ellipsoid method and its consequences in combinatorial opti- mization. Combinatorica , 1(2):169–197, 1981. 18. P . L. Hammer . Some network flo w problems solved with pseudo-boolean programming. Operations Resear ch , 13:388–399, 1965. 19. G. Hansel. Sur le nombre des fonctions booleennes monotones de n variables. C.R. Acad. Sci. P aris , 1966. 26 Srikumar Ramalingam, Chris Russell, L ’ubor Ladický, and Philip H.S. T orr 20. H. Ishikawa. Exact optimization for Marko v random fields with con vex priors. P AMI , 25:1333–1336, 2003. 21. H. Ishika wa. Transformation of general binary mrf minimization to the first-order case. P AMI , 13(6):1234 – 1249, 2011. 22. S. Iwata. A fully combinatorial algorithm for submodular function minimization. Journal of Combinatorial Theory , Series B , 84(2):203–212, 2002. 23. S. Iw ata. A faster scaling algorithm for minimizing submodular functions. SIAM J . Computing , 2337:1–8, 2003. 24. S. Iwata, L. Fleischer , and S. Fujishige. A combinatorial strongly polynomial algorithm for minimizing submod- ular functions. J. A CM , 48(4):761–777, 2001. 25. D. Kleitman. On dedekind’ s problem: The number of monotone boolean functions. Pr oc. Amer . Math Soc. , 1969. 26. P . Kohli, M. P . Kumar , and P . H. S. T orr . P 3 & beyond: Solving ener gies with higher order cliques. In CVPR , pages 1 – 8, 2007. 27. V . Kolmogorov . Minimizing a sum of submodular functions. Discrete Applied Mathematics , 160(15):2246â ˘ A ¸ S2258, 2012. 28. V . K olmogorov and R. Zabih. What energy functions can be minimized via graph cuts? P AMI , 26(2), 2004. 29. A.D. K orshunov . The number of monotone boolean functions. Pr oblemy Kibernet 38:5-108 , 1981. 30. L. Ladicky , C. Russell, P . Kohli, and P . T orr . Graph cut based inference with co-occurrence statistics. In Eur opean Confer ence on Computer V ision , pages 239–253. springer, 2010. 31. X. Lan, S. Roth, D. P . Huttenlocher , and M. J. Black. Efficient belief propagation with learned higher-order Markov random fields. In ECCV , pages 269–282, 2006. 32. Y .T . Lee, A. Sidford, and S.C. W ong. A faster cutting plane method and its implications for combinatorial and con ve x optimization. In In IEEE 56th Annual Symposium on F oundations of Computer Science , 2015. 33. L. Lovasz. Submodular functions and con vexity . Mathematical Pro gramming - The State of the Art , pages 235–257, 1983. 34. M. Narasimhan and J.A. Bilmes. A submodular -supermodular procedure with applications to discriminative structure learning. In Uncertainty in Artificial Intelligence , pages 404–412, 2005. 35. R. Nishihara, S. Jegelka, and M.I. Jordan. On the con v ergence rate of decomposable submodular function minimization. In NIPS , 2014. 36. J.B. Orlin. A faster strongly polynomial time algorithm for submodular function minimization. Mathematical Pr ogr amming , 118(2):237–251, 2009. 37. S. Promislow and V . Y oung. Supermodular functions on finite lattices. Or der 22(4) , 2005. 38. M. Queyranne. A combinatorial algorithm for minimizing symmetric submodular functions. SODA , pages 98–101, 1995. 39. S. Ramalingam, P . Kohli, K. Alahari, and P .H.S. T orr . Exact inference in multi-label crfs with higher order cliques. In CVPR , pages 1–8, 2008. 40. J.M.W . Rhys. A selection problem of shared fixed costs and network flo ws. Management Science , pages 200 – 207, 1970. 41. I. G. Rosenberg. Reduction of bi valent maximization to the quadratic case. Cahiers du Centre d’Etudes de Recher che Oper ationnelle , 17:71–74, 1975. 42. C. Rother, P . K ohli, W . Feng, and J. Jia. Minimizing sparse higher order energy functions of discrete v ariables. In CVPR , 2009. 43. C. Rother , V . Kolmogoro v , V . Lempitsky , and M. Szummer . Optimizing binary MRFs via extended roof duality . In CVPR , 2007. 44. D. Schlesinger and B. Flach. T ransforming an arbitrary minsum problem into a binary one. T echnical Report TUD-FI06-01, Dresden Univ ersity of T echnology , 2006. 45. A. Schrijv er . A combinatorial algorithm minimizing submodular functions in strongly polynomial time. J ournal of Combinatorial Theory Series B , 80(2):346–355, 2000. 46. L. T revisan G.B. Sorkin, M. Sudan, and D.P . W illiamson. Gadgets, approximation, and linear programming. SIAM Journal of Computing , 29(6):2074â ˘ A ¸ S2097, 2000. 47. Alan Y uille, Anand Rangarajan, and A. L. Y uille. The concave-con v ex procedure (cccp). In Advances in Neural Information Pr ocessing Systems 14 . MIT Press, 2002. 48. B. Zalesk y . Ef ficient determination of gibbs estimators with submodular energy functions. http://arxiv .org/abs/math/0304041v1 , 2003. Efficient Minimization of Higher Order Submodular Functions using Monotonic Boolean Functions 27 49. S. Zi vny , D.A.Cohen, and P .G. Jeav ons. The e xpressiv e po wer of binary submodular functions. Discrete Applied Mathematics , 157(15):3347–3358, 2009. 50. S. Zivny and P .G. Jeavons. Classes of submodular constraints expressible by graph cuts. Pr oceedings of CP , pages 112–127, 2008. 51. S. Zi vny and P .G. Jea vons. Which submodular functions are expressible using binary submodular functions? Oxfor d University Computing Laboratory Resear c Report CS-RR-08-08 , 2008. 52. S. Zivn y and P .G. Jeav ons. Classes of submodular constraints expressible by graph cuts. Constraints , 15(3):430– 452, 2010.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment