Common Permutation Problem
In this paper we show that the following problem is NP-complete: Given an alphabet $\Sigma$ and two strings over $\Sigma$, the question is whether there exists a permutation of $\Sigma$ which is a subsequence of both of the given strings.
Authors: Marian Dvorsky
Common P erm utation Problem Mari´ an Dv orsk´ y marian .dvors ky@gmail.com Octob er 29, 2018 Abstract In this paper we sho w that the follo wing problem is N P -complete: Give n an alphab et Σ and tw o strings ov er Σ, the q u estion is whether there exists a p ermutatio n of Σ which is a subsequence of b oth of the giv en strings. 1 In tro du c tion In co mputer science, efficient a lgorithms for v arious str ing problems are studied. One of such problems is a w ell- known Longe st Common Sub sequen ce problem. F or tw o given s trings, the problem is to find the longest string which is a subseq uenc e of b oth the s trings. A s urvey o f efficient algor ithms for this pro blem can b e found in [1]. Let us co ns ider a mo dification of the Longest Common Subseq uence pro blem. Instead of finding any longest common subsequence, w e restrict o urselves to subse q uences in which symbols do not rep eat, i.e., every sym b ol o c c urs at most once. W e call this pro ble m Lon gest Restrict ed Common Subseq uence . 1 Example. F or strings “ bcaba ” and “ babcc a ”, the lo ngest common subsequence is “ baba “ while the longest restricted c o mmon subseq uence is “ bc a ”. Longes t Restr icted Common Subsequenc e is an optimization problem. In this pap er we con- sider its sp ecia l ca se which is the following decision problem: Suppo s e that the tw o str ings are formed ov er an a lpha bet Σ. The question is, do the t wo strings contain a restricted common s ubse- quence of the maximal p ossible length, i.e., a string that co ntains every s ym b ol of Σ exa ctly o nce? Such a string is a p ermutation of Σ. Therefore, we ca ll this pr oblem the Common Perm utatio n problem. Common Permut ation Instance : An alphab et Σ a nd tw o strings a, b over Σ. Question : Is there a permutation of Σ whic h is a common s ubsequence o f a and b ? W e will show that Commo n Permutati on is N P -complete. Mo reov er , we will show that Common Permut ation is N P -complete even if the input strings cont a in every symbol of Σ at most t wice. Common Permut ation can b e reduced to Lon gest Restricte d Commo n Subs equenc e by ask- ing whether the lo ng est restr ic ted common subse quence of the tw o string s is eq ua l to the size of the a lpha bet. Since Com mon Permutatio n will b e shown to b e N P -complete, it follows that Longes t Res tricte d Co mmon Subseque nce is N P -hard. In the nex t sec tion we define the terms used in this pap er. Section 3 intro duces a lignment s as a wa y to v isualize the Co mmon Permut ation pro blem. Finally , Section 4 pre sent s the pro of o f N P -completeness by reducing 3 SAT to Commo n Perm utatio n . 1 A more general ve r sion of this problem (with the same name) app eared in [3] together with its efficient solution. Unfortunately , that solution i s incorrect. Our result in this paper indeed sho ws that an efficient (p olynomial) solution for this pr oblem does not exist unless P = N P . 1 2 Preliminaries An alphab et is a finite set of symbols . A s tring ov e r an alphab et Σ is a finite sequence a = a 1 a 2 . . . a N where N is a length of the string and a i ∈ Σ for all i ∈ { 1 , . . . , N } . W e say that a i is a symbol on a p ositio n i in the string a . F or a given s ymbol x ∈ Σ, o ccurrences of x in a are all po sitions i such that a i = x . A subsequence o f a s tring a = a 1 a 2 . . . a N ov er Σ is a string b = a i 1 a i 2 . . . a i n where n ∈ { 0 , 1 , . . . , N } and 1 ≤ i 1 < i 2 < · · · < i n ≤ N . A common subseque nce of tw o strings a and b is a string whic h is a subse q uence of b oth a a nd b . A p ermutation of a finite set A = { x 1 , . . . , x n } is a string x i 1 x i 2 . . . x i n (note that the leng th of the string is the same as the num b er of elemen ts in A ) wher e i j ∈ { 1 , . . . , n } for j ∈ { 1 , . . . , n } and for a ll k , l ∈ { 1 , . . . , n } if k 6 = l then i k 6 = i l . The ab ove definitions give a formal basis for the statement o f the problem from Sectio n 1. F or the pr o o f of N P -completeness in Section 4 w e use the reduction fro m 3 -Satis fiabil ity ( 3SAT for shor t). The following definitions are fro m [2]. 2.1 3-Satisfiabilit y Let U = { u 1 , u 2 , . . . , u n } b e a se t o f Bo olean v ar iables. A truth a ssignment for U is a function t : U → { T , F } . If t ( u ) = T we say that u is tr ue under t ; if t ( u ) = F w e say that u is false. If u is a v a riable in U , then u and u a re liter als ov er U . T he literal u is true under t if and only if the v ariable u is true under t ; the literal u is true if and o nly if the v ariable u is false. A cla use ov er U is a set of literals ov er U , for ex ample { u 1 , u 3 , u 8 } . It r epresents the disjunction of those literals and is satisfied by a truth assig nmen t if and o nly if at least one of its members is true under tha t ass ignment. In other words, the clause is no t satisfie d if and only a ll its literals are false. The clause above will be satisfied by t unless t ( u 1 ) = F , t ( u 3 ) = T , t ( u 8 ) = F . A collection C of c lauses over U is sa tisfiable if and only if there exists some truth assignment for U that simult a neously sa tisfies all the clauses in C . Such a truth assignment is ca lle d a satisfying truth assignment for C . 3SAT Instance : A set U of v ariables and a c ollection C of clauses ov er U with exactly three literals p er clause . Question : Is there a satisfying tr uth a ssignment for C ? Theorem 2.1 3S AT is N P -c omplete. See [2] for the definition of N P -completeness a nd fo r the pro of of this theorem. 3 Alignmen ts In Section 4 we will use a no tio n of alig nmen ts . Imagine the t wo input str ings of Comm on Permut ation written in tw o rows, one string p er row. F o r every sy mbo l of the alphab et Σ we wan t to find ex actly one o c c urrence o f that symbo l in b oth str ings, such that we can “align” those o ccurrences. Example. F or tw o strings “ b caba ” and “ babcca ”, one of the p os sible a lig nmen ts is depicted below (the aligned o ccurrences a re bold) b c ab a b ab c c a 2 F ormally , let a and b b e strings over an alpha b et Σ. Let n b e the num b er of symbols in Σ . An alignment (deno ted A ) of a and b is a sequence of ordered pairs A = h i 1 , j 1 i , h i 2 , j 2 i , . . . , h i n , j n i such that for all k , the v alue of i k is a p osition in the str ing a , j k is a p osition in the string b , and a i k = b j k . Moreover, i 1 < · · · < i n , j 1 < · · · < j n , a nd a i 1 a i 2 . . . a i n (= b j 1 b j 2 . . . b j n ) is a p ermutation of Σ. F or all k we say that, in the alignment A , the p ositio n i k in the string a is a ligned with the po sition j k in the string b . W e also s ay that the symbol a i k (= b j k ) is a ligned at the p osition i k in a , and at the p osition j k in b . Positions i k and j k are aligned o cc urrences o f a i k . Notice that once a p ositio n i (in a ) is alig ned w ith a p osition j (in b ), p ositions less than i (in a ) cannot b e alig ned with p ositions greater tha n j (in b ) and vice versa. In other words, the aligned occ urrences o f different symbols cannot “cross”. Lemma 3. 1 L et a and b b e two strings over Σ . A p ermutation of Σ which is a c ommon su bse- quenc e of a and b exists if and only if ther e ex ists one or mor e alignments of a and b . Pr o of. An alignment corres p onds to s ubsequences in a and b whic h comprise a (co mmo n) p ermu- tation o f Σ. According to this lemma, an alignment of tw o strings is an e xistence pro o f (of a p olynomial size with resp ect to lengths o f the strings) for an instance of Com mon Permutati on . Therefore, Common Permut ation is in N P . The pro of o f N P - completeness follows in the next section. 4 Reduction In this se c tion we will reduce 3SAT to Com mon Permutati on . Theorem 4.1 Co mmon Permutati on is N P -c omplete. Pr o of. L et U be a finite s et of v ariables and C = { c 1 , c 2 , . . . , c n } b e a set o f clause s ov er U . W e hav e to constr uct an a lphab et Σ and t wo string s a, b ov er Σ such tha t ther e exists a permutation of Σ which is a co mmon subseq uence of b oth a and b if and o nly if C is satisfia ble . The pro o f consists of t wo parts. The fir st par t presents the construction of Σ and the string s. The second part prov es that the construction is corr ect in a sense that it satisfies the pr op erty describ ed a bove. Construction The alpha bet Σ consists of a pair o f symbols u i and u i for e very v aria ble u ∈ U and every clause c i for which either u ∈ c i or u ∈ c i . Additiona lly , Σ co n ta ins a sp ecial “b oundary” symbol • . The strings a a nd b have t wo pa r ts: “truth-setting” part and “satisfaction testing” part. The parts ar e s eparated by the b oundary symbol which e nsures that o ccur rences from o ne part ca nnot be alig ned with o ccurrence s from the other part of the string s. The “truth-setting” part consists of a concatenation of blocks, one for each v ar iable. Let u b e a v ariable from U and let { i 1 , . . . , i m } b e the indexes of cla uses in which it app ears. The strings contain the following blo ck for v ariable u : a = . . . u i 1 u i 2 . . . u i m u i 1 u i 2 . . . u i m . . . b = . . . u i 1 u i 2 . . . u i m u i 1 u i 2 . . . u i m . . . This blo ck is constructed in suc h a way that it is pos sible to simultaneously align a ll the symbols { u i 1 , u i 2 , . . . , u i m } inside this blo ck, o r all the symbols { u i 1 , u i 2 , . . . , u i m } . It is, how ever, not pos sible to simultaneously align b oth u i and u j for s ome i and j inside this blo ck. The “satisfac tion-testing” part consists of a c o ncatenation of blo cks, one for each clause . F or a clause c i ∈ C , let x , y , and z b e the liter als in the c la use c i , i.e., c i = { x, y , z } . W e use the following notation: 3 • if x = u for u ∈ U , then x i = u i and x i = u i • if x = u for u ∈ U , then x i = u i and x i = u i The string s co n tain the following blo ck for the clause c i : a = . . . x i y i z i x i y i z i . . . b = . . . x i y i z i y i x i z i y i . . . The blo ck has tw o par ts . The left par t is the same for b oth string s . T he r ight part is constructed in such a wa y that the s ym b ols { x i , y i , z i } cannot b e simultaneously aligned in this blo ck. Notice that these a re the symbols corresp onding to the truth assignment for whic h the clause is false . The alphab et Σ contains 6 n + 1 symbols. The leng th of the string a is 6 n + 1 + 6 n = 12 n + 1; the length of the string b is 6 n + 1 + 7 n = 13 n + 1. Therefo re, the size of the co nstructed Comm on Permut ation instance is p olynomial with resp ect to the original 3SAT insta nce. The construction can b e carr ied out in poly nomial time. Example. F or a set of v ariables { w , x, y , z } and c la uses {{ w, x, y } , { z , x, y }} which represent the log ical function ( w ∨ ¯ x ∨ y ) ∧ ( ¯ z ∨ x ∨ ¯ y ) we g et the alphab et Σ = { w 1 , w 1 , x 1 , x 1 , x 2 , x 2 , y 1 , y 1 , y 2 , y 2 , z 2 , z 2 } and the following string s: a = w 1 w 1 x 1 x 2 x 1 x 2 y 1 y 2 y 1 y 2 z 2 z 2 • w 1 x 1 y 1 w 1 x 1 y 1 z 2 x 2 y 2 z 2 x 2 y 2 b = w 1 w 1 | {z } for w x 1 x 2 x 1 x 2 | {z } for x y 1 y 2 y 1 y 2 | {z } for y z 2 z 2 | {z } for z • w 1 x 1 y 1 x 1 w 1 y 1 x 1 | {z } for c lause 1 z 2 x 2 y 2 x 2 z 2 y 2 x 2 | {z } for c lause 2 Correctness Now we verify that the constructed strings a and b contain a co mmon p ermutation of Σ if and only if C is satisfiable. Let t : U → { T , F } be any satisfying truth a s signment for C . W e will s how that there exists a p ermutation of Σ which a common s ubsequence o f bo th a and b , i.e., that it is p ossible to alig n all s y m b ols from Σ in the strings. There is only o ne choice how to align the b oundary symbol. F or a v aria ble u ∈ U , if t ( u ) = T , we align u i symbols for all i in the “truth-a ssigning” par t and u i in the “satisfaction- tes ting” part. If t ( u ) = F we conv er sely alig n u i symbols in the “truth-ass igning” pa rt and u i in the “satisfaction- testing” part. As we noted during construction, the desir ed alig nmen t in the “ truth-assigning ” part is a lwa ys po ssible to find. T o s how that we c an alig n the r emaining symbols in the “ satisfaction-testing” part, we use that t is s atisfying C . Notice that the s ym b ols which we hav e to alig n in the “sa tisfaction-testing” par t cor resp ond to the truth v alues of the v aria bles. F o r e x ample, if we hav e to a lig n symbo l u i in this part, we know that t ( u ) = F . F or every c lause c i ∈ C , c i = { x, y , z } , we a lign the remaining sy mbo ls corres p onding to clause c i in the blo ck for c i . The symbols x i , y i , and z i can b e aligned in the first part of the blo ck. It is ea sy to see that any pa ir of symbo ls from { x i , y i , z i } can b e aligned in the sec ond part. Ther efore, for a ll seven p ossibilities how c i can b e satisfied, we ca n align the corres p onding symbols in the blo ck for the clause c i . F or the pr o of in the opp osite direction, supp ose now that a and b have a commo n p e rmut a tion of the symbols in Σ. W e will co nstruct a sa tisfying truth ass ig nmen t for C . F or that we lo ok at “truth-setting” part of the strings. F or a v ariable u ∈ U , • if the sy m b ol u i for some i is aligned in the “truth-setting” part of the strings, w e set t ( u ) = T , 4 • if the sy m b ol u i for some i is aligned in the “truth-setting” part of the strings, w e set t ( u ) = F , • if none of the symbols { u i , u i } ar e a lig ned in the “truth-setting” part, we set t ( u ) ar bitr arily , say t ( u ) = T . Notice that, according to the co nstruction of the “ truth-setting” part, this is a v alid definition of the a ssignment, i.e., it cannot ha ppen that we would want to as sign t ( u ) to b oth T and F . W e now hav e to prove that t is a satisfying truth assignment for C . F or any clause c i ∈ C , let x , y , z b e its literals, so c i = { x, y , z } . W e know that not all the s ym b ols { x i , y i , z i } can b e aligned in “sa tisfaction-testing” part of the strings , so at lea s t o ne them must b e aligned in the “truth-setting” par t. Without loss of generality say it is x i . Therefore, by the definition of t , we know that the liter a l x is tr ue, and therefor e c i is true. Our cons tr uction in the pro of used every symbo l at most t wice in the string a , but used some of the symbols three times in the string b . The following co rollary s hows that we can use a slig h tly different construction whic h uses every symbol at most twice in bo th the string s. Corollary 4. 1 Common Permutati on is N P -c omplete even if every symb ol o c curs at most twic e in the given strings. Pr o of. W e will us e the sa me cons tr uction as in the pro o f of Theorem 4.1, except for the definition of Σ, and blo cks fo r clauses. F o r every cla use c i ∈ C we will add three additional symbols △ i , ♥ i , and i to the alphab et Σ. The str ing s co n tain the following blo ck fo r the clause c i : a = . . . x i x i △ i ♥ i y i y i △ i i ♥ i z i i z i . . . b = . . . x i △ i x i y i ♥ i y i i △ i ♥ i i z i z i . . . One can verify that inside this blo ck we can align symbols corr e spo nding to the satisfying assignment for c i , but we cannot alig n sim ultaneo usly x i , y i , and z i . The constructed string s contain every s ym b ol exactly twice with the e x ception of • which they contain once. 5 Ac kno wledgemen ts The a uthor would like to thank Martin Alter for sugg esting Corolla ry 4.1, J´ an Katreniˇ c for discus- sions ab out the problem, a nd author s of [4] for br inging the pr oblem (a g ain) to a uthor’s attention. References [1] L. B e rgroth, H. Hakonen, T. Raita: A su rvey of longest c ommon su bse quenc e algorithms . Pro ceedings of the Seven th Int er national Symp osium on String Pr o c essing and Infor mation Retriev al (SPIRE 200 0), pages 39–4 8 . [2] M. R. Garey , D. S. Johnson: Computers and Intra ctability: A Gu ide t o the The ory of NP- Completeness . W. H. F reema n 1979 , ISBN 071 6 71045 5. [3] G. Andrejko v´ a: The longest r estricte d c ommon subse quenc e pr oblem . Pro ceeding s o f the Prague String o logy Club W ork s hop 1998 , pages 14–25 . [4] M. Cro chemore, C. Hancart, T. Lecro q : Algorithms on Strings . Cambridge Universit y P r ess 2007, ISBN 052 1 84899 7. 5
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment