Inclusion of Unambiguous RE#s is NP-Hard

Reading time: 4 minute
...

📝 Original Info

  • Title: Inclusion of Unambiguous RE#s is NP-Hard
  • ArXiv ID: 1111.0422
  • Date: 2011-11-03
  • Authors: Pekka Kilpel’ainen

📝 Abstract

We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (RE#s) is NP-hard, even if the expressions satisfy the requirement of "unambiguity", which is required for XML Schema content model expressions.

💡 Deep Analysis

Figure 1

📄 Full Content

arXiv:1111.0422v1 [cs.CC] 2 Nov 2011 Inclusion of Unambiguous #REs is NP-Hard Pekka Kilpel¨ainen University of Kuopio Department of Computer Science Pekka.Kilpelainen@cs.uku.fi May 27, 2004 Abstract We show that testing inclusion between languages represented by regular expressions with numerical occurrence indicators (#REs) is NP-hard, even if the expressions satisfy the requirement of “unambi- guity”, which is required for XML Schema content model expressions. 1 Proof of the result We have seen before [3] that testing for inclusion and overlap of languages represented by #REs is NP-hard. Testing for the overlap was seen hard also for expressions that satisfy the XML requirement of “unambiguity”. On the other hand, the NP-hardness proof of #RE inclusion used ambiguous expressions. Here we show that unambiguity does not make the testing of inclusion essentially easier. The proof is based on a polynomial time Turing reduction [1, Chap. 5] from PARTITION, which is one of the best-known NP-complete problems [2, 1]. Theorem 1.1 The #RE inclusion problem is NP-hard, also for unambigu- ous #REs. Proof. Let a set A = {a1, . . . , ak} and a positive integer weight w(a) of each a ∈A form an instance of PARTITION. The problem is to decide whether A can be split in two equal-weight subsets A′ and A −A′, that is, whether X a∈A′ w(a) = X a∈A−A′ w(a) (1) holds for some A′ ⊆A. Notice that (1) can hold only if the total weight of the set A is even. Therefore we can assume that P a∈A w(a) = 2n for some 1 positive integer n, which means that (1) holds if and only if X a∈A′ w(a) = n (2) for some A′ ⊆A. For shortness, denote the weight w(ai) of an item ai ∈A by wi. Now form the following two #REs over the alphabet Σ = {a0, a1, . . . , ak}: E1 = an+1..n+1 0 (aw1..w1 1 |ǫ)(aw2..w2 2 |ǫ) · · · (awk..wk k |ǫ) E2 = ((a0|a1| · · · |ak)n+1..2n)1..2 Notice that both expressions are trivially unambiguous since each symbol of Σ appears exactly once in both of them. Expression E1 describes words of the form an+1 0 u, where the length of the suffix u equals the total weight of some subset of A. Therefore L(E1) ⊆{v ∈Σ∗| n + 1 ≤|v| ≤3n + 1}. Obviously E1 accepts a word of length 2n + 1 if and only if a partition that satisfies (2) exists. Expression E2, on the other hand, rejects any words of length 2n + 1: L(E2) = 2n [ i=n+1 Σi ∪ 4n [ i=2n+2 Σi = {v ∈Σ∗| n + 1 ≤|v| ≤4n, |v| ̸= 2n + 1} Now L(E1) ⊆L(E2) holds iffE1 does not accept any word of length 2n + 1, which holds if and only if no partition which satisfies (1) exists. □ So, a polynomial-time algorithm for testing the inclusion of unambiguous #REs would imply P = NP, which is considered most unlikely. References [1] M.R. Garey and D.S. Johnson. Computers and Intractability. W.H. Freeman and Company, New York, 1979. [2] R.M. Karp. Reducibility among combinatorial problems. In R.E. Miller and J.W. Thatcher, editors, Complexity of Computer Computations, pages 85–103. Plenum Press, New York, 1972. [3] P. Kilpel¨ainen and R. Tuhkanen. Regular expressions with numerical occurrence indicators—preliminary results. In Proc. of the Eighth Sym- posium on Programming Languages and Software Tools, pages 163–173. University of Kuopio, Department of Computer Science, 2003. 2

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut