A filtration of a formal language L by a sequence s maps L to the set of words formed by taking the letters of words of L indexed only by s. We consider the languages resulting from filtering by all arithmetic progressions. If L is regular, it is easy to see that only finitely many distinct languages result. By contrast, there exist CFL's that give infinitely many distinct languages as a result. We use our technique to show that the operation diag, which extracts the diagonal of words of square length arranged in a square array, preserves regularity but does not preserve context-freeness.
Let s = (s(i)) i≥0 be an infinite strictly increasing sequence of non-negative integers. Berstel et al. [1] introduced the notion of filtering by s: given a finite word w = a 0 a 1 • • • a n , we write w[s] = a s(0) a s(1) • • • a s(k) , where k is the largest integer such that s(k) ≤ n < s(k+1). (If there is no such integer, then w[s] = ǫ.) Given a language L, we define L[s] = {w[s] : w ∈ L}.
Example 1. If w = theorem, and s = 0, 2, 4, 6, . . ., the sequence of even integers, then w[s] = term. If t = 1, 3, 5, . . ., the sequence of odd integers, then w[t] = hoe.
Berstel et al. [1] proved a number of theorems about filters, and characterized those sequences s that preserve regularity (i.e., L[s] is always regular if L is) and context-freeness.
In this note we revisit the concept of filtering from a slightly different point of view. Suppose we have an infinite set of filters S = {s 1 , s 2 , . . .}. Given a language L, what can be said about the set of all filtered languages {L[s i ] : i ≥ 1}? For example, is it finite?
In this note we are only concerned with filters s that represent arithmetic progressions: there exist integers a ≥ 1, b ≥ 0 such that s i = ai + b for i ≥ 0. We consider four different types of filter sets: If L is regular, a simple argument (given below) shows that filtration by the strong arithmetic progressions produces only finitely many distinct languages (and hence the same is true for filtration by the weak and ordinary arithmetic progressions and shifts). By contrast, there exist context-free languages L so that filtering only by the weak arithmetic progressions or the shifts produces infinitely many distinct languages (and hence the same is true for the ordinary and strong arithmetic progressions).
In Section 4 we introduce a natural operation on formal languages that is related to the results of Berstel et al. [1], but seemingly cannot be analyzed using their framework. We show that this operation preserves regularity, but does not preserve context-freeness.
We adopt the following notation: if L is a language, and s = (s i ) i≥0 is an arithmetic progression such that
2 The regular case Theorem 2. If L is regular, then filtering by the strong arithmetic progressions produces finitely many distinct languages. Remark 3. It is easy to see that if L is regular and s is an arithmetic progression, then L[s] is regular. Indeed, this follows immediately from the theorem that the regular languages are closed under applying a transducer, since it is easy to make a transducer that extracts the letters corresponding to indices in s. That is not the issue here; we need to see that among all the regular languages produced by filtering by a strong arithmetic progression, there are only finitely many distinct languages.
Proof. Let A = (Q, Σ, δ, q 0 , F ) be a DFA accepting L. Our proof is based on the boolean matrix interpretation of automata [3]. Let M c be the boolean incidence matrix of the underlying transition graph of the automaton corresponding to a transition on the symbol c ∈ Σ. That is, if Q = {q 0 , q 1 , . . . , q n-1 }, then
We also write M = c∈Σ M c . By standard results about path algebra, the matrix M n has a 1 in row i and column j if and only if there is a length-n path from q i to q j . Suppose L = L(A). We show how to create a DFA
where x 0 , x 1 , . . . , x n are words such that
n . Thus all states except q ′ 0 are boolean vectors. We let f be a boolean vector with 1’s in the positions corresponding to final states of F .
We define the transition function δ ′ as follows:
for all boolean vectors q and symbols c ∈ Σ. Also define
Finally, set
otherwise.
An easy induction on n now shows that if δ ′ (q ′ 0 , c 0 c 1 • • • c n-1 ) = v, then v has 1’s in the positions corresponding to all states of the form δ(q 0 , x 0 c 0 • • • x n-1 c n-1 ), where the words x i satisfy the inequalities mentioned previously. It follows that L(A ′ ) = L a,b .
Note that A ′ has 2 n + 1 states, and this quantity does not depend on a or b. There are only finitely many languages with this property.
Theorem 4. There exists a context-free language L such that filtering by the weak arithmetic progressions produces infinitely many distinct languages.
Then it is easy to see that L is context-free, as it is generated by the context-free grammar
We claim that the languages L a,0 for a ≥ 2 are all distinct. To see this, it suffices to show that L a,0 ∩ 123 + = {123 a-1 }.
Clearly 123 a-1 = z a,0 , where z = 10 a-1 2(0 a-1 3) a-1 ∈ L. Now suppose x ∈ L a,0 ∩ 123 + . Then x = w a,0 for some w ∈ L. Since each word in L starts 10 n 2 and contains no other 2’s, we must have n = a -1. It follows that w ∈ 10 a-1 2(0 + 3) a-1 . But then w contains only a -1 3’s, so to get a -1 3’s in x, each of them must be used. It follows that the exponent of 0 in each 0 + 3 is a -1, and so x = 123 a-1 .
This completes the proof.
Theorem 5. There exists a context-free language such that L filtered by the shifts re
This content is AI-processed based on open access ArXiv data.