On the Number of Subsequences in the Nonbinary Deletion Channel

On the Number of Subsequences in the Nonbinary Deletion Channel
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the deletion channel, an important problem is to determine the number of subsequences derived from a string $U$ of length $n$ when subjected to $t$ deletions. It is well-known that the number of subsequences in the setting exhibits a strong dependence on the number of runs in the string $U$, where a run is defined as a maximal substring of identical characters. In this paper we study the number of subsequences of a non-binary string in this scenario, and propose some improved bounds on the number of subsequences of $r$-run non-binary strings. Specifically, we characterize a family of $r$-run non-binary strings with the maximum number of subsequences under any $t$ deletions, and show that this number can be computed in polynomial time.


💡 Research Summary

The paper investigates the combinatorial problem of counting the distinct subsequences that can be obtained from a q‑ary string U of length n when exactly t symbols are deleted, a scenario commonly referred to as the deletion channel. The central observation, dating back to Levenshtein (1966), is that the number of subsequences depends not only on n and t but also strongly on the number of runs r(U) in the string, where a run is a maximal block of identical symbols. While this dependence has been thoroughly studied for binary strings, the authors extend the analysis to non‑binary alphabets and provide tighter bounds that are both theoretically interesting and practically useful for code design and sequence reconstruction.

Key Contributions

  1. General Framework and Notation – The authors formalize the deletion ball D_t(X) as the set of all length‑(n‑t) subsequences of X. They adopt the notation S(x₁,…,x_r; a₁,…,a_r) to denote a string consisting of r runs, where the i‑th run has length x_i and symbol a_i. This representation makes the dependence on run lengths and symbols explicit.

  2. Four Fundamental String Operations

    • Insertion: Adding a symbol anywhere cannot decrease |D_t| (Lemma 3).
    • Deletion Chain Rule: If V ∈ D_t(U) then D_{t’}(V) ⊆ D_{t+t’}(U) (Lemma 4).
    • Permutation: Applying a global permutation of the alphabet leaves |D_t| unchanged (Lemma 5).
    • Reduction: Mapping each run to a binary symbol (0/1) yields a binary string U_d with |D_t(U_d)| ≤ |D_t(U)| (Lemma 6).

    These operations allow the authors to relate arbitrary q‑ary strings to binary strings while preserving or bounding the subsequence count.

  3. Lower Bound via Reduction to Binary – By repeatedly applying the reduction operation, any q‑ary string U can be transformed into a binary string S₂(x₁,…,x_r) where runs alternate between 0 and 1. Using the tight binary lower bound from Liron and Langberg (Theorem 2 in


Comments & Academic Discussion

Loading comments...

Leave a Comment