Clique-Based Deletion-Correcting Codes via Penalty-Guided Clique Search

Clique-Based Deletion-Correcting Codes via Penalty-Guided Clique Search
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the construction of $d$-deletion-correcting binary codes by formulating the problem as a Maximum Clique Problem (MCP). In this formulation, vertices represent candidate codewords and edges connect pairs whose longest common subsequence (LCS) distance guarantees correction of up to $d$ deletions. A valid codebook corresponds to a clique in the resulting graph, and finding the largest codebook is equivalent to identifying a maximum clique. While MCP-based formulations for deletion-correcting codes have previously been explored, we demonstrate that applying Penalty-Guided Clique Search (PGCS), a lightweight stochastic clique-search heuristic inspired by Dynamic Local Search (DLS), consistently yields larger codebooks than existing graph-based heuristics, including minimum-degree and coloring methods, for block lengths $n = 8,9,\dots,14$ and deletion parameters $d = 1,2,3$. In several finite-length regimes, the resulting codebooks match known optimal sizes and outperform classical constructions such as Helberg codes. For decoding under segmented reception, where codeword boundaries are known, we propose an optimized LCS-based decoder that exploits symbol-count filtering and early termination to substantially reduce the number of LCS evaluations while preserving exact decoding guarantees. These optimizations lead to significantly lower average-case decoding complexity than the baseline $O(|C| n^2)$ approach.


💡 Research Summary

The paper addresses the construction of binary deletion‑correcting codes and their decoding by casting the design problem as a Maximum Clique Problem (MCP). A binary sequence of length n is a vertex; an edge connects two vertices if their longest‑common‑subsequence (LCS) distance is at least d + 1, which is equivalent to the condition that the two codewords remain distinguishable after up to d deletions. Consequently, any clique corresponds to a valid d‑deletion‑correcting codebook, and a maximum clique yields the largest possible codebook for the given parameters.

Exact maximum‑clique algorithms (e.g., Bron–Kerbosch) are feasible only for very small instances (n ≤ 8). To handle larger block lengths (n = 8…14) the authors introduce Penalty‑Guided Clique Search (PGCS), a lightweight stochastic heuristic inspired by Dynamic Local Search (DLS). PGCS builds a clique greedily by repeatedly adding a candidate vertex with the smallest penalty. When no further expansion is possible, it perturbs the current solution by either removing a high‑penalty vertex or restarting from a random vertex. Penalties are increased each time a vertex participates in the current clique and decay multiplicatively each iteration, discouraging repeated selections and promoting diversification. The algorithm runs for a user‑specified time budget or iteration limit, making its computational cost predictable.

For decoding, the authors adopt an LCS‑based approach: given a received subsequence y, they select the codeword u maximizing LCS(u, y). The naïve implementation requires O(|C| · n²) time, where |C| is the codebook size. Two optimizations dramatically reduce this cost. First, a symbol‑count filter eliminates any codeword whose number of 1’s or 0’s is smaller than that of y, because deletions can only reduce symbol counts. This pre‑filter runs in O(|C| · n) time and typically discards a large fraction of candidates. Second, if a codeword achieves LCS(u, y) = |y|, then y is a subsequence of u, guaranteeing uniqueness; the decoder can terminate immediately. The filtered decoder therefore evaluates LCS only on a small subset of the codebook, achieving average‑case speed‑ups of 2–3× while preserving exact recovery guarantees (proved via Lemma 1).

Experimental results compare PGCS against several baselines: the minimum‑degree heuristic, a graph‑coloring method, Helberg’s construction, and Swart’s heuristic. For double‑deletion correction (d = 2) and n = 8…14, PGCS consistently yields larger codebooks; for example, at n = 14 it finds 81 codewords versus 78 for Helberg and 66 for the minimum‑degree method. In several instances (e.g., n = 12, d = 2) the PGCS codebook matches known optimal sizes. For other deletion levels (d = 1, 3, 4) similar superiority is observed, with PGCS beating VT codes (d = 1) and Helberg codes (d ≥ 3) in most lengths. Decoding experiments show that the symbol‑count filter reduces the average number of LCS evaluations to roughly 20–35 % of the full codebook, and early termination further cuts the average to under 10 % of the baseline, leading to 2–3× faster decoding.

The paper’s contributions are threefold: (1) the introduction of PGCS, a DLS‑inspired stochastic clique‑search that efficiently finds large cliques in deletion‑compatibility graphs; (2) an optimized LCS‑based decoder that combines lightweight statistical filtering with early exit, dramatically lowering practical decoding complexity; and (3) extensive empirical validation demonstrating that PGCS produces larger, often optimal, codebooks for moderate block lengths while keeping decoding tractable. The authors also provide implementation details and parameter settings to facilitate reproducibility.

Overall, the work demonstrates that graph‑theoretic formulations combined with modern stochastic local‑search techniques can bridge the gap between theoretical optimality and practical feasibility in deletion‑correcting code design, offering a promising toolset for applications such as high‑speed wireless links, DNA data storage, and low‑power sensor networks where deletions are a dominant error mode.


Comments & Academic Discussion

Loading comments...

Leave a Comment