Spectral-Aligned Pruning for Universal Error-Correcting Code Transformers
Recently, the Foundation Error Correction Code Transformer (FECCT) has emerged as a promising universal channel decoder, achieving competitive decoding performance across diverse code families by relying on a single shared model backbone, optionally followed by code-specific retraining. Despite this flexibility, the high computational complexity and large parameter footprint of transformer-based decoders present substantial obstacles to practical deployment. To address these challenges, we investigate structured pruning for FECCT and propose Spectral-Aligned Pruning (SAP), a structure-aware framework that enables cross-code reuse of structured pruning masks across codes by leveraging the spectrum of the corresponding bipartite graph. After pruning, SAP performs per-code recovery via parameter-efficient low-rank adaptation (LoRA), enabling a shared pruned backbone while storing only small code-specific adapter parameters. Experiments across diverse codes show that SAP achieves decoding performance comparable to dedicated per-code pruning, while enabling substantial reductions in computational cost and model memory footprint through kernel-level structured pruning.
💡 Research Summary
The paper addresses the prohibitive computational cost and large memory footprint of transformer‑based universal error‑correcting code decoders, specifically the Foundation Error‑Correction Code Transformer (FECCT). While FECCT achieves impressive cross‑code performance by sharing a single backbone across many code families, its practical deployment is hampered by the need to retain all attention heads and feed‑forward network (FFN) channels. To overcome these limitations, the authors propose Spectral‑Aligned Pruning (SAP), a two‑stage framework that (1) dramatically reduces the backbone size through structured pruning and (2) restores any lost decoding performance with a lightweight, parameter‑efficient adaptation technique (LoRA).
Structured pruning for FECCT.
The authors first generate a code‑specific pruning mask for each code by applying a Fisher‑based importance estimator to the pretrained FECCT model. Structured pruning removes entire attention heads and FFN channels, which translates directly into kernel‑level speed‑ups on modern hardware. However, deriving a new mask for every new code is computationally expensive. SAP therefore introduces a mask‑reuse mechanism based on the spectral properties of the code’s bipartite graph.
Spectral signature of a code.
Each linear block code is described by its parity‑check matrix (H). From (H) the authors build a symmetric bipartite adjacency matrix (A(H)) that captures connections between variable and check nodes. The top‑(K) eigenvalues of (A(H)) constitute a fixed‑dimensional spectral signature (\phi(H) \in \mathbb{R}^K). The Euclidean distance between two signatures defines a spectral distance, which is transformed into a similarity score (\kappa = \exp(-\beta d)). This similarity serves as the decision criterion for mask reuse.
Mask library and retrieval.
SAP maintains a library (\mathcal{D} = {(\phi(H_i), M(H_i))}{i=1}^N) of previously computed signatures and their associated structured masks. When a new target code (H{\text{new}}) arrives, its signature (\phi(H_{\text{new}})) is computed, the nearest neighbor in the library is identified, and the similarity (\kappa_\star) is evaluated. If (\kappa_\star) exceeds a predefined threshold (\tau), the neighbor’s mask is reused; otherwise a fresh mask is derived and added to the library. Algorithm 1 formalizes this process, ensuring that the library grows only when necessary and that mask‑search overhead is amortized across many codes.
Parameter‑efficient recovery with LoRA.
After pruning, decoding performance typically degrades. Rather than fine‑tuning the entire backbone (as done in the original FECCT pipeline), the authors freeze the pruned backbone and attach low‑rank adaptation (LoRA) modules to each attention and FFN layer. Each LoRA adapter learns two small matrices (A) and (B) (rank‑(r)) that are added to the original weight matrix, providing a lightweight correction. Because only the adapters are trained, the per‑code storage cost drops from the full model size (hundreds of megabytes) to roughly 0.07–0.09 M parameters (≈ 7 % of the original), while still achieving near‑identical bit‑error‑rate (BER) performance.
Experimental validation.
The authors evaluate SAP on a diverse set of codes: BCH (31,16), BCH (63,45), LDPC (96,64), LDPC (121,70), Polar (64,32), and Polar (128,64). Structured pruning is applied at a 40 % FLOPs reduction target, yielding 30 %–45 % FLOPs savings and a 7 % reduction in total parameters after physically removing pruned heads/channels. LoRA adapters are trained per code with a rank of 2, resulting in a negligible additional memory cost. BER curves versus (E_b/N_0) show that SAP‑pruned models with LoRA match or slightly surpass the performance of dedicated per‑code pruning baselines. Moreover, the authors demonstrate a strong correlation between Jaccard similarity of masks and spectral distance, confirming that the spectral signature is an effective proxy for structural similarity.
Key contributions.
- First systematic application of structured pruning to transformer‑based ECC decoders, achieving substantial computational and memory savings.
- Introduction of a spectral‑based similarity metric that enables cross‑code mask reuse, dramatically reducing the cost of mask generation.
- Novel use of LoRA for post‑pruning recovery, cutting per‑code parameter storage to under 10 % of the original model.
- Comprehensive experiments across multiple code families that validate SAP’s ability to retain decoding performance while delivering up to 40 % FLOPs reduction.
Implications and future work.
SAP makes universal transformer decoders viable for latency‑sensitive or resource‑constrained receivers such as IoT devices, satellite terminals, and edge routers. The spectral‑signature approach could be extended to other graph‑structured neural models (e.g., GNNs) where mask reuse across tasks is desirable. Future research may explore adaptive threshold selection, dynamic rank adjustment for LoRA, or integration with quantization techniques to push efficiency even further.
Comments & Academic Discussion
Loading comments...
Leave a Comment