GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent studies have demonstrated that many layers are functionally redundant in large language models (LLMs), enabling model compression by removing these layers to reduce inference cost. While such approaches can improve efficiency, indiscriminate layer pruning often results in significant performance degradation. In this paper, we propose GRASP (Gradient-based Retention of Adaptive Singular Parameters), a novel compression framework that mitigates this issue by preserving sensitivity-aware singular values. Unlike direct layer pruning, GRASP leverages gradient-based attribution on a small calibration dataset to adaptively identify and retain critical singular components. By replacing redundant layers with only a minimal set of parameters, GRASP achieves efficient compression while maintaining strong performance with minimal overhead. Experiments across multiple LLMs show that GRASP consistently outperforms existing compression methods, achieving 90% of the original model’s performance under 20% compression ratio.

💡 Research Summary

The paper introduces GRASP (Gradient‑based Retention of Adaptive Singular Parameters), a novel compression framework for large language models (LLMs) that targets functionally redundant layers rather than indiscriminately pruning or replacing them with randomly initialized modules.
Key steps:

Redundant layer detection – For each transformer layer the cosine similarity between its input hidden state (H_i) and output hidden state (H_{i+1}) is computed on a small calibration set (e.g., 512 WikiText‑2 samples). Layers with high similarity are deemed “redundant” because they perform little transformation.
Adaptive singular‑value replacement – Each identified layer’s weight matrices (attention and MLP) are factorized by singular value decomposition (SVD): (W = U\Sigma V^\top). Instead of keeping the largest singular values by magnitude, GRASP evaluates the task‑relevant importance of each singular component (\Phi_k = {u_k, \sigma_k, v_k}) using a first‑order gradient‑based score:

GRASP: Replace Redundant Layers with Adaptive Singular Parameters for Efficient Model Compression

💡 Research Summary

Comments & Academic Discussion

Leave a Comment