MLSA4Rec: Mamba Combined with Low-Rank Decomposed Self-Attention for Sequential Recommendation

MLSA4Rec: Mamba Combined with Low-Rank Decomposed Self-Attention for Sequential Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In applications such as e-commerce, online education, and streaming services, sequential recommendation systems play a critical role. Despite the excellent performance of self-attention-based sequential recommendation models in capturing dependencies between items in user interaction history, their quadratic complexity and lack of structural bias limit their applicability. Recently, some works have replaced the self-attention module in sequential recommenders with Mamba, which has linear complexity and structural bias. However, these works have not noted the complementarity between the two approaches. To address this issue, this paper proposes a new hybrid recommendation framework, Mamba combined with Low-Rank decomposed Self-Attention for Sequential Recommendation (MLSA4Rec), whose complexity is linear with respect to the length of the user’s historical interaction sequence. Specifically, MLSA4Rec designs an efficient Mamba-LSA interaction module. This module introduces a low-rank decomposed self-attention (LSA) module with linear complexity and injects structural bias into it through Mamba. The LSA module analyzes user preferences from a different perspective and dynamically guides Mamba to focus on important information in user historical interactions through a gated information transmission mechanism. Finally, MLSA4Rec combines user preference information refined by the Mamba and LSA modules to accurately predict the user’s next possible interaction. To our knowledge, this is the first study to combine Mamba and self-attention in sequential recommendation systems. Experimental results show that MLSA4Rec outperforms existing self-attention and Mamba-based sequential recommendation models in recommendation accuracy on three real-world datasets, demonstrating the great potential of Mamba and self-attention working together.


💡 Research Summary

The paper “MLSA4Rec: Mamba Combined with Low‑Rank Decomposed Self‑Attention for Sequential Recommendation” addresses two fundamental challenges in sequential recommendation: (1) the quadratic time and memory cost of vanilla self‑attention, which hampers scalability to long user interaction histories, and (2) the lack of structural bias in self‑attention models, which can lead to over‑fitting and poor generalization. Recent work has introduced Mamba, a selective state‑space model (SSM) that offers linear‑time complexity and an inherent structural bias, but Mamba alone struggles to capture fine‑grained local patterns. The authors argue that Mamba and self‑attention are complementary: Mamba provides global, sequential context, while self‑attention excels at modeling local, item‑level interactions.

To exploit this complementarity, the authors propose MLSA4Rec, a hybrid architecture that integrates Mamba with a Low‑Rank Decomposed Self‑Attention (LSA) module. The LSA reduces the classic O(L²) attention cost to O(P·L) by projecting the item embeddings onto P latent “interest” vectors (P ≪ L) and performing item‑to‑interest aggregation followed by interest‑to‑item interaction. This low‑rank factorization preserves the expressive power of attention while ensuring linear scaling with sequence length.

The core of MLSA4Rec is the “Mamba‑LSA Interaction Layer.” First, a Mamba block processes the input embedding sequence, aggregating global and sequential information into a hidden representation H. After layer normalization, H is fed into the LSA module, which injects structural bias from Mamba into the low‑rank attention computation. Crucially, the two modules exchange information through a gated transmission mechanism: the LSA output ε (the latent‑interest representation) is element‑wise multiplied with H, passed through an MLP and GELU activation, and then fed back into the Mamba pathway. This gating allows LSA to highlight user‑specific interests and guide Mamba’s selective state updates, while Mamba supplies LSA with a coherent global context.

After the interaction, the outputs of LSA and Mamba are concatenated, linearly projected, and activated (GELU) to produce a refined user representation Ĥ. A stack of “Mamba Normalization Layers” (each consisting of a Mamba block followed by layer normalization) further refines this representation, ensuring that information from both sources is fully integrated. Finally, the prediction layer takes the representation of the most recent item, applies a linear transformation and softmax, and outputs probability scores over the entire item catalog.

Complexity analysis confirms that each component—LSA, Mamba block, interaction layer, and normalization layer—operates in O(L) time, preserving linear scalability with respect to the sequence length. The authors benchmark MLSA4Rec on three real‑world datasets: MovieLens‑1M (average sequence length 165.4), Amazon‑Beauty, and Amazon‑Video‑Games. Baselines include GRU4Rec, NARM, SASRec, BERT4Rec, and Mamba4Rec. Across all datasets and evaluation metrics (HR@10, NDCG@10, MRR@10), MLSA4Rec consistently outperforms the baselines, with the most pronounced gains on the longest sequences, demonstrating the effectiveness of the hybrid design.

The paper’s contributions are threefold: (1) It introduces the first model that jointly leverages Mamba and self‑attention for sequential recommendation. (2) It proposes a novel interaction module that combines low‑rank attention with a selective SSM, mediated by a gated information exchange. (3) It provides extensive empirical evidence of superior accuracy and linear computational cost.

Limitations noted by the authors include the sensitivity to the hyper‑parameter P (the number of latent interests) and the relatively simple gating mechanism (element‑wise multiplication). Future work is suggested to explore dynamic P selection, more sophisticated gating (e.g., attention‑based gates), and cross‑domain or multimodal extensions.

In summary, MLSA4Rec demonstrates that integrating a structurally biased, linear‑time SSM with an efficient low‑rank self‑attention mechanism can capture both global sequential patterns and fine‑grained local preferences, achieving state‑of‑the‑art performance in sequential recommendation while maintaining scalability.


Comments & Academic Discussion

Loading comments...

Leave a Comment