Slate recommendation, where users are presented with a ranked list of items simultaneously, is widely adopted in online platforms. Recent advances in generative models have shown promise in slate recommendation by modeling sequences of discrete semantic IDs autoregressively. However, existing autoregressive approaches suffer from semantically entangled item tokenization and inefficient sequential decoding that lacks holistic slate planning. To address these limitations, we propose HiGR, an efficient generative slate recommendation framework that integrates hierarchical planning with listwise preference alignment. First, we propose an auto-encoder utilizing residual quantization and contrastive constraints to tokenize items into semantically structured IDs for controllable generation. Second, HiGR decouples generation into a list-level planning stage
Personalized recommendation systems now play a central role in large-scale online services, powering user experiences across news feeds, short-video platforms, and e-commerce applications [2,20,32,38]. Unlike traditional item-level recommendation [1] that presents items individually, slate recommendation -where a ranked list of items is displayed simultaneously to users -has emerged as the dominant paradigm. The slate, as a fundamental unit of user experience, determines not only which content users consume but also how they perceive the platform's relevance and diversity.
Traditional slate recommendation approaches typically follow a two-stage paradigm: first scoring candidate items independently using point-wise or pair-wise ranking models [17], then assembling the final slate through greedy selection or reranking heuristics. While computationally efficient, these methods suffer from critical limitations: 1) They optimize item-level objectives rather than list-level quality, failing to account for the synergistic relationships among items within a slate. For instance, in short-video recommendation, users often prefer diverse content that balances entertainment, information, and novelty-a holistic preference that cannot be captured by independent item scoring. 2) The greedy assembly process lacks the ability to look ahead or backtrack. Even if individual items are scored accurately, the final combination is often suboptimal because earlier selections constrain later choices without a global optimization perspective.
Recent advances in generative models [8], particularly large language models and autoregressive architectures, have opened new possibilities for slate recommendation. By formulating recommendation as a sequence generation task, these approaches can model complex dependencies between items and generate coherent recommendation lists in an end-to-end manner. To adapt items for generation, recent works typically employ Semantic IDs (SID) based on Residual Quantized VAEs (RQ-VAE) [27] to tokenize items into discrete codes. Despite their theoretical appeal, existing generative methods face three fundamental challenges that hinder their practical deployment in large-scale systems. First, conventional semantic quantization often yields an entangled ID space where “different prefixes share similar semantics” or “same prefixes imply different semantics. " This lack of clear semantic boundaries makes it difficult for the model to precisely control the generation process. Second, while SID reduce vocabulary size, they present a single item as a sequence of multiple tokens (e.g., 3 tokens per item). Autoregressive models must generate tokens one by one, with each generation step depending on all previous outputs. For a typical slate of 10 items requiring 30 sequential steps. This significantly slows down the reasoning process, creating a bottleneck for systems requiring sub-100ms latency. Third, autoregressive generation lacks holistic list planning. Although these models theoretically can condition each item on previous context, they still operate in a left-to-right generation paradigm without explicit mechanisms for global slate structure planning. This often leads to locally coherent but globally suboptimal slates, where later items may contradict earlier selections or fail to achieve desired list-level properties such as diversity, coverage, or topical coherence.
To address these limitations, we introduce HiGR (Hierarchical Generative Slate Recommendation), a novel framework that reformulates slate recommendation as a hierarchical coarse-to-fine generation process with listwise preference alignment. Fundamentally, to solve the semantic entanglement issue, we first propose an auto-encoder with residual quantization and contrastive learning (CRQ-VAE). Unlike conventional quantization, CRQ-VAE injects prefix-level contrastive constraints, ensuring that high-level ID prefixes explicitly encode semantic similarity and separability. This creates a structured vocabulary where the “prefix” acts as a reliable semantic anchor. Building these structured IDs, HiGR decouples slate generation into two levels: a list-level preference planning stage that captures the global structure and intent of the entire slate, followed by an item-level decoding stage that grounds the preference plans into specific item selections. This hierarchical design enables efficient inference while maintaining coherent global planning-addressing both efficiency and quality challenges simultaneously. This coarse-to-fine decomposition mirrors how humans approach slate curation: first deciding the overall composition and themes, then filling in specific items.
Beyond architectural innovation, we propose a listwise preference alignment objective that directly optimizes slate-level quality. Traditional recommendation models are trained with item-level cross-entropy loss, which does not reflect how users actually evaluate slates-holistically
This content is AI-processed based on open access ArXiv data.