RecGPT-V2 Technical Report
Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cognitive redundancy across multiple reasoning routes; (2) insufficient explanation diversity in fixed-template generation; (3) limited generalization under supervised learning paradigms; and (4) simplistic outcome-focused evaluation that fails to match human standards. To address these challenges, we present RecGPT-V2 with four key innovations. First, a Hierarchical Multi-Agent System restructures intent reasoning through coordinated collaboration, eliminating cognitive duplication while enabling diverse intent coverage. Combined with Hybrid Representation Inference that compresses user-behavior contexts, our framework reduces GPU consumption by 60% and improves exclusive recall from 9.39% to 10.99%. Second, a Meta-Prompting framework dynamically generates contextually adaptive prompts, improving explanation diversity by +7.3%. Third, constrained reinforcement learning mitigates multi-reward conflicts, achieving +24.1% improvement in tag prediction and +13.0% in explanation acceptance. Fourth, an Agent-as-a-Judge framework decomposes assessment into multi-step reasoning, improving human preference alignment. Online A/B tests on Taobao demonstrate significant improvements: +2.98% CTR, +3.71% IPV, +2.19% TV, and +11.46% NER. RecGPT-V2 establishes both the technical feasibility and commercial viability of deploying LLM-powered intent reasoning at scale, bridging the gap between cognitive exploration and industrial utility.
💡 Research Summary
The technical report for RecGPT-V2 introduces a groundbreaking evolution in LLM-powered recommender systems, moving beyond simple pattern matching toward explicit intent reasoning. While the predecessor, RecGPT-V1, established the foundation for integrating LLM reasoning into user interest mining, it was hindered by four critical bottlenecks: computational redundancy, lack of explanation diversity due to fixed templates, limited generalization in supervised learning, and a simplistic evaluation metric that failed to align with human preferences.
To overcome these challenges, RecGPT-V2 implements four core technological innovations. First, a Hierarchical Multi-Agent System, paired with Hybrid Representation Inference, restructures the reasoning process through coordinated collaboration. This architecture eliminates cognitive duplication and optimizes resource usage, resulting in a massive 60% reduction in GPU consumption while simultaneously boosting exclusive recall from 9.39% to 10.99%. Second, the introduction of a Meta-Prompting framework enables the dynamic generation of contextually adaptive prompts, which enhances the diversity of generated explanations by 7.3%.
Third, the researchers addressed the complex trade-offs in recommendation objectives using Constrained Reinforcement Learning. By managing conflicts between multiple rewards—such as accuracy, diversity, and acceptance—the model achieved a 24.1% improvement in tag prediction and a 13.0% increase in explanation acceptance. Fourth, the “Agent-as-a-Judge” framework revolutionizes the evaluation paradigm. By decomposing assessment into a multi-step reasoning process, the system ensures that the model’s outputs are much more closely aligned with nuanced human preferences.
The real-world impact of RecGPT-V2 was rigorously validated through large-scale online A/B testing on the Taobao platform. The results demonstrate significant commercial and operational improvements: a 2.98% increase in Click-Through Rate (CTR), a 3.71% rise in Items Per Visit (IPV), a 2.19% boost in Total Value (TV), and a remarkable 11.46% surge in New Item Exposure/Discovery (NER). Ultimately, RecGPT-V2 serves as a definitive proof of concept for the technical feasibility and commercial viability of deploying large-scale, LLM-driven intent reasoning in industrial-grade production environments, effectively bridging the gap between cognitive exploration and large-scale industrial utility.
Comments & Academic Discussion
Loading comments...
Leave a Comment