Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations

Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The rapidly evolving landscape of products, surfaces, policies, and regulations poses significant challenges for deploying state-of-the-art recommendation models at industry scale, primarily due to data fragmentation across domains and escalating infrastructure costs that hinder sustained quality improvements. To address this challenge, we propose Lattice, a recommendation framework centered around model space redesign that extends Multi-Domain, Multi-Objective (MDMO) learning beyond models and learning objectives. Lattice addresses these challenges through a comprehensive model space redesign that combines cross-domain knowledge sharing, data consolidation, model unification, distillation, and system optimizations to achieve significant improvements in both quality and cost-efficiency. Our deployment of Lattice at Meta has resulted in 10% revenue-driving top-line metrics gain, 11.5% user satisfaction improvement, 6% boost in conversion rate, with 20% capacity saving.


💡 Research Summary

The paper “Meta Lattice: Model Space Redesign for Cost-Effective Industry-Scale Ads Recommendations” introduces Lattice, a comprehensive recommendation framework developed at Meta to address the critical challenges of deploying large-scale models in industrial settings. The primary obstacles are economic scalability (prohibitive cost of scaling thousands of independent models), data fragmentation (isolated datasets hindering knowledge sharing), and stringent deployment constraints (inference latency limits).

Lattice’s core innovation is a paradigm shift from a scattered “model space”—where each domain-objective pair (portfolio) has its own dedicated model—to a redesigned, consolidated model space. This is achieved through strategic portfolio consolidation guided by the Lattice Partitioner, which groups related domains and objectives based on user/item overlap, feedback characteristics, and compliance policies, significantly reducing the total number of models required.

To support these consolidated portfolios, Lattice implements a cohesive data integration strategy. The Lattice Zipper tackles the delayed feedback problem inherent in ads recommendation. Instead of maintaining separate datasets for different attribution windows, it creates a unified dataset by randomly assigning each impression to one window. The model is equipped with separate prediction heads for each window during training, allowing it to learn from both fresh (short-window) and complete (long-window) signals simultaneously, while only the “oracle” head (longest window) is used for inference. The Lattice Filter addresses the feature selection challenge for multiple portfolios by employing a Pareto-optimal algorithm. It selects features that guarantee no single portfolio’s performance is unfairly degraded, ensuring balanced quality across all consolidated tasks.

Architecturally, the Lattice Networks are designed to handle the heterogeneous input formats (sparse, dense, sequential) resulting from data consolidation through interleaved learning. Crucially, to mitigate domain conflict within a unified model, it employs parameter untying, where domain-specific feedforward networks are separated from the shared backbone, stabilizing multi-task learning.

For practical deployment, Lattice introduces a hierarchical knowledge transfer system via Lattice KTAP (Knowledge Transfer with Asynchronous Precompute). Large-scale foundational models act as teachers, generating knowledge offline. This knowledge is asynchronously distilled into smaller, highly-optimized student models that serve real-time traffic, thus bypassing latency barriers while preserving the benefits of large models. Furthermore, the reduction in the number of models enables focused system-wide optimizations. Lattice Sketch automates the search for optimal hyperparameters and parallelization strategies per model. The consolidated model space also allows for deep, per-model efficiency optimizations like custom GPU kernels and low-precision training/inference, which were previously impractical due to the scattered model landscape.

Extensive experiments show that Lattice outperforms ten state-of-the-art baselines, achieving up to a 1% improvement in prediction loss and delivering up to 1.3x hardware efficiency gains on 1024 GPUs. Most significantly, its production deployment across a representative slice of Meta’s ads recommendation systems yielded substantial real-world impact: a 10% gain in revenue-driving top-line metrics, an 11.5% improvement in user satisfaction, a 6% boost in conversion rate, all while achieving 20% capacity savings. This demonstrates that Lattice successfully bridges the gap between the theoretical promise of scaling laws and practical, cost-effective industry-scale deployment.


Comments & Academic Discussion

Loading comments...

Leave a Comment