ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest
Deep learning recommendation systems rely on feature interaction modules to model complex user-item relationships across sparse categorical and dense features. In large-scale ad ranking, increasing model capacity is a promising path to improving both predictive performance and business outcomes, yet production serving budgets impose strict constraints on latency and FLOPs. This creates a central tension: we want interaction modules that both scale effectively with additional compute and remain compute-efficient at serving time. In this work, we study how to scale feature interaction modules under a fixed serving budget. We find that naively scaling DCNv2 and MaskNet, despite their widespread adoption in industry, yields rapidly diminishing offline gains in the Pinterest ads ranking system. To overcome aforementioned limitations, we propose ML-DCN, an interaction module that integrates an instance-conditioned mask into a low-rank crossing layer, enabling per-example selection and amplification of salient interaction directions while maintaining efficient computation. This novel architecture combines the strengths of DCNv2 and MaskNet, scales efficiently with increased compute, and achieves state-of-the-art performance. Experiments on a large internal Pinterest ads dataset show that ML-DCN achieves higher AUC than DCNv2, MaskNet, and recent scaling-oriented alternatives at matched FLOPs, and it scales more favorably overall as compute increases, exhibiting a stronger AUC-FLOPs trade-off. Finally, online A/B tests demonstrate statistically significant improvements in key ads metrics (including CTR and click-quality measures) and ML-DCN has been deployed in the production system with neutral serving cost.
💡 Research Summary
The paper addresses a fundamental tension in large‑scale advertising click‑through‑rate (CTR) prediction: how to increase the capacity of the feature‑interaction module without exceeding strict production latency and FLOP budgets. Existing interaction architectures—Deep & Cross Network v2 (DCNv2) and MaskNet—are widely deployed but, when scaled up within a fixed compute budget, quickly exhibit diminishing returns on offline AUC. To overcome this limitation, the authors propose ML‑DCN (Masked Low‑rank Deep Crossing Network), a novel interaction unit that fuses a low‑rank DCNv2 backbone with an instance‑conditioned masking mechanism.
Core Architecture
- Low‑rank DCNv2 backbone – The standard DCNv2 cross layer computes X_{l+1}=X_0⊙(X_lW_l+b_l)+X_l. The low‑rank variant factorizes W_l into U_lV_l^T with U_l, V_l∈ℝ^{d×r} and r≪d, yielding X_{l+1}=X_0⊙(X_lV_l)U_l^T+b_l+X_l. This reduces parameters and FLOPs while preserving the explicit crossing operation.
- Instance‑guided mask – For each batch, a mask M is generated from the current layer input X_l via two linear projections and a ReLU bottleneck: M=ReLU(X_lW_{d1}+b_{d1})W_{d2}+b_{d2}. The mask dimension k is typically larger than d, giving a mask‑ratio t=k/d. The mask is then element‑wise multiplied with the low‑rank cross term (X_lV_l) before the final projection: X_{l+1}=LN( X_0⊙
Comments & Academic Discussion
Loading comments...
Leave a Comment