From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models
Click-Through Rate (CTR) prediction, a core task in recommendation systems, aims to estimate the probability of users clicking on items. Existing models predominantly follow a discriminative paradigm, which relies heavily on explicit interactions between raw ID embeddings. However, this paradigm inherently renders them susceptible to two critical issues: embedding dimensional collapse and information redundancy, stemming from the over-reliance on feature interactions \emph{over raw ID embeddings}. To address these limitations, we propose a novel \emph{Supervised Feature Generation (SFG)} framework, \emph{shifting the paradigm from discriminative feature interaction" to generative feature generation"}. Specifically, SFG comprises two key components: an \emph{Encoder} that constructs hidden embeddings for each feature, and a \emph{Decoder} tasked with regenerating the feature embeddings of all features from these hidden representations. Unlike existing generative approaches that adopt self-supervised losses, we introduce a supervised loss to utilize the supervised signal, \ie, click or not, in the CTR prediction task. This framework exhibits strong generalizability: it can be seamlessly integrated with most existing CTR models, reformulating them under the generative paradigm. Extensive experiments demonstrate that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains across various datasets and base models. The code is available at https://github.com/USTC-StarTeam/GE4Rec.
💡 Research Summary
This paper, “From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models,” proposes a fundamental paradigm shift for Click-Through Rate (CTR) prediction models in recommendation systems.
The authors identify critical limitations inherent in the dominant discriminative paradigm used by existing CTR models (e.g., FM, DeepFM, DCN-V2). These models rely heavily on explicit interactions (e.g., inner products) between raw ID feature embeddings. This reliance leads to two core issues: (1) embedding dimensional collapse, where embeddings occupy a low-dimensional space due to the “Interaction-Collapse Theory,” and (2) information redundancy, as the same raw embeddings are repeatedly used in interaction functions, learning correlated representations. Furthermore, discriminative models focus only on P(Y|X), ignoring the data distribution P(X) and its rich feature co-occurrence patterns.
To address these issues, the authors introduce the Supervised Feature Generation (SFG) framework, shifting the focus from “feature interaction” to “feature generation.” The key insight is to reinterpret the inherent “co-occurrence relationship” among categorical features in CTR data as a source-target structure for generation. The SFG framework consists of:
- An Encoder: A field-wise single-layer non-linear MLP that takes the concatenated embeddings of all input features and transforms them into new hidden representations for each feature.
- A Decoder: This component maps the encoder’s hidden representations back to the original embedding space to “regenerate” feature embeddings. Crucially, the decoder’s projection matrices can be designed to mirror the interaction functions of existing CTR models (e.g., using a field-pair specific full matrix for FmFM/DCN-V2), allowing SFG to reformulate most existing models under this generative paradigm.
- A Supervised Generative Loss: Instead of using self-supervised losses common in generative models (e.g., masked reconstruction), SFG directly leverages the natural supervised signal of the CTR task—click or not—using a binary cross-entropy loss. The final prediction is made by aggregating (e.g., summing) the relevance (e.g., dot product) between each regenerated embedding (from feature i to j) and the original target embedding of feature j across all feature pairs.
SFG operates in an “All-Predict-All” manner: the encoder processes all features, and each encoded feature representation is used to generate all target features simultaneously. This design avoids direct interaction between raw ID embeddings, theoretically mitigating dimensional collapse. The encoder also produces sample-specific hidden representations that are decorrelated from the original embeddings, reducing redundancy.
Extensive experiments on public datasets (Criteo, Avazu) and industrial datasets demonstrate that SFG consistently and significantly boosts the performance of various base CTR models. It achieved an average improvement of 0.272% in AUC and a 0.435% reduction in Logloss, with only a minor computational overhead (~3% more time, ~1.5% more memory). Empirical analysis confirmed that SFG effectively alleviates embedding collapse and reduces information redundancy in the learned representations. Ablation studies validated core design choices, such as the necessity of non-linearity in the encoder and the use of the supervised loss. A successful deployment on Tencent’s advertising platform resulted in a 2.68% Gross Merchandise Volume (GMV) lift in a major scenario, underscoring its practical utility and impact.
Comments & Academic Discussion
Loading comments...
Leave a Comment