Field Matters: A Lightweight LLM-enhanced Method for CTR Prediction
Click-through rate (CTR) prediction is a fundamental task in modern recommender systems. In recent years, the integration of large language models (LLMs) has been shown to effectively enhance the performance of traditional CTR methods. However, existing LLM-enhanced methods often require extensive processing of detailed textual descriptions for large-scale instances or user/item entities, leading to substantial computational overhead. To address this challenge, this work introduces LLaCTR, a novel and lightweight LLM-enhanced CTR method that employs a field-level enhancement paradigm. Specifically, LLaCTR first utilizes LLMs to distill crucial and lightweight semantic knowledge from small-scale feature fields through self-supervised field-feature fine-tuning. Subsequently, it leverages this field-level semantic knowledge to enhance both feature representation and feature interactions. In our experiments, we integrate LLaCTR with six representative CTR models across four datasets, demonstrating its superior performance in terms of both effectiveness and efficiency compared to existing LLM-enhanced methods. Our code is available at https://github.com/istarryn/LLaCTR.
💡 Research Summary
The paper tackles the practical challenge of integrating large language models (LLMs) into click‑through rate (CTR) prediction without incurring prohibitive computational costs. Existing LLM‑enhanced CTR approaches either prompt LLMs to act as direct predictors or use them to enrich traditional CTR models with semantic knowledge. Both paradigms operate at the instance, user, or item level, requiring the processing of massive textual descriptions for millions of interactions. Empirical measurements on datasets such as Amazon Video Games and MovieLens‑1M show that these methods can be more than 290 times slower than baseline CTR models, making them unsuitable for real‑time serving.
To overcome this bottleneck, the authors propose LLaCTR, a lightweight framework that shifts the semantic extraction from the instance level to the field level. In CTR datasets, a “field” denotes a semantic group of features (e.g., user age, item price, average rating). The number of fields is typically orders of magnitude smaller than the number of instances—often a few hundred—so processing them with an LLM is far cheaper.
LLaCTR consists of two main components:
-
Self‑Supervised Field‑Feature Fine‑Tuning (SSFT).
The authors construct a self‑supervised task where the LLM receives a prompt containing a feature description and a list of candidate field descriptions, and must output the correct field name. This creates prompt‑response pairs such as “The feature is ‘Black Rose.’ Which field does it belong to?” → “Title”. The LLM is fine‑tuned on these pairs using a language‑generation loss and a contrastive loss that aligns the embedding of the prompt with that of the correct field while pushing away embeddings of incorrect fields. Because only a small set of fields and a sampled subset of features are needed, the fine‑tuning cost is negligible compared with full instance‑level processing. -
Field Semantic‑Guided Enhancement (FRE & FIE).
After SSFT, the LLM produces a dense embedding for each field. These embeddings are used in two ways:- Feature Representation Enhancement (FRE): An alignment loss forces the traditional feature embeddings (learned from sparse IDs) to be close to a linear transformation of the corresponding field embedding. This injects semantic knowledge directly into the feature representation, helping the model capture meanings that pure ID embeddings miss.
- Feature Interaction Enhancement (FIE): The field embeddings are passed through a small neural network to generate a field‑interaction matrix, which is then incorporated into the interaction layer of standard CTR architectures (e.g., FM, DeepFM, xDeepFM). This matrix explicitly boosts interactions between semantically related fields such as “user income” and “item price”.
The framework is plug‑and‑play: it can be attached to any existing CTR model without altering the model’s core architecture. The authors evaluate LLaCTR on four public datasets (Amazon Video Games, Amazon Books, MovieLens‑1M, Criteo) by integrating it with six representative CTR models: DeepFM, FM, FiBiNet, xDeepFM, DCN, and AutoInt. Compared with the original models, LLaCTR yields an average AUC improvement of 2.24 % and a reduction in LogLoss, while requiring 10‑ to 100‑fold less training time than prior LLM‑enhanced methods (KAR, LLM‑CF, CTRL, EASE). In latency‑sensitive inference, LLaCTR stays well within typical real‑time constraints (sub‑10 ms per request).
Key insights from the study include:
- Efficiency through granularity reduction: By moving from millions of instance‑level texts to a few hundred field‑level texts, the computational burden drops dramatically.
- Domain adaptation via self‑supervision: The SSFT step tailors a generic LLM to the specific semantics of CTR fields, overcoming the mismatch between pre‑training corpora and recommendation domains.
- Dual enhancement of representation and interaction: Injecting field semantics both into the embedding space and the interaction module yields synergistic gains, as the model benefits from richer initial representations and more informed interaction patterns.
- Scalability and practicality: LLaCTR’s lightweight nature makes it suitable for production environments where GPU resources and inference latency are tightly constrained.
The authors acknowledge limitations: the approach relies on well‑crafted field descriptions; if these are noisy or missing, the benefit diminishes. Moreover, the current design treats field embeddings as static; adapting to temporal shifts in field semantics (e.g., trending topics) would require additional mechanisms. Finally, when the number of fields grows into the thousands, the computational advantage narrows, suggesting future work on hierarchical field grouping or dynamic field selection.
In conclusion, LLaCTR demonstrates that LLMs can meaningfully enhance CTR prediction without the prohibitive costs traditionally associated with large‑scale language model deployment. By extracting lightweight, high‑quality semantic knowledge at the field level and integrating it into existing CTR pipelines, the method achieves a compelling balance of accuracy, efficiency, and ease of integration, paving the way for more sustainable use of LLMs in large‑scale recommender systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment