LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay
E-commerce sellers are advised to bid on keyphrases to boost their advertising campaigns. These keyphrases must be relevant to prevent irrelevant items from cluttering Search systems and to maintain positive seller perception. It is vital that keyphrase suggestions align with seller, Search, and buyer judgments. Given the challenges in collecting negative feedback in these systems, LLMs have been used as a scalable proxy for human judgments. We present an empirical study on a major e-commerce platform of a distillation framework involving an LLM teacher, a cross-encoder assistant and a bi-encoder Embedding Based Retrieval (EBR) student model, aimed at mitigating click-induced biases and provide more diverse keyphrase recommendations while aligning advertising, search and buyer preferences.
💡 Research Summary
The paper “LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay” presents a novel framework designed to improve keyword (keyphrase) recommendation for advertisers on a major e-commerce platform, addressing critical biases inherent in standard approaches.
The core problem lies in using user click data as the primary training signal. Click logs are notoriously biased due to exposure (items ranked lower get fewer clicks), popularity, and, crucially, “middleman bias.” This bias arises because an item-keyphrase pair must first be deemed relevant by the platform’s Search relevance filter before it can enter an auction and potentially receive a click. Therefore, models trained only on click data never learn from keyphrases rejected by Search, even if Advertising systems proposed them, leading to a limited and skewed understanding of relevance.
To mitigate these biases, the authors propose a three-pronged strategy: hybrid supervision, a teacher-assistant-student distillation hierarchy, and optimized distillation objectives.
First, they augment the standard click-based positive/negative labels with two additional signals: 1) Search Relevance (SR) scores, which are judgments from an upstream relevance model applied during the auction process, and 2) relevance labels generated by a Large Language Model (Mixtral 8x7B Instruct). The LLM acts as a scalable proxy for human judgment, providing labels free from the exposure and middleman biases present in behavioral data.
Second, they introduce a hierarchical knowledge distillation framework. The LLM serves as the ultimate “teacher” of relevance. Its judgments are used to fine-tune a “cross-encoder assistant” model. Cross-encoders process the keyphrase and item (title + category) jointly, allowing for deep interaction and highly accurate relevance scoring, but are too computationally expensive for real-time retrieval over billions of items. This cross-encoder distills and calibrates the LLM’s knowledge for the specific task. Finally, a “bi-encoder student” model is trained to mimic the soft scores (e.g., probability of relevance) of the cross-encoder assistant. Bi-encoders encode keyphrases and items independently, enabling fast approximate nearest neighbor search via pre-computed embeddings, making them ideal for production deployment.
Third, the paper conducts an extensive ablation study on the loss function used for distilling knowledge from the cross-encoder assistant to the bi-encoder student. They compare standard losses like Mean Squared Error (MSE) and Contrastive loss with ranking-oriented losses like CoSENT and a Pearson Correlation-based rank imitation loss. Their experiments clearly show that the two-stage distillation (LLM -> CE -> BE) significantly outperforms direct distillation (LLM -> BE). Furthermore, among the distillation losses, the Pearson correlation loss achieved the best results, effectively transferring both the ranking order and the score calibration from the cross-encoder to the bi-encoder, leading to superior performance in F1 score, precision, recall, and score correlation.
In summary, this work provides a practical and effective blueprint for building a keyphrase recommendation system that balances accuracy with the efficiency demands of a large-scale platform. By leveraging hybrid signals (clicks, search relevance, LLM judgments) and a carefully designed hierarchical distillation process with an optimized loss function, the proposed LLMDistill4Ads framework successfully reduces click-data biases, aligns recommendations with multiple stakeholder perspectives (seller, advertising, search, buyer), and delivers a model efficient enough for real-world use.
Comments & Academic Discussion
Loading comments...
Leave a Comment