Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation

Mirroring Users: Towards Building Preference-aligned User Simulator with User Feedback in Recommendation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

User simulation is increasingly vital to develop and evaluate recommender systems (RSs). While Large Language Models (LLMs) offer promising avenues to simulate user behavior, they often struggle with the absence of specific task alignment required for RSs and the efficiency demands of large-scale simulation. A vast yet underutilized resource for enhancing this alignment is the extensive user feedback inherent in RSs, but leveraging it is challenging due to its ambiguity, noise and massive volume, which hinders efficient preference alignment. To overcome these hurdles, we introduce a novel data construction framework that leverages user feedback in RSs with advanced LLM capabilities to generate high-quality simulation data. Our framework unfolds in two key phases: (1) using LLMs to generate decision-making processes as explanatory rationales on simulation samples, thereby reducing ambiguity; and (2) data distillation based on uncertainty estimation and behavior sampling to efficiently filter the most informative, denoised samples. Accordingly, we fine-tune lightweight LLMs, as user simulators, using such high-quality dataset with corresponding decision-making processes. Extensive experiments confirm that our framework significantly boosts the alignment with human preferences and the in-domain reasoning capabilities of the fine-tuned LLMs, providing more insightful and interpretable signals for RS interaction. We believe our work, together with publicly available developed framework, high-quality mixed-domain dataset, and fine-tuned LLM checkpoints, will advance the RS community and offer valuable insights for broader human-centric AI research.


💡 Research Summary

**
The paper introduces USER‑MIRRORER, a novel framework for building preference‑aligned user simulators for recommender systems (RS) by leveraging the massive, yet noisy, user feedback that RSs naturally generate. Traditional user simulators—rule‑based or reinforcement‑learning based—lack the ability to capture external contextual knowledge and the reasoning behind real user decisions. Recent attempts to use large language models (LLMs) as simulators suffer from two main drawbacks: they rely on pre‑trained knowledge without fine‑tuning on domain‑specific feedback, and they require heavyweight models that are computationally prohibitive for large‑scale simulation.

USER‑MIRRORER addresses these issues in two stages. First, raw feedback (clicks, ratings, reviews) is transformed into a unified “simulation scene” consisting of a user’s memory (profile attributes and interaction history) and an exposure list (the set of items presented to the user). When the original dataset lacks explicit exposure, a hybrid exposure list is constructed by uniformly sampling items from three sources: random selection, collaborative filtering, and content‑based nearest‑neighbor retrieval. The scene is rendered as plain‑text prompts that LLMs can ingest. Crucially, the framework prompts a powerful LLM to generate an explicit decision‑making rationale for each simulated interaction, turning ambiguous click data into a richer, explanatory format.

The second stage performs uncertainty‑based data distillation. By generating multiple rationales with Bayesian dropout, the method decomposes uncertainty into (1) model uncertainty (epistemic), reflecting the LLM’s knowledge gaps, and (2) data uncertainty (aleatoric), reflecting inherent ambiguity or noise in the feedback. Samples with low combined uncertainty—i.e., those where the rationale is clear and aligns well with the observed action—are retained, while noisy or ambiguous samples are discarded. This yields a compact, high‑quality training set that captures the most informative user behaviors.

Using this distilled dataset, the authors fine‑tune a lightweight LLM (Llama‑3.2‑3B‑Instruct) for only one epoch. Experiments span eight diverse domains (movies, books, news, etc.), each with 1,024 training and test scenes. Baselines include the same lightweight model without fine‑tuning, as well as much larger models (Qwen‑2.5‑32B‑Instruct, GPT‑5). Results show that after fine‑tuning with USER‑MIRRORER data, the lightweight model’s accuracy in predicting user actions rises from roughly 45‑62 % to 71‑84 %, closing the gap with the giant models while using a fraction of the computational resources. Moreover, the fine‑tuned model produces coherent rationales, improving interpretability for RS developers.

Key contributions are: (1) a prompt design that converts raw feedback into explicit decision rationales, (2) an uncertainty‑driven distillation pipeline that automatically selects high‑quality training samples, and (3) a cost‑effective method for aligning lightweight LLMs with real user preferences. Limitations include the synthetic construction of exposure lists (which may differ from real UI presentations) and the current focus on short‑term prediction accuracy rather than long‑term user engagement dynamics.

The authors release the full framework, mixed‑domain dataset, and fine‑tuned checkpoints, inviting the community to adopt and extend the approach. Future work is slated to incorporate multimodal memory (e.g., images, audio), conduct online A/B testing within live recommender systems, and explore long‑term simulation of user satisfaction and loyalty.


Comments & Academic Discussion

Loading comments...

Leave a Comment