LLM-Augmented Digital Twin for Policy Evaluation in Short-Video Platforms
Short-video platforms are closed-loop, human-in-the-loop ecosystems where platform policy, creator incentives, and user behavior co-evolve. This feedback structure makes counterfactual policy evaluation difficult in production, especially for long-horizon and distributional outcomes. The challenge is amplified as platforms deploy AI tools that change what content enters the system, how agents adapt, and how the platform operates. We propose a large language model (LLM)-augmented digital twin for short-video platforms, with a modular four-twin architecture (User, Content, Interaction, Platform) and an event-driven execution layer that supports reproducible experimentation. Platform policies are implemented as pluggable components within the Platform Twin, and LLMs are integrated as optional, schema-constrained decision services (e.g., persona generation, content captioning, campaign planning, trend prediction) that are routed through a unified optimizer. This design enables scalable simulations that preserve closed-loop dynamics while allowing selective LLM adoption, enabling the study of platform policies, including AI-enabled policies, under realistic feedback and constraints.
💡 Research Summary
The paper addresses the pressing challenge of evaluating policy changes on short‑video platforms such as TikTok, Instagram Reels, and Kuaishou. These platforms operate as closed‑loop, human‑in‑the‑loop systems: platform policies influence creator incentives and user behavior, which in turn reshape the data‑generating process that drives future policy decisions. Traditional evaluation methods—online A/B tests and offline counterfactual estimators—struggle to capture long‑horizon, distributional effects, especially when AI‑driven tools (e.g., automated captioning, trend prediction, creator assistants) are embedded in the decision pipeline, accelerating feedback loops and obscuring causal attribution.
To overcome these limitations, the authors propose an LLM‑augmented digital twin that faithfully reproduces the platform’s feedback dynamics while allowing safe, reproducible counterfactual experiments. The architecture consists of four modular “twins”—User, Content, Interaction, and Platform—coordinated by an event‑driven execution layer and a central optimizer service that mediates all calls to large language models.
User Twin models a population of autonomous agents. Each agent has static attributes (demographics, creator tier, calibrated propensity parameters) and a dynamic 50‑dimensional latent preference vector. Agents are initialized from structured personas, and their preferences evolve through a short‑term/long‑term memory system inspired by the Ebbinghaus forgetting curve, enabling realistic drift over time. The twin exposes a limited interface: given a session context (served impressions), it returns an action (consume, engage, create) and any auxiliary parameters.
Content Twin abstracts each short video as a feature‑based profile rather than raw media, drastically reducing storage and computational cost. At creation time, optional LLM services generate captions, hashtags, and trend labels, enriching the semantic representation without incurring pixel‑level processing.
Interaction Twin captures the full spectrum of user‑content events (exposure, watch, like, comment, share, duet, stitch, gifting). It records these as typed events, updates content popularity scores, and feeds back into user preference updates, thereby closing the loop between exposure and behavior.
Platform Twin encapsulates all platform levers as pluggable, parameterized components: recommendation algorithms, exposure‑stage logic, trend tracking, moderation, and promotion mechanisms. Policy evaluation is performed by swapping or re‑parameterizing these components while keeping the rest of the world state fixed, ensuring that observed outcome differences are attributable solely to the policy change.
All twins communicate through an event bus that defines 23 cross‑twin event types, making the causal pathway explicit: User → Interaction → Content → Platform and back. An orchestrator advances simulated time, dispatches event handlers, and logs every state transition in an append‑only fashion for full reproducibility.
LLM integration is realized via a Unified Optimizer Service housed in the Platform Twin. The optimizer exposes a schema‑constrained API (e.g., persona generation, caption generation, trend prediction, campaign planning). Any twin can invoke these services during event handling without embedding separate LLM clients. This design enables selective, cost‑governed LLM usage: the system can decide when the semantic fidelity provided by an LLM is essential and when simpler rule‑based logic suffices, thereby balancing realism against computational expense.
The authors compare their framework to the recent OASIS simulator, which couples large‑scale agents with a recommender system but lacks explicit policy modules, fine‑grained event modeling, and LLM‑driven semantic richness. By contrast, the proposed digital twin offers (1) a clear modular policy interface, (2) LLM‑augmented semantic generation, and (3) an event‑driven closed‑loop that faithfully reproduces the platform’s dynamic equilibrium.
Limitations are acknowledged. Abstracting content as feature vectors omits visual and auditory cues that may affect user engagement. Scaling LLM calls to production‑level traffic will require sophisticated cost‑governance, caching, and batching strategies. The current counterfactual setup assumes that only policy components change while other twins remain static, which may under‑represent structural shifts induced by long‑term policy adoption.
Future work could extend the twin to multimodal content representations, integrate adaptive LLM serving layers, and develop meta‑simulation capabilities that allow policies to reshape the very structure of users and content over extended horizons.
In summary, the paper delivers a novel, scalable digital twin framework that captures the intricate human‑in‑the‑loop feedback of short‑video ecosystems and leverages LLMs as optional, schema‑constrained decision services. This enables rigorous, safe, and interpretable evaluation of both traditional and AI‑enabled platform policies, advancing the state of the art in policy simulation for large‑scale social media platforms.
Comments & Academic Discussion
Loading comments...
Leave a Comment