From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models

From Preferences to Prejudice: The Role of Alignment Tuning in Shaping Social Bias in Video Diffusion Models
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Recent advances in video diffusion models have significantly enhanced text-to-video generation, particularly through alignment tuning using reward models trained on human preferences. While these methods improve visual quality, they can unintentionally encode and amplify social biases. To systematically trace how such biases evolve throughout the alignment pipeline, we introduce VideoBiasEval, a comprehensive diagnostic framework for evaluating social representation in video generation. Grounded in established social bias taxonomies, VideoBiasEval employs an event-based prompting strategy to disentangle semantic content (actions and contexts) from actor attributes (gender and ethnicity). It further introduces multi-granular metrics to evaluate (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts in social attributes across model variants, and (4) the temporal persistence of bias within videos. Using this framework, we conduct the first end-to-end analysis connecting biases in human preference datasets, their amplification in reward models, and their propagation through alignment-tuned video diffusion models. Our results reveal that alignment tuning not only strengthens representational biases but also makes them temporally stable, producing smoother yet more stereotyped portrayals. These findings highlight the need for bias-aware evaluation and mitigation throughout the alignment process to ensure fair and socially responsible video generation.


💡 Research Summary

The paper investigates how alignment tuning—fine‑tuning video diffusion models with reward models learned from human preference data—affects social bias in text‑to‑video generation. To expose bias at a fine‑grained level, the authors introduce VideoBiasEval, a diagnostic framework that uses event‑centric prompts. Each prompt specifies an actor’s gender and ethnicity, a verb (42 socially‑charged actions), and a context, forming a ⟨actor, verb, context⟩ tuple. This separation allows the study to disentangle who is depicted from what they are doing.

Generated videos are analyzed with large visual‑language models (Qwen2‑VL 7B, Qwen2.5‑VL 7B, InternVL 8B) that automatically label gender and ethnicity per frame. Four multi‑granular metrics are computed: (1) overall ethnicity bias, (2) gender bias conditioned on ethnicity, (3) distributional shifts across model variants, and (4) Temporal Attribute Stability (TAS), which measures how consistently a bias persists throughout a video.

The authors first examine two human‑preference datasets, HPDv2 and Pick‑a‑Pic, and find pronounced male‑oriented and White‑oriented preferences. Reward models trained on these datasets (HPSv2.0, HPSv2.1, PickScore) inherit and amplify the same skew. When a VideoCrafter‑2‑based diffusion model is aligned with these reward models, visual quality and temporal coherence improve markedly, but bias worsens: male‑preferred rewards increase the proportion of men in authority‑related actions (driving, working), while female‑preferred rewards boost women in caregiving actions (cooking, cleaning). Ethnicity bias remains heavily skewed toward White representations (≈60 % of frames), with minority groups largely absent or relegated to background roles.

TAS analysis shows that alignment tuning makes biased representations more temporally stable, turning occasional stereotypical depictions into persistent, smooth portrayals across the entire clip. This demonstrates that “preference” signals act as a conduit for systemic bias, fixing it into the generative pipeline.

To explore mitigation, the authors construct curated reward datasets with deliberately balanced gender and ethnicity distributions. Aligning the video model with these balanced rewards successfully steers the generated videos toward the desired demographic mix, indicating that bias can be controlled at the reward‑model stage.

The paper contributes three main advances: (1) the VideoBiasEval framework for fine‑grained, event‑level bias evaluation in video generation; (2) the first end‑to‑end analysis linking human preference data, reward models, and aligned video diffusion models, revealing bias inheritance and amplification; (3) empirical evidence that controlled reward‑model training can mitigate bias, offering a practical pathway toward fairer video generation. The work underscores that while alignment tuning boosts perceptual quality, it simultaneously risks reinforcing societal stereotypes, making bias‑aware evaluation and mitigation essential throughout the generative pipeline.


Comments & Academic Discussion

Loading comments...

Leave a Comment