Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models
Vision-language models trained on large-scale multimodal datasets show strong demographic biases, but the role of training data in producing these biases remains unclear. A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M. We address this gap by creating person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions. These annotations are produced through validated automatic labeling pipelines combining object detection, multimodal captioning, and finetuned classifiers. Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content. We also show that a linear fit predicts 60-70% of gender bias in CLIP and Stable Diffusion from direct co-occurrences in the data. Our resources establish the first large-scale empirical link between dataset composition and downstream model bias. Code is available at https://github.com/ExplainableML/LAION-400M-Person-Centric-Annotations.
💡 Research Summary
The paper presents a comprehensive, automated annotation pipeline for the entire LAION‑400M dataset, focusing on person‑centric metadata: bounding boxes, perceived gender, perceived race/ethnicity, and detailed captions. By detecting 276 million person boxes with YOLOv11‑l (using a low confidence threshold to maximize recall) and filtering out boxes smaller than 30 pixels, the authors ensure that visual cues necessary for downstream classification are present. Gender labels are derived through a three‑model consensus (Phi‑3.5‑Vision, LLaVA‑1.6, InternVL3) on a curated 300 k sample, followed by fine‑tuning a SigLIP classifier that achieves 97.2% accuracy on a held‑out test set and comparable performance on external benchmarks (Phase, FACET). Race/ethnicity labels are built from seven categories (Black, East Asian, Hispanic, Middle Eastern, South Asian, Southeast Asian, White) by first retrieving images whose alt‑text contains related keywords, then applying the same three‑model consensus on 7 million samples, and finally training a second SigLIP classifier to label the full dataset. Captions for each detected person are generated with InternVL3‑2B, with faces blurred for privacy.
Armed with these annotations, the authors conduct a large‑scale bias audit. Statistical analysis reveals that men and individuals perceived as Black or Middle Eastern co‑occur disproportionately with crime‑related terms (“criminal”, “weapon”, “gang”) and negative sentiment, while women are more often linked to fashion, beauty, and positive sentiment. A Structured Attribute Extraction (SAE) analysis further maps these co‑occurrences to visual contexts such as streets, prisons, or stages, highlighting how certain demographic groups are over‑represented in stereotypical scenes.
To assess bias transfer, the paper examines CLIP and Stable Diffusion models pretrained on LAION‑400M. By measuring gender bias scores (e.g., differences in similarity to “woman” vs. “man” prompts) and regressing them against the dataset’s gender‑race co‑occurrence frequencies, a simple linear model explains 60‑70% of the variance (R² = 0.60‑0.70). This provides the first empirical link that the composition of a web‑scale multimodal dataset can predict downstream model bias to a substantial degree.
The authors discuss ethical considerations: perceived demographic labels are inherently noisy and do not capture self‑identified identities; access to the full annotated dataset is gated and requires acknowledgment of these limitations. They also evaluate detection bias (showing no significant gender or race disparity) and report human validation of bounding boxes (82.5% correct). Limitations include reliance on visual cues alone, potential amplification of existing societal stereotypes, and the binary gender framing.
Future work is suggested in three directions: improving annotation fidelity (e.g., multi‑annotator consensus, finer‑grained identity categories), developing debiasing strategies that leverage the released annotations (such as re‑weighting or sub‑sampling), and extending the methodology to newer, larger multimodal corpora. Overall, the paper delivers a valuable resource and a rigorous methodology for quantifying how data‑level imbalances propagate into vision‑language models, offering a concrete foundation for future fairness research in large‑scale AI.
Comments & Academic Discussion
Loading comments...
Leave a Comment