Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts

Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to enhance the generative capabilities of LLMs, remain susceptible to the same bias-related challenges. This work focuses on evaluating and understanding the social bias implications of RAG. Through extensive experiments across various retrieval corpora, LLMs, and bias evaluation datasets, encompassing more than 13 different bias types, we surprisingly observe a reduction in bias in RAG. This suggests that the inclusion of external context can help counteract stereotype-driven predictions, potentially improving fairness by diversifying the contextual grounding of the model’s outputs. To better understand this phenomenon, we then explore the model’s reasoning process by integrating Chain-of-Thought (CoT) prompting into RAG while assessing the faithfulness of the model’s CoT. Our experiments reveal that the model’s bias inclinations shift between stereotype and anti-stereotype responses as more contextual information is incorporated from the retrieved documents. Interestingly, we find that while CoT enhances accuracy, contrary to the bias reduction observed with RAG, it increases overall bias across datasets, highlighting the need for bias-aware reasoning frameworks that can mitigate this trade-off.


💡 Research Summary

The paper investigates how Retrieval‑Augmented Generation (RAG) and Chain‑of‑Thought (CoT) prompting affect social bias in large language models (LLMs). Using Meta‑Llama‑3‑8B‑Instruct and Mistral‑7B‑v0.1 as back‑ends, the authors construct a standard RAG pipeline that retrieves top‑5 documents (250‑word chunks) from two large corpora—WikiText‑103 and the C4 web crawl—via MPNet‑base‑v2 embeddings and cosine similarity, stored in a Chroma vector database. They evaluate bias before and after augmentation on three benchmark suites covering more than 13 bias dimensions: StereoSet‑CrowS‑Pairs‑WinoBias (combined as SCW), BOLD, and HolisticBias.

Results show a consistent reduction in bias scores across almost all bias types (gender, race, age, disability, religion, etc.) when RAG is applied. The reduction is observed for both retrieval sources, indicating that the diversity of external context, rather than the intrinsic quality of the source, drives the mitigation. Correlation analysis reveals that before RAG, gender‑related polarity strongly predicts bias (male polarity positively, female polarity negatively). After RAG these correlations weaken, suggesting that retrieved documents dilute systematic stereotypical patterns. The authors argue that external, balanced information—especially female‑oriented content—acts as a natural counter‑stereotype, neutralising the model’s internal biases.

In a second set of experiments, the same RAG pipeline is combined with CoT prompting, forcing the model to generate step‑by‑step reasoning before the final answer. While CoT improves task accuracy and factuality, it simultaneously raises bias scores across all datasets. The authors term this a “bias‑accuracy trade‑off” and hypothesise that CoT gives the model more freedom to reinterpret retrieved evidence, often amplifying latent stereotypes.

To assess the faithfulness of CoT, the authors employ an “Early Answering” technique: they truncate the CoT explanation at 1 sentence, 25 %, 50 %, and 70 % of its length, feed the truncated reasoning back to the model, and observe the final answer. The analysis shows that early fragments of reasoning tend to preserve the bias‑reduction effect of RAG, whereas later fragments increasingly re‑introduce bias. This suggests that the model initially follows the retrieved evidence but later layers of reasoning drift toward internal, ungrounded biases.

Pearson correlation between bias scores and auxiliary metrics (sentiment, toxicity, regard) further supports these findings. After RAG, correlations weaken, especially for toxicity and sentiment, indicating that contextual diversification mitigates explicit bias. After CoT, correlations strengthen again, showing that extended reasoning aligns bias with other quality dimensions.

Overall, the study demonstrates two opposing forces: external knowledge injection via RAG mitigates social bias, while chain‑of‑thought reasoning can exacerbate it despite improving accuracy. The authors conclude that future RAG‑based systems must incorporate bias‑aware CoT designs, balanced retrieval corpora, and multi‑objective optimisation to simultaneously achieve factual accuracy and fairness.


Comments & Academic Discussion

Loading comments...

Leave a Comment