BAID: A Benchmark for Bias Assessment of AI Detectors

Reading time: 5 minute
...

📝 Original Info

  • Title: BAID: A Benchmark for Bias Assessment of AI Detectors
  • ArXiv ID: 2512.11505
  • Date: 2025-12-12
  • Authors: Priyam Basu, Yunfeng Zhang, Vipul Raheja

📝 Abstract

AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has uncovered isolated cases of bias, particularly against English Language Learners (ELLs) however, there is a lack of systematic evaluation of such systems across broader sociolinguistic factors. In this work, we propose BAID, a comprehensive evaluation framework for AI detectors across various types of biases. As a part of the framework, we introduce over 200k samples spanning 7 major categories: demographics, age, educational grade level, dialect, formality, political leaning, and topic. We also generated synthetic versions of each sample with carefully crafted prompts to preserve the original content while reflecting subgroup-specific writing styles. Using this, we evaluate four open-source state-of-the-art AI text detectors and find consistent disparities in detection performance, particularly low recall rates for texts from underrepresented groups. Our contributions provide a scalable, transparent approach for auditing AI detectors and emphasize the need for bias-aware evaluation before these tools are deployed for public use.

💡 Deep Analysis

Figure 1

📄 Full Content

BAID: A Benchmark for Bias Assessment of AI Detectors Priyam Basu Superhuman priyam.basu@grammarly.com Yunfeng Zhang Superhuman yunfeng.zhang@grammarly.com Vipul Raheja Superhuman vipul.raheja@grammarly.com Abstract AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has un- covered isolated cases of bias, particularly against English Language Learners (ELLs) however, there is a lack of system- atic evaluation of such systems across broader sociolinguistic factors. In this work, we propose BAID, a comprehensive evaluation framework for AI detectors across various types of biases. As a part of the framework, we introduce over 200k samples spanning 7 major categories: demographics, age, edu- cational grade level, dialect, formality, political leaning, and topic. We also generated synthetic versions of each sample with carefully crafted prompts to preserve the original content while reflecting subgroup-specific writing styles. Using this, we evaluate four open-source state-of-the-art AI text detectors and find consistent disparities in detection performance, partic- ularly low recall rates for texts from underrepresented groups. Our contributions provide a scalable, transparent approach for auditing AI detectors and emphasize the need for bias-aware evaluation before these tools are deployed for public use. Introduction As large language models (LLMs) such as GPT-4 (OpenAI 2024) and LLaMA (Touvron et al. 2023) continue to improve, the line between machine-generated and human-written text is becoming increasingly difficult to draw. These models now produce writing that is not only grammatically correct but also stylistically sophisticated and contextually nuanced (Brown et al. 2020), while being indistinguishable to the amateur eye. Recent advancements have introduced new risks around the generation of deceptive content, raising serious concerns about their potential to mislead or manipulate public perception (Solaiman et al. 2019). These risks span a range of real-world applications, including the automated creation of fabricated news stories (Zellers et al. 2020), fake product reviews (Meng et al. 2025), inauthentic social media posts intended to influence public opinion (Loth, Kappes, and Pahl 2024) as well as phishing attacks (Thapa et al. 2025). In parallel, educators have expressed growing unease over the use of generative tools in academic settings (Currie 2023). Recent works have proposed a variety of detection methods aimed at distinguishing machine-written text from human- Copyright © 2025, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. written text. These efforts span a range of approaches, from leveraging statistical irregularities in generated outputs (Gehrmann, Strobelt, and Rush 2019) to training supervised classifiers on curated datasets (Mitchell et al. 2023). Most de- tectors operate under a binary assumption that a given input is either fully AI-generated or fully human-written. This im- plies they evaluate the input text at a paragraph or document level, while some work focusses on fine-grained detection, including phrase-level or even token-level classification (Teja et al. 2025). Although significant progress has been made in develop- ing and evaluating AI-generated text detectors, these models have not been tested for fairness and equity. In particular, research on bias in AI detectors remains sparse. (Liang et al. 2023) systematically investigated this issue, where they found that widely-used detectors disproportionately classify texts written by non-native English speakers as AI-generated due to their lower linguistic perplexity. This discovery under- scores a troubling consequence: detectors may inadvertently penalize individuals based on their language background, even when their writing is entirely original. Motivated by this insight, our work extends the investigation of bias in AI detectors by evaluating their behavior across a broader and more diverse set of dimensions. Specifically, we examine seven types of bias - demographics, age, educational grade level, dialect, formality, political leaning and topic, to offer a more comprehensive assessment of how detection systems may fail across different groups. By doing so, we aim to high- light not only the technical limitations of current detectors but also the social implications of deploying them at scale without rigorous fairness evaluations. Related Works Various methods have been developed for the detection of AI- generated text. Early approaches like (Gehrmann, Strobelt, and Rush 2019) used statistical cues and visualizations to ex- ploit the fact that AI-generated text often relies on a narrower range of high-probability word patterns. Other methods like (Bao et al. 2024) provide zero-shot way solutions by ana- lyzing outputs via perplexity or entropy differences. ZipPy (Thinkst Applied Research 2023) foregoes heavy n

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut