📝 Original Info Title: BAID: A Benchmark for Bias Assessment of AI DetectorsArXiv ID: 2512.11505Date: 2025-12-12Authors: Priyam Basu, Yunfeng Zhang, Vipul Raheja📝 Abstract AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has uncovered isolated cases of bias, particularly against English Language Learners (ELLs) however, there is a lack of systematic evaluation of such systems across broader sociolinguistic factors. In this work, we propose BAID, a comprehensive evaluation framework for AI detectors across various types of biases. As a part of the framework, we introduce over 200k samples spanning 7 major categories: demographics, age, educational grade level, dialect, formality, political leaning, and topic. We also generated synthetic versions of each sample with carefully crafted prompts to preserve the original content while reflecting subgroup-specific writing styles. Using this, we evaluate four open-source state-of-the-art AI text detectors and find consistent disparities in detection performance, particularly low recall rates for texts from underrepresented groups. Our contributions provide a scalable, transparent approach for auditing AI detectors and emphasize the need for bias-aware evaluation before these tools are deployed for public use.
💡 Deep Analysis
📄 Full Content BAID: A Benchmark for Bias Assessment of AI Detectors
Priyam Basu
Superhuman
priyam.basu@grammarly.com
Yunfeng Zhang
Superhuman
yunfeng.zhang@grammarly.com
Vipul Raheja
Superhuman
vipul.raheja@grammarly.com
Abstract
AI-generated text detectors have recently gained adoption in
educational and professional contexts. Prior research has un-
covered isolated cases of bias, particularly against English
Language Learners (ELLs) however, there is a lack of system-
atic evaluation of such systems across broader sociolinguistic
factors. In this work, we propose BAID, a comprehensive
evaluation framework for AI detectors across various types of
biases. As a part of the framework, we introduce over 200k
samples spanning 7 major categories: demographics, age, edu-
cational grade level, dialect, formality, political leaning, and
topic. We also generated synthetic versions of each sample
with carefully crafted prompts to preserve the original content
while reflecting subgroup-specific writing styles. Using this,
we evaluate four open-source state-of-the-art AI text detectors
and find consistent disparities in detection performance, partic-
ularly low recall rates for texts from underrepresented groups.
Our contributions provide a scalable, transparent approach for
auditing AI detectors and emphasize the need for bias-aware
evaluation before these tools are deployed for public use.
Introduction
As large language models (LLMs) such as GPT-4 (OpenAI
2024) and LLaMA (Touvron et al. 2023) continue to improve,
the line between machine-generated and human-written text
is becoming increasingly difficult to draw. These models
now produce writing that is not only grammatically correct
but also stylistically sophisticated and contextually nuanced
(Brown et al. 2020), while being indistinguishable to the
amateur eye. Recent advancements have introduced new risks
around the generation of deceptive content, raising serious
concerns about their potential to mislead or manipulate public
perception (Solaiman et al. 2019). These risks span a range
of real-world applications, including the automated creation
of fabricated news stories (Zellers et al. 2020), fake product
reviews (Meng et al. 2025), inauthentic social media posts
intended to influence public opinion (Loth, Kappes, and Pahl
2024) as well as phishing attacks (Thapa et al. 2025). In
parallel, educators have expressed growing unease over the
use of generative tools in academic settings (Currie 2023).
Recent works have proposed a variety of detection methods
aimed at distinguishing machine-written text from human-
Copyright © 2025, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
written text.
These efforts span a range of approaches,
from leveraging statistical irregularities in generated outputs
(Gehrmann, Strobelt, and Rush 2019) to training supervised
classifiers on curated datasets (Mitchell et al. 2023). Most de-
tectors operate under a binary assumption that a given input
is either fully AI-generated or fully human-written. This im-
plies they evaluate the input text at a paragraph or document
level, while some work focusses on fine-grained detection,
including phrase-level or even token-level classification (Teja
et al. 2025).
Although significant progress has been made in develop-
ing and evaluating AI-generated text detectors, these models
have not been tested for fairness and equity. In particular,
research on bias in AI detectors remains sparse. (Liang et
al. 2023) systematically investigated this issue, where they
found that widely-used detectors disproportionately classify
texts written by non-native English speakers as AI-generated
due to their lower linguistic perplexity. This discovery under-
scores a troubling consequence: detectors may inadvertently
penalize individuals based on their language background,
even when their writing is entirely original. Motivated by
this insight, our work extends the investigation of bias in AI
detectors by evaluating their behavior across a broader and
more diverse set of dimensions. Specifically, we examine
seven types of bias - demographics, age, educational grade
level, dialect, formality, political leaning and topic, to offer a
more comprehensive assessment of how detection systems
may fail across different groups. By doing so, we aim to high-
light not only the technical limitations of current detectors
but also the social implications of deploying them at scale
without rigorous fairness evaluations.
Related Works
Various methods have been developed for the detection of AI-
generated text. Early approaches like (Gehrmann, Strobelt,
and Rush 2019) used statistical cues and visualizations to ex-
ploit the fact that AI-generated text often relies on a narrower
range of high-probability word patterns. Other methods like
(Bao et al. 2024) provide zero-shot way solutions by ana-
lyzing outputs via perplexity or entropy differences. ZipPy
(Thinkst Applied Research 2023) foregoes heavy n
📸 Image Gallery
Reference This content is AI-processed based on open access ArXiv data.