In Bad Faith: Assessing Discussion Quality on Social Media
The quality of a user’s social media experience is determined both by the content they see and by the quality of the conversation and interaction around it. In this paper, we look at replies to tweets from mainstream media outlets and official government agencies and assess if they are good faith, engaging honestly and constructively with the original post, or bad faith, attacking the author or derailing the conversation. We assess automated approaches that may help in making this determination and then show that within our dataset of replies to mainstream media outlets and government agencies, bad faith interactions constitute 68.3% of all replies we studied, suggesting potential concerns about the quality of discourse in these specific conversational contexts. This is particularly true from verified accounts, where 91.7% of replies were bad faith. Given that verified accounts are algorithmically amplified, we discuss the implications of our work for understanding the user experience on social media.
💡 Research Summary
The paper “In Bad Faith: Assessing Discussion Quality on Social Media” investigates the nature of replies to high‑visibility tweets posted by mainstream U.S. media outlets and official government accounts on Twitter (now X). The authors define two categories of interaction: “good‑faith” – constructive, evidence‑based, respectful engagement with the original tweet – and “bad‑faith” – comments that dismiss evidence, generalize without support, derail the conversation, employ personal attacks, spread misinformation, or otherwise undermine meaningful discourse. Drawing on philosophical concepts from Sartre and Johansson, they operationalize these definitions into a coding schema that lists specific linguistic and rhetorical cues for each class.
Data collection focused on tweets from 2024 that attracted at least 100 replies, yielding 601 source tweets (441 from media, 160 from government). From these they harvested 52,469 replies, of which 31,283 unique English‑language replies were retained for analysis (non‑English and duplicate content were excluded). Because Twitter’s Terms of Service prohibit public release of the raw data, the authors make it available on request.
For annotation, a random sample of 400 tweet‑reply pairs was independently coded by two human annotators; a third adjudicator resolved disagreements, producing a “ground‑truth” set of 397 labeled replies (the remaining 3 were non‑English). Inter‑annotator agreement was moderate (Cohen’s κ = 0.64), reflecting the inherent subjectivity of judging conversational intent. The same 400 pairs were then fed to ChatGPT‑4 with a detailed prompt that reproduced the coding rubric. ChatGPT’s labels matched the human gold standard with 89.0% agreement (κ = 0.75). Performance metrics showed precision of 84.43% and recall of 81.75% for good‑faith detection, and precision of 91.64% and recall of 92.98% for bad‑faith detection, indicating that large language models can reliably replicate human judgments in this domain.
Applying the trained ChatGPT model to the full set of 31,283 replies revealed that only 24.9% were classified as good‑faith, while 75.1% were bad‑faith. The pattern varied by account type: media‑origin tweets received 20.8% good‑faith replies versus 39.7% for government tweets. Verification status proved a strong predictor: among verified users (151 replies), only 18.7% were good‑faith, compared with 28.5% for unverified users (246 replies). Verified users also generated a higher proportion of bad‑faith comments (81.3% vs. 71.5%). Moreover, the authors observed that verified accounts dominate the top ranks of reply visibility (average rank 32.8 for verified vs. 59.8 for unverified), and the correlation between reply rank and the share of verified users is strongly negative (r = ‑0.85). However, rank showed little correlation with the proportion of good‑faith content (r = 0.18).
The discussion interprets these findings as evidence that high‑visibility conversational spaces around authoritative sources are saturated with hostile or unproductive discourse, especially when amplified by algorithmic promotion of verified accounts. The authors argue that this dynamic may reflect a “parasocial” incentive structure: verified users, often public figures or influencers, may prioritize attention‑grabbing tactics over genuine dialogue, thereby inflating the prevalence of bad‑faith interactions. They also highlight the practical implication that LLM‑based automated labeling can scale discourse‑quality monitoring, potentially informing platform interventions such as incorporating reply‑quality signals into ranking algorithms.
Limitations are acknowledged. The study’s scope is confined to direct replies on a specific subset of high‑engagement tweets, limiting generalizability to broader Twitter conversations or other platforms (e.g., Threads, BlueSky). The binary good/bad‑faith classification simplifies a spectrum of communicative behaviors, as evidenced by the modest human agreement score. Future work is suggested to develop multi‑dimensional labeling schemes (e.g., degrees of hostility, intent, emotional tone) and to test the approach across diverse conversational contexts and platforms.
In conclusion, the paper contributes a theoretically grounded coding framework for assessing discussion quality, demonstrates that ChatGPT‑4 can achieve human‑level labeling accuracy, and uncovers a concerning prevalence of bad‑faith replies—68.3% overall and 81.3% among verified users—in replies to mainstream media and government tweets. These results raise questions about the health of public discourse on social media, especially given the algorithmic amplification of verified accounts that tend to exhibit higher rates of hostile engagement. The authors call for further research into scalable moderation tools and platform design choices that prioritize conversational health over raw engagement metrics.
Comments & Academic Discussion
Loading comments...
Leave a Comment