Large Language Models (LLMs) are increasingly embedded in autonomous agents that engage, converse, and co-evolve in online social platforms. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments). We conduct a large-scale empirical analysis of agent behavior, examining how toxic responses relate to toxic stimuli, how repeated exposure to toxicity affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that toxic responses are more likely following toxic stimuli, and, at the same time, cumulative toxic exposure (repeated over time) significantly increases the probability of toxic responding. We further introduce two influence metrics, the Influence-Driven Toxic Response Rate and the Spontaneous Toxic Response Rate, revealing a strong negative correlation between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents, particularly as such agents operate in online environments where they may engage not only with other AI chatbots, but also with human counterparts. This could trigger unwanted and pernicious phenomena, such as hate-speech propagation and cyberbullying. In an effort to reduce such risks, monitoring exposure to toxic content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.
Social bots have long played a significant role in online platforms, influencing information diffusion, engagement dynamics, and public discourse [5,19,23,44]. However, the advent of Large Language Models (LLMs) has enabled a new generation of social bots capable of far more sophisticated and naturalistic interactions with users and with one another. These advances have spurred growing academic interest in understanding the behavior of LLM-driven social agents deployed in or modeled after online social ecosystems [43,45].
Unlike traditional rule-based or template-driven bots, LLM-based agents exhibit adaptive and emergent behaviors that arise from ongoing interactions within social networks [3,30]. Recent work has leveraged offline simulated social environments to examine the extent to which such agents can replicate human-like network structures [26], coordinate and perform cooperative tasks [41,47], or give rise to collective phenomena such as polarization and echo chambers [32,42]. Together, these studies suggest that LLM agents are not merely passive generators of text, but active participants in social dynamics whose behavior evolves over time.
At the same time, despite the underlying guardrails, LLMs can be misused to generate toxic or harmful content at scale, posing risks to online communities and individual users [9,21]. In response, a substantial body of work has focused on measuring harmful generation and developing mitigation strategies, such as safer training procedures, filtering mechanisms, and post-hoc moderation [46].
Complementarily, a growing body of work in human-centered computing has shown that exposure to harmful content can influence user behavior, increasing the likelihood of adopting similar language or norms over time [24]. However, it remains unclear whether, and to what extent, analogous dynamics apply to LLMdriven agents.
In particular, we lack empirical evidence on: (i) how these agents behave when they solely interact to each other, in a fully AI-driven, dedicated platform; (ii) whether toxic content acts as a trigger for toxic responses, and (iii) whether repeated exposure systematically increases agents propensity to generate harmful outputs.
In this paper, we address these gaps by performing a large-scale auditing on Chirper.ai, a fully AI-driven social network. Here, users can create and release LLMs-based agents that autonomously interact in the dedicated ecosystem, by generating posts and comments, as well as by interacting with other AI agents via following and liking mechanisms. Specifically, we rely on the platform to study toxicity adoption in LLM-driven agents through the lens of stimuli and responses. We model stimuli as posts that an agent explicitly comments on, and responses as the comments produced by the agent.
Our main contributions are threefold.
• We are the first to provide a large-scale empirical audit of the mechanism of exposure-adoption toxicity in a fully AIdriven social platform. • We show that stimuli play a central role in shaping the harmful behavior of LLM-based agents, acting not only as a trigger for toxic response, but increasing the likelihood of toxic behavior under cumulative exposure to toxic content. • We further show that the number of toxic stimuli encountered by an agent alone enables accurate prediction of whether it will eventually produce toxic content, without requiring access to model internals, training data, or prompt instructions.
We believe that uncovering agent dynamics in these open ecosystems is crucial, as they function simultaneously as (i) testbeds for the emergence and evaluation of deployment norms, (ii) sources of training and fine-tuning data that shape future models, and (iii) precursors to mixed human-agent platforms where interaction patterns may later be formalized and scaled.
The remaining paper is structured as follows. Section 2 discusses the related literature, focusing on AI-based agents, toxic adoption, and the Chirper.ai platform. In Section 3, we provide details regarding data and methodology employed in our analysis, while Section 4 illustrates the results. Section 5 further discusses the implication of our work. Finally, Section 6 concludes the paper, addressing its limitations and providing cues for future research directions.
In the following, we review the relevant literature by focusing on three core components of this work: (i) the behavior of AIdriven agents in online social ecosystems, (ii) the social mechanisms underlying exposure to harmful content and toxicity adoption, and (iii) Chirper.ai, the platform that serves as the empirical basis for our investigation. AI-Agents in Social Platform. A growing body of research investigates how conversational agents and LLM-driven accounts behave when embedded in social platforms, where interactions are sequential, public, and shaped by evolving community norms [12]. In these environments, agent behavior is no longer determined solely by isola
This content is AI-processed based on open access ArXiv data.