Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance

Collaborating with AI Agents: Field Experiments on Teamwork, Productivity, and Performance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We examined the mechanisms underlying productivity and performance gains from AI agents using a large-scale experiment on Pairit, a platform we developed to study human-AI collaboration. We randomly assigned 2,234 participants to human-human and human-AI teams that produced 11,024 ads for a think tank. We evaluated the ads using independent human ratings and a field experiment on X which garnered ~5M impressions. We found human-AI teams produced 50% more ads per worker and higher text quality, while human-human teams produced higher image quality, suggesting a jagged frontier of AI agent capability. Human-AI teams also produced more homogeneous, or self-similar, outputs. The field experiment revealed higher text quality improved click-through rates and view-through duration, while higher image quality improved cost-per-click rates. We found three mechanisms explained these effects. First, human-AI collaboration was more task-oriented, with 25% more task-oriented messages and 18% fewer interpersonal messages. Second, human-AI collaboration displayed more delegation, as participants delegated 17% more work to AI agents than to human partners and performed 62% fewer direct text edits when working with AI. Third, recognition that the collaborator was an AI moderated these effects as participants who correctly identified they were working with AI were more task-oriented and more likely to delegate work. These mechanisms then explained performance as task-oriented communication improved ad quality, specifically when working with AI, while interpersonal communication reduced ad quality; delegation improved text quality but had no effect on image quality and was positively associated with diversity collapse, creating homogeneous outputs of higher average quality. The results suggest AI agents drive changes in productivity, performance, and output diversity by reshaping teamwork.


💡 Research Summary

This paper presents a large‑scale field experiment that investigates how collaborating with multimodal AI agents reshapes productivity, output quality, and diversity in a real‑world creative task. Using a newly built platform called Pairit, the authors recruited 2,234 U.S.‑representative participants and randomly assigned them to either human‑human (HH) dyads or human‑AI (HA) dyads. Each dyad worked together to produce advertising campaigns for a think‑tank’s annual report, generating a total of 11,024 ads that included ad copy, calls‑to‑action, and images. Pairit recorded every keystroke, chat message, edit, swipe, and API call, yielding a richly detailed dataset (over 182 k messages, 1.9 M text edits, 62 k image edits, and 10 k AI‑generated images).

After the lab phase, the ads were evaluated by independent human raters and AI quality models, and then deployed in a live field experiment on the social platform X, where they accumulated roughly 5 million impressions. Performance metrics such as click‑through rate (CTR), cost‑per‑click (CPC), view‑through rate, and view‑through duration were collected.

Key findings:

  1. Productivity – HA teams produced 50 % more ads per worker than HH teams. The AI partner generated draft copy, performed many routine edits, and called external image‑generation APIs, freeing human participants to focus on higher‑level decisions.

  2. Quality and Market Outcomes – Ads from HA teams scored higher on textual quality (average +0.23 on a 5‑point scale) and achieved higher CTR (+12 %) and longer view‑through duration (+8 %). Conversely, HH teams created higher‑quality images, which translated into lower CPC (‑9 %). Thus, AI excels at text‑centric improvements, while humans retain an edge in visual judgment.

  3. Diversity Collapse – Outputs from HA teams were more self‑similar; a self‑similarity metric rose by 18 % relative to HH teams. While average quality increased, the variance across ads shrank, indicating a homogenization effect typical of AI‑driven generation.

The authors identify three mechanisms that mediate these effects:

  • Task‑oriented communication – HA dyads sent 25 % more messages that were explicitly about task instructions, prioritization, and planning, while HH dyads sent 18 % more socially or emotionally oriented messages. Task‑oriented communication positively correlated with text quality and CTR, especially in the presence of an AI partner.

  • Delegation – In HA teams, participants performed 62 % fewer direct text edits and delegated 17 % more work to the AI. Delegation boosted text quality but did not affect image quality, and it was strongly linked to the observed diversity collapse.

  • AI recognition – Participants who correctly identified that their partner was an AI displayed higher task‑orientation and delegated more work. This suggests that awareness of the partner’s artificial nature prompts more efficient collaboration strategies.

The paper situates these findings within team‑work theory (distinguishing taskwork from interpersonal processes) and human‑robot interaction literature, arguing that AI agents transform the balance between task‑focused and relational activities. The authors also introduce the concept of a “jagged frontier”: AI provides substantial gains on some dimensions (text generation) while lagging on others (image creation), leading to uneven performance improvements across tasks.

Practical implications include redesigning workflows to emphasize delegation and task‑oriented messaging when AI teammates are present, making the AI’s identity explicit to encourage efficient behavior, and implementing safeguards (e.g., varied prompts, multiple model ensembles) to mitigate diversity collapse.

In sum, this study offers the first randomized, task‑level evidence that multimodal AI agents act as active collaborators rather than passive tools, fundamentally reshaping teamwork dynamics, boosting productivity, altering quality trade‑offs, and inducing homogenization of outputs. Future research should explore other domains, longer‑term team evolution, and methods to balance AI efficiency with creative diversity.


Comments & Academic Discussion

Loading comments...

Leave a Comment