The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Agentic AI systems are increasingly capable of performing professional and personal tasks with limited human involvement. However, tracking these developments is difficult because the AI agent ecosystem is complex, rapidly evolving, and inconsistently documented, posing obstacles to both researchers and policymakers. To address these challenges, this paper presents the 2025 AI Agent Index. The Index documents information regarding the origins, design, capabilities, ecosystem, and safety features of 30 state-of-the-art AI agents based on publicly available information and email correspondence with developers. In addition to documenting information about individual agents, the Index illuminates broader trends in the development of agents, their capabilities, and the level of transparency of developers. Notably, we find different transparency levels among agent developers and observe that most developers share little information about safety, evaluations, and societal impacts. The 2025 AI Agent Index is available online at https://aiagentindex.mit.edu


💡 Research Summary

The paper presents the 2025 AI Agent Index, a systematic catalog of thirty highly agentic AI systems that are publicly deployed and have significant real‑world impact. Recognizing that the rapid proliferation of agentic AI has outpaced documentation and transparency, the authors develop a rigorous inclusion framework based on three dimensions: agency (autonomy, goal complexity, environmental interaction, and generality), impact (public interest, market significance, or developer prominence), and practicality (public availability, deployability, and general‑purpose capability). All agency criteria must be met, at least one impact criterion, and all practicality criteria.

Candidate agents were identified through large‑language‑model‑driven searches, cross‑referencing with existing lists (the 2024 AI Agent Index, Princeton Holistic Agent Leaderboard, AIAgentList.com), and consultation with Chinese ecosystem experts to mitigate regional blind spots. After screening, the final set of agents was categorized into three interaction paradigms: chat‑based agents with tool access (12 systems), browser‑based agents that directly manipulate a computer or web environment (5 systems), and enterprise workflow agents that embed agentic actions within business process builders (13 systems).

For each agent, the authors collected information across six major categories—product overview, company and accountability, technical capabilities, autonomy & control, ecosystem interaction, and safety & evaluation—resulting in 45 detailed fields per system. The data were drawn exclusively from publicly available documentation, websites, demos, published papers, and governance documents; no experimental testing was performed.

Analysis of the compiled data reveals stark variability in transparency. Large technology firms tend to disclose product roadmaps, pricing, and API specifications, yet they rarely publish detailed safety guardrails, evaluation results, or societal impact assessments. Smaller startups and open‑source projects often share code and datasets but lack publicly reported market valuations or comprehensive safety documentation. The “safety & evaluation” dimension is the most under‑reported, with many agents lacking explicit emergency‑stop mechanisms, third‑party audits, or compliance statements.

The paper also highlights risk differentials across interaction paradigms. Browser‑based agents pose the highest direct‑risk potential because they can execute background actions, perform automated transactions, and interact with external services without user oversight. Enterprise workflow agents introduce complex organizational risks related to data flow, permission management, and integration with legacy systems. Chat‑based agents, while more constrained, still raise concerns about inadvertent generation of harmful content or code.

Based on these findings, the authors make three primary contributions: (1) the creation of the 2025 AI Agent Index itself, providing a reusable, extensible dataset for researchers and policymakers; (2) identification of ecosystem‑wide trends, especially the pervasive lack of safety reporting and uneven transparency among developers; and (3) three case studies illustrating the distinct technical and governance challenges of each interaction paradigm.

The discussion acknowledges limitations: reliance on publicly disclosed information may miss proprietary or internal‑only agents, and the “general‑purpose” criterion involves subjective judgment. To address these gaps, the authors propose a continuous feedback loop (via https://aiagentindex.mit.edu/feedback) and future work incorporating automated crawling, text‑mining, and the development of quantitative safety metrics. They advocate for standardized safety reporting templates, minimum guardrail requirements, and independent third‑party evaluation frameworks to improve accountability and mitigate the unique risks posed by increasingly autonomous AI agents.


Comments & Academic Discussion

Loading comments...

Leave a Comment