AgentDyn: A Dynamic Open-Ended Benchmark for Evaluating Prompt Injection Attacks of Real-World Agent Security System
AI agents that autonomously interact with external tools and environments show great promise across real-world applications. However, the external data which agent consumes also leads to the risk of indirect prompt injection attacks, where malicious instructions embedded in third-party content hijack agent behavior. Guided by benchmarks, such as AgentDojo, there has been significant amount of progress in developing defense against the said attacks. As the technology continues to mature, and that agents are increasingly being relied upon for more complex tasks, there is increasing pressing need to also evolve the benchmark to reflect threat landscape faced by emerging agentic systems. In this work, we reveal three fundamental flaws in current benchmarks and push the frontier along these dimensions: (i) lack of dynamic open-ended tasks, (ii) lack of helpful instructions, and (iii) simplistic user tasks. To bridge this gap, we introduce AgentDyn, a manually designed benchmark featuring 60 challenging open-ended tasks and 560 injection test cases across Shopping, GitHub, and Daily Life. Unlike prior static benchmarks, AgentDyn requires dynamic planning and incorporates helpful third-party instructions. Our evaluation of ten state-of-the-art defenses suggests that almost all existing defenses are either not secure enough or suffer from significant over-defense, revealing that existing defenses are still far from real-world deployment. Our benchmark is available at https://github.com/leolee99/AgentDyn.
💡 Research Summary
AgentDyn is a newly introduced benchmark that addresses three critical shortcomings of existing agent‑security evaluation suites—static tasks, absence of benign instructions, and overly simplistic user goals. Built on top of the AgentDojo framework, AgentDyn comprises three domains (Shopping, GitHub, Daily Life) and contains 60 open‑ended user tasks together with 560 prompt‑injection test cases. Each task requires dynamic replanning: agents must adapt their action sequence in response to real‑time feedback from tools or environments, rather than executing a pre‑computed plan. Moreover, every task embeds at least one helpful third‑party instruction (e.g., “please log in first”) that is necessary for successful completion, forcing defenses to discriminate between benign and malicious directives. The benchmark also raises task complexity dramatically: the average trajectory length is 7.1 steps, the average number of distinct applications involved is 3.17, and the average visible tool count per task is 33.33—substantially higher than the figures reported for AgentDojo (3 steps, 1.38 apps, 19.87 tools).
To evaluate the state of the art, the authors tested ten recent defense mechanisms spanning four categories: prompting‑based (Prompt Sandwich, Spotlight, PromptGuard2), alignment‑based (StruQ, SecAlign, Meta SecAlign), filtering‑based (ProtectAI, PIGuard, PromptGuard2, PromptArmor), and system‑level (Tool Filter, CaMeL, Progent, DRIFT). All defenses were run on a GPT‑4o‑powered agent within the AgentDyn environment, and performance was measured in terms of Attack Success Rate (ASR) and utility (task completion effectiveness). The results reveal two dominant failure modes. First, many prompting‑based defenses that performed near‑perfectly on static benchmarks suffer high ASR (30‑70 %) when faced with dynamic replanning, indicating that they rely on the assumption that the agent’s plan never changes. Second, defenses that heavily depend on the initial plan (Tool Filter, CaMeL, DRIFT) experience severe utility drops (often >50 %) because any deviation from the pre‑planned tool sequence triggers over‑defense, effectively halting the agent. Filtering‑based approaches (ProtectAI, PIGuard) also struggle to distinguish helpful instructions from malicious ones, leading to near‑zero utility. Policy‑driven system defenses such as Progent show degraded performance as the toolset expands, highlighting scalability issues.
These findings lead the authors to conclude that current defenses are tuned to the static, low‑complexity environments of prior benchmarks and do not generalize to realistic, open‑ended scenarios. AgentDyn therefore serves both as a more faithful stress test for existing methods and as a catalyst for future research that must incorporate dynamic planning, context‑aware instruction filtering, and flexible policy enforcement. By releasing the benchmark as open‑source, the authors invite the community to extend the suite with new attack vectors and defense strategies, aiming to shift the field from “static evaluation” toward “real‑world, dynamic security assessment” of LLM‑driven agents.
Comments & Academic Discussion
Loading comments...
Leave a Comment