SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration

SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

With the rapid evolution of Large Language Model (LLM) agent ecosystems, centralized skill marketplaces have emerged as pivotal infrastructure for augmenting agent capabilities. However, these marketplaces face unprecedented security challenges, primarily stemming from semantic-behavioral inconsistency and inter-skill combinatorial risks, where individually benign skills induce malicious behaviors during collaborative invocation. To address these vulnerabilities, we propose SkillProbe, a multi-stage security auditing framework driven by multi-agent collaboration. SkillProbe introduces a “Skills-for-Skills” design paradigm, encapsulating auditing processes into standardized skill modules to drive specialized agents through a rigorous pipeline, including admission filtering, semantic-behavioral alignment detection, and combinatorial risk simulation. We conducted a large-scale evaluation using 8 mainstream LLM series across 2,500 real-world skills from ClawHub. Our results reveal a striking popularity-security paradox, where download volume is not a reliable proxy for security quality, as over 90% of high-popularity skills failed to pass rigorous auditing. Crucially, we discovered that high-risk skills form a single giant connected component within the risk-link dimension, demonstrating that cascaded risks are systemic rather than isolated occurrences. We hope that SkillProbe will inspire researchers to provide a scalable governance infrastructure for constructing a trustworthy Agentic Web. SkillProbe is accessible for public experience at skillhub.holosai.io.


💡 Research Summary

SkillProbe is a multi‑stage security auditing framework designed for the emerging ecosystem of large‑language‑model (LLM) agent skill marketplaces. The authors observe that current defenses—runtime prompt filtering and static code scanning—are inadequate because they ignore two novel threat vectors: (1) semantic‑behavioral inconsistency, where a skill’s natural‑language description (the “safe” claim) diverges from its executable logic, and (2) inter‑skill combinatorial risk, where individually benign skills can be orchestrated into malicious execution chains. To address these gaps, SkillProbe introduces a “Skills‑for‑Skills” paradigm: each audit phase is itself packaged as a standardized skill that can be dynamically loaded and executed by specialized auditor agents.

The framework consists of five layers: (1) Input layer for ingesting skill bundles from various sources, (2) Orchestration layer that coordinates multiple auditor agents and maintains audit state, (3) Skill layer containing three core audit modules—Gatekeeper (admission filtering), Alignment Detector (semantic‑behavioral consistency checking), and Flow Simulator (combinatorial risk analysis), (4) Output layer that synthesizes findings into structured reports and a multi‑dimensional security scorecard, and (5) Infrastructure layer providing persistent storage, a tool registry, and a dedicated agent runtime environment.

In the admission filtering stage, SkillProbe checks metadata such as declared permissions, versioning, and required resources, rejecting any skill that requests excessive privileges or exhibits suspicious declarations. The semantic‑behavioral consistency stage extracts a four‑class alignment matrix that maps documented capabilities to observed code behaviors, projects these into a labeled graph, and quantifies mismatches. This captures cases where a skill advertises “restricted” usage yet contains hidden functions for unauthorized data access or command execution. The combinatorial risk stage constructs a risk‑link graph where nodes are skills and edges represent data or control flow between them. By applying risk‑link policies and graph traversal algorithms, SkillProbe automatically generates potential attack chains, revealing how a benign upstream skill can feed malicious inputs to a downstream skill, leading to privilege escalation, data exfiltration, or command injection.

The prototype is built on FastAPI and Vue 3 with an npm‑based REPL, allowing seamless integration of various LLM back‑ends (GPT‑4, Claude, Llama, Gemini, etc.) and third‑party security tools. The authors evaluated SkillProbe on 2,500 real‑world skills from the ClawHub marketplace, using eight mainstream LLM series. Key findings include: (1) a “popularity‑security paradox” where highly downloaded skills are far more likely to fail the audit, indicating download count is a poor proxy for safety; (2) over 90 % of the high‑popularity skills did not pass the rigorous multi‑phase audit; (3) high‑risk skills form a single giant connected component in the risk‑link graph, demonstrating systemic cascade risk rather than isolated vulnerabilities; and (4) SkillProbe uncovered several zero‑day vulnerabilities and complex combinatorial attacks that traditional atomic‑level scanners missed.

The paper acknowledges limitations: the current simulation does not fully emulate external APIs or cloud services, and risk‑link policies may need domain‑specific tuning. Future work includes extending dynamic runtime monitoring, developing real‑time mitigation mechanisms for cascading attacks, and establishing standardized security certifications for open skill marketplaces. In sum, SkillProbe offers a scalable, multi‑agent‑driven governance infrastructure that bridges the gap between static code analysis and runtime defenses, paving the way toward a trustworthy “Agentic Web.”


Comments & Academic Discussion

Loading comments...

Leave a Comment