pith. sign in

arxiv: 2606.11671 · v1 · pith:ZE63NPADnew · submitted 2026-06-10 · 💻 cs.CR · cs.AI

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

Pith reviewed 2026-06-27 09:27 UTC · model grok-4.3

classification 💻 cs.CR cs.AI
keywords LLM agent skillsruntime security auditdynamic analysismalicious behavior detectionself-evolving attacksagent skill securitytargeted probing
0
0 comments X

The pith

Runtime Skill Audit detects malicious LLM agent skills at 90% accuracy through targeted dynamic probing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that static vetting of LLM agent skills is insufficient because malicious behavior can remain hidden until the skill is invoked with particular user requests, local assets, or multi-step interactions. RSA addresses this by dynamically auditing skills through targeted runtime probing of risk-relevant interfaces under prepared execution contexts. This approach yields 90% accuracy and maintains detection effectiveness against evolving attacks, unlike static methods that degrade quickly. A reader would care because reusable skills are becoming central to agent systems, creating new vectors for security issues that static checks cannot reliably catch.

Core claim

RSA is a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. On 100 skills, RSA achieves 90.0% accuracy with an 88.0% true positive rate and an 8.0% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19--20 out of 20 malicious skills across round

What carries the argument

Runtime Skill Audit (RSA), a dynamic analysis method that profiles risk-relevant interfaces, prepares execution contexts, and assigns security labels from trace evidence.

If this is right

  • RSA improves accuracy by 13 percentage points over the best static baseline on the tested skills.
  • RSA maintains detection of 19-20 malicious skills across multiple rounds of self-evolving attacks where static detectors fail after one or two rounds.
  • Dynamic analysis reveals harmful behavior that is not apparent from static code or documentation alone.
  • Targeted probing of risk-relevant interfaces under prepared contexts is more effective than generic testing for skill security.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the targeted probing approach scales, security practices for agent skills may shift toward runtime testing in addition to static review.
  • Other agent frameworks could benefit from adopting context-preparation techniques to audit reusable components.
  • The method implies that skill interfaces may need clearer documentation to support effective runtime auditing.

Load-bearing premise

The specific risk-relevant interfaces, prepared execution contexts, and trace-evidence rules used by RSA are sufficient to surface hidden malicious behavior across the full space of possible user requests, local assets, and multi-step interactions without missing important cases or over-labeling benign skills.

What would settle it

A collection of malicious skills that evade RSA's profiled interfaces and prepared contexts while still producing harm in actual use, or a high rate of false positives on benign skills under varied real-world conditions.

Figures

Figures reproduced from arXiv: 2606.11671 by Chaowei Xiao, Tu Lan.

Figure 1
Figure 1. Figure 1: Example of an environment-dependent ma￾licious skill that can evade static vetting. The artifact resembles a benign file-organizer skill, while the hidden resource rule only becomes security-relevant when exe￾cuted over local assets. attack surface of these agents. Malicious or com￾promised skills may hide unsafe behaviors behind seemingly benign instructions, reusable workflows, and tool-mediated interact… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of Runtime Skill Audit (RSA). Given an agent skill, RSA profiles potential risks, generates [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of skill profiling. RSA combines [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Knowledge and memory design in RSA. Hu￾man security priors and run memory are organized into a knowledge base that guides profiling, task generation, and trace judgment. updated from previous executions. It stores com￾pact summaries of effective triggers, recurring false positives, missed malicious behaviors, and trace ev￾idence that supported prior judgments, similar in spirit to agent memory mechanisms t… view at source ↗
Figure 5
Figure 5. Figure 5: Case study of how RSA converts runtime trace evidence into a behavior-grounded verdict for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Detection robustness under self-evolving skill [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19--20 out of 20 malicious skills across rounds.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes Runtime Skill Audit (RSA), a dynamic analysis method for detecting malicious LLM agent skills that may appear benign statically but exhibit harm under specific runtime conditions (user requests, local assets, state, or multi-step interactions). RSA profiles risk-relevant interfaces, prepares targeted execution contexts, and derives security labels from trace evidence. On a 100-skill corpus against static baselines, it reports 90.0% accuracy (88.0% TPR, 8.0% FPR), a 13-point accuracy gain, and sustained detection (19-20/20 malicious skills) under self-evolving attacks where static detectors fail after 1-2 rounds.

Significance. If the evaluation methodology is sound and the interface coverage is adequate, RSA would address a genuine limitation of static skill vetting for context-dependent malice in agent systems, offering a practical runtime probing approach with demonstrated robustness to adaptive attacks. This could inform security practices for reusable agent skills in LLM deployments.

major comments (2)
  1. [§4 (Evaluation)] §4 (Evaluation): The manuscript supplies no description of the 100-skill dataset composition, how the targeted conditions were chosen, how labels were assigned, or controls for selection bias. Without these details the headline performance numbers (90% accuracy, 88% TPR) cannot be assessed for reliability or generalizability.
  2. [§3 (RSA Method)] §3 (RSA Method): The description of profiling risk-relevant interfaces, preparing execution contexts, and trace-evidence rules provides no argument or evidence that this finite set is sufficient to surface hidden malicious behavior across the full space of possible user requests, local assets, and multi-step interactions. This leaves the accuracy and round-by-round robustness claims tied to the authors' test distribution rather than a general property.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater transparency in the evaluation and method sections. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [§4 (Evaluation)] The manuscript supplies no description of the 100-skill dataset composition, how the targeted conditions were chosen, how labels were assigned, or controls for selection bias. Without these details the headline performance numbers (90% accuracy, 88% TPR) cannot be assessed for reliability or generalizability.

    Authors: We agree that these methodological details are necessary to evaluate the reported performance. In the revised manuscript we will expand §4 with: (1) a breakdown of the 100-skill corpus by source (public repositories and controlled generation) and malicious/benign categories; (2) the process for deriving targeted execution conditions from the profiled risk-relevant interfaces; (3) the label-assignment protocol, performed by two independent security reviewers using explicit criteria for harm; and (4) bias-mitigation steps including stratified sampling across skill complexity and domain. These additions will allow readers to assess reliability and generalizability directly. revision: yes

  2. Referee: [§3 (RSA Method)] The description of profiling risk-relevant interfaces, preparing execution contexts, and trace-evidence rules provides no argument or evidence that this finite set is sufficient to surface hidden malicious behavior across the full space of possible user requests, local assets, and multi-step interactions. This leaves the accuracy and round-by-round robustness claims tied to the authors' test distribution rather than a general property.

    Authors: We acknowledge that exhaustive coverage of an infinite interaction space is impossible and that the paper does not claim universality. RSA deliberately restricts probing to a finite set of risk-relevant interfaces derived from documented attack patterns in LLM-agent literature. The self-evolving attack experiments provide evidence that this targeted set remains effective when adversaries adapt, which goes beyond a single fixed test distribution. In revision we will add an explicit limitations paragraph in §3 discussing the interface-selection rationale and coverage threats to validity, while retaining the practical robustness results. revision: partial

Circularity Check

0 steps flagged

No circularity: RSA presents an empirical runtime evaluation with no self-referential reductions.

full rationale

The manuscript describes a dynamic probing method that selects risk-relevant interfaces, prepares contexts, collects traces, and assigns labels, then reports accuracy on a fixed 100-skill corpus against static baselines. No equations, parameter-fitting steps presented as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled via prior work appear in the text. The reported 90% accuracy and round-by-round detection rates are framed as outcomes of the evaluation procedure rather than quantities defined in terms of themselves. The coverage assumption noted by the skeptic is a potential external-validity concern, not a circularity in the derivation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no concrete information on free parameters, axioms, or invented entities; the method description is too high-level to identify any.

pith-pipeline@v0.9.1-grok · 5729 in / 1185 out tokens · 23501 ms · 2026-06-27T09:27:15.576372+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 12 internal anchors

  1. [1]

    "Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

    Malicious Agent Skills in the Wild: A Large-Scale Security Empirical Study. arXiv e-prints , keywords =. doi:10.48550/arXiv.2602.06547 , archivePrefix =. 2602.06547 , primaryClass =

  2. [2]

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

    InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents. arXiv e-prints , keywords =. doi:10.48550/arXiv.2403.02691 , archivePrefix =. 2403.02691 , primaryClass =

  3. [3]

    arXiv e-prints , keywords =

    BackdoorAgent: A Unified Framework for Backdoor Attacks on LLM-based Agents. arXiv e-prints , keywords =. doi:10.48550/arXiv.2601.04566 , archivePrefix =. 2601.04566 , primaryClass =

  4. [4]

    arXiv e-prints , keywords =

    Agent Skills: A Data-Driven Analysis of Claude Skills for Extending Large Language Model Functionality. arXiv e-prints , keywords =. doi:10.48550/arXiv.2602.08004 , archivePrefix =. 2602.08004 , primaryClass =

  5. [5]

    SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents

    SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents. arXiv e-prints , keywords =. doi:10.48550/arXiv.2605.05726 , archivePrefix =. 2605.05726 , primaryClass =

  6. [6]

    How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings

    How Well Do Agentic Skills Work in the Wild: Benchmarking LLM Skill Usage in Realistic Settings. arXiv e-prints , keywords =. doi:10.48550/arXiv.2604.04323 , archivePrefix =. 2604.04323 , primaryClass =

  7. [7]

    Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw

    Your Agent, Their Asset: A Real-World Safety Analysis of OpenClaw. arXiv e-prints , keywords =. doi:10.48550/arXiv.2604.04759 , archivePrefix =. 2604.04759 , primaryClass =

  8. [8]

    Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis

    Semia: Auditing Agent Skills via Constraint-Guided Representation Synthesis. arXiv e-prints , keywords =. doi:10.48550/arXiv.2605.00314 , archivePrefix =. 2605.00314 , primaryClass =

  9. [9]

    Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

    Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks. arXiv e-prints , keywords =. doi:10.48550/arXiv.2602.20156 , archivePrefix =. 2602.20156 , primaryClass =

  10. [10]

    arXiv e-prints , keywords =

    Formal Analysis and Supply Chain Security for Agentic AI Skills. arXiv e-prints , keywords =. doi:10.48550/arXiv.2603.00195 , archivePrefix =. 2603.00195 , primaryClass =

  11. [11]

    Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale

    Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale. arXiv e-prints , keywords =. doi:10.48550/arXiv.2601.10338 , archivePrefix =. 2601.10338 , primaryClass =

  12. [12]

    arXiv e-prints , keywords =

    SkillProbe: Security Auditing for Emerging Agent Skill Marketplaces via Multi-Agent Collaboration. arXiv e-prints , keywords =. doi:10.48550/arXiv.2603.21019 , archivePrefix =. 2603.21019 , primaryClass =

  13. [13]

    arXiv e-prints , keywords =

    TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection. arXiv e-prints , keywords =. doi:10.48550/arXiv.2510.11203 , archivePrefix =. 2510.11203 , primaryClass =

  14. [14]

    arXiv e-prints , keywords =

    MindGuard: Intrinsic Decision Inspection for Securing LLM Agents Against Metadata Poisoning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2508.20412 , archivePrefix =. 2508.20412 , primaryClass =

  15. [15]

    GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

    GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts. arXiv e-prints , keywords =. doi:10.48550/arXiv.2309.10253 , archivePrefix =. 2309.10253 , primaryClass =

  16. [16]

    arXiv e-prints , keywords =

    MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety. arXiv e-prints , keywords =. doi:10.48550/arXiv.2602.01539 , archivePrefix =. 2602.01539 , primaryClass =

  17. [17]

    SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement

    SkillAttack: Automated Red Teaming of Agent Skills through Attack Path Refinement. arXiv e-prints , keywords =. doi:10.48550/arXiv.2604.04989 , archivePrefix =. 2604.04989 , primaryClass =

  18. [18]

    arXiv e-prints , keywords =

    AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs. arXiv e-prints , keywords =. doi:10.48550/arXiv.2410.05295 , archivePrefix =. 2410.05295 , primaryClass =

  19. [19]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    Reflexion: Language Agents with Verbal Reinforcement Learning. arXiv e-prints , keywords =. doi:10.48550/arXiv.2303.11366 , archivePrefix =. 2303.11366 , primaryClass =

  20. [20]

    Generative Agents: Interactive Simulacra of Human Behavior

    Generative Agents: Interactive Simulacra of Human Behavior. arXiv e-prints , keywords =. doi:10.48550/arXiv.2304.03442 , archivePrefix =. 2304.03442 , primaryClass =