How Your Credentials Are Leaked by LLM Agent Skills: An Empirical Study
Pith reviewed 2026-05-13 19:48 UTC · model grok-4.3
The pith
Third-party LLM agent skills leak credentials in over 500 cases through debug logs and prompt injections.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that 520 skills leak credentials, distributed across 10 patterns where debug logging via print and console.log causes 73.5 percent of cases, 76.3 percent of leaks require combined analysis of code and natural language, 3.1 percent arise purely from prompt injection, and 89.6 percent of leaked credentials are exploitable without privileges while persisting in forks even after upstream fixes.
What carries the argument
A taxonomy of 10 leakage patterns (4 accidental and 6 adversarial) derived from static analysis, sandbox testing, and manual inspection of skills.
If this is right
- Debug logging statements cause 73.5 percent of leaks because their output reaches the LLM through stdout.
- 76.3 percent of leaks are cross-modal and cannot be found without examining both code and natural language together.
- 89.6 percent of leaked credentials can be used directly without any special privileges.
- Credentials remain present in forks even after the original skill is updated or fixed.
- Public disclosure resulted in removal of all malicious skills and fixes to 91.6 percent of hardcoded credential cases.
Where Pith is reading between the lines
- Skill platforms could integrate automated scanning for these 10 patterns before allowing public distribution.
- Developers should receive explicit guidance to avoid any logging that writes credentials to outputs accessible by the LLM.
- Persistence across forks indicates that public repositories for skills need stronger secret-masking practices.
- Applying the same taxonomy to skills on other agent platforms would test whether the patterns are platform-specific.
Load-bearing premise
The sampled skills accurately represent the full population on the platform and the chosen detection methods reliably identify every leakage pattern present.
What would settle it
Repeating the full analysis on the unsampled skills or applying runtime execution monitoring to check whether additional leaks appear beyond those found by static and sandbox methods.
Figures
read the original abstract
Large Language Model (LLM) agents increasingly rely on third-party skills that operate within privileged execution environments and routinely handle sensitive credentials, yet how these credentials are leaked remains largely unexplored. To fill this gap, we present the first large-scale empirical study on credential leakage in agent skills. From 170,226 artifacts on SkillsMP, the largest open-source skill marketplace, we sampled 17,022 skills via stratified random sampling and analyzed each through static secret extraction (regex and AST parsing), dynamic sandbox testing with mock credentials, and cross-referencing developer intent against runtime behavior. Our analysis identifies 520 affected skills containing 1,708 security issues, and yields a taxonomy of 10 leakage patterns. Three findings stand out. First, 76.3% of cases require jointly analyzing natural-language descriptions and programming logic, showing that credential exposure in skills is fundamentally cross-modal. Second, debug logging accounts for 73.5% of vulnerabilities because agent frameworks feed stdout into the LLM context window, turning routine debugging into a credential exposure vector. Third, 89.6% of leaked credentials are immediately exploitable -- 92.5% during routine execution without elevated privileges -- and the fork-based distribution model defeats remediation, as secrets removed from 107 upstream repositories persist across 50+ independent forks. Following responsible disclosure, all malicious skills have been removed and 91.6% of hardcoded cases remediated. We release our dataset, taxonomy, and detection pipeline to support future agent security research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first large-scale empirical study of credential leakage in third-party LLM agent skills, sampling 17,022 skills from the 170,226 available on SkillsMP. Using a combination of static analysis, sandbox testing, and manual inspection, the authors identify 520 vulnerable skills containing 1,708 issues and derive a taxonomy of 10 leakage patterns (4 accidental, 6 adversarial). Key findings include that 76.3% of leaks require joint code and natural-language analysis, 73.5% stem from debug logging (print/console.log exposing stdout to LLMs), 89.6% of leaked credentials are exploitable without privileges, and leaks persist in forks. After responsible disclosure, all malicious skills were removed and 91.6% of hardcoded credentials were fixed. The authors release the dataset, taxonomy, and detection pipeline.
Significance. If the empirical pipeline and counts hold, the work establishes a concrete baseline for credential-leakage prevalence in LLM agents and supplies a reusable taxonomy plus open artifacts that directly support follow-on detection research and platform policy. The cross-modal emphasis and persistence-in-forks observation are particularly actionable for agent runtime design.
major comments (2)
- [Methodology] Methodology section: the sampling procedure used to obtain the 17,022-skill subset from the full 170,226 is described only at the aggregate level; without explicit randomness, stratification, or exclusion criteria, it is impossible to evaluate whether the reported 520 vulnerable skills and 3.05% rate are representative or subject to selection bias.
- [Results and Taxonomy] Results and Taxonomy sections: the classification rules that map the 1,708 raw issues into the 10-pattern taxonomy (including the 76.3% cross-modal and 3.1% pure-prompt-injection splits) are not stated with sufficient operational detail; this leaves the boundary between accidental and adversarial categories open to post-hoc adjustment.
minor comments (2)
- [Abstract] Abstract: the statement that 'all malicious skills were removed' would be more informative if the absolute count of malicious skills were supplied alongside the 520 total vulnerable skills.
- [Figures and Tables] Figure and table captions: several summary tables listing per-pattern counts or exploitability percentages lack explicit column definitions or confidence intervals, reducing immediate interpretability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment. We address each major comment below and will revise the manuscript to provide greater operational detail on sampling and classification.
read point-by-point responses
-
Referee: [Methodology] Methodology section: the sampling procedure used to obtain the 17,022-skill subset from the full 170,226 is described only at the aggregate level; without explicit randomness, stratification, or exclusion criteria, it is impossible to evaluate whether the reported 520 vulnerable skills and 3.05% rate are representative or subject to selection bias.
Authors: We agree that the sampling description was insufficiently detailed. In the revised manuscript we will expand the Methodology section to state that the 17,022 skills were obtained by uniform random sampling from the full 170,226 skills on SkillsMP (using a fixed random seed for reproducibility), with exclusion criteria limited to skills that failed to parse due to encoding errors or exceeded a 100k-token size limit. No stratification was applied. We will also add the exact sampling parameters, a short discussion of representativeness, and a note on why random sampling was chosen over stratified approaches. revision: yes
-
Referee: [Results and Taxonomy] Results and Taxonomy sections: the classification rules that map the 1,708 raw issues into the 10-pattern taxonomy (including the 76.3% cross-modal and 3.1% pure-prompt-injection splits) are not stated with sufficient operational detail; this leaves the boundary between accidental and adversarial categories open to post-hoc adjustment.
Authors: We accept that the classification rules lacked sufficient operational specificity. In the revision we will insert a dedicated subsection in the Taxonomy section that defines the decision criteria explicitly: an issue is labeled cross-modal if it requires joint inspection of code and natural-language prompts; it is accidental if the leakage stems from standard debugging constructs (e.g., print/console.log) without any exfiltration-oriented prompt engineering; it is adversarial if the skill contains deliberate patterns such as prompt-injection templates or hidden exfiltration instructions. We will report inter-annotator agreement (Cohen’s kappa), provide a decision flowchart, and include one canonical example per pattern to make the mapping reproducible and the accidental/adversarial boundary unambiguous. revision: yes
Circularity Check
No significant circularity
full rationale
This is a purely empirical measurement study. The authors sample 17,022 skills from SkillsMP, apply static analysis plus sandbox testing plus manual review, count 520 vulnerable skills and 1,708 issues, and induce a 10-pattern taxonomy directly from the observed cases. No equations, fitted parameters, derivations, or self-referential definitions appear. The taxonomy is an inductive classification of detected patterns rather than a renaming or self-definition of the input data. No load-bearing self-citations or uniqueness theorems are invoked. The pipeline is externally falsifiable against the released dataset and detection code.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Static analysis combined with sandbox execution can detect credential leakage patterns in skill code and prompts.
Forward citations
Cited by 9 Pith papers
-
Cloak and Detonate: Scanner Evasion and Dynamic Detection of Agent Skill Malware
SkillCloak evades existing static scanners for agent skill malware at high rates, while SkillDetonate detects 97% of attacks at 2% false-positive rate using sandboxed runtime behavior analysis.
-
From Registry to Repository: How AI Agent Skills Are Written, Adapted, and Maintained
Empirical study of 41k+ AI agent skills finds reuse is mostly one-time verbatim copying with 53% never modified afterward and maintenance focused on additive local adaptations.
-
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
-
Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification
Vera automates safety testing for LLM agents via literature-driven risk taxonomies, combinatorial case generation, and evidence-grounded verification in isolated environments, showing 93.9% average attack success on f...
-
SeClaw: Spec-Driven Security Task Synthesis for Evaluating Autonomous Agents
SeClaw provides spec-driven synthesis of security tasks and an execution-based docker testbed for evaluating unsafe behaviors in autonomous LLM agents.
-
When Safe Skills Collide: Measuring Compositional Risk in Agent Skill Ecosystems
About 18.2% of structurally flagged skill pairs represent genuine compositional safety risks in agent skill registries, with exploitation gated by host model behavior.
-
Harnessing Agent Skills: Architectural Patterns and a Reference Architecture for Skill-Mediated LLM Agents
Catalogs ten patterns and synthesizes a four-layer reference architecture for skill harnessing in LLM agents, evaluated via cross-instantiation on eight systems.
-
AgentTrap: Measuring Runtime Trust Failures in Third-Party Agent Skills
AgentTrap shows that current LLM agents typically complete user tasks while silently accepting unsafe side effects from malicious third-party skills rather than refusing them.
-
RouteGuard: Internal-Signal Detection of Skill Poisoning in LLM Agents
RouteGuard uses response-conditioned attention and hidden-state alignment to detect skill poisoning in LLM agents, achieving 0.8834 F1 on Skill-Inject benchmarks and recovering 90.51% of attacks missed by lexical screening.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.