hub Canonical reference

Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration

URLhttps://arxiv · 2026 · arXiv 2603.21019

Canonical reference. 80% of citing Pith papers cite this work as background.

14 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 1

citation-polarity summary

background 4 baseline 1

representative citing papers

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents?

cs.CR · 2026-04-16 · unverdicted · novelty 8.0

Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.

SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.

No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills

cs.CR · 2026-05-13 · unverdicted · novelty 7.0

Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.

Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems

cs.CR · 2026-05-12 · unverdicted · novelty 7.0

Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.

Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills

cs.CR · 2026-05-10 · unverdicted · novelty 7.0

Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.

Sealing the Audit-Runtime Gap for LLM Skills

cs.CR · 2026-05-06 · unverdicted · novelty 7.0

SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.

Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw

cs.CR · 2026-05-11 · unverdicted · novelty 6.0

DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.

SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills

cs.CR · 2026-05-07 · unverdicted · novelty 6.0

SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

cs.SE · 2026-04-09 · accept · novelty 5.0

LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.

Responsible Agentic AI Requires Explicit Provenance

cs.AI · 2026-05-16 · unverdicted · novelty 4.0

Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.

Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes

cs.CR · 2026-05-01 · unverdicted · novelty 4.0

Proposes a trust schema including verification levels and a biconditional correctness criterion to verify skills in human-in-the-loop agent runtimes, reducing the need for constant oversight.

SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills

cs.CR · 2026-04-08

citing papers explorer

Showing 14 of 14 citing papers.

HarmfulSkillBench: How Do Harmful Skills Weaponize Your Agents? cs.CR · 2026-04-16 · unverdicted · none · ref 24
Harmful skills in open agent ecosystems raise average harm scores from 0.27 to 0.76 across six LLMs by lowering refusal rates when tasks are presented via pre-installed skills.
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents cs.CL · 2026-05-16 · unverdicted · none · ref 7
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
SMMBench: A Benchmark for Source-Distributed Multimodal Agent Memory cs.CL · 2026-05-15 · unverdicted · none · ref 4
SMMBench is a benchmark evaluating multimodal agents on cross-source reasoning, conflict resolution, preference reasoning, and action prediction, showing current systems struggle with evidence distributed across heterogeneous sources.
No Attack Required: Semantic Fuzzing for Specification Violations in Agent Skills cs.CR · 2026-05-13 · unverdicted · none · ref 15
Sefz discovers specification violations in 29.9% of 402 real-world agent skills by translating guardrails into reachability goals and guiding LLM mutations with a multi-armed bandit.
Proteus: A Self-Evolving Red Team for Agent Skill Ecosystems cs.CR · 2026-05-12 · unverdicted · none · ref 9
Proteus demonstrates that adaptive red-teaming achieves 40-90% attack success after five rounds and bypasses even strong auditors at up to 41% joint success, revealing that static skill vetting underestimates residual risk.
Trust Me, Import This: Dependency Steering Attacks via Malicious Agent Skills cs.CR · 2026-05-10 · unverdicted · none · ref 46
Malicious Skills induce coding agents to hallucinate and import attacker-controlled packages at high rates while evading detection.
Sealing the Audit-Runtime Gap for LLM Skills cs.CR · 2026-05-06 · unverdicted · none · ref 9
SIGIL cryptographically seals the audit-runtime gap for LLM skills via an on-chain registry with four publication types, DAO vetting, and a runtime verification loader that enforces integrity and permissions.
Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw cs.CR · 2026-05-11 · unverdicted · none · ref 5
DeepTrap automates discovery of contextual vulnerabilities in OpenClaw agents via trajectory optimization, showing that unsafe behavior can be induced while preserving task completion and that final-response checks are insufficient.
Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents cs.CL · 2026-05-11 · unverdicted · none · ref 6
Malicious actors could use AI agents to submit large numbers of fake papers, inflating the submission count and thereby raising the acceptance odds for a small set of chosen legitimate papers under stable conference acceptance rates.
SkillScope: Toward Fine-Grained Least-Privilege Enforcement for Agent Skills cs.CR · 2026-05-07 · unverdicted · none · ref 20
SkillScope detects over-privileged LLM agent skills with 94.53% F1 score via graph analysis and replay validation, finding 7,039 problematic skills in the wild and reducing violations by 88.56% while preserving task completion.
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering cs.SE · 2026-04-09 · accept · none · ref 52
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
Responsible Agentic AI Requires Explicit Provenance cs.AI · 2026-05-16 · unverdicted · none · ref 23
Explicit provenance across the full agentic AI lifecycle is the necessary condition for making responsibility computable and actionable.
Skills as Verifiable Artifacts: A Trust Schema and a Biconditional Correctness Criterion for Human-in-the-Loop Agent Runtimes cs.CR · 2026-05-01 · unverdicted · none · ref 11
Proposes a trust schema including verification levels and a biconditional correctness criterion to verify skills in human-in-the-loop agent runtimes, reducing the need for constant oversight.
SkillSieve: A Hierarchical Triage Framework for Detecting Malicious AI Agent Skills cs.CR · 2026-04-08 · unreviewed · ref 11

Skillprobe: Security auditing for emerging agent skill marketplaces via multi-agent collaboration

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer