{"paper":{"title":"SkillSafetyBench: Evaluating Agent Safety under Skill-Facing Attack Surfaces","license":"http://creativecommons.org/licenses/by/4.0/","headline":"SkillSafetyBench shows that attacks on reusable skills can induce unsafe actions in LLM agents even from benign user requests.","cross_cats":["cs.AI","cs.CL","cs.LG","cs.MA"],"primary_cat":"cs.CR","authors_text":"An Wang, Biaojie Zeng, Chang Jin, Chao Yang, Jingjing Qu, Kai Wang, Qiaosheng Zhang, Xia Hu, Xingcheng Xu, Zeming Wei","submitted_at":"2026-05-12T12:03:54Z","abstract_excerpt":"Reusable skills are becoming a common interface for extending large language model agents, packaging procedural guidance with access to files, tools, memory, and execution environments. However, this modularity introduces attack surfaces that are largely missed by existing safety evaluations: even when the user request is benign, unsafe influence may reside in skill guidance, local artifacts, or execution-environment files that steer the agent toward unsafe actions. We present SkillSafetyBench, a runnable benchmark for evaluating such skill-mediated safety failures. SkillSafetyBench includes 1"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The 155 adversarial cases and rule-based verifiers accurately capture real-world skill-mediated safety failures without over- or under-counting due to case construction or verifier limitations.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"SkillSafetyBench shows that attacks on reusable skills can induce unsafe actions in LLM agents even from benign user requests.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"1c22731bcffacc0854443ac7afbf5e4e2529c54a06413912f51e96929838df85"},"source":{"id":"2605.12015","kind":"arxiv","version":2},"verdict":{"id":"6e4f449d-25cd-42c2-898d-01cb91e2fd4b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-13T05:04:22.667508Z","strongest_claim":"Experiments with multiple CLI agents and model backends show that localized non-user attacks can consistently induce unsafe behavior, with distinct failure patterns across domains, attack methods, and scaffold-model pairings.","one_line_summary":"SkillSafetyBench shows that localized non-user attacks via skills and artifacts can consistently induce unsafe agent behavior across domains and model backends, independent of user intent.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The 155 adversarial cases and rule-based verifiers accurately capture real-world skill-mediated safety failures without over- or under-counting due to case construction or verifier limitations.","pith_extraction_headline":"SkillSafetyBench shows that attacks on reusable skills can induce unsafe actions in LLM agents even from benign user requests."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.12015/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-26T15:44:35.339872Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-20T17:31:26.539383Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-20T11:45:23.644820Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-20T03:22:00.082946Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"eef78b037229623e4ff7bf7ec3e9fe920a9ad32a81f648e41acefd0fa4d7ff48"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":1,"snapshot_sha256":"69fa46f10fd20b90b7d887dd24ffdd75816029faea1016a61d8551659411d692"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}