SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated skills with 84.8% precision and 96.5% recall against human review.
When skills lie: Hidden-comment injection in llm agents,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.
citing papers explorer
-
Do Skill Descriptions Tell the Truth? Detecting Undisclosed Security Behaviors in Code-Backed LLM Skills
SKILLSCOPE detects undisclosed security behaviors in LLM skill implementations via security property graphs and taxonomy-based consistency checking, identifying confirmed inconsistencies in 9.4% of 4,556 evaluated skills with 84.8% precision and 96.5% recall against human review.
-
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering
LLM agent progress depends on externalizing cognitive functions into memory, skills, protocols, and harness engineering that coordinates them reliably.