26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 2representative citing papers
A 114k compositional jailbreak dataset is created, generators are fine-tuned for on-the-fly synthesis, and OPTIMUS introduces a continuous evaluator that identifies stealth-optimal regimes missed by binary attack success rates.
Safety Context Injection prepends structured external risk reports via static or agentic analysis to lower attack success rates and toxicity in reasoning models on AdvBench and GPTFuzz benchmarks.
LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
A framework detects LLM anomalies including hallucinations, jailbreaks, and backdoors by forensic inspection of layer-wise hidden state patterns, reporting over 95% accuracy with minimal computational overhead.
citing papers explorer
-
Agent Skills in the Wild: An Empirical Study of Security Vulnerabilities at Scale
26.1% of analyzed AI agent skills contain vulnerabilities across 14 patterns, with executable scripts raising risk 2.12x, based on static and LLM analysis of 31k skills.
-
The Art of the Jailbreak: Formulating Jailbreak Attacks for LLM Security Beyond Binary Scoring
A 114k compositional jailbreak dataset is created, generators are fine-tuned for on-the-fly synthesis, and OPTIMUS introduces a continuous evaluator that identifies stealth-optimal regimes missed by binary attack success rates.
-
Safety Context Injection: Inference-Time Safety Alignment via Static Filtering and Agentic Analysis
Safety Context Injection prepends structured external risk reports via static or agentic analysis to lower attack success rates and toxicity in reasoning models on AdvBench and GPTFuzz benchmarks.
-
Beyond Context: Large Language Models' Failure to Grasp Users' Intent
LLMs fail to detect hidden harmful intent, allowing systematic bypass of safety mechanisms through framing techniques, with reasoning modes often worsening the issue.
-
Exposing the Ghost in the Transformer: Abnormal Detection for Large Language Models via Hidden State Forensics
A framework detects LLM anomalies including hallucinations, jailbreaks, and backdoors by forensic inspection of layer-wise hidden state patterns, reporting over 95% accuracy with minimal computational overhead.