AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

· 2026 · cs.AI · arXiv 2605.29801

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Modern open-world agents such as OpenClaw exhibit powerful cross-environment execution capabilities yet introduce broad new safety risk sources. Meanwhile, advanced frontier AI models drastically lower attack barriers, rendering current agent alignment frameworks inadequate for real-world deployment. To tackle these emerging threats, we propose a lightweight and scalable agent safety alignment framework. Specifically, we update the agent safety taxonomy to accommodate emergent risks from Codex and OpenClaw execution scenarios. We further build a taxonomy-guided data engine with influence-function purification to train lightweight AgentDoG 1.5 variants (0.8B, 2B, 4B, and 8B parameters) using only around 1k samples, achieving comparable performance with leading closed-source models (e.g., GPT-5.4). Based on AgentDoG 1.5, we construct a highly efficient agentic safety SFT and RL training environment, which reduces deployment overhead in Docker-level environments by two orders of magnitude. Finally, we deploy AgentDoG 1.5 as a training-free online guardrail for real-time safety moderation. Extensive experimental results indicate that AgentDoG 1.5 achieves state-of-the-art performance in diverse and complex interactive agentic scenarios. All models and datasets are openly released.

representative citing papers

Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification

cs.AI · 2026-07-02 · conditional · novelty 6.0

Vera automates safety testing for LLM agents via literature-driven risk taxonomies, combinatorial case generation, and evidence-grounded verification in isolated environments, showing 93.9% average attack success on four frameworks.

citing papers explorer

Showing 1 of 1 citing paper.

Safety Testing LLM Agents at Scale: From Risk Discovery to Evidence-Grounded Verification cs.AI · 2026-07-02 · conditional · none · ref 50 · internal anchor
Vera automates safety testing for LLM agents via literature-driven risk taxonomies, combinatorial case generation, and evidence-grounded verification in isolated environments, showing 93.9% average attack success on four frameworks.

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

fields

years

verdicts

representative citing papers

citing papers explorer