pith. sign in

hub

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it
abstract

The rise of AI agents introduces complex safety and security challenges arising from autonomous tool use and environmental interactions. Current guardrail models lack agentic risk awareness and transparency in risk diagnosis. To introduce an agentic guardrail that covers complex and numerous risky behaviors, we first propose a unified three-dimensional taxonomy that orthogonally categorizes agentic risks by their source (where), failure mode (how), and consequence (what). Guided by this structured and hierarchical taxonomy, we introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG). AgentDoG provides fine-grained and contextual monitoring across agent trajectories. More Crucially, AgentDoG can diagnose the root causes of unsafe actions and seemingly safe but unreasonable actions, offering provenance and transparency beyond binary labels to facilitate effective agent alignment. AgentDoG variants are available in three sizes (4B, 7B, and 8B parameters) across Qwen and Llama model families. Extensive experimental results demonstrate that AgentDoG achieves state-of-the-art performance in agentic safety moderation in diverse and complex interactive scenarios. All models and datasets are openly released.

hub tools

citation-role summary

background 3

citation-polarity summary

years

2026 12

roles

background 3

polarities

background 2 support 1

representative citing papers

HearthNet: Edge Multi-Agent Orchestration for Smart Homes

cs.DC · 2026-03-16 · unverdicted · novelty 6.0

HearthNet is an edge multi-agent orchestration system that runs role-specialized LLM agents locally to handle natural-language smart-home control, conflict resolution, and failure recovery through MQTT and shared state.

Security Considerations for Multi-agent Systems

cs.CR · 2026-03-09 · unverdicted · novelty 6.0

No existing AI security framework covers a majority of the 193 identified multi-agent system threats in any category, with OWASP Agentic Security Initiative achieving the highest overall coverage at 65.3%.

citing papers explorer

Showing 12 of 12 citing papers.