DRAFT: Task Decoupled Latent Reasoning for Agent Safety

· 2026 · cs.LG · arXiv 2604.03242

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

The advent of tool-using LLM agents shifts safety monitoring from output moderation to auditing long, noisy interaction trajectories, where risk-critical evidence is sparse-making standard binary supervision poorly suited for credit assignment. To address this, we propose DRAFT (Task Decoupled Latent Reasoning for Agent Safety), a latent reasoning framework that decouples safety judgment into two trainable stages: an Extractor that distills the full trajectory into a compact continuous latent draft, and a Reasoner that jointly attends to the draft and the original trajectory to predict safety. DRAFT avoids lossy explicit summarize-then-judge pipelines by performing evidence aggregation in latent space, enabling end-to-end differentiable training.Across benchmarks including ASSEBench and R-Judge, DRAFT consistently outperforms strong baselines, improving accuracy from 63.27% (LoRA) to 91.18% averaged over benchmarks, and learns more separable representations. Ablations demonstrate a clear synergy between the Extractor and the Reasoner.Overall, DRAFT suggests that continuous latent reasoning prior to readout is a practical path to robust agent safety under long-context supervision with sparse evidence.

representative citing papers

LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails

cs.CR · 2026-05-17 · conditional · novelty 6.0

LPG compresses policy deliberation into 10 latent tokens to reach 84.5% safety accuracy and 11x speedup over explicit reasoning baselines on guardrail benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails cs.CR · 2026-05-17 · conditional · none · ref 31 · internal anchor
LPG compresses policy deliberation into 10 latent tokens to reach 84.5% safety accuracy and 11x speedup over explicit reasoning baselines on guardrail benchmarks.

DRAFT: Task Decoupled Latent Reasoning for Agent Safety

fields

years

verdicts

representative citing papers

citing papers explorer