Learning Efficient Guardrails for Compliance

Muhao Chen; Peng Qi; Wenjie Jacky Mo; Xiaofei Wen; Yanan Xie

arxiv: 2510.03485 · v2 · pith:OKPXZKYJnew · submitted 2025-10-03 · 💻 cs.AI

Learning Efficient Guardrails for Compliance

Xiaofei Wen , Wenjie Jacky Mo , Yanan Xie , Peng Qi , Muhao Chen This is my paper

classification 💻 cs.AI

keywords compliancedetectionguardrailshighmodeltasksabilityaccuracy

0 comments

read the original abstract

Autonomous web agents are increasingly deployed for long-horizon tasks, yet their ability to adhere to real-world policies remains critically underexplored compared to standard safety objectives. To address this gap, we introduce PolicyGuardBench, a benchmark of 60k policy-trajectory pairs designed to evaluate compliance through both full-trajectory and novel prefix-based violation detection tasks. Using this dataset, we train PolicyGuard, a lightweight guardrail model that achieves strong detection accuracy while maintaining high inference efficiency. Notably, our model demonstrates robust generalization capabilities, preserving high performance even on unseen domains. These contributions establish a comprehensive framework for studying policy compliance, showing that accurate and generalizable guardrails are feasible at small scales.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LPG: Balancing Efficiency and Policy Reasoning in Latent Policy Guardrails
cs.CR 2026-05 conditional novelty 6.0

LPG compresses policy deliberation into 10 latent tokens to reach 84.5% safety accuracy and 11x speedup over explicit reasoning baselines on guardrail benchmarks.