ProjGuard monitors agent trajectories with low-dimensional projections to cut unsafe actions from 16% to 3% and raise task completion from 59% to 65% on OS-Harm.
Stepshield: When, not whether to intervene on rogue agents
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
stat.CO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ProjGuard: Safety Monitoring for Computer-Use Agents via Low-Dimensional Projections
ProjGuard monitors agent trajectories with low-dimensional projections to cut unsafe actions from 16% to 3% and raise task completion from 59% to 65% on OS-Harm.