pith. sign in

De- fenses against prompt attacks learn surface heuristics,

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.AI 1 cs.CR 1

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Agent Safety Is Action Alignment

cs.AI · 2026-06-27 · unverdicted · novelty 6.0

Agent safety cannot be achieved via model refusal training and instead requires external least-privilege enforcement evaluated as action alignment.

citing papers explorer

Showing 2 of 2 citing papers after filters.