← back to paper
arxiv: 2606.31644 · 2 revisions
Moral Safety in LLMs: Exposing Performative Compliance with Puzzled Cues