pith:TDD5C6FQ
Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs
Ethical reasoning in LLMs opens a vulnerability where harmful requests framed as moral dilemmas can bypass safety alignments.
arxiv:2509.05367 v5 · 2025-09-04 · cs.CR · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TDD5C6FQIQ4X3H4IE6PURMOGXH}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
TRIAL achieves high attack success rates across most tested models by systematically exploiting the model's ethical reasoning capabilities to frame harmful actions as morally necessary compromises.
That ethical reasoning responses can be reliably partitioned into instrumental (enabling harm) versus explanatory (analyzing without endorsing) categories in a way that preserves overall model utility and does not introduce new failure modes.
Introduces TRIAL, a multi-turn red-teaming method exploiting ethical reasoning to achieve high attack success on LLMs, and ERR, a Layer-Stratified Harm-Gated LoRA defense that separates instrumental harmful responses from explanatory ethical analysis.
Formal links
Cited by
Receipt and verification
| First computed | 2026-06-02T01:03:33.490546Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
98c7d178b044397d9f88279f48b1c6b9d07178cd15c9eb28960f3c0d3a95af6a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TDD5C6FQIQ4X3H4IE6PURMOGXH \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 98c7d178b044397d9f88279f48b1c6b9d07178cd15c9eb28960f3c0d3a95af6a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "bcb52421c65cfe9940fb14955c76d366fe164009c631c90d1c1eec156d1ec547",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CR",
"submitted_at": "2025-09-04T05:53:20Z",
"title_canon_sha256": "971f2c64a9c37d9f7d08daf341bddcd43c5bc64cd862cd7ff99777ab4d00af26"
},
"schema_version": "1.0",
"source": {
"id": "2509.05367",
"kind": "arxiv",
"version": 5
}
}