pith. sign in
Pith Number

pith:TDD5C6FQ

pith:2025:TDD5C6FQIQ4X3H4IE6PURMOGXH
not attested not anchored not stored refs pending

Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs

Kai Jun Teh, Qibing Ren, Shei Pern Chua, Xiao Li, Xiaolin Hu, Zhen Leng Thai

Ethical reasoning in LLMs opens a vulnerability where harmful requests framed as moral dilemmas can bypass safety alignments.

arxiv:2509.05367 v5 · 2025-09-04 · cs.CR · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{TDD5C6FQIQ4X3H4IE6PURMOGXH}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TRIAL achieves high attack success rates across most tested models by systematically exploiting the model's ethical reasoning capabilities to frame harmful actions as morally necessary compromises.

C2weakest assumption

That ethical reasoning responses can be reliably partitioned into instrumental (enabling harm) versus explanatory (analyzing without endorsing) categories in a way that preserves overall model utility and does not introduce new failure modes.

C3one line summary

Introduces TRIAL, a multi-turn red-teaming method exploiting ethical reasoning to achieve high attack success on LLMs, and ERR, a Layer-Stratified Harm-Gated LoRA defense that separates instrumental harmful responses from explanatory ethical analysis.

Formal links

1 machine-checked theorem link

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-06-02T01:03:33.490546Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

98c7d178b044397d9f88279f48b1c6b9d07178cd15c9eb28960f3c0d3a95af6a

Aliases

arxiv: 2509.05367 · arxiv_version: 2509.05367v5 · doi: 10.48550/arxiv.2509.05367 · pith_short_12: TDD5C6FQIQ4X · pith_short_16: TDD5C6FQIQ4X3H4I · pith_short_8: TDD5C6FQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/TDD5C6FQIQ4X3H4IE6PURMOGXH \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 98c7d178b044397d9f88279f48b1c6b9d07178cd15c9eb28960f3c0d3a95af6a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "bcb52421c65cfe9940fb14955c76d366fe164009c631c90d1c1eec156d1ec547",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2025-09-04T05:53:20Z",
    "title_canon_sha256": "971f2c64a9c37d9f7d08daf341bddcd43c5bc64cd862cd7ff99777ab4d00af26"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2509.05367",
    "kind": "arxiv",
    "version": 5
  }
}