pith:2U7RVSTI
Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability
LLMs follow a U-curve in criminal bias: strong in small models, removed by instruction tuning, and restored by reasoning distillation at the same scale.
arxiv:2605.03217 v2 · 2026-05-04 · cs.LG · cs.CY
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{2U7RVSTIX3OCS7G3XDRD66NW7A}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Circuit-level analysis reveals a U-curve of bias: SLMs exhibit strong criminal bias; scaling to instruction-tuned models eliminates it; reasoning distillation reintroduces bias to SLM-like levels despite identical parameter counts, suggesting distillation compresses reasoning traces in ways that reactivate shallow statistical associations.
That the chosen criminal-bias scenarios and interpretability probes (logit lens, attention analysis, activation patching, semantic probing) isolate bias circuits without confounding from prompt wording, model scale, or other unmeasured factors.
LLMs exhibit context-sensitive moral bias with model-specific patterns; mechanistic analysis shows a U-curve in which instruction tuning removes bias but reasoning distillation reintroduces it despite unchanged size.
Formal links
Receipt and verification
| First computed | 2026-06-05T01:14:39.972307Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
d53f1aca68bedc297cdbb8e23f79b6f83fe8f3ff51a5e6cfbeb572dcbe7c6078
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/2U7RVSTIX3OCS7G3XDRD66NW7A \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d53f1aca68bedc297cdbb8e23f79b6f83fe8f3ff51a5e6cfbeb572dcbe7c6078
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f551c6ae015815f3ec2a903e619a098b646a1c74c3f608a740f3bd91a991dc95",
"cross_cats_sorted": [
"cs.CY"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-04T23:12:32Z",
"title_canon_sha256": "3052527ba04e593aad9f68352d09a30bfa650a0a3c9eeac963323c24ad3cc7ca"
},
"schema_version": "1.0",
"source": {
"id": "2605.03217",
"kind": "arxiv",
"version": 2
}
}