pith. sign in
Pith Number

pith:V5OHABJB

pith:2026:V5OHABJBMTHZ2LU4YCMN3YXRGC
not attested not anchored not stored refs resolved

Defenses at Odds: Measuring and Explaining Defense Conflicts in Large Language Models

Chuanchao Zang, Jianing Wang, Li Wang, Shanqing Guo, Wenyu Chen, Xiangtao Meng, Xinyu Gao, Zheng Li

Sequential safety defenses on large language models often undermine earlier protections in 38.9 percent of deployment orders.

arxiv:2605.14514 v1 · 2026-05-14 · cs.CR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{V5OHABJBMTHZ2LU4YCMN3YXRGC}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

38.9% of 144 ordered sequences exhibit measurable risk exacerbation on the originally defended dimension. These interactions are highly asymmetric and order-dependent.

C2weakest assumption

That the chosen risk dimensions, evaluation metrics, and sequential patching without retraining accurately model real-world multi-defense deployment and capture true safety properties.

C3one line summary

Sequential LLM defense deployment leads to risk exacerbation in 38.9% of cases due to anti-aligned updates in shared critical layers, addressed by conflict-guided layer freezing.

References

57 extracted · 57 resolved · 14 Pith anchors

[1] Deep learning with differential privacy 2016
[2] Constitutional AI: Harmlessness from AI Feedback 2022 · arXiv:2212.08073
[3] Machine un- learning 2021
[4] Extracting training data from large language mod- els 2021
[5] Safe RLHF: Safe Reinforcement Learning from Human Feedback 2023 · arXiv:2310.12773

Formal links

2 machine-checked theorem links

Receipt and verification
First computed 2026-05-17T23:39:06.154548Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

af5c70052164cf9d2e9cc098dde2f13094583a4dc8a6b9b317c5583b7ce3673b

Aliases

arxiv: 2605.14514 · arxiv_version: 2605.14514v1 · doi: 10.48550/arxiv.2605.14514 · pith_short_12: V5OHABJBMTHZ · pith_short_16: V5OHABJBMTHZ2LU4 · pith_short_8: V5OHABJB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/V5OHABJBMTHZ2LU4YCMN3YXRGC \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: af5c70052164cf9d2e9cc098dde2f13094583a4dc8a6b9b317c5583b7ce3673b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "14656f567a83f12c672bb60f699e005b12149747998e20e2304550abf590edd9",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-05-14T07:58:47Z",
    "title_canon_sha256": "6a874cabc0ac37f88231d9ef0228f568917a81b0d931f95e85919befbe2d9a69"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.14514",
    "kind": "arxiv",
    "version": 1
  }
}