pith:AC7PUYKR
BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models
BackFlush detects unknown backdoors in LLMs by amplifying susceptibility and flushes them via embedding rotation while preserving watermarks and clean accuracy.
arxiv:2605.12529 v1 · 2026-04-15 · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{AC7PUYKRJCTNVBHX5SL4XDF2FY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
BackFlush achieves approximately 1% Attack Success Rate (ASR), approximately 99% clean accuracy (CACC), and preserved watermarking capabilities in the realm where no existing method simultaneously provides these alongside maintaining model utility comparable to clean baselines.
The Backdoor Flushing Phenomenon and Backdoor Susceptibility Amplification are assumed to hold generally for unknown backdoors across trigger types and architectures, and that RoPE Unlearning selectively removes backdoors without damaging watermarks.
BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.
References
Receipt and verification
| First computed | 2026-05-18T03:10:02.713155Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AC7PUYKRJCTNVBHX5SL4XDF2FY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "4aeb2244978a6f5a93423de82798361812a71c78022bd3ce63e941bda41fd56d",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/publicdomain/zero/1.0/",
"primary_cat": "cs.CR",
"submitted_at": "2026-04-15T10:56:08Z",
"title_canon_sha256": "d55db2b9b1390a1948db5b93f15473c1631d71cf44af6418b800d987fe39cdec"
},
"schema_version": "1.0",
"source": {
"id": "2605.12529",
"kind": "arxiv",
"version": 1
}
}