pith. sign in
Pith Number

pith:AC7PUYKR

pith:2026:AC7PUYKRJCTNVBHX5SL4XDF2FY
not attested not anchored not stored refs resolved

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

Amit Shukla, Jagadeesh Rachapudi, Praful Hambarde, Pranav Singh, Ritali Vatsi

BackFlush detects unknown backdoors in LLMs by amplifying susceptibility and flushes them via embedding rotation while preserving watermarks and clean accuracy.

arxiv:2605.12529 v1 · 2026-04-15 · cs.CR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{AC7PUYKRJCTNVBHX5SL4XDF2FY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

BackFlush achieves approximately 1% Attack Success Rate (ASR), approximately 99% clean accuracy (CACC), and preserved watermarking capabilities in the realm where no existing method simultaneously provides these alongside maintaining model utility comparable to clean baselines.

C2weakest assumption

The Backdoor Flushing Phenomenon and Backdoor Susceptibility Amplification are assumed to hold generally for unknown backdoors across trigger types and architectures, and that RoPE Unlearning selectively removes backdoors without damaging watermarks.

C3one line summary

BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.

References

42 extracted · 42 resolved · 8 Pith anchors

[1] Large language models (llms): survey, technical frameworks, and future challenges, 2024
[2] Look before you leap: An exploratory study of uncertainty measurement for large language models.arXiv preprint arXiv:2307.10236 2023
[3] Putting people in llms’ shoes: Generating better answers via question rewriter, 2025
[4] Can multiple-choice questions really be useful in detecting the abilities of LLMs? , url= 2024
[5] Bid-lora: A parameter-efficient framework for continual learning and unlearning,
Receipt and verification
First computed 2026-05-18T03:10:02.713155Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7

Aliases

arxiv: 2605.12529 · arxiv_version: 2605.12529v1 · doi: 10.48550/arxiv.2605.12529 · pith_short_12: AC7PUYKRJCTN · pith_short_16: AC7PUYKRJCTNVBHX · pith_short_8: AC7PUYKR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/AC7PUYKRJCTNVBHX5SL4XDF2FY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4aeb2244978a6f5a93423de82798361812a71c78022bd3ce63e941bda41fd56d",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-04-15T10:56:08Z",
    "title_canon_sha256": "d55db2b9b1390a1948db5b93f15473c1631d71cf44af6418b800d987fe39cdec"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12529",
    "kind": "arxiv",
    "version": 1
  }
}