Pith Number

pith:AC7PUYKR

pith:2026:AC7PUYKRJCTNVBHX5SL4XDF2FY

not attested not anchored not stored refs resolved

BackFlush: Knowledge-Free Backdoor Detection and Elimination with Watermark Preservation in Large Language Models

Amit Shukla, Jagadeesh Rachapudi, Praful Hambarde, Pranav Singh, Ritali Vatsi

BackFlush detects unknown backdoors in LLMs by amplifying susceptibility and flushes them via embedding rotation while preserving watermarks and clean accuracy.

arxiv:2605.12529 v1 · 2026-04-15 · cs.CR

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{AC7PUYKRJCTNVBHX5SL4XDF2FY}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

BackFlush achieves approximately 1% Attack Success Rate (ASR), approximately 99% clean accuracy (CACC), and preserved watermarking capabilities in the realm where no existing method simultaneously provides these alongside maintaining model utility comparable to clean baselines.

C2weakest assumption

The Backdoor Flushing Phenomenon and Backdoor Susceptibility Amplification are assumed to hold generally for unknown backdoors across trigger types and architectures, and that RoPE Unlearning selectively removes backdoors without damaging watermarks.

C3one line summary

BackFlush detects backdoors via susceptibility amplification and eliminates them with RoPE unlearning to reach 1% ASR and 99% clean accuracy while preserving watermarks.

References

42 extracted · 42 resolved · 8 Pith anchors

[1] Large language models (llms): survey, technical frameworks, and future challenges, 2024

[2] Look before you leap: An exploratory study of uncertainty measurement for large language models.arXiv preprint arXiv:2307.10236 2023

[3] Putting people in llms’ shoes: Generating better answers via question rewriter, 2025

[4] Can multiple-choice questions really be useful in detecting the abilities of LLMs? , url= 2024

[5] Bid-lora: A parameter-efficient framework for continual learning and unlearning,

Receipt and verification

First computed	2026-05-18T03:10:02.713155Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7

Aliases

arxiv: 2605.12529 · arxiv_version: 2605.12529v1 · doi: 10.48550/arxiv.2605.12529 · pith_short_12: AC7PUYKRJCTN · pith_short_16: AC7PUYKRJCTNVBHX · pith_short_8: AC7PUYKR

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/AC7PUYKRJCTNVBHX5SL4XDF2FY \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 00befa615148a6da84f7ec97cb8cba2e0255a8b3e7f5f062339ff38f845bfdd7

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4aeb2244978a6f5a93423de82798361812a71c78022bd3ce63e941bda41fd56d",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/publicdomain/zero/1.0/",
    "primary_cat": "cs.CR",
    "submitted_at": "2026-04-15T10:56:08Z",
    "title_canon_sha256": "d55db2b9b1390a1948db5b93f15473c1631d71cf44af6418b800d987fe39cdec"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.12529",
    "kind": "arxiv",
    "version": 1
  }
}