pith. sign in
Pith Number

pith:C6EUDN3U

pith:2026:C6EUDN3U25TMIQ5TUX2KTOBO64
not attested not anchored not stored refs pending

Mechanistic Anomaly Detection via Functional Attribution

Christopher Leckie, Hugo Lyons Keenan, Sarah Erfani

A neural network's output can be checked for anomalous internal mechanisms by measuring how much it depends on a small trusted reference set.

arxiv:2604.18970 v2 · 2026-04-21 · cs.LG · cs.CR

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C6EUDN3U25TMIQ5TUX2KTOBO64}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We reframe MAD as a functional attribution problem: asking to what extent samples from a trusted set can explain the model's output, where attribution failure signals anomalous behavior. We operationalize this using influence functions... For backdoors in vision models, our method achieves state-of-the-art detection on BackdoorBench, with an average Defense Effectiveness Rating (DER) of 0.93 across seven attacks and four datasets (next best 0.83).

C2weakest assumption

That failure of influence-function-based attribution to a trusted reference set reliably indicates anomalous internal mechanisms rather than other causes such as high model uncertainty or distribution shift.

C3one line summary

Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.

Receipt and verification
First computed 2026-05-26T02:04:11.071869Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

178941b774d766c443b3a5f4a9b82ef73dbb0038b0af31a1b3d564e10ed3230f

Aliases

arxiv: 2604.18970 · arxiv_version: 2604.18970v2 · doi: 10.48550/arxiv.2604.18970 · pith_short_12: C6EUDN3U25TM · pith_short_16: C6EUDN3U25TMIQ5T · pith_short_8: C6EUDN3U
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C6EUDN3U25TMIQ5TUX2KTOBO64 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 178941b774d766c443b3a5f4a9b82ef73dbb0038b0af31a1b3d564e10ed3230f
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "6b94b944d469a720af3d789c59b1219d65cef2bbb7b5f2631fe28224c7ac6106",
    "cross_cats_sorted": [
      "cs.CR"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-04-21T01:39:57Z",
    "title_canon_sha256": "56b7e575bff4e3ee2ccf1990fe2e81f631958aa86b9fc592e8c9c4aefe84045c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.18970",
    "kind": "arxiv",
    "version": 2
  }
}