pith:C6EUDN3U
Mechanistic Anomaly Detection via Functional Attribution
A neural network's output can be checked for anomalous internal mechanisms by measuring how much it depends on a small trusted reference set.
arxiv:2604.18970 v2 · 2026-04-21 · cs.LG · cs.CR
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{C6EUDN3U25TMIQ5TUX2KTOBO64}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We reframe MAD as a functional attribution problem: asking to what extent samples from a trusted set can explain the model's output, where attribution failure signals anomalous behavior. We operationalize this using influence functions... For backdoors in vision models, our method achieves state-of-the-art detection on BackdoorBench, with an average Defense Effectiveness Rating (DER) of 0.93 across seven attacks and four datasets (next best 0.83).
That failure of influence-function-based attribution to a trusted reference set reliably indicates anomalous internal mechanisms rather than other causes such as high model uncertainty or distribution shift.
Functional attribution with influence functions detects anomalous mechanisms in neural networks, achieving SOTA backdoor detection (average DER 0.93) on vision benchmarks and improvements on LLMs.
Receipt and verification
| First computed | 2026-05-26T02:04:11.071869Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
178941b774d766c443b3a5f4a9b82ef73dbb0038b0af31a1b3d564e10ed3230f
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/C6EUDN3U25TMIQ5TUX2KTOBO64 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 178941b774d766c443b3a5f4a9b82ef73dbb0038b0af31a1b3d564e10ed3230f
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6b94b944d469a720af3d789c59b1219d65cef2bbb7b5f2631fe28224c7ac6106",
"cross_cats_sorted": [
"cs.CR"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-04-21T01:39:57Z",
"title_canon_sha256": "56b7e575bff4e3ee2ccf1990fe2e81f631958aa86b9fc592e8c9c4aefe84045c"
},
"schema_version": "1.0",
"source": {
"id": "2604.18970",
"kind": "arxiv",
"version": 2
}
}