Pith Number

pith:MCKA6H4H

pith:2026:MCKA6H4H2AIAJPSHZLQFCJRSP2

not attested not anchored not stored refs resolved

Deep Delta Learning

Mengdi Wang, Quanquan Gu, Yifan Zhang, Yifeng Liu

Deep Delta Learning lets Transformer layers selectively rewrite residual content instead of only adding to it.

arxiv:2601.00417 v3 · 2026-01-01 · cs.LG · cs.AI · cs.CL · cs.CV

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{MCKA6H4H2AIAJPSHZLQFCJRSP2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Controlled pretraining and downstream evaluations show that residual rewrite operations improve language modeling quality relative to pure additive accumulation introduced in ResNet.

C2weakest assumption

That the learned directions, target values, and gates can reliably identify and correct obsolete or conflicting residual content without introducing training instability or degrading the identity path.

C3one line summary

Deep Delta Learning replaces additive residual updates with a gated delta-rule that selectively overwrites residual content along learned directions, improving language modeling quality over standard ResNet-style accumulation.

References

20 extracted · 20 resolved · 7 Pith anchors

[1] Hoft: Householder orthogonal fine-tuning.arXiv preprint arXiv:2505.16531,

[2] N-ode transformer: A depth-adaptive variant of the transformer using neural ordinary differential equations.arXiv preprint arXiv:2010.11358, 2010

[3] BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions 1905 · arXiv:1905.10044

[4] arXiv preprint arXiv:2201.12133 , year=

[5] Chaos meets attention: Transformers for large-scale dynamical prediction.arXiv preprint arXiv:2504.20858,

Cited by

2 papers in Pith

Attention Residuals

Delta Attention Residuals

Receipt and verification

First computed	2026-05-18T03:10:11.456646Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

60940f1f87d01004be47cae05126327ea06b8e6308ba65963df8d7e917432728

Aliases

arxiv: 2601.00417 · arxiv_version: 2601.00417v3 · doi: 10.48550/arxiv.2601.00417 · pith_short_12: MCKA6H4H2AIA · pith_short_16: MCKA6H4H2AIAJPSH · pith_short_8: MCKA6H4H

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/MCKA6H4H2AIAJPSHZLQFCJRSP2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 60940f1f87d01004be47cae05126327ea06b8e6308ba65963df8d7e917432728

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "875e5e10e3508eb1b9efb7e96a35e54a6142a36079e6a9bfac28b6f18446df04",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-01T18:11:38Z",
    "title_canon_sha256": "7b8a6309ac3992381c549c47e19a9b881f02eec4e1a89f46cfda0566c3dba6ab"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.00417",
    "kind": "arxiv",
    "version": 3
  }
}