pith. sign in
Pith Number

pith:MCKA6H4H

pith:2026:MCKA6H4H2AIAJPSHZLQFCJRSP2
not attested not anchored not stored refs resolved

Deep Delta Learning

Mengdi Wang, Quanquan Gu, Yifan Zhang, Yifeng Liu

Deep Delta Learning lets Transformer layers selectively rewrite residual content instead of only adding to it.

arxiv:2601.00417 v3 · 2026-01-01 · cs.LG · cs.AI · cs.CL · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{MCKA6H4H2AIAJPSHZLQFCJRSP2}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Controlled pretraining and downstream evaluations show that residual rewrite operations improve language modeling quality relative to pure additive accumulation introduced in ResNet.

C2weakest assumption

That the learned directions, target values, and gates can reliably identify and correct obsolete or conflicting residual content without introducing training instability or degrading the identity path.

C3one line summary

Deep Delta Learning replaces additive residual updates with a gated delta-rule that selectively overwrites residual content along learned directions, improving language modeling quality over standard ResNet-style accumulation.

References

20 extracted · 20 resolved · 7 Pith anchors

[1] Hoft: Householder orthogonal fine-tuning.arXiv preprint arXiv:2505.16531,
[2] N-ode transformer: A depth-adaptive variant of the transformer using neural ordinary differential equations.arXiv preprint arXiv:2010.11358, 2010
[3] BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions 1905 · arXiv:1905.10044
[4] arXiv preprint arXiv:2201.12133 , year=
[5] Chaos meets attention: Transformers for large-scale dynamical prediction.arXiv preprint arXiv:2504.20858,

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-05-18T03:10:11.456646Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

60940f1f87d01004be47cae05126327ea06b8e6308ba65963df8d7e917432728

Aliases

arxiv: 2601.00417 · arxiv_version: 2601.00417v3 · doi: 10.48550/arxiv.2601.00417 · pith_short_12: MCKA6H4H2AIA · pith_short_16: MCKA6H4H2AIAJPSH · pith_short_8: MCKA6H4H
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/MCKA6H4H2AIAJPSHZLQFCJRSP2 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 60940f1f87d01004be47cae05126327ea06b8e6308ba65963df8d7e917432728
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "875e5e10e3508eb1b9efb7e96a35e54a6142a36079e6a9bfac28b6f18446df04",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL",
      "cs.CV"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-01T18:11:38Z",
    "title_canon_sha256": "7b8a6309ac3992381c549c47e19a9b881f02eec4e1a89f46cfda0566c3dba6ab"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.00417",
    "kind": "arxiv",
    "version": 3
  }
}