Pith Number

pith:RCPETT5H

pith:2024:RCPETT5HFXXTYZCIDGIVXYAWDV

not attested not anchored not stored refs resolved

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Arjun Vikram, Carlos Guestrin, Genghan Zhang, Jiarui Xu, Karan Dalal, Sanmi Koyejo, Tatsunori Hashimoto, Xiaolong Wang, Xinhao Li, Xinlei Chen, Yann Dubois, Yu Sun

RNNs can match long-context performance by updating a learnable hidden-state model via self-supervised steps at test time.

arxiv:2407.04620 v4 · 2024-07-05 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{RCPETT5HFXXTYZCIDGIVXYAWDV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TTT-Linear and TTT-MLP can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context.

C2weakest assumption

That performing gradient-based self-supervised updates on the hidden-state model at test time remains stable, computationally tractable, and beneficial without overfitting or excessive overhead at scale.

C3one line summary

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

References

85 extracted · 85 resolved · 15 Pith anchors

[1] GPT-4 Technical Report 2023 · arXiv:2303.08774

[2] Learning to learn by gradient descent by gradient descent 2016

[3] You just found out your book was used to train ai 2023

[4] o ppel, Markus Spanring, Andreas Auer, Oleksandra Prudnikova, Michael Kopp, G \ 2024

[5] Learning a synaptic learning rule 1990

Formal links

2 machine-checked theorem links

Cited by

38 papers in Pith

On Efficient Variants of Segment Anything Model: A Survey

LIFT: A Novel Framework for Enhancing Long-Context Understanding of LLMs via Long Input Fine-Tuning

WriteSAE: Sparse Autoencoders for Recurrent State

Attention Residuals

Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation

Receipt and verification

First computed	2026-05-17T23:38:53.408085Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

889e49cfa72def3c644819915be0161d71812901998c79e2d764dfbfa76e92d6

Aliases

arxiv: 2407.04620 · arxiv_version: 2407.04620v4 · doi: 10.48550/arxiv.2407.04620 · pith_short_12: RCPETT5HFXXT · pith_short_16: RCPETT5HFXXTYZCI · pith_short_8: RCPETT5H

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/RCPETT5HFXXTYZCIDGIVXYAWDV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 889e49cfa72def3c644819915be0161d71812901998c79e2d764dfbfa76e92d6

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "05b4d8152342b055af443082b4000e3e33ae32d46334dbf1752401c3572d0c9e",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-07-05T16:23:20Z",
    "title_canon_sha256": "28bf260612ef235043aafc2f64009b40780baf577faf3678da216f6c9231734f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.04620",
    "kind": "arxiv",
    "version": 4
  }
}