pith. sign in
Pith Number

pith:WCHBCUAJ

pith:2024:WCHBCUAJDPJA3BI2DVHH5U5GUG
not attested not anchored not stored refs resolved

Titans: Learning to Memorize at Test Time

Ali Behrouz, Peilin Zhong, Vahab Mirrokni

Titans combine attention with a learnable neural long-term memory to handle contexts over two million tokens more effectively than Transformers or linear recurrent models.

arxiv:2501.00663 v1 · 2024-12-31 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WCHBCUAJDPJA3BI2DVHH5U5GUG}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our experimental results on language modeling, common-sense reasoning, genomics, and time series tasks show that Titans are more effective than Transformers and recent modern linear recurrent models. They further can effectively scale to larger than 2M context window size with higher accuracy in needle-in-haystack tasks compared to baselines.

C2weakest assumption

That the neural memory module can reliably learn to store and retrieve relevant historical information without catastrophic forgetting or introducing new failure modes that offset the claimed gains, especially when the training objective does not explicitly supervise the memory contents.

C3one line summary

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

References

139 extracted · 139 resolved · 24 Pith anchors

[1] GPT-4 Technical Report 2023 · arXiv:2303.08774
[2] Linear Transformers with Learnable Kernel Functions are Better In-Context Models 2024
[3] Learning to learn by gradient descent by gradient descent 2016
[4] Exploring length generalization in large language models 2022
[5] Simple linear attention language models balance the recall-throughput tradeoff 2024

Formal links

2 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:21.525493Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b08e1150091bd20d851a1d4e7ed3a6a1b85728467986a54b1264c50b5ba05ea7

Aliases

arxiv: 2501.00663 · arxiv_version: 2501.00663v1 · doi: 10.48550/arxiv.2501.00663 · pith_short_12: WCHBCUAJDPJA · pith_short_16: WCHBCUAJDPJA3BI2 · pith_short_8: WCHBCUAJ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WCHBCUAJDPJA3BI2DVHH5U5GUG \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b08e1150091bd20d851a1d4e7ed3a6a1b85728467986a54b1264c50b5ba05ea7
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "2e206822891bb75ad3edfac5f675ae2117dd8dc18a8e770fe3e7037c8bcd6d5b",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2024-12-31T22:32:03Z",
    "title_canon_sha256": "68ab678edefb0c80939e9ec6ad62f8f70af0a8957580f19f811a11a8a0a22891"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2501.00663",
    "kind": "arxiv",
    "version": 1
  }
}