pith. sign in
Pith Number

pith:7P6I7KSP

pith:2026:7P6I7KSPTNWS6QGTEZKFVHANMX
not attested not anchored not stored refs resolved

M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling

Ion Stoica, Joseph Gonzalez, Mayank Mishra, Shawn Tan, Tri Dao

Non-linear RNNs with matrix-valued states achieve perfect unseen-length state tracking and outperform equivalent attention hybrids by 0.4-0.5 perplexity points while using three times smaller recurrent states.

arxiv:2603.14360 v2 · 2026-03-15 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{7P6I7KSPTNWS6QGTEZKFVHANMX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

M²RNN achieves perfect state tracking generalization at sequence lengths not seen during training. Hybrid M²RNN outperforms equivalent Gated DeltaNet hybrids by 0.4-0.5 perplexity points on a 7B MoE model while using 3× smaller state sizes for the recurrent layers.

C2weakest assumption

That the non-linear matrix-valued state transitions and state size expansion mechanism provide the claimed expressive power and efficiency gains without introducing training instability or hidden computational costs at scale.

C3one line summary

M²RNN achieves perfect state tracking at unseen lengths and outperforms Gated DeltaNet hybrids by 0.4-0.5 perplexity on 7B models with 3x smaller recurrent states.

References

46 extracted · 46 resolved · 26 Pith anchors

[1] PyTorch 2: Faster machine learning through dynamic Python bytecode transformation and graph compilation · doi:10.1145/3620665.3640366.https://doi.org/10.1145/3620665.3640366
[2] Simard, and Paolo Frasconi 2019 · doi:10.1109/72.279181
[3] Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901 1901
[4] On the properties of neural machine translation: Encoder-decoder approaches · arXiv:1409.1259
[5] Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling · arXiv:1412.3555

Cited by

2 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:15.774513Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

fbfc8faa4f9b6d2f40d326545a9c0d65e2103a79d75ec61db5d8d3cc40a2c23a

Aliases

arxiv: 2603.14360 · arxiv_version: 2603.14360v2 · doi: 10.48550/arxiv.2603.14360 · pith_short_12: 7P6I7KSPTNWS · pith_short_16: 7P6I7KSPTNWS6QGT · pith_short_8: 7P6I7KSP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/7P6I7KSPTNWS6QGTEZKFVHANMX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: fbfc8faa4f9b6d2f40d326545a9c0d65e2103a79d75ec61db5d8d3cc40a2c23a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4551899e3bd2851564898ffb9d453f1d0b8e2c75228f165e25b7b279195de842",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-03-15T12:53:09Z",
    "title_canon_sha256": "f8af1814d7a5dab1af988a2966084482dfe224fc54ad81de41a2a3ab0e8250c4"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2603.14360",
    "kind": "arxiv",
    "version": 2
  }
}