Schmidhuber

doi: 10 · 1992 · DOI 10.1162/neco.1992.4.1.131

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

Learning, Fast and Slow: Towards LLMs That Adapt Continually

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

cs.CL · 2024-04-08 · unverdicted · novelty 6.0

Eagle and Finch enhance RWKV with matrix-valued states and dynamic recurrence, trained on a 1.12-trillion-token multilingual corpus, and report competitive performance on standard benchmarks.

Kaczmarz Linear Attention

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

citing papers explorer

Showing 4 of 4 citing papers.

WriteSAE: Sparse Autoencoders for Recurrent State cs.LG · 2026-05-12 · unverdicted · none · ref 38
WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.
Learning, Fast and Slow: Towards LLMs That Adapt Continually cs.LG · 2026-05-12 · unverdicted · none · ref 49 · 2 links
Fast-Slow Training uses context optimization as fast weights alongside parameter updates as slow weights to achieve up to 3x better sample efficiency, higher performance, and less catastrophic forgetting than standard RL in continual LLM learning.
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence cs.CL · 2024-04-08 · unverdicted · none · ref 3
Eagle and Finch enhance RWKV with matrix-valued states and dynamic recurrence, trained on a 1.12-trillion-token multilingual corpus, and report competitive performance on standard benchmarks.
Kaczmarz Linear Attention cs.LG · 2026-05-09 · unverdicted · none · ref 30
Kaczmarz Linear Attention replaces the empirical coefficient in Gated DeltaNet with a key-norm-normalized step size derived from the online regression objective, yielding lower perplexity and better needle-in-haystack performance.

Schmidhuber

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer