Memformer: A memory- augmented transformer for sequence modeling.arXiv preprint arXiv:2010.06891, 2020

Qingyang Wu, Zhenzhong Lan, Kun Qian, Jing Gu, Alborz Geramifard, Zhou Yu · 2010 · arXiv 2010.06891

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

cs.LG · 2026-04-23 · conditional · novelty 6.0

Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

Titans: Learning to Memorize at Test Time

cs.LG · 2024-12-31 · unverdicted · novelty 6.0

Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning cs.LG · 2026-04-23 · conditional · none · ref 22
Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.
MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 44
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
Titans: Learning to Memorize at Test Time cs.LG · 2024-12-31 · unverdicted · none · ref 121
Titans combine attention for current context with a learnable neural memory for long-term history, achieving better performance and scaling to over 2M-token contexts on language, reasoning, genomics, and time-series tasks.

Memformer: A memory- augmented transformer for sequence modeling.arXiv preprint arXiv:2010.06891, 2020

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer