We use a peak learning rate of2×10−4 with cosine decay and a warmup ratio of 0.1

Training is conducted on 8×A800 GPUs with bfloat16 precision, DeepSpeed ZeRO-2 (Rasley et al · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

$\delta$-mem: Efficient Online Memory for Large Language Models

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-heavy tasks.

citing papers explorer

Showing 1 of 1 citing paper.

$\delta$-mem: Efficient Online Memory for Large Language Models cs.AI · 2026-05-12 · unverdicted · none · ref 21
δ-mem augments frozen LLMs with an 8x8 online memory state updated by delta-rule learning to generate low-rank attention corrections, delivering 1.10x average gains over the backbone and larger improvements on memory-heavy tasks.

We use a peak learning rate of2×10−4 with cosine decay and a warmup ratio of 0.1

fields

years

verdicts

representative citing papers

citing papers explorer