KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.
citing papers explorer
-
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory
KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.
-
Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning
ProxyCoT transfers CoT reasoning from proxy short contexts to full long contexts through RL/distillation followed by SFT, outperforming baselines with lower overhead and generalizing out-of-domain.