Large-chunk online updates during inference let test-time training scale state capacity to 40% of model size and handle contexts up to 1M tokens without custom kernels.
Leave no context behind: Efficient infinite context transformers with infini-attention, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2representative citing papers
citing papers explorer
-
Test-Time Training Done Right
Large-chunk online updates during inference let test-time training scale state capacity to 40% of model size and handle contexts up to 1M tokens without custom kernels.
- VRAG: Learning World Models for Interactive Video Generation