In Proceedings of the 38th Conference on Learning Theory (COLT), volume 291 ofProceedings of Machine Learning Research, pages 2757–2785

Compression barriers in autoregressive transformers · arXiv 2502.15955

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers

cs.LG · 2026-04-20 · unverdicted · novelty 7.0

Transformers need depth scaling as the product of ceil(k/s) and log n terms for k-hop pointer chasing under cache size s, with a conjectured lower bound, proved upper bound via windowed pointer doubling, and an adaptive-oblivious error separation.

citing papers explorer

Showing 1 of 1 citing paper.

How Much Cache Does Reasoning Need? Depth-Cache Tradeoffs in KV-Compressed Transformers cs.LG · 2026-04-20 · unverdicted · none · ref 6
Transformers need depth scaling as the product of ceil(k/s) and log n terms for k-hop pointer chasing under cache size s, with a conjectured lower bound, proved upper bound via windowed pointer doubling, and an adaptive-oblivious error separation.

In Proceedings of the 38th Conference on Learning Theory (COLT), volume 291 ofProceedings of Machine Learning Research, pages 2757–2785

fields

years

verdicts

representative citing papers

citing papers explorer