TSA adds end-to-end differentiable per-token halting gates to transformers, enabling learned adaptive depth that saves 14-23% token-layer operations with under 0.5% quality loss on language modeling.
International Conference on Learning Representations (ICLR) , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
citing papers explorer
-
Adaptive Computation Depth via Learned Token Routing in Transformers
TSA adds end-to-end differentiable per-token halting gates to transformers, enabling learned adaptive depth that saves 14-23% token-layer operations with under 0.5% quality loss on language modeling.
-
Measuring AI Reasoning: A Guide for Researchers
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.