Training-Free Looped Transformers

· 2026 · cs.LG · arXiv 2605.23872

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

representative citing papers

AGI Maze as a Benchmark Framework for World-Modeling Agents

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

AGI Maze supplies a family of grid maze environments with a clean API to benchmark agents on learning and using world state representations rather than local pattern matching, with preliminary tests showing vanilla LLMs fail even on small instances.

citing papers explorer

Showing 1 of 1 citing paper after filters.

AGI Maze as a Benchmark Framework for World-Modeling Agents cs.AI · 2026-07-01 · unverdicted · none · ref 2 · internal anchor
AGI Maze supplies a family of grid maze environments with a clean API to benchmark agents on learning and using world state representations rather than local pattern matching, with preliminary tests showing vanilla LLMs fail even on small instances.

Training-Free Looped Transformers

fields

years

verdicts

representative citing papers

citing papers explorer