pith. sign in

Training-Free Looped Transformers

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We introduce training-free looped transformers, in which a lightweight inference-time wrapper loops a contiguous mid-stack block of layers of a frozen checkpoint without additional fine-tuning, continued training, or architectural changes. Unlike prior looped transformer methods that train with the looped structure end-to-end, we retrofit recurrence onto pretrained models at test time. We show that naive block reapplication usually degrades performance, highlighting the importance of the loop application strategy. Motivated by viewing a pre-norm transformer block as a forward Euler step on an ODE, we instead treat looping as a refinement of the same approximation, replacing one large update with smaller damped sub-steps. Across seven dense, sparse MoE, and MLA+MoE model families, our method improves Qwen3-4B-Instruct by +2.64 pp on MMLU-Pro, Qwen3-30B-A3B-Instruct by +1.14 pp on CommonsenseQA, and Moonlight-16B-A3B-Instruct by +1.20 pp on OpenBookQA.

fields

cs.AI 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

AGI Maze as a Benchmark Framework for World-Modeling Agents

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

AGI Maze supplies a family of grid maze environments with a clean API to benchmark agents on learning and using world state representations rather than local pattern matching, with preliminary tests showing vanilla LLMs fail even on small instances.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • AGI Maze as a Benchmark Framework for World-Modeling Agents cs.AI · 2026-07-01 · unverdicted · none · ref 2 · internal anchor

    AGI Maze supplies a family of grid maze environments with a clean API to benchmark agents on learning and using world state representations rather than local pattern matching, with preliminary tests showing vanilla LLMs fail even on small instances.