Pretraining language models to ponder in continuous space

Boyi Zeng, Shixiang Song, Siyuan Huang, Yixuan Wang, He Li, Ziwei He, Xinbing Wang, Zhiyu Li, Zhouhan Lin · 2025 · arXiv 2505.20674

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Training-Free Looped Transformers

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

Scaling Latent Reasoning via Looped Language Models

cs.CL · 2025-10-29 · unverdicted · novelty 7.0

Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.

The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning

eess.AS · 2026-03-18 · unverdicted · novelty 4.0 · 2 refs

FLAIR enables simultaneous latent reasoning during speech input in full-duplex dialogue models via recursive latent embeddings and an ELBO-based training objective without added latency.

citing papers explorer

Showing 4 of 4 citing papers.

Training-Free Looped Transformers cs.LG · 2026-05-22 · unverdicted · none · ref 101
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models cs.LG · 2026-05-10 · unverdicted · none · ref 27
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unverdicted · none · ref 18
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
The Silent Thought: Modeling Internal Cognition in Full-Duplex Spoken Dialogue Models via Latent Reasoning eess.AS · 2026-03-18 · unverdicted · none · ref 41 · 2 links
FLAIR enables simultaneous latent reasoning during speech input in full-duplex dialogue models via recursive latent embeddings and an ELBO-based training objective without added latency.

Pretraining language models to ponder in continuous space

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer