hub

Pondernet: Learning to ponder

Andrea Banino, Jan Balaguer, Charles Blundell · 2021 · arXiv 2107.05407

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

Training-Free Looped Transformers

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.

A Mechanistic Analysis of Looped Reasoning Language Models

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.

Scaling Latent Reasoning via Looped Language Models

cs.CL · 2025-10-29 · unverdicted · novelty 7.0

Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.

Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

RIC replaces single-pass label imitation with RL-driven iterative belief refinement, recovering cross-entropy optima while enabling adaptive halting via a value function.

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

cs.LG · 2026-04-23 · conditional · novelty 6.0

Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

When to Think Fast and Slow? AMOR: Adaptive Entropy Gate for Hybrid Models

cs.AI · 2026-01-22 · unverdicted · novelty 6.0

AMOR uses output entropy to gate attention in recurrent hybrids, matching full attention performance at roughly 22% attention invocations across 180M-1.5B models.

Hierarchical Reasoning Model

cs.AI · 2025-06-26 · unverdicted · novelty 5.0

HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.

Galactica: A Large Language Model for Science

cs.CL · 2022-11-16 · unverdicted · novelty 5.0

Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

citing papers explorer

Showing 10 of 10 citing papers.

Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 3
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
Training-Free Looped Transformers cs.LG · 2026-05-22 · unverdicted · none · ref 8
Training-free looped transformers retrofit recurrence to frozen models via damped ODE sub-steps on mid-stack blocks, yielding gains such as +2.64 pp on MMLU-Pro for Qwen3-4B.
A Mechanistic Analysis of Looped Reasoning Language Models cs.LG · 2026-04-13 · unverdicted · none · ref 4
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
Scaling Latent Reasoning via Looped Language Models cs.CL · 2025-10-29 · unverdicted · none · ref 32
Looped language models with latent iterative computation and entropy-regularized depth allocation achieve performance matching up to 12B standard LLMs through superior knowledge manipulation.
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents cs.CV · 2026-04-28 · unverdicted · none · ref 23
A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.
Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement cs.LG · 2026-04-23 · unverdicted · none · ref 1
RIC replaces single-pass label imitation with RL-driven iterative belief refinement, recovering cross-entropy optima while enabling adaptive halting via a value function.
Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning cs.LG · 2026-04-23 · conditional · none · ref 1
Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.
When to Think Fast and Slow? AMOR: Adaptive Entropy Gate for Hybrid Models cs.AI · 2026-01-22 · unverdicted · none · ref 1
AMOR uses output entropy to gate attention in recurrent hybrids, matching full attention performance at roughly 22% attention invocations across 180M-1.5B models.
Hierarchical Reasoning Model cs.AI · 2025-06-26 · unverdicted · none · ref 92
HRM is a recurrent architecture with high-level planning and low-level execution modules that reaches near-perfect accuracy on complex Sudoku, maze navigation, and ARC benchmarks using 27M parameters and 1000 samples without pre-training or CoT supervision.
Galactica: A Large Language Model for Science cs.CL · 2022-11-16 · unverdicted · none · ref 143
Galactica, a science-specialized LLM, reports higher scores than GPT-3, Chinchilla, and PaLM on LaTeX knowledge, mathematical reasoning, and medical QA benchmarks while outperforming general models on BIG-bench.

Pondernet: Learning to ponder

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer