hub Mixed citations

Less is More: Recursive Reasoning with Tiny Networks

Alexia Jolicoeur-Martineau · 2025 · cs.LG · arXiv 2510.04871

Mixed citation behavior. Most common role is background (67%).

29 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 29 citing papers arXiv PDF

abstract

Hierarchical Reasoning Model (HRM) is a novel approach using two small neural networks recursing at different frequencies. This biologically inspired method beats Large Language models (LLMs) on hard puzzle tasks such as Sudoku, Maze, and ARC-AGI while trained with small models (27M parameters) on small data (around 1000 examples). HRM holds great promise for solving hard problems with small networks, but it is not yet well understood and may be suboptimal. We propose Tiny Recursive Model (TRM), a much simpler recursive reasoning approach that achieves significantly higher generalization than HRM, while using a single tiny network with only 2 layers. With only 7M parameters, TRM obtains 45% test-accuracy on ARC-AGI-1 and 8% on ARC-AGI-2, higher than most LLMs (e.g., Deepseek R1, o3-mini, Gemini 2.5 Pro) with less than 0.01% of the parameters.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 1

citation-polarity summary

background 4 unclear 1 use method 1

representative citing papers

Stability and Generalization in Looped Transformers

cs.LG · 2026-04-16 · unverdicted · novelty 8.0

Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.

CanViT: Toward Active-Vision Foundation Models

cs.CV · 2026-03-23 · conditional · novelty 8.0

CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.

Interaction Locality in Hierarchical Recursive Reasoning

cs.AI · 2026-05-20 · unverdicted · novelty 7.0

Interaction locality is introduced as a task-geometry-aware measurement framework showing that high-level states in recursive models write locally while recursive updates build broader structures on maze, Sudoku, ARC-AGI, and 3D grounding tasks.

LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models

cs.LG · 2026-05-10 · unverdicted · novelty 7.0

LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.

A Mechanistic Analysis of Looped Reasoning Language Models

cs.LG · 2026-04-13 · unverdicted · novelty 7.0

Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.

FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers

cs.CV · 2026-05-21 · unverdicted · novelty 6.0

FastTab combines a Tiny Recursive Module and axial 1D Transformer encoders to predict table grids, headers, and cell spans directly, achieving competitive accuracy on four benchmarks with low-latency inference.

Winfree Oscillatory Neural Network

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

WONN is a new oscillatory neural network based on generalized Winfree dynamics that scales competitively to ImageNet-1K and reaches 80.1% accuracy on Maze-hard with 1% of prior model parameters.

HRM-Text: Efficient Pretraining Beyond Scaling

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.

Generative Recursive Reasoning

cs.AI · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.

One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

In a minimal two-state recurrent Transformer, asymmetric input injection induces stable specialization where one state becomes a committed proposal and the other retains shifting uncertainty.

One Pass Is Not Enough: Recursive Latent Refinement for Generative Models

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

RTM uses iterative refinement of latent codes in generative models to improve both precision and recall alongside competitive FID scores on CIFAR-10, CelebA-HQ, and few-shot datasets.

Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models

cs.CL · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

MELT decouples reasoning depth from memory in looped language models by sharing a single gated KV cache per layer and training it via chunk-wise distillation from Ouro starting models.

The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

cs.LG · 2026-04-23 · conditional · novelty 6.0

Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.

One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.

LASER: Low-Rank Activation SVD for Efficient Recursion

cs.LG · 2026-04-19 · unverdicted · novelty 6.0

LASER tracks low-rank activation subspaces in recursive models via matrix-free SVD updates and fidelity resets to save 60% memory without accuracy loss.

Parcae: Scaling Laws For Stable Looped Language Models

cs.LG · 2026-04-14 · unverdicted · novelty 6.0

Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.

Querying Structured Data Through Natural Language Using Language Models

cs.CL · 2026-04-03 · conditional · novelty 6.0

Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.

Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

Fast-slow recurrence interleaves quick latent updates with slow observation processing to maintain coherent clustered representations over long horizons, improving out-of-distribution generalization versus LSTM, state space, and Transformer baselines.

Probabilistic Tiny Recursive Model

cs.AI · 2026-05-19 · conditional · novelty 5.0

PTRM adds stochastic Gaussian noise to Tiny Recursive Model recursion for parallel trajectory exploration and Q-head selection, raising Sudoku-Extreme accuracy from 87.4% to 98.75% and Pencil Puzzle Bench from 62.6% to 91.2% without retraining.

bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

cs.CL · 2026-05-11 · unverdicted · novelty 5.0

Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.

State Representation and Termination for Recursive Reasoning Systems

cs.AI · 2026-05-02 · unverdicted · novelty 5.0

Recursive reasoning systems can represent their state via an epistemic state graph and terminate when the linearized order-gap is non-degenerate near the fixed point, providing a local condition for when the stopping rule is informative.

Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency

cs.LG · 2026-04-09 · unverdicted · novelty 5.0

KoPE adds Kuramoto-based oscillatory phase states and synchronization to Vision Transformers, improving training, parameter, and data efficiency on structured vision tasks.

citing papers explorer

Showing 29 of 29 citing papers.

Stability and Generalization in Looped Transformers cs.LG · 2026-04-16 · unverdicted · none · ref 12 · internal anchor
Looped transformers with recall and outer normalization produce reachable, input-dependent fixed points with stable gradients, enabling generalization, while those without recall cannot; a new internal recall variant performs competitively or better.
CanViT: Toward Active-Vision Foundation Models cs.CV · 2026-03-23 · conditional · none · ref 42 · internal anchor
CanViT is the first task- and policy-agnostic AVFM pretrained via passive-to-active dense latent distillation on 13.2M scenes and 1B random glimpses, achieving 38.5% ADE20K mIoU in one glimpse and 84.5% ImageNet-1k top-1 after fine-tuning.
Interaction Locality in Hierarchical Recursive Reasoning cs.AI · 2026-05-20 · unverdicted · none · ref 4 · internal anchor
Interaction locality is introduced as a task-geometry-aware measurement framework showing that high-level states in recursive models write locally while recursive updates build broader structures on maze, Sudoku, ARC-AGI, and 3D grounding tasks.
LoopUS: Recasting Pretrained LLMs into Looped Latent Refinement Models cs.LG · 2026-05-10 · unverdicted · none · ref 43 · internal anchor
LoopUS converts pretrained LLMs into looped latent refinement models via block decomposition, selective gating, random deep supervision, and confidence-based early exiting to improve reasoning performance.
A Mechanistic Analysis of Looped Reasoning Language Models cs.LG · 2026-04-13 · unverdicted · none · ref 16 · internal anchor
Looped LLMs converge to distinct cyclic fixed points per layer, repeating feedforward-style inference stages across recurrences.
FastTab: A Fast Table Recognizer with a Tiny Recursive Module and 1D Transformers cs.CV · 2026-05-21 · unverdicted · none · ref 26 · internal anchor
FastTab combines a Tiny Recursive Module and axial 1D Transformer encoders to predict table grids, headers, and cell spans directly, achieving competitive accuracy on four benchmarks with low-latency inference.
Winfree Oscillatory Neural Network cs.LG · 2026-05-20 · unverdicted · none · ref 16 · internal anchor
WONN is a new oscillatory neural network based on generalized Winfree dynamics that scales competitively to ImageNet-1K and reaches 80.1% accuracy on Maze-hard with 1% of prior model parameters.
HRM-Text: Efficient Pretraining Beyond Scaling cs.CL · 2026-05-20 · unverdicted · none · ref 36 · internal anchor
A 1B-parameter hierarchical recurrent model pretrained on 40B instruction-response tokens achieves 60.7% MMLU and strong results on ARC-C, DROP, GSM8K, and MATH while using 100-900x fewer tokens than standard baselines.
Generative Recursive Reasoning cs.AI · 2026-05-19 · unverdicted · none · ref 9 · 2 links · internal anchor
GRAM is a latent-variable generative model that performs recursive reasoning via stochastic trajectories, trained with amortized variational inference to support multi-hypothesis reasoning and unconditional generation.
One Model, Two Roles: Emergent Specialization in a Shared Recurrent Transformer cs.LG · 2026-05-18 · unverdicted · none · ref 15 · internal anchor
In a minimal two-state recurrent Transformer, asymmetric input injection induces stable specialization where one state becomes a committed proposal and the other retains shifting uncertainty.
One Pass Is Not Enough: Recursive Latent Refinement for Generative Models cs.CV · 2026-05-14 · unverdicted · none · ref 4 · internal anchor
RTM uses iterative refinement of latent codes in generative models to improve both precision and recall alongside competitive FID scores on CIFAR-10, CelebA-HQ, and few-shot datasets.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models cs.CL · 2026-05-08 · unverdicted · none · ref 5 · 2 links · internal anchor
MELT decouples reasoning depth from memory in looped language models by sharing a single gated KV cache per layer and training it via chunk-wise distillation from Ouro starting models.
The Thinking Pixel: Recursive Sparse Reasoning in Multimodal Diffusion Latents cs.CV · 2026-04-28 · unverdicted · none · ref 24 · internal anchor
A recursive sparse MoE framework integrated into diffusion models iteratively refines visual tokens via gated module selection to improve structured reasoning and image generation performance.
Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning cs.LG · 2026-04-23 · conditional · none · ref 8 · internal anchor
Memory tokens are required for non-trivial performance in adaptive Universal Transformers on Sudoku-Extreme, with 8-32 tokens yielding stable 57% exact-match accuracy while trading off against ponder depth.
One Step Forward and K Steps Back: Better Reasoning with Denoising Recursion Models cs.LG · 2026-04-20 · unverdicted · none · ref 175 · internal anchor
Denoising Recursion Models train multi-step noise reversal in looped transformers and outperform the prior Tiny Recursion Model on ARC-AGI.
LASER: Low-Rank Activation SVD for Efficient Recursion cs.LG · 2026-04-19 · unverdicted · none · ref 5 · internal anchor
LASER tracks low-rank activation subspaces in recursive models via matrix-free SVD updates and fidelity resets to save 60% memory without accuracy loss.
Parcae: Scaling Laws For Stable Looped Language Models cs.LG · 2026-04-14 · unverdicted · none · ref 40 · internal anchor
Parcae stabilizes looped LLMs via spectral norm constraints on injection parameters, enabling power-law scaling for training FLOPs and saturating exponential scaling at test time that improves quality over fixed-depth baselines under fixed parameter budgets.
Querying Structured Data Through Natural Language Using Language Models cs.CL · 2026-04-03 · conditional · none · ref 10 · internal anchor
Fine-tuning an 8B LLM with synthetic data enables accurate natural language querying of structured datasets like accessibility services in Spain, generalizing to new locations.
Thinking While Listening: Fast-Slow Recurrence for Long-Horizon Sequential Modeling cs.LG · 2026-04-02 · unverdicted · none · ref 3 · internal anchor
Fast-slow recurrence interleaves quick latent updates with slow observation processing to maintain coherent clustered representations over long horizons, improving out-of-distribution generalization versus LSTM, state space, and Transformer baselines.
Probabilistic Tiny Recursive Model cs.AI · 2026-05-19 · conditional · none · ref 1 · internal anchor
PTRM adds stochastic Gaussian noise to Tiny Recursive Model recursion for parallel trajectory exploration and Q-head selection, raising Sudoku-Extreme accuracy from 87.4% to 98.75% and Pencil Puzzle Bench from 62.6% to 91.2% without retraining.
bViT: Investigating Single-Block Recurrence in Vision Transformers for Image Recognition cs.CV · 2026-05-11 · unverdicted · none · ref 16 · internal anchor
A 12-step single-block recurrent ViT-B reaches accuracy comparable to a standard ViT-B on ImageNet-1K while using an order of magnitude fewer parameters.
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis cs.CL · 2026-05-11 · unverdicted · none · ref 10 · internal anchor
Mela is a Transformer variant with a dual-frequency Hierarchical Memory Module and MemStack that performs test-time memory consolidation, outperforming baselines on long contexts.
State Representation and Termination for Recursive Reasoning Systems cs.AI · 2026-05-02 · unverdicted · none · ref 7 · internal anchor
Recursive reasoning systems can represent their state via an epistemic state graph and terminate when the linearized order-gap is non-degenerate near the fixed point, providing a local condition for when the stopping rule is informative.
Kuramoto Oscillatory Phase Encoding: Neuro-inspired Synchronization for Improved Learning Efficiency cs.LG · 2026-04-09 · unverdicted · none · ref 7 · internal anchor
KoPE adds Kuramoto-based oscillatory phase states and synchronization to Vision Transformers, improving training, parameter, and data efficiency on structured vision tasks.
Consolidation-Expansion Operator Mechanics:A Unified Framework for Adaptive Learning cs.LG · 2026-05-11 · unverdicted · none · ref 19 · 2 links · internal anchor
OpMech defines the order-gap as a computable non-commutativity measure between consolidation and expansion operators to provide real-time convergence signals and stopping rules in adaptive learning.
Measuring AI Reasoning: A Guide for Researchers cs.AI · 2026-05-04 · unverdicted · none · ref 20 · internal anchor
Reasoning in language models should be measured by the faithfulness and validity of their multi-step search processes and intermediate traces, not final-answer accuracy.
Dual-Track CoT: Budget-Aware Stepwise Guidance for Small LMs cs.CL · 2026-04-27 · unverdicted · none · ref 1 · internal anchor
Dual-Track CoT lets small language models perform reliable multi-step reasoning with the same or fewer tokens via budget tracking and rejection of redundant steps.
LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems cs.AI · 2026-04-14 · unverdicted · none · ref 17 · internal anchor
LIFE is a proposed agentic framework that combines four components to enable incremental, flexible, and energy-efficient continual learning for HPC operations such as latency spike mitigation.
S-AI-Recursive: A Bio-Inspired and Temporal Sparse AI Architecture for Iterative, Introspective, and Energy-Frugal Reasoning cs.NE · 2026-05-05 · unverdicted · none · ref 12 · internal anchor
S-AI-Recursive operationalizes reasoning as a closed-loop hormonal iteration with Clarifine and Confusionin to reach stable equilibrium, achieving competitive benchmark performance with under 10 million parameters via temporal depth instead of width.

Less is More: Recursive Reasoning with Tiny Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer