Re- cursive self-aggregation unlocks deep thinking in large language models

Venkatraman, S · 2025 · arXiv 2509.26626

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

representative citing papers

On Test-Time Scaling for Vision-Language Models

cs.CV · 2026-06-27 · unverdicted · novelty 7.0

Small well-performing LVLMs gain the most from test-time scaling with up to 30% improvements that can match or exceed larger models, while visual information is used mainly early in reasoning chains.

CAPS: Cascaded Adaptive Pairwise Selection for Efficient Parallel Reasoning

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

CAPS is a four-stage inference-only cascade that adapts how much of each solution the verifier sees and how comparisons are distributed, halving per-candidate verifier tokens while outperforming uniform pairwise verification on most benchmarks.

Test-Time Learning with an Evolving Library

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

EvoLib enables LLMs to accumulate, reuse, and evolve knowledge abstractions from inference trajectories at test time, yielding substantial gains on math reasoning, code generation, and agentic benchmarks without parameter updates or supervision.

ZAYA1-8B Technical Report

cs.AI · 2026-05-06 · unverdicted · novelty 6.0

ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.

QED-Nano: Teaching a Tiny Model to Prove Hard Theorems

cs.AI · 2026-04-06 · unverdicted · novelty 6.0

A 4B model post-trained with SFT, RL, and a reasoning cache surpasses larger open models and approaches proprietary ones on Olympiad proof generation.

Understanding Performance Gap Between Parallel and Sequential Sampling in Large Reasoning Models

cs.CL · 2026-04-07 · unverdicted · novelty 5.0

Lack of exploration from conditioning on prior answers is the primary reason parallel sampling outperforms sequential sampling in large reasoning models.

citing papers explorer

Showing 1 of 1 citing paper after filters.

On Test-Time Scaling for Vision-Language Models cs.CV · 2026-06-27 · unverdicted · none · ref 33
Small well-performing LVLMs gain the most from test-time scaling with up to 30% improvements that can match or exceed larger models, while visual information is used mainly early in reasoning chains.

Re- cursive self-aggregation unlocks deep thinking in large language models

fields

years

verdicts

representative citing papers

citing papers explorer