hub Canonical reference

Dynamic early exit in reasoning models

Dynamic early exit in reasoning models · 2025 · arXiv 2504.15895

Canonical reference. 100% of citing Pith papers cite this work as background.

17 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 17 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling

cs.CL · 2026-05-08 · conditional · novelty 8.0 · 2 refs

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.

Near-Future Policy Optimization

cs.LG · 2026-04-22 · unverdicted · novelty 7.0

NPO uses a policy's own near-future checkpoint as auxiliary trajectories to maximize effective learning signal S = Q/V, improving performance from 57.88 to 63.15 on Qwen3-VL-8B-Instruct with GRPO while accelerating convergence.

Self-Distilled RLVR

cs.LG · 2026-04-03 · unverdicted · novelty 7.0

RLSD mixes self-distillation for token-level policy difference magnitudes with RLVR for reliable update directions from response correctness to reach higher convergence and better training stability.

Two-dimensional early exit optimisation of LLM inference

cs.CL · 2026-03-27 · unverdicted · novelty 7.0

Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.

Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection

cs.AI · 2026-05-13 · unverdicted · novelty 6.0

MSIFR stops faulty LLM generations early via staged rule-based checks, reducing token consumption 11-78% with no accuracy loss.

Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VISTA uses prefix resampling and a vision-aware attention score to address data imbalance and language prior bias in self-improvement training of MLLMs, yielding up to 13.66% gains on reasoning tasks.

Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.

Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight

cs.AI · 2026-05-07 · conditional · novelty 6.0 · 2 refs

Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to cut up to 50% wasted reasoning tokens and recover safe actions from 80% of unsafe traces, more than doubling success rates with no performance cost.

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

cs.AI · 2026-02-03 · unverdicted · novelty 6.0

Conformal risk control with upper and lower thresholds lets LLMs adaptively stop reasoning while guaranteeing a maximum error rate and minimizing token use.

Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought

cs.LG · 2025-10-28 · unverdicted · novelty 6.0

LLMs interleave true causal reasoning steps with decorative ones in CoT, with only ~2.3% of steps having high causal impact on AIME for Qwen-2.5, and a steering direction can force internal use of specific steps.

Entropy After </Think> for reasoning model early exiting

cs.LG · 2025-09-30 · unverdicted · novelty 6.0

Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.

Efficient Test-Time Scaling via Temporal Reasoning Aggregation

cs.AI · 2026-04-19 · unverdicted · novelty 5.0

TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.

SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking

cs.AI · 2026-04-09 · unverdicted · novelty 5.0

SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.

Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation

cs.CL · 2025-10-20 · unverdicted · novelty 5.0

Reasoning LLMs aggregate social biases through stereotype repetition and irrelevant information injection in their thinking processes, and a self-review prompt mitigates this on BBQ, StereoSet, and BOLD benchmarks.

DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models

cs.CL · 2025-05-20 · unverdicted · novelty 5.0

DRP combines teacher-guided pruning of chain-of-thought steps with distillation to cut token usage in reasoning models on GSM8K and AIME while maintaining or improving accuracy.

EasyVideoR1: Easier RL for Video Understanding

cs.CV · 2026-04-18 · unverdicted · novelty 4.0

EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.

River-LLM: Large Language Model Seamless Exit Based on KV Share

cs.CL · 2026-04-20

citing papers explorer

Showing 17 of 17 citing papers.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling cs.CL · 2026-05-08 · conditional · none · ref 37 · 2 links
AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning tasks at low cost.
Near-Future Policy Optimization cs.LG · 2026-04-22 · unverdicted · none · ref 28
NPO uses a policy's own near-future checkpoint as auxiliary trajectories to maximize effective learning signal S = Q/V, improving performance from 57.88 to 63.15 on Qwen3-VL-8B-Instruct with GRPO while accelerating convergence.
Self-Distilled RLVR cs.LG · 2026-04-03 · unverdicted · none · ref 28
RLSD mixes self-distillation for token-level policy difference magnitudes with RLVR for reliable update directions from response correctness to reach higher convergence and better training stability.
Two-dimensional early exit optimisation of LLM inference cs.CL · 2026-03-27 · unverdicted · none · ref 26
Coordinating layer-wise and sentence-wise early exits in LLMs produces multiplicative speedups of 1.4-2.3x over single-dimension early exit on sentiment classification tasks.
Know When To Fold 'Em: Token-Efficient LLM Synthetic Data Generation via Multi-Stage In-Flight Rejection cs.AI · 2026-05-13 · unverdicted · none · ref 42
MSIFR stops faulty LLM generations early via staged rule-based checks, reducing token consumption 11-78% with no accuracy loss.
Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training cs.CV · 2026-05-12 · unverdicted · none · ref 20
VISTA uses prefix resampling and a vision-aware attention score to address data imbalance and language prior bias in self-improvement training of MLLMs, yielding up to 13.66% gains on reasoning tasks.
Nice Fold or Hero Call: Learning Budget-Efficient Thinking for Adaptive Reasoning cs.AI · 2026-05-12 · unverdicted · none · ref 50
BET reduces reasoning tokens by about 55% on average while improving performance across benchmarks by learning to short-solve easy queries, fold early on unsolvable ones, and preserve budget for hard solvable queries.
Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight cs.AI · 2026-05-07 · conditional · none · ref 43 · 2 links
Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to cut up to 50% wasted reasoning tokens and recover safe actions from 80% of unsafe traces, more than doubling success rates with no performance cost.
Conformal Thinking: Risk Control for Reasoning on a Compute Budget cs.AI · 2026-02-03 · unverdicted · none · ref 10
Conformal risk control with upper and lower thresholds lets LLMs adaptively stop reasoning while guaranteeing a maximum error rate and minimizing token use.
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought cs.LG · 2025-10-28 · unverdicted · none · ref 32
LLMs interleave true causal reasoning steps with decorative ones in CoT, with only ~2.3% of steps having high causal impact on AIME for Qwen-2.5, and a steering direction can force internal use of specific steps.
Entropy After </Think> for reasoning model early exiting cs.LG · 2025-09-30 · unverdicted · none · ref 21
Entropy After </Think> (EAT) enables early exiting in reasoning LLMs by tracking entropy stabilization after a </think> token, cutting token use 12-22% on MATH500 and AIME2025 with no accuracy loss.
Efficient Test-Time Scaling via Temporal Reasoning Aggregation cs.AI · 2026-04-19 · unverdicted · none · ref 76
TRACE aggregates answer consistency and confidence trajectory over multiple reasoning steps to decide when to halt inference, reducing token usage by 25-30% while keeping accuracy within 1-2% of full reasoning.
SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking cs.AI · 2026-04-09 · unverdicted · none · ref 40
SAT reduces reasoning tokens by up to 40% across multiple large reasoning models and benchmarks by adaptively pruning steps based on difficulty while maintaining or improving accuracy.
Investigating Thinking Behaviours of Reasoning-Based Language Models for Social Bias Mitigation cs.CL · 2025-10-20 · unverdicted · none · ref 19
Reasoning LLMs aggregate social biases through stereotype repetition and irrelevant information injection in their thinking processes, and a self-review prompt mitigates this on BBQ, StereoSet, and BOLD benchmarks.
DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models cs.CL · 2025-05-20 · unverdicted · none · ref 4
DRP combines teacher-guided pruning of chain-of-thought steps with distillation to cut token usage in reasoning models on GSM8K and AIME while maintaining or improving accuracy.
EasyVideoR1: Easier RL for Video Understanding cs.CV · 2026-04-18 · unverdicted · none · ref 43
EasyVideoR1 delivers an optimized RL pipeline for video understanding in large vision-language models, achieving 1.47x throughput gains and aligned results on 22 benchmarks.
River-LLM: Large Language Model Seamless Exit Based on KV Share cs.CL · 2026-04-20 · unreviewed · ref 25

Dynamic early exit in reasoning models

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer