o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
NeurIPS , year=
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
A gated residual KAN framework called Temporal Functional Circuits maps edge functions to input lags, ranks them by activation, and validates faithfulness via interventions showing that learned B-splines add predictive value beyond base activations.
Recurrent Transformers add per-layer recurrent memory via self-attention on own activations plus a tiling algorithm that reduces training memory traffic, yielding better C4 pretraining cross-entropy than parameter-matched standard transformers with fewer layers.
LILAC+ combines context-based, adaptation-speed, and budget-to-state safety constraints to reduce violations in continual RL under nonstationary conditions, demonstrated in simulated driving tasks.
citing papers explorer
-
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
o1-like models overthink easy tasks; self-training reduces compute use without accuracy loss on GSM8K, MATH500, GPQA, and AIME.
-
Temporal Functional Circuits: From Spline Plots to Faithful Explanations in KAN Forecasting
A gated residual KAN framework called Temporal Functional Circuits maps edge functions to input lags, ranks them by activation, and validates faithfulness via interventions showing that learned B-splines add predictive value beyond base activations.
-
The Recurrent Transformer: Greater Effective Depth and Efficient Decoding
Recurrent Transformers add per-layer recurrent memory via self-attention on own activations plus a tiling algorithm that reduces training memory traffic, yielding better C4 pretraining cross-entropy than parameter-matched standard transformers with fewer layers.
-
Safe Continual Reinforcement Learning under Nonstationarity via Adaptive Safety Constraints
LILAC+ combines context-based, adaptation-speed, and budget-to-state safety constraints to reduce violations in continual RL under nonstationary conditions, demonstrated in simulated driving tasks.