NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
Prox- ythinker: Test-time guidance through small visual reasoners.arXiv preprint arXiv:2505.24872
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.
OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.
citing papers explorer
-
Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration
NExt accelerates RLVR training for LLMs by nonlinearly extrapolating low-rank parameter trajectories extracted from LoRA runs.
-
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts
GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.
-
OneThinker: All-in-one Reasoning Model for Image and Video
OneThinker unifies image and video reasoning in one model across 10 tasks via a 600k corpus, CoT-annotated SFT, and EMA-GRPO reinforcement learning, reporting strong results on 31 benchmarks plus some cross-task transfer.