pith. sign in

R2r: Efficiently navigating divergent reasoning paths with small-large model token routing.arXiv preprint arXiv:2505.21600

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

years

2026 5

representative citing papers

Compute Where it Counts: Self Optimizing Language Models

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

SOL trains a policy to dynamically control multiple efficiency mechanisms per token via group-relative policy optimization on teacher-forced episodes, yielding better quality at matched average budget than static or random allocation.

Token-Level LLM Collaboration via FusionRoute

cs.AI · 2026-01-08 · unverdicted · novelty 6.0

FusionRoute augments token-level expert routing with a trainable complementary logit generator to expand the policy class and recover optimal decoding under mild conditions, outperforming prior collaboration and merging methods on reasoning and generation benchmarks.

citing papers explorer

Showing 5 of 5 citing papers.

  • NI Sampling: Accelerating Discrete Diffusion Sampling by Token Order Optimization cs.LG · 2026-04-20 · unverdicted · none · ref 34

    NI Sampling accelerates discrete diffusion language models up to 14.3 times by training a neural indicator to select which tokens to sample at each step using a trajectory-preserving objective.

  • Compute Where it Counts: Self Optimizing Language Models cs.LG · 2026-05-11 · unverdicted · none · ref 11

    SOL trains a policy to dynamically control multiple efficiency mechanisms per token via group-relative policy optimization on teacher-forced episodes, yielding better quality at matched average budget than static or random allocation.

  • Select to Think: Unlocking SLM Potential with Local Sufficiency cs.CL · 2026-04-29 · conditional · none · ref 6

    Small language models can achieve near large-model reasoning performance by learning to re-rank their own top-K token predictions after distilling selection from the large model.

  • GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts cs.AI · 2026-01-08 · unverdicted · none · ref 5

    GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.

  • Token-Level LLM Collaboration via FusionRoute cs.AI · 2026-01-08 · unverdicted · none · ref 8

    FusionRoute augments token-level expert routing with a trainable complementary logit generator to expand the policy class and recover optimal decoding under mild conditions, outperforming prior collaboration and merging methods on reasoning and generation benchmarks.