The maximum prompt length is set to 2,560 tokens and the maximum response length to 20,480 tokens

All RL & Distillation baselines including SOD are trained based on the SFT checkpoint · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SOD: Step-wise On-policy Distillation for Small Language Model Agents

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

citing papers explorer

Showing 1 of 1 citing paper.

SOD: Step-wise On-policy Distillation for Small Language Model Agents cs.CL · 2026-05-08 · unverdicted · none · ref 77
SOD reweights on-policy distillation strength step-by-step using divergence to stabilize tool use in small language model agents, yielding up to 20.86% gains and 26.13% on AIME 2025 for a 0.6B model.

The maximum prompt length is set to 2,560 tokens and the maximum response length to 20,480 tokens

fields

years

verdicts

representative citing papers

citing papers explorer