Title resolution pending

For a 3B student, training under both a strong 30B teacher, a 7B RL teacher leads to similar outcomes: the KL divergence decreases steadily, the success rate improves at comparable rates, indicating that increasing teacher strength be · 2000

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

TCOD stabilizes on-policy distillation for multi-turn agents via temporal curriculum on trajectory depth, improving performance up to 18 points over vanilla OPD and sometimes surpassing the teacher.

citing papers explorer

Showing 1 of 1 citing paper.

TCOD: Exploring Temporal Curriculum in On-Policy Distillation for Multi-turn Autonomous Agents cs.LG · 2026-04-27 · unverdicted · none · ref 21
TCOD stabilizes on-policy distillation for multi-turn agents via temporal curriculum on trajectory depth, improving performance up to 18 points over vanilla OPD and sometimes surpassing the teacher.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer