On-policy GKD trains 5x smaller student LLMs to nearly match large teacher performance in AV motion planning on nuScenes while beating a dense-feedback RL baseline.
Convex methods for constrained linear bandits
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning
On-policy GKD trains 5x smaller student LLMs to nearly match large teacher performance in AV motion planning on nuScenes while beating a dense-feedback RL baseline.