Scheduled sampling for sequence prediction with recurrent neural networks

Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam Shazeer · 2015

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.

Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0 · 2 refs

PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.

On-Policy Distillation with Best-of-N Teacher Rollout Selection

cs.CV · 2026-05-10 · unverdicted · novelty 5.0 · 2 refs

BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation cs.LG · 2026-05-16 · unverdicted · none · ref 3
Decoupling prefix source from token-level KL direction in autoregressive sequence KL yields four objectives unifying SFT, DAgger, offline RL and OPD, with KL mixing and entropy-gated curriculum improving math reasoning accuracy and shortening responses.
Toward Privileged Foundation Models:LUPI for Accelerated and Improved Learning cs.LG · 2026-05-08 · unverdicted · none · ref 3 · 2 links
PIQL integrates privileged information to accelerate convergence, lower loss, and improve generalization in tabular foundation models.
On-Policy Distillation with Best-of-N Teacher Rollout Selection cs.CV · 2026-05-10 · unverdicted · none · ref 3 · 2 links
BRTS improves on-policy distillation by sampling multiple teacher rollouts and selecting the best one via a correctness-first then alignment priority rule, yielding gains on AIME and AMC math benchmarks.

Scheduled sampling for sequence prediction with recurrent neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer