Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Position-Weighted On-Policy Self-Distillation (PW-OPSD) weights later tokens more heavily after a diagnostic shows position predicts teacher reliability better than entropy, yielding +1.0 and +1.1 Avg@12 gains on AIME 2024/2025.

citing papers explorer

Showing 1 of 1 citing paper.

When Are Teacher Tokens Reliable? Position-Weighted On-Policy Self-Distillation for Reasoning cs.LG · 2026-05-20 · unverdicted · none · ref 34
Position-Weighted On-Policy Self-Distillation (PW-OPSD) weights later tokens more heavily after a diagnostic shows position predicts teacher reliability better than entropy, yielding +1.0 and +1.1 Avg@12 gains on AIME 2024/2025.

Transformers: State-of-the-art natural language processing

fields

years

verdicts

representative citing papers

citing papers explorer