arXiv preprint arXiv:2503.13162 , year=

Espinosa-Dice, N · 2025 · arXiv 2503.13162

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Online IL overcomes an information-theoretic bottleneck that offline IL faces in non-realizable settings even at horizon 1, under a new structural characterization of reward-relative misspecification.

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

citing papers explorer

Showing 2 of 2 citing papers after filters.

When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon cs.LG · 2026-06-29 · unverdicted · none · ref 10
Online IL overcomes an information-theoretic bottleneck that offline IL faces in non-realizable settings even at horizon 1, under a new structural characterization of reward-relative misspecification.
Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift cs.LG · 2026-05-09 · unverdicted · none · ref 28 · 2 links
SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

arXiv preprint arXiv:2503.13162 , year=

fields

years

verdicts

representative citing papers

citing papers explorer