arXiv preprint arXiv:2502.12465 , year=

Computational-statistical tradeoffs at the next-token prediction barrier: Autoregressive, imitation learning under misspecification , author= · 2025 · arXiv 2502.12465

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon

cs.LG · 2026-06-29 · unverdicted · novelty 7.0

Online IL overcomes an information-theoretic bottleneck that offline IL faces in non-realizable settings even at horizon 1, under a new structural characterization of reward-relative misspecification.

Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss

cs.LG · 2026-06-04 · unverdicted · novelty 6.0

Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.

citing papers explorer

Showing 2 of 2 citing papers after filters.

When Does Online Imitation Learning Help in LLM Post-Training? The Role of (Non-)Realizability Beyond Horizon cs.LG · 2026-06-29 · unverdicted · none · ref 45
Online IL overcomes an information-theoretic bottleneck that offline IL faces in non-realizable settings even at horizon 1, under a new structural characterization of reward-relative misspecification.
Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss cs.LG · 2026-06-04 · unverdicted · none · ref 116
Double preconditioning (DoPr) improves downstream task performance in test-time feedback settings without consistent gains in validation loss.

arXiv preprint arXiv:2502.12465 , year=

fields

years

verdicts

representative citing papers

citing papers explorer