Recoverable Identifier

arXiv:2605.15417 · detector doi_compliance · incontrovertible · 2026-05-19 16:04:40.928097+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. Han, M. Jiang, Y . Song, S. Ermon, and M. Xu. f-po: Generalizing preference optimization with f-divergence minimization.arXiv preprint arXiv:2410.21662, 2024. D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. Measuring mathemat- ical problem solving with the math dataset. InThirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2). A. Huang, W. Zhan, T. Xie, J. D. Lee, W. Sun, A. Kr- ishnamurthy, and D. J. Foster. Correcting the mythos of kl-regularization: Direct alignment without overopti- mization via chi-squared preference optimization. InThe Thirteenth International Conference on Learning Repre- sentations. P. Intellect. Prime-rl, 2025. URL https://github. com/PrimeIntellect-ai/prime-rl. Q. Jin, B. Dhingra, Z. Liu, W. Cohen, and X. Lu. Pubmedqa: A dataset for biomedical research question answering. In Proceedings of the 2019 Conference on Empirical Meth- ods in Natural Language Processing and the 9th Interna- tional Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, 2019. L. Ke, S. Choudhury, M. Barnes, W. Sun, G. Lee, and S. Srinivasa. Imitation learning as f-divergence mini- mization. InInternational workshop on the algorithmic foundations of robotics,

Evidence payload

{
  "printed_excerpt": "PMLR, 2020. D. Go, T. Korbak, G. Kruszewski, J. Rozen, N. Ryu, and M. Dymetman. Aligning language models with prefer- ences through f-divergence minimization.arXiv preprint arXiv:2302.08215, 2023. J. Han, M. Jiang, Y . Song, S. Ermon, and M",
  "reconstructed_doi": "10.64434/tml.20251026.https://thinkingmachines.ai/blog/on-policy-distillation.N.Malkin",
  "ref_index": 1,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}