pith. sign in

Recoverable Identifier

arXiv:2604.27263 · detector doi_compliance · incontrovertible · 2026-05-19 19:25:52.347849+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.18653/v1/2020.findings-emnlp.414.URL:https://aclanthology.org/2020.findings-emnlp.414/(visited) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

Kaj Bostrom and Greg Durrett. “Byte Pair Encoding is Suboptimal for Language Model Pretraining”. In:Findings of the Association for Computational Linguistics: EMNLP 2020. Findings 2020. Ed. by Trevor Cohn, Yulan He, and Yang Liu. Online: Association for Com- putational Linguistics, Nov. 2020, pp. 4617–4624.DOI:10.18653/v1/2020.findings- emnlp.414.URL:https://aclanthology.org/2020.findings-emnlp.414/(visited on 01/22/2026)

Evidence payload

{
  "printed_excerpt": "Kaj Bostrom and Greg Durrett. \u201cByte Pair Encoding is Suboptimal for Language Model Pretraining\u201d. In:Findings of the Association for Computational Linguistics: EMNLP 2020. Findings 2020. Ed. by Trevor Cohn, Yulan He, and Yang Liu. Online: As",
  "reconstructed_doi": "10.18653/v1/2020.findings-emnlp.414.URL:https://aclanthology.org/2020.findings-emnlp.414/(visited",
  "ref_index": 2,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}