{"paper":{"title":"Nougat: Neural Optical Understanding for Academic Documents","license":"http://creativecommons.org/licenses/by-sa/4.0/","headline":"A visual transformer model converts images of scientific document pages into accurate semantic markup.","cross_cats":["cs.CV"],"primary_cat":"cs.LG","authors_text":"Guillem Cucurull, Lukas Blecher, Robert Stojnic, Thomas Scialom","submitted_at":"2023-08-25T15:03:36Z","abstract_excerpt":"Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"We propose Nougat, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That visual processing of page images is sufficient to recover accurate semantic markup for complex layouts and nested mathematical expressions without systematic errors on unseen document styles.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A visual transformer model converts images of scientific document pages into accurate semantic markup.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"57fe66c69169cdb73da041858598292b058a352521aa8409e9e253dbadb3d916"},"source":{"id":"2308.13418","kind":"arxiv","version":1},"verdict":{"id":"05bc4f07-bec7-4c4f-8101-b742e39fa7c3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T09:38:45.893191Z","strongest_claim":"We propose Nougat, a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents.","one_line_summary":"Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That visual processing of page images is sufficient to recover accurate semantic markup for complex layouts and nested mathematical expressions without systematic errors on unseen document styles.","pith_extraction_headline":"A visual transformer model converts images of scientific document pages into accurate semantic markup."},"references":{"count":54,"sample":[{"doi":"","year":2012,"title":"Statistics of the Common Crawl Corpus 2012, June 2013","work_id":"d24e1885-6ad7-4ecb-8a32-433284462996","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1109/icdar.2007.4376991","year":2007,"title":"An Overview of the Tesseract OCR Engine","work_id":"e618c86a-60fd-49c4-a85e-049b10895fce","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2020.acl-main","year":2020,"title":"S2ORC: The Semantic Scholar Open Research Corpus","work_id":"34435fff-cccc-450e-a8d7-38e0e710cc8f","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"URL https://aclanthology.org/2020.acl-main.447","work_id":"d6bd2d43-1d4e-49dd-b2d7-648ed5de02e9","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Patrice Lopez. GROBID, February 2023. URL https://github.com/kermitt2/grobid. original-date: 2012-09- 13T15:48:54Z","work_id":"67e9cc1b-5668-4f1a-bc28-8efbc6c9b418","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":54,"snapshot_sha256":"ea4ef99beca03ffe3514dc317e0e388f95e46e4da509fbc4fe664e9ee22ac16b","internal_anchors":12},"formal_canon":{"evidence_count":3,"snapshot_sha256":"ab147f9ae1e1464118b42eed6cfe35f133a23a48f52ec91bee682421cc8baf4e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}