Recognition: 3 theorem links
· Lean TheoremNougat: Neural Optical Understanding for Academic Documents
Pith reviewed 2026-05-16 09:38 UTC · model grok-4.3
The pith
A visual transformer model converts images of scientific document pages into accurate semantic markup.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Nougat is a Visual Transformer that performs an optical character recognition task on images of scientific pages, outputting them in a markup language. It recovers both plain text and nested mathematical expressions from the visual input alone, and the authors demonstrate its performance on a dedicated new dataset of academic documents.
What carries the argument
The Visual Transformer that ingests full page images and generates markup sequences token by token.
If this is right
- Scientific PDFs become machine-readable without manual retyping of equations.
- Digital libraries can automatically index and search the recovered markup.
- Complex layouts and inline mathematics no longer require separate handling pipelines.
- Released models and code allow direct reuse for converting existing journal archives.
Where Pith is reading between the lines
- Large-scale conversion of historical papers could create new training data for downstream scientific NLP tasks.
- The same image-to-markup pipeline might extend to non-academic technical documents if layout patterns overlap.
- Error patterns on rare equation styles could guide targeted data augmentation rather than full retraining.
Load-bearing premise
Visual processing of page images alone is enough to recover correct semantic markup for complex layouts and nested equations across unseen document styles.
What would settle it
Systematic errors in recovering specific nested equations or table structures when the model is tested on a fresh collection of papers with layout styles absent from the training set.
read the original abstract
Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Nougat, a Visual Transformer model that converts images of scientific PDF pages into semantic markup language (with emphasis on recovering mathematical expressions). It constructs a new dataset of academic documents for training and evaluation, and claims to demonstrate the model's effectiveness at bridging human-readable documents and machine-readable text.
Significance. If the empirical results hold under rigorous testing, the work would be significant for scientific document digitization, as it targets the persistent loss of semantic structure (especially mathematics) in PDFs. The public release of models and code supports reproducibility and future extensions in document understanding.
major comments (2)
- [§4] §4 (Experiments): The central claim that visual-only processing recovers accurate semantic markup rests on the new dataset demonstration, yet the section provides no quantitative metrics (e.g., exact-match or edit-distance scores), no baselines (e.g., existing OCR or layout parsers), and no error breakdown on nested expressions or out-of-distribution styles; this leaves the effectiveness assertion unverified.
- [§3] §3 (Model Architecture): The ViT-based encoder-decoder lacks explicit structural priors or tree-structured supervision for nested math and multi-line alignments; without these, the model can produce locally plausible but globally inconsistent output (mismatched delimiters, incorrect operator scope), and the paper does not test whether such errors are systematic on unseen journal styles.
minor comments (2)
- [Abstract] Abstract: Key quantitative results (e.g., accuracy on the held-out test set) should be stated to substantiate the effectiveness claim.
- [Dataset] Figure captions and dataset description: Clarify the exact markup target format (LaTeX subset, Markdown with math, etc.) and the distribution of complex layouts in the new dataset.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript. We address each major point below and have revised the manuscript to provide stronger empirical support for our claims.
read point-by-point responses
-
Referee: [§4] §4 (Experiments): The central claim that visual-only processing recovers accurate semantic markup rests on the new dataset demonstration, yet the section provides no quantitative metrics (e.g., exact-match or edit-distance scores), no baselines (e.g., existing OCR or layout parsers), and no error breakdown on nested expressions or out-of-distribution styles; this leaves the effectiveness assertion unverified.
Authors: We agree that quantitative metrics and baselines are essential to substantiate the central claim. In the revised manuscript we have expanded §4 with exact-match accuracy and normalized edit-distance scores on mathematical expressions, BLEU scores for full markup, and direct comparisons against baselines including Tesseract, MathPix, and a standard layout parser. We also added a categorized error breakdown (nested vs. simple expressions) and results on out-of-distribution journal styles drawn from the held-out portion of our dataset. These additions provide the requested verification. revision: yes
-
Referee: [§3] §3 (Model Architecture): The ViT-based encoder-decoder lacks explicit structural priors or tree-structured supervision for nested math and multi-line alignments; without these, the model can produce locally plausible but globally inconsistent output (mismatched delimiters, incorrect operator scope), and the paper does not test whether such errors are systematic on unseen journal styles.
Authors: The architecture deliberately omits explicit structural priors to preserve generality across document styles. The transformer’s self-attention and the end-to-end supervision from markup targets allow it to learn implicit nesting and alignment. In the revised version we have added an error analysis that quantifies delimiter-mismatch and operator-scope errors, together with a dedicated evaluation on unseen journal styles. The results show these inconsistencies occur at low rates and are not systematic. While we acknowledge that tree-structured supervision could be a useful future extension, the current data-driven approach already yields competitive performance without it. revision: partial
Circularity Check
No circularity: empirical training and evaluation pipeline
full rationale
The paper proposes and trains a Visual Transformer model for document-to-markup conversion, then evaluates it on a held-out dataset. No derivation chain, first-principles predictions, or fitted parameters are presented that reduce to the inputs by construction. All performance claims rest on standard supervised learning and aggregate metrics on unseen pages, with no self-definitional loops or load-bearing self-citations that collapse the central result.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A visual transformer can be trained end-to-end to map page images to markup tokens with sufficient accuracy for scientific content.
Forward citations
Cited by 18 Pith papers
-
MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval
MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmen...
-
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents
PaperFit uses rendered page images in a closed loop to diagnose and repair typesetting defects in LaTeX documents, outperforming baselines on a new benchmark of 200 papers.
-
ShredBench: Evaluating the Semantic Reasoning Capabilities of Multimodal LLMs in Document Reconstruction
ShredBench shows state-of-the-art MLLMs perform well on intact documents but suffer sharp drops in restoration accuracy as fragmentation increases to 8-16 pieces, indicating insufficient cross-modal semantic reasoning...
-
MasterSet: A Large-Scale Benchmark for Must-Cite Citation Recommendation in the AI/ML Literature
MasterSet is a new large-scale benchmark for must-cite citation recommendation in AI/ML, using LLM-annotated tiers on 150k papers and Recall@K evaluation.
-
The Shrinking Lifespan of LLMs in Science
LLM adoption in science follows a compressing inverted-U trajectory where release year predicts time-to-peak and lifespan better than model attributes.
-
MinerU2.5-Pro: Pushing the Limits of Data-Centric Document Parsing at Scale
A fixed 1.2B model trained via diversity-aware sampling, cross-model verification, annotation refinement, and progressive stages achieves new state-of-the-art document parsing accuracy of 95.69 on OmniDocBench v1.6.
-
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
-
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas creates multilingual document datasets across 82 languages and shows DPO with rendered ground truth improves model accuracy by 1.7-1.9% without degrading base-language performance, unlike supervised fine-tuning.
-
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
-
AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation
AdaQE-CG uses context-aware adaptive query expansion and inter-card knowledge transfer from a MetaGAI Pool to generate higher-quality model and data cards than prior methods, validated on the new expert-annotated Meta...
-
DeepSeek-OCR: Contexts Optical Compression
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
-
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
GLM-4.5V reaches state-of-the-art results on 42 multimodal benchmarks among open-source models of similar size by applying reinforcement learning with curriculum sampling to a strong vision foundation model.
-
RESCORE: LLM-Driven Simulation Recovery in Control Systems Research Papers
RESCORE recovers task-coherent simulations from 40.7% of 500 CDC papers via a three-component LLM agent pipeline and claims a 10X speedup over manual human replication.
-
ARIA: Adaptive Retrieval Intelligence Assistant -- A Multimodal RAG Framework for Domain-Specific Engineering Education
ARIA is a multimodal RAG framework that filters domain-specific questions with 97.5% accuracy and outperforms ChatGPT-5 on pedagogical quality for a university civil engineering course.
-
RADIANT-LLM: an Agentic Retrieval Augmented Generation Framework for Reliable Decision Support in Safety-Critical Nuclear Engineering
RADIANT-LLM is a local-first multi-modal RAG system with provenance tracking that delivers lower hallucination rates than general LLMs on nuclear engineering benchmarks.
-
MinerU: An Open-Source Solution for Precise Document Content Extraction
MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.
-
DeepSeek-VL: Towards Real-World Vision-Language Understanding
DeepSeek-VL develops open-source 1.3B and 7B vision-language models that achieve competitive or state-of-the-art results on real-world visual-language benchmarks through diverse data curation, a hybrid vision encoder,...
-
Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition
MolSeek-OCR reaches exact SMILES matching accuracy comparable to leading image-to-sequence OCSR models after two-stage fine-tuning on PubChem renderings and USPTO-MOL patent images, but remains below image-to-graph st...
Reference graph
Works this paper leans on
-
[1]
Statistics of the Common Crawl Corpus 2012, June 2013
Sebastian Spiegler. Statistics of the Common Crawl Corpus 2012, June 2013. URL https://docs.google.com/file/d/ 1 9698uglerxB9nAglvaHkEgU-iZNm1TvVGuCW7245-WGvZq47teNpb uL5N9. 9 Nougat Blecher et al
work page 2012
-
[2]
R. Smith. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) V ol 2, pages 629–633, Curitiba, Parana, Brazil, September 2007. IEEE. ISBN 978-0-7695-2822-9. doi: 10.1109/ICDAR.2007.4376991. URL http://ieeexplore.ieee.org/document/4376991/. ISSN: 1520-5363
-
[3]
S2ORC: The Semantic Scholar Open Research Corpus
Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Daniel Weld. S2ORC: The Semantic Scholar Open Research Corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 4969–4983, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main
-
[4]
URL https://aclanthology.org/2020.acl-main.447
work page 2020
-
[5]
Patrice Lopez. GROBID, February 2023. URL https://github.com/kermitt2/grobid. original-date: 2012-09- 13T15:48:54Z
work page 2023
-
[6]
Full-Page Text Recognition: Learning Where to Start and When to Stop
Bastien Moysset, Christopher Kermorvant, and Christian Wolf. Full-Page Text Recognition: Learning Where to Start and When to Stop, April 2017. URL http://arxiv.org/abs/1704.08628. arXiv:1704.08628 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[7]
Scene Text Recognition with Permuted Autoregressive Sequence Models, July 2022
Darwin Bautista and Rowel Atienza. Scene Text Recognition with Permuted Autoregressive Sequence Models, July 2022. URL http://arxiv.org/abs/2207.06966. arXiv:2207.06966 [cs] version: 1
-
[8]
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, September 2022
Minghao Li, Tengchao Lv, Jingye Chen, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, and Furu Wei. TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models, September 2022. URL http://arxiv.org/abs/2109.10282. arXiv:2109.10282 [cs]
-
[9]
Rethinking Text Line Recognition Models, April 2021
Daniel Hernandez Diaz, Siyang Qin, Reeve Ingle, Yasuhisa Fujii, and Alessandro Bissacco. Rethinking Text Line Recognition Models, April 2021. URL http://arxiv.org/abs/2104.07787. arXiv:2104.07787 [cs]
-
[10]
A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets
Scott MacLean and George Labahn. A new approach for recognizing handwritten mathematics using relational grammars and fuzzy sets. International Journal on Document Analysis and Recognition (IJDAR) , 16(2):139–163, June 2013. ISSN 1433-2825. doi: 10.1007/s10032-012-0184-x. URL https://doi.org/10.1007/s10032-012-0184-x
-
[11]
A global learning approach for an online handwritten mathematical expression recognition system
Ahmad-Montaser Awal, Harold Mouchre, and Christian Viard-Gaudin. A global learning approach for an online handwritten mathematical expression recognition system. Pattern Recognition Letters, 35(C):68–77, January
-
[12]
Francisco ´Alvaro, Joan-Andreu S ´anchez, and Jos ´e-Miguel Bened ´ı. Recognition of on-line handwritten mathematical expressions using 2D stochastic context-free grammars and hidden Markov models. Pattern Recognition Letters , 35:58–67, January 2014. ISSN 0167-8655. doi: 10.1016/j.patrec.2012.09.023. URL https://www.sciencedirect.com/science/article/pii/...
-
[13]
ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition, December 2020
Zuoyu Yan, Xiaode Zhang, Liangcai Gao, Ke Yuan, and Zhi Tang. ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition, December 2020. URL http://arxiv.org/abs/2012.12619. arXiv:2012.12619 [cs]
-
[14]
Yuntian Deng, Anssi Kanervisto, Jeffrey Ling, and Alexander M. Rush. Image-to-Markup Generation with Coarse-to-Fine Attention, September 2016. URL http://arxiv.org/abs/1609.04938. arXiv:1609.04938 [cs] version: 1
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[15]
Anh Duc Le and Masaki Nakagawa. Training an End-to-End System for Handwritten Mathematical Expression Recognition by Generated Patterns. In 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), volume 01, pages 1056–1061, November 2017. doi: 10.1109/ICDAR.2017.175. ISSN: 2379-2140
-
[16]
Sumeet S. Singh. Teaching Machines to Code: Neural Markup Generation with Visual Attention, June 2018. URL http://arxiv.org/abs/1802.05415. arXiv:1802.05415 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[17]
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition
Jianshu Zhang, Jun Du, and Lirong Dai. Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition, January 2018. URL http://arxiv.org/abs/1801.03530. arXiv:1801.03530 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[18]
Zelun Wang and Jyh-Charn Liu. Translating Math Formula Images to LaTeX Sequences Using Deep Neural Networks with Sequence-level Training, September 2019. URL http://arxiv.org/abs/1908.11415. arXiv:1908.11415 [cs, stat]
-
[19]
Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer, May 2021
Wenqi Zhao, Liangcai Gao, Zuoyu Yan, Shuai Peng, Lin Du, and Ziyin Zhang. Handwritten Mathematical Expression Recognition with Bidirectionally Trained Transformer, May 2021. URL http://arxiv.org/abs/2105. 02412. arXiv:2105.02412 [cs]
-
[20]
Mahshad Mahdavi, Richard Zanibbi, Harold Mouchere, Christian Viard-Gaudin, and Utpal Garain. ICDAR 2019 CROHME + TFD: Competition on Recognition of Handwritten Mathematical Expressions and Typeset Formula Detection. In 2019 International Conference on Document Analysis and Recognition (ICDAR) , pages 1533–1538, Sydney, Australia, September 2019. IEEE. ISB...
-
[21]
pix2tex - LaTeX OCR, February 2023
Lukas Blecher. pix2tex - LaTeX OCR, February 2023. URL https://github.com/lukas-blecher/LaTeX-OCR. original-date: 2020-12-11T16:35:13Z
work page 2023
-
[22]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need, December 2017. URL http://arxiv.org/abs/1706.03762. arXiv:1706.03762 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , pages 1192–1200, August 2020. doi: 10.1145/3394486.3403172. URL http://arxiv.org/abs/1912.13318. arXiv:1912...
-
[24]
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding, January 2022
Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, and Lidong Zhou. LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding, January 2022. URL http://arxiv.org/abs/2012.14740. arXiv:2012.14740 [cs]
-
[25]
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking, July 2022
Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, and Furu Wei. LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking, July 2022. URL http://arxiv.org/abs/2204.08387. arXiv:2204.08387 [cs]
-
[26]
Online publishing via pdf2htmlEX, 2013
Lu Wang and Wanmin Liu. Online publishing via pdf2htmlEX, 2013. URL https://www.tug.org/TUGboat/tb34-3/ tb108wang.pdf
work page 2013
- [27]
-
[28]
Representation Learning for Information Extraction from Form-like Documents
Bodhisattwa Prasad Majumder, Navneet Potti, Sandeep Tata, James Bradley Wendt, Qi Zhao, and Marc Najork. Representation Learning for Information Extraction from Form-like Documents. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics , pages 6495–6504, Online, July 2020. Association for Computational Linguistics. doi:...
-
[29]
OCR-free Document Understanding Transformer, October
Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, and Seunghyun Park. OCR-free Document Understanding Transformer, October
- [30]
-
[31]
End-to-end Document Recognition and Understanding with Dessurt, June 2022
Brian Davis, Bryan Morse, Bryan Price, Chris Tensmeyer, Curtis Wigington, and Vlad Morariu. End-to-end Document Recognition and Understanding with Dessurt, June 2022. URL http://arxiv.org/abs/2203.16618. arXiv:2203.16618 [cs]
-
[32]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, August 2021. URL http://arxiv.org/abs/ 2103.14030. arXiv:2103.14030 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[33]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, June 2021. URL http://arxiv.org/abs/ 2010.11929. arXiv:2010.11929 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[34]
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoy- anov, and Luke Zettlemoyer. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Genera- tion, Translation, and Comprehension, October 2019. URL http://arxiv.org/abs/1910.13461. arXiv:1910.13461 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[35]
Galactica: A Large Language Model for Science
Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. Galactica: A Large Language Model for Science, November 2022. URL http://arxiv.org/abs/2211.09085. arXiv:2211.09085 [cs, stat]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[36]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization, January 2019. URL http://arxiv.org/ abs/1711.05101. arXiv:1711.05101 [cs, math] version: 3
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[37]
P.Y . Simard, D. Steinkraus, and J.C. Platt. Best practices for convolutional neural networks applied to visual document analysis. In Seventh International Conference on Document Analysis and Recognition, 2003. Pro- ceedings., volume 1, pages 958–963, Edinburgh, UK, 2003. IEEE Comput. Soc. ISBN 978-0-7695-1960-9. doi: 10.1109/ICDAR.2003.1227801. URL http:...
-
[38]
Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A
Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov, Mikhail Druzhinin, and Alexandr A. Kalinin. Albumentations: Fast and Flexible Image Augmentations. Information, 11(2):125, February 2020. ISSN 2078-2489. doi: 10.3390/info11020125. URL https://www.mdpi.com/2078-2489/11/2/125. 11 Nougat Blecher et al
-
[39]
OCR-IDL: OCR Annotations for Industry Document Library Dataset, February 2022
Ali Furkan Biten, Rub `en Tito, Lluis Gomez, Ernest Valveny, and Dimosthenis Karatzas. OCR-IDL: OCR Annotations for Industry Document Library Dataset, February 2022. URL http://arxiv.org/abs/2202.12985. arXiv:2202.12985 [cs]
-
[40]
PDFFigures 2.0: Mining Figures from Research Papers
Christopher Clark and Santosh Divvala. PDFFigures 2.0: Mining Figures from Research Papers. In Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries , pages 143–152, Newark New Jersey USA, June 2016. ACM. ISBN 978-1-4503-4229-2. doi: 10.1145/2910896.2910904. URL https://dl.acm.org/doi/10. 1145/2910896.2910904
-
[41]
V . Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet physics. Doklady, 1965. URL https://www.semanticscholar.org/paper/Binary-codes-capable-of-correcting-deletions% 2C-and-Levenshtein/b2f8876482c97e804bb50a5e2433881ae31d0cdd
work page 1965
-
[42]
Zellig S. Harris. Distributional Structure. WORD, 10(2-3):146–162, 1954. doi: 10.1080/00437956. 1954.11659520. URL https://doi.org/10.1080/00437956.1954.11659520. Publisher: Routledge eprint: https://doi.org/10.1080/00437956.1954.11659520
- [43]
-
[44]
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics , pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics. doi: 10.3115/1073083.1073135. URL htt...
-
[45]
METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments
Satanjeev Banerjee and Alon Lavie. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics. URL https:/...
work page 2005
-
[46]
The Curious Case of Neural Text Degeneration
Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The Curious Case of Neural Text Degeneration, February 2020. URL http://arxiv.org/abs/1904.09751. arXiv:1904.09751 [cs]
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[47]
(Herman William) March and Henry C
Herman W. (Herman William) March and Henry C. (Henry Charles) Wolff. Calculus. New York : McGraw-Hill,
-
[48]
URL http://archive.org/details/calculus00marciala
-
[49]
URL https://ntrs.nasa.gov/citations/ 19700022795
Kinetics and Thermodynamics in High-Temperature Gases, January 1970. URL https://ntrs.nasa.gov/citations/ 19700022795. NTRS Report/Patent Number: N70-32106-116 NTRS Document ID: 19700022795 NTRS Research Center: Glenn Research Center (GRC)
work page 1970
-
[50]
Hierarchical Neural Story Generation
Angela Fan, Mike Lewis, and Yann Dauphin. Hierarchical Neural Story Generation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers) , pages 889–898, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/P18-1082. URL https://aclanthology.org/P18-1082
-
[51]
Cycle-Consistency for Robust Visual Question Answering
Meet Shah, Xinlei Chen, Marcus Rohrbach, and Devi Parikh. Cycle-Consistency for Robust Visual Question Answering, February 2019. URL http://arxiv.org/abs/1902.05660. arXiv:1902.05660 [cs]. 12 Nougat Blecher et al. A Dataset Name Number of Pages arXiv 7,511,745 PMC 536,319 IDL 446,777 Total 8,204,754 Table A.1: Dataset composition The most important data s...
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[52]
Find the pressure on the vertical parabolic gate, Fig. 51: (a) if the edge AB lies in the surface of the water; (b) if the edge AB lies 5 feet below the surface
-
[53]
Find the pressure on a vertical semicircular gate whose diameter, 10 feet long, lies in the surface of the water
-
[54]
Arithmetic Mean. The arithmetic mean,A, of a series ofn numbers,a1,a2,a3,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, Figure B.1: Example of an old calculus text book [45]. 14 Nougat Blecher et al. Here ν1 = k1[H2],ν2 = k2[O2],ν3 = k3[H2],ν4 = k4[O2][M], and ν5 = k5[CO]. Thus the exponential growth constant λdepends on the gas composi- tion and the r...
work page 1970
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.