LayoutLM: Pre-training of Text and Layout for Document Image Under- standing

Xu Yiheng et al · 2020 · arXiv 4486.340317

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

cs.CV · 2026-05-19 · conditional · novelty 7.0

Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

cs.CL · 2026-02-02 · unverdicted · novelty 7.0

Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

Nougat: Neural Optical Understanding for Academic Documents

cs.LG · 2023-08-25 · conditional · novelty 6.0

Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

cs.AI · 2026-05-13

citing papers explorer

Showing 4 of 4 citing papers.

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding cs.CV · 2026-05-19 · conditional · none · ref 22
Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding cs.CL · 2026-02-02 · unverdicted · none · ref 101
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
Nougat: Neural Optical Understanding for Academic Documents cs.LG · 2023-08-25 · conditional · none · ref 23
Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.
Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning cs.AI · 2026-05-13 · unreviewed · ref 89

LayoutLM: Pre-training of Text and Layout for Document Image Under- standing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer