arXiv preprint arXiv:2012.14740 , year=

Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, et al · 2020 · arXiv 2012.14740

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

cs.CV · 2026-05-19 · conditional · novelty 7.0

Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.

DocAtlas: Multilingual Document Understanding Across 80+ Languages

cs.CL · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.

DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA

cs.CV · 2025-11-27 · unverdicted · novelty 6.0

DocVAL transfers spatial reasoning via validated CoT distillation from large teachers to compact student VLMs, delivering up to 6-7 ANLS gains and strong mAP localization on document VQA benchmarks.

OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models

cs.LG · 2025-11-13 · unverdicted · novelty 6.0

OutSafe-Bench supplies the first large-scale four-modality safety dataset and evaluation framework that exposes persistent unsafe outputs in nine leading multimodal LLMs.

Nougat: Neural Optical Understanding for Academic Documents

cs.LG · 2023-08-25 · conditional · novelty 6.0

Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

cs.MM · 2024-10-28 · unverdicted · novelty 3.0

Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

citing papers explorer

Showing 6 of 6 citing papers.

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding cs.CV · 2026-05-19 · conditional · none · ref 23
Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.
DocAtlas: Multilingual Document Understanding Across 80+ Languages cs.CL · 2026-05-12 · unverdicted · none · ref 34 · 2 links
DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.
DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA cs.CV · 2025-11-27 · unverdicted · none · ref 32
DocVAL transfers spatial reasoning via validated CoT distillation from large teachers to compact student VLMs, delivering up to 6-7 ANLS gains and strong mAP localization on document VQA benchmarks.
OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models cs.LG · 2025-11-13 · unverdicted · none · ref 62
OutSafe-Bench supplies the first large-scale four-modality safety dataset and evaluation framework that exposes persistent unsafe outputs in nine leading multimodal LLMs.
Nougat: Neural Optical Understanding for Academic Documents cs.LG · 2023-08-25 · conditional · none · ref 24
Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction cs.MM · 2024-10-28 · unverdicted · none · ref 271
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.

arXiv preprint arXiv:2012.14740 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer