Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.
arXiv preprint arXiv:2012.14740 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
other 1polarities
unclear 1representative citing papers
DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.
DocVAL transfers spatial reasoning via validated CoT distillation from large teachers to compact student VLMs, delivering up to 6-7 ANLS gains and strong mAP localization on document VQA benchmarks.
OutSafe-Bench supplies the first large-scale four-modality safety dataset and evaluation framework that exposes persistent unsafe outputs in nine leading multimodal LLMs.
Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.
citing papers explorer
-
Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding
Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.
-
DocAtlas: Multilingual Document Understanding Across 80+ Languages
DocAtlas introduces model-free rendering pipelines to create DocTag-annotated datasets across 82 languages and shows DPO adaptation improves multilingual performance without base-language degradation.
-
DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA
DocVAL transfers spatial reasoning via validated CoT distillation from large teachers to compact student VLMs, delivering up to 6-7 ANLS gains and strong mAP localization on document VQA benchmarks.
-
OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models
OutSafe-Bench supplies the first large-scale four-modality safety dataset and evaluation framework that exposes persistent unsafe outputs in nine leading multimodal LLMs.
-
Nougat: Neural Optical Understanding for Academic Documents
Nougat applies a visual transformer to convert academic PDFs into markup language while accurately handling mathematical content on a new scientific document dataset.
-
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Survey proposing a taxonomy for document parsing into pipeline-based systems and VLM-driven unified models, reviewing components, metrics, benchmarks, and challenges.