MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.
Image-to-markup generation with coarse-to-fine attention
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.
citing papers explorer
-
MPDocBench-Parse: Benchmarking Practical Multi-page Document Parsing
MPDocBench-Parse provides 433 annotated multi-page documents and an evaluation protocol covering text/table/formula extraction, merging, figure extraction, reading order, and heading hierarchy for realistic document parsing.
-
Scientific Graphics Program Synthesis via Dual Self-Consistency Reinforcement Learning
SciTikZer-8B uses a new dataset, benchmark, and dual self-consistency RL to generate TikZ code for scientific graphics, outperforming much larger models like Gemini-2.5-Pro.