The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.
Pp-doclayout: A unified document layout detection model to accelerate large-scale data construction.arXiv preprint arXiv:2503.17213, 2025
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6representative citing papers
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
PaddleOCR-VL-1.5 is a 0.9B VLM achieving 94.5% SOTA accuracy on OmniDocBench v1.5, with added robustness to physical distortions and support for seal recognition plus text spotting.
PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.
citing papers explorer
-
The Character Error Vector: Decomposable errors for page-level OCR evaluation
The Character Error Vector is a decomposable bag-of-characters evaluator for page-level OCR that remains defined under parsing errors and bridges parsing metrics with local CER.
-
Parser-Oriented Structural Refinement for a Stable Layout Interface in Document Parsing
A parser-oriented refinement stage performs set-level reasoning on detector hypotheses to jointly decide instance retention, refine boxes, and set parser input order, cutting reading order errors to 0.024 on OmniDocBench.
-
Boosting Document Parsing Efficiency and Performance with Coarse-to-Fine Visual Processing
PaddleOCR-VL uses a Valid Region Focus Module to select key visual tokens and a 0.9B model for guided recognition, delivering SOTA document parsing with far fewer tokens and parameters.
-
DeepSeek-OCR: Contexts Optical Compression
DeepSeek-OCR compresses text contexts up to 20x via 2D optical mapping while achieving 97% OCR accuracy below 10x and 60% at 20x, outperforming prior OCR tools with fewer vision tokens.
-
PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing
PaddleOCR-VL-1.5 is a 0.9B VLM achieving 94.5% SOTA accuracy on OmniDocBench v1.5, with added robustness to physical distortions and support for seal recognition plus text spotting.
-
PaddleOCR 3.0 Technical Report
PaddleOCR 3.0 releases compact open-source models for OCR, document structure parsing, and information extraction that rival billion-parameter VLMs.