MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.
Yolov10: Real-time end-to-end object detection.Advances in Neural Information Processing Systems, 37:107984–108011, 2024
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
MinerU2.5 uses a two-stage decoupled vision-language architecture to achieve state-of-the-art document parsing accuracy with lower computational overhead than existing general and domain-specific models.