OmniParser V2 introduces SPOT prompting to create a single model for four visual text parsing tasks, reports competitive results on eight datasets, and shows the prompting works inside multimodal LLMs.
Visual understanding of complex table structures from document images,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2 introduces SPOT prompting to create a single model for four visual text parsing tasks, reports competitive results on eight datasets, and shows the prompting works inside multimodal LLMs.