Visual perception in text strings

· 2025 · arXiv 2410.01733

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

cs.CV · 2024-12-31 · accept · novelty 7.0

OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

ASCII Art Turns LLMs into VLA Controllers

cs.RO · 2026-06-19 · unverdicted · novelty 6.0

ASCII rendering of visual states enables fine-tuned text-only LLMs to serve as VLA controllers that identify objects and generate feasible action sequences in 2D manipulation benchmarks in simulation and on hardware.

citing papers explorer

Showing 1 of 1 citing paper after filters.

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning cs.CV · 2024-12-31 · accept · none · ref 134
OCRBench v2 is a new benchmark with four times more tasks than prior versions that reveals most large multimodal models score below 50 out of 100 on visual text tasks and share five specific weaknesses.

Visual perception in text strings

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer