Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
What is wrong with scene text recognition model comparison s? dataset and model analysis
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
The RRC-MLT-2019 report describes an expanded multi-lingual scene text challenge with new tasks, a 20k-image real dataset, synthetic data, and competition outcomes from 60 submissions.
citing papers explorer
-
CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.
-
ICDAR2019 Robust Reading Challenge on Multi-lingual Scene Text Detection and Recognition -- RRC-MLT-2019
The RRC-MLT-2019 report describes an expanded multi-lingual scene text challenge with new tasks, a 20k-image real dataset, synthetic data, and competition outcomes from 60 submissions.