An OCR-aware multilingual framework combining synthetic data generation, LoRA SFT, and visual CoT prompting improves text extraction and translation robustness in multimodal LLMs on degraded images.
However, these approaches are often computationally expensive and difficult to deploy efficiently at scale
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Multilingual OCR-Aware Fine-Tuning and Prompt-Guided Chain-of-Thought Reasoning for Multimodal Large Language Models
An OCR-aware multilingual framework combining synthetic data generation, LoRA SFT, and visual CoT prompting improves text extraction and translation robustness in multimodal LLMs on degraded images.