AGAR uses middle-to-late layer attention in VLMs to identify and enlarge important word spans in rendered text images, improving performance on visual text comprehension benchmarks.
Towards context-robust llms: A gated representation fine-tuning approach.arXiv preprint arXiv:2502.14100, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension
AGAR uses middle-to-late layer attention in VLMs to identify and enlarge important word spans in rendered text images, improving performance on visual text comprehension benchmarks.