Framing visual text compression as measure transport decomposes encoding loss into precision and coverage costs, enabling a label-free routing rule that matches oracle performance on 17 of 24 NLP datasets while using 10% fewer tokens.
Vision-centric token compression in large language model.arXiv preprint arXiv:2502.00791, 2025
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
LoMo is a lightweight data curation technique that locally substitutes text with images in prompts to enforce cross-modal invariance, yielding 2.67-2.82 point gains over standard SFT on two VLMs across 13 benchmarks.
LensVLM trains VLMs to scan compressed rendered text images and selectively expand task-relevant regions, achieving 4.3x compression with near full-text accuracy and outperforming baselines up to 10.1x on text QA benchmarks.
MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.
citing papers explorer
-
MemOCR: Layout-Aware Visual Memory for Efficient Long-Horizon Reasoning
MemOCR renders structured memory as images with adaptive visual density to improve long-horizon reasoning under tight context budgets.