FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.
Learning transferable visual models from natural language supervision
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2representative citing papers
AttnRouter routes edits to optimal attention operations per category on MMDiT, raising the CLIP-T+DINO-I composite score 6.4% above baseline while an automatic classifier recovers 98% of the gain.
citing papers explorer
-
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing
FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.
-
AttnRouter: Per-Category Attention Routing for Training-Free Image Editing on MMDiT
AttnRouter routes edits to optimal attention operations per category on MMDiT, raising the CLIP-T+DINO-I composite score 6.4% above baseline while an automatic classifier recovers 98% of the gain.