An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models

Liang Chen, Haozhe Zhao, Tianyu Liu, Shuai Bai, Junyang Lin, Chang Zhou, Baobao Chang · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.

FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing

cs.CV · 2026-05-17 · unverdicted · novelty 5.0

FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.

citing papers explorer

Showing 2 of 2 citing papers.

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models cs.CV · 2026-05-12 · unverdicted · none · ref 5
OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.
FastOCR: Dynamic Visual Fixation via KV Cache Pruning for Efficient Document Parsing cs.CV · 2026-05-17 · unverdicted · none · ref 6
FastOCR dynamically selects a small subset of visual tokens per decoding step using focal-guided pruning and cross-step reuse, retaining 98% accuracy on Qwen2.5-VL while attending to only 5% of tokens and cutting attention latency by 3x.

An image is worth 1/2 tokens after layer 2: Plug-and-play inference acceleration for large vision-language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer