OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.
N \” uwa: Mending the spatial integrity torn by vlm token pruning.arXiv preprint arXiv:2602.02951
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.
citing papers explorer
-
OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models
OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.
-
RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference
RTPrune introduces a reading-twice inspired two-stage pruning technique for DeepSeek-OCR that retains 84.25% tokens while delivering 99.47% accuracy and 1.23x faster prefill on OmniDocBench.