Unified spatiotemporal token compression for video-llms at ultra-low retention

Junhao Du, Jialong Xue, Anqi Li, Jincheng Dai, Guo Lu · 2026 · arXiv 2603.21957

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.

citing papers explorer

Showing 1 of 1 citing paper.

OTT-Vid: Optimal Transport Temporal Token Compression for Video Large Language Models cs.CV · 2026-05-12 · unverdicted · none · ref 10
OTT-Vid uses optimal transport with non-uniform token mass and locality-aware costs to dynamically allocate compression budgets across video frames, retaining 95.8% VQA and 73.9% VTG performance at 10% token retention.

Unified spatiotemporal token compression for video-llms at ultra-low retention

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer