Token merging: Your ViT but faster

Daniel Bolya, Cheng-Yang Fu, Xiaoliang Dai, Peizhao Zhang, Christoph Feichtenhofer, Judy Hoffman · 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

DORA uses an online RL agent to adaptively merge tokens in Vision Transformers, reporting better accuracy-efficiency trade-offs than static baselines on ImageNet and OOD sets.

LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

LRCP prunes visual tokens in LVLMs by scoring projection residuals onto a PCA-estimated low-rank subspace, achieving 88.9% image token reduction with 94.7% performance retention and 87.5% video reduction with 97.8% accuracy retention.

ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

ST-Prune is a training-free spatio-temporal token pruning framework for VLMs in autonomous driving that achieves near-lossless results at 90% token reduction by exploiting motion volatility, temporal recency, and multi-view geometry.

citing papers explorer

Showing 3 of 3 citing papers.

DORA: Dynamic Online Reinforcement Agent for Token Merging in Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 4
DORA uses an online RL agent to adaptively merge tokens in Vision Transformers, reporting better accuracy-efficiency trade-offs than static baselines on ImageNet and OOD sets.
LRCP: Low-Rank Compressibility Guided Visual Token Pruning for Efficient LVLMs cs.CV · 2026-05-15 · unverdicted · none · ref 49
LRCP prunes visual tokens in LVLMs by scoring projection residuals onto a PCA-estimated low-rank subspace, achieving 88.9% image token reduction with 94.7% performance retention and 87.5% video reduction with 97.8% accuracy retention.
ST-Prune: Training-Free Spatio-Temporal Token Pruning for Vision-Language Models in Autonomous Driving cs.CV · 2026-04-21 · unverdicted · none · ref 15
ST-Prune is a training-free spatio-temporal token pruning framework for VLMs in autonomous driving that achieves near-lossless results at 90% token reduction by exploiting motion volatility, temporal recency, and multi-view geometry.

Token merging: Your ViT but faster

fields

years

verdicts

representative citing papers

citing papers explorer