Learn- ing to merge tokens in vision transformers

Cedric Renggli, Andr ´e Susano Pinto, Neil Houlsby, Basil Mustafa, Joan Puigcerver, Carlos Riquelme · 2022 · arXiv 2202.12015

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer

cs.CV · 2025-09-02 · conditional · novelty 7.0

FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.

Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors

cs.CV · 2026-04-16 · conditional · novelty 5.0

SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.

citing papers explorer

Showing 2 of 2 citing papers.

FastVGGT: Training-Free Acceleration of Visual Geometry Transformer cs.CV · 2025-09-02 · conditional · none · ref 17
FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.
Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors cs.CV · 2026-04-16 · conditional · none · ref 37
SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.

Learn- ing to merge tokens in vision transformers

fields

years

verdicts

representative citing papers

citing papers explorer