FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.
Learn- ing to merge tokens in vision transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
CONDITIONAL 2representative citing papers
SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.
citing papers explorer
-
FastVGGT: Training-Free Acceleration of Visual Geometry Transformer
FastVGGT achieves 4x speedup on VGGT for 1000-image inputs using training-free token merging tailored to 3D architectures while reducing error accumulation.
-
Revisiting Token Compression for Accelerating ViT-based Sparse Multi-View 3D Object Detectors
SEPatch3D accelerates ViT-based 3D object detectors up to 57% faster than StreamPETR via dynamic patch sizing and cross-granularity enhancement while keeping comparable accuracy on nuScenes and Argoverse 2.