APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.
Since implementations and experiments for ViT-L and ViT-H were not provided, we extended the code to include these two model configura- tions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Accelerating Vision Transformers with Adaptive Patch Sizes
APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.