APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.
arXiv preprint arXiv:2110.09408
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2representative citing papers
DPL-ReID adds dual prompt learning, real-world occlusion augmentation, and weighted gated fusion to CLIP for state-of-the-art occluded person re-identification on benchmark datasets.
citing papers explorer
-
Accelerating Vision Transformers with Adaptive Patch Sizes
APT adaptively varies patch sizes within a single image to reduce ViT token count, delivering 40-50% throughput gains on large models with no downstream performance loss.
-
Dual-Prompt CLIP with Hybrid Visual Encoders for Occluded Person Re-Identification
DPL-ReID adds dual prompt learning, real-world occlusion augmentation, and weighted gated fusion to CLIP for state-of-the-art occluded person re-identification on benchmark datasets.