Large-scale experiments demonstrate that data-aware augmentations applied only during training allow fine-grained image models to reach high accuracy without using discriminative crops at inference, lowering costs.
Visual attention network
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EVT improves Vision Transformers by using Euclidean distance decay for spatial priors and simpler grouping, achieving 86.6% top-1 accuracy on ImageNet-1k.
citing papers explorer
-
A Large-Scale Study on the Accuracy vs Cost Trade-offs of Training and Evaluation Settings in Fine-Grained Image Recognition
Large-scale experiments demonstrate that data-aware augmentations applied only during training allow fine-grained image models to reach high accuracy without using discriminative crops at inference, lowering costs.
-
Advancing Vision Transformer with Enhanced Spatial Priors
EVT improves Vision Transformers by using Euclidean distance decay for spatial priors and simpler grouping, achieving 86.6% top-1 accuracy on ImageNet-1k.