TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
VISTA is a test-time adaptation framework for multi-sequence MRI that uses inter-sequence intervention probes and cross-view disagreement variance to gate self-training, yielding Dice gains of +1.89% on low-field African data and +2.82% on pediatric data over the source model.
CutMix augmentation during training induces spatial locality in early layers of Vision Transformers trained from scratch, as measured by reduced Mean Attention Distance.
CHIS steers pretrained diffusion models to generate histopathology images aligned with input structural masks via frequency-domain structural initialization and wavelet-based textural modulation without any training on annotated data.
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
LGTrack achieves 258.7 FPS real-time UAV tracking with 82.8% precision on UAVDT by combining dynamic layer selection, Global-Grouped Coordinate Attention, and Similarity-Guided Layer Adaptation.
Nonlinear transformations enable DNNs to achieve substantial test accuracy gains (0.34% to 249.59%) on unlearnable CIFAR10 datasets from twelve protection methods, outperforming a recent linear baseline.
NTGA is the first clean-label generalization attack under black-box settings but is vulnerable to adversarial training and image transformations, with newer attacks outperforming it.
citing papers explorer
-
TextTeacher: What Can Language Teach About Images?
TextTeacher uses frozen text embeddings from captions as semantic anchors to guide vision model training, improving ImageNet accuracy by up to 2.7 p.p. and transfer performance by 1.0 p.p. on average.
-
VISTA: Variance-Gated Inter-Sequence Test-Time Adaptation for Multi-Sequence MRI Segmentation
VISTA is a test-time adaptation framework for multi-sequence MRI that uses inter-sequence intervention probes and cross-view disagreement variance to gate self-training, yielding Dice gains of +1.89% on low-field African data and +2.82% on pediatric data over the source model.
-
Inducing Spatial Locality in Vision Transformers through the Training Protocol
CutMix augmentation during training induces spatial locality in early layers of Vision Transformers trained from scratch, as measured by reduced Mean Attention Distance.
-
Controllable Histopathology Image Synthesis with Training-free Structural Initialization and Textural Modulation
CHIS steers pretrained diffusion models to generate histopathology images aligned with input structural masks via frequency-domain structural initialization and wavelet-based textural modulation without any training on annotated data.
-
SignNet-1M: Large-Scale Multilingual Sign Language Video Dataset with Downstream Benchmarks
The paper releases SignNet-1M, a 1M-scale augmented dataset for ASL, CSL and DGS with 3DGS and diffusion-based variations, plus benchmarks showing improved cross-shift generalization.
-
Weak-to-Strong Knowledge Distillation Accelerates Visual Learning
Weak-to-strong knowledge distillation applied early and then turned off accelerates convergence to target performance in visual learning tasks by factors of 1.7-4.8x.
-
Layer-Guided UAV Tracking: Enhancing Efficiency and Occlusion Robustness
LGTrack achieves 258.7 FPS real-time UAV tracking with 82.8% precision on UAVDT by combining dynamic layer selection, Global-Grouped Coordinate Attention, and Similarity-Guided Layer Adaptation.
-
Nonlinear Transformations Against Unlearnable Datasets
Nonlinear transformations enable DNNs to achieve substantial test accuracy gains (0.34% to 249.59%) on unlearnable CIFAR10 datasets from twelve protection methods, outperforming a recent linear baseline.