SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.
arXiv preprint arXiv:1610.02242 , year=
8 Pith papers cite this work. Polarity classification is still indexing.
abstract
In this paper, we present a simple and efficient method for training deep neural networks in a semi-supervised setting where only a small portion of training data is labeled. We introduce self-ensembling, where we form a consensus prediction of the unknown labels using the outputs of the network-in-training on different epochs, and most importantly, under different regularization and input augmentation conditions. This ensemble prediction can be expected to be a better predictor for the unknown labels than the output of the network at the most recent training epoch, and can thus be used as a target for training. Using our method, we set new records for two standard semi-supervised learning benchmarks, reducing the (non-augmented) classification error rate from 18.44% to 7.05% in SVHN with 500 labels and from 18.63% to 16.55% in CIFAR-10 with 4000 labels, and further to 5.12% and 12.16% by enabling the standard augmentations. We additionally obtain a clear improvement in CIFAR-100 classification accuracy by using random images from the Tiny Images dataset as unlabeled extra inputs during training. Finally, we demonstrate good tolerance to incorrect labels.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
Any supervised encoder must retain sensitivity along label-correlated directions, unifying non-robust features, texture bias, corruption fragility, and the robustness-accuracy tradeoff, and this is measurable and partially repairable via a new diagnostic and training term.
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
GLCCL uses a Global-Local Interaction Module and Contrastive Score Consistency loss to align text and video semantics more efficiently than attention-based methods on MSR-VTT, DiDeMo, and VATEX.
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
PEPL refines pseudo-labels via CAM-based semantic estimation in two phases to reach state-of-the-art accuracy in semi-supervised fine-grained image classification.
Proposes Distill-2MD-MTL, an MTL-based data distillation framework for semi-supervised multi-domain face analysis tasks that claims better performance than single-task baselines.
citing papers explorer
-
Revisiting Shadow Detection from a Vision-Language Perspective
SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.
-
Supervised Learning Has a Necessary Geometric Blind Spot: Theory, Consequences, and Minimal Repair
Any supervised encoder must retain sensitivity along label-correlated directions, unifying non-robust features, texture bias, corruption fragility, and the robustness-accuracy tradeoff, and this is measurable and partially repairable via a new diagnostic and training term.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
Text-Video Retrieval With Global-Local Contrastive Consistency Learning
GLCCL uses a Global-Local Interaction Module and Contrastive Score Consistency loss to align text and video semantics more efficiently than attention-based methods on MSR-VTT, DiDeMo, and VATEX.
-
Information theoretic underpinning of self-supervised learning by clustering
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
-
ZScribbleSeg: A comprehensive segmentation framework with modeling of efficient annotation and maximization of scribble supervision
ZScribbleSeg maximizes scribble supervision with efficient annotation forms, spatial regularization, and EM-estimated class ratios to deliver competitive performance on six medical segmentation tasks without full labels.
-
PEPL: Precision-Enhanced Pseudo-Labeling for Fine-Grained Image Classification in Semi-Supervised Learning
PEPL refines pseudo-labels via CAM-based semantic estimation in two phases to reach state-of-the-art accuracy in semi-supervised fine-grained image classification.
-
Distill-2MD-MTL: Data Distillation based on Multi-Dataset Multi-Domain Multi-Task Frame Work to Solve Face Related Tasksks, Multi Task Learning, Semi-Supervised Learning
Proposes Distill-2MD-MTL, an MTL-based data distillation framework for semi-supervised multi-domain face analysis tasks that claims better performance than single-task baselines.