hub

Unsupervised Representation Learning by Predicting Image Rotations

Gidaris, S · 2018 · cs.CV · arXiv 1803.07728

17 Pith papers cite this work. Polarity classification is still indexing.

17 Pith papers citing it

open full Pith review browse 17 citing papers arXiv PDF

abstract

Over the last years, deep convolutional neural networks (ConvNets) have transformed the field of computer vision thanks to their unparalleled capacity to learn high level semantic image features. However, in order to successfully learn those features, they usually require massive amounts of manually labeled data, which is both expensive and impractical to scale. Therefore, unsupervised semantic feature learning, i.e., learning without requiring manual annotation effort, is of crucial importance in order to successfully harvest the vast amount of visual data that are available today. In our work we propose to learn image features by training ConvNets to recognize the 2d rotation that is applied to the image that it gets as input. We demonstrate both qualitatively and quantitatively that this apparently simple task actually provides a very powerful supervisory signal for semantic feature learning. We exhaustively evaluate our method in various unsupervised feature learning benchmarks and we exhibit in all of them state-of-the-art performance. Specifically, our results on those benchmarks demonstrate dramatic improvements w.r.t. prior state-of-the-art approaches in unsupervised representation learning and thus significantly close the gap with supervised feature learning. For instance, in PASCAL VOC 2007 detection task our unsupervised pre-trained AlexNet model achieves the state-of-the-art (among unsupervised methods) mAP of 54.4% that is only 2.4 points lower from the supervised case. We get similarly striking results when we transfer our unsupervised learned features on various other tasks, such as ImageNet classification, PASCAL classification, PASCAL segmentation, and CIFAR-10 classification. The code and models of our paper will be published on: https://github.com/gidariss/FeatureLearningRotNet .

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

REMAP: Regularized Matching and Partial Alignment of Video Embeddings

cs.CV · 2025-09-29 · unverdicted · novelty 7.0

REMAP applies regularized fused partial Gromov-Wasserstein optimal transport to align video embeddings for unsupervised procedure learning on noisy instructional videos.

A Simple Framework for Contrastive Learning of Visual Representations

cs.LG · 2020-02-13 · accept · novelty 7.0

SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.

Multi-hop Relational Contrastive Learning: Extending Spatial Contrastive Pre-training Beyond Pairwise Relations

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

MRCL extends pairwise spatial contrastive pre-training to multi-hop paths in scene graphs, yielding NDCG@5 = 0.748 on GQA graph retrieval and gains on spatial recognition and QA tasks.

MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation

eess.SP · 2026-05-02 · unverdicted · novelty 6.0 · 2 refs

MU-SHOT-Fi is a source-free UDA framework for multi-user WiFi HAR using permutation-invariant set prediction, occupancy-weighted information maximization, and binary rotation prediction to handle domain shifts.

gen2seg: Generative Models Enable Generalizable Instance Segmentation

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.

Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection

cs.CV · 2024-11-23 · unverdicted · novelty 6.0

Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.

Vector-quantized Image Modeling with Improved VQGAN

cs.CV · 2021-10-09 · accept · novelty 6.0

Improved ViT-VQGAN enables autoregressive Transformer pretraining on ImageNet tokens to reach IS 175.1 and FID 4.17 for generation plus 73.2% linear-probe accuracy, beating prior iGPT models.

Multi-task Self-Supervised Learning for Human Activity Detection

cs.LG · 2019-07-27 · unverdicted · novelty 6.0

A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.

Self-supervised pretraining for an iterative image size agnostic vision transformer

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.

Probing Intrinsic Medical Task Relationships: A Contrastive Learning Perspective

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

TaCo contrastively embeds semantic, generative, and transformation tasks from medical imaging into a joint space to reveal which tasks cluster, blend, or remain distinct.

A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

The paper introduces a unified formulation for representation learning with task and constraint components, arguing for mutual benefits between causal and traditional approaches and showing via experiments that causal constraint effectiveness depends on paired tasks.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

On the Power of Foundation Models

cs.AI · 2022-11-29 · unverdicted · novelty 5.0

Category theory proves prompt-based learning on perfect foundation models works only for representable tasks, fine-tuning solves tasks in the pretext category, and models can represent unseen target-category objects using source-category structure.

Information theoretic underpinning of self-supervised learning by clustering

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

MAE-SAM2: Mask Autoencoder-Enhanced SAM2 for Clinical Retinal Vascular Leakage Segmentation

q-bio.TO · 2025-09-09 · unverdicted · novelty 4.0

MAE-SAM2 integrates MAE self-supervised learning with SAM2 to achieve superior segmentation of retinal vascular leakage on fluorescein angiography images, with highest Dice/IoU scores and 5% improvement over original SAM2.

Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning

eess.IV · 2019-07-25 · unverdicted · novelty 4.0

A 3DFPN with self-supervised pretraining and HS2 false-positive reduction using location history images reaches 90.6% sensitivity at 0.125 FP/scan on LUNA16, claimed 15.8% above prior results.

From pre-training to downstream performance: Does domain-specific pre-training make sense?

cs.CV · 2026-05-09 · unverdicted · novelty 4.0

Pre-training on modality-matched data significantly improves downstream performance in medical imaging models while self-supervised learning benefits depend on context.

citing papers explorer

Showing 17 of 17 citing papers.

REMAP: Regularized Matching and Partial Alignment of Video Embeddings cs.CV · 2025-09-29 · unverdicted · none · ref 7 · internal anchor
REMAP applies regularized fused partial Gromov-Wasserstein optimal transport to align video embeddings for unsupervised procedure learning on noisy instructional videos.
A Simple Framework for Contrastive Learning of Visual Representations cs.LG · 2020-02-13 · accept · none · ref 19
SimCLR learns visual representations by contrasting augmented views of the same image and reaches 76.5% ImageNet top-1 accuracy with a linear classifier, matching a supervised ResNet-50.
Multi-hop Relational Contrastive Learning: Extending Spatial Contrastive Pre-training Beyond Pairwise Relations cs.CV · 2026-05-15 · unverdicted · none · ref 5 · internal anchor
MRCL extends pairwise spatial contrastive pre-training to multi-hop paths in scene graphs, yielding NDCG@5 = 0.748 on GQA graph retrieval and gains on spatial recognition and QA tasks.
MU-SHOT-Fi: Self-Supervised Multi-User Wi-Fi Sensing with Source-free Unsupervised Domain Adaptation eess.SP · 2026-05-02 · unverdicted · none · ref 38 · 2 links · internal anchor
MU-SHOT-Fi is a source-free UDA framework for multi-user WiFi HAR using permutation-invariant set prediction, occupancy-weighted information maximization, and binary rotation prediction to handle domain shifts.
gen2seg: Generative Models Enable Generalizable Instance Segmentation cs.CV · 2025-05-21 · unverdicted · none · ref 10 · internal anchor
Finetuning generative models on limited instance segmentation data produces zero-shot generalization to unseen object categories and styles, matching or exceeding supervised baselines like SAM on ambiguous boundaries.
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection cs.CV · 2024-11-23 · unverdicted · none · ref 205 · internal anchor
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.
Vector-quantized Image Modeling with Improved VQGAN cs.CV · 2021-10-09 · accept · none · ref 28 · internal anchor
Improved ViT-VQGAN enables autoregressive Transformer pretraining on ImageNet tokens to reach IS 175.1 and FID 4.17 for generation plus 73.2% linear-probe accuracy, beating prior iGPT models.
Multi-task Self-Supervised Learning for Human Activity Detection cs.LG · 2019-07-27 · unverdicted · none · ref 18 · internal anchor
A multi-task self-supervised approach trains a temporal CNN to detect transformations on sensory data, yielding features that match or exceed fully supervised performance in semi-supervised and transfer settings for smartphone-based HAR.
Self-supervised pretraining for an iterative image size agnostic vision transformer cs.CV · 2026-04-22 · unverdicted · none · ref 22
A sequential-to-global SSL method based on DINO pretrains iterative foveal-inspired vision transformers to achieve competitive ImageNet-1K performance with constant compute regardless of input resolution.
Probing Intrinsic Medical Task Relationships: A Contrastive Learning Perspective cs.CV · 2026-04-07 · unverdicted · none · ref 30
TaCo contrastively embeds semantic, generative, and transformation tasks from medical imaging into a joint space to reveal which tasks cluster, blend, or remain distinct.
A Dialogue between Causal and Traditional Representation Learning: Toward Mutual Benefits in a Unified Formulation cs.LG · 2026-05-20 · unverdicted · none · ref 2 · internal anchor
The paper introduces a unified formulation for representation learning with task and constraint components, arguing for mutual benefits between causal and traditional approaches and showing via experiments that causal constraint effectiveness depends on paired tasks.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 12 · 2 links · internal anchor
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
On the Power of Foundation Models cs.AI · 2022-11-29 · unverdicted · none · ref 29 · internal anchor
Category theory proves prompt-based learning on perfect foundation models works only for representable tasks, fine-tuning solves tasks in the pretext category, and models can represent unseen target-category objects using source-category structure.
Information theoretic underpinning of self-supervised learning by clustering cs.LG · 2026-05-12 · unverdicted · none · ref 19
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
MAE-SAM2: Mask Autoencoder-Enhanced SAM2 for Clinical Retinal Vascular Leakage Segmentation q-bio.TO · 2025-09-09 · unverdicted · none · ref 27 · internal anchor
MAE-SAM2 integrates MAE self-supervised learning with SAM2 to achieve superior segmentation of retinal vascular leakage on fluorescein angiography images, with highest Dice/IoU scores and 5% improvement over original SAM2.
Accurate and Robust Pulmonary Nodule Detection by 3D Feature Pyramid Network with Self-supervised Feature Learning eess.IV · 2019-07-25 · unverdicted · none · ref 4 · internal anchor
A 3DFPN with self-supervised pretraining and HS2 false-positive reduction using location history images reaches 90.6% sensitivity at 0.125 FP/scan on LUNA16, claimed 15.8% above prior results.
From pre-training to downstream performance: Does domain-specific pre-training make sense? cs.CV · 2026-05-09 · unverdicted · none · ref 12
Pre-training on modality-matched data significantly improves downstream performance in medical imaging models while self-supervised learning benefits depend on context.

Unsupervised Representation Learning by Predicting Image Rotations

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer