arxiv: 2105.04906 · v3 · submitted 2021-05-11 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Adrien Bardes , Jean Ponce , Yann LeCun

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:25 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG

keywords self-supervised learningrepresentation learningvariance regularizationcovariance regularizationembedding collapseimage representationsVICReg

0 comments

The pith

VICReg prevents collapse to constant embeddings in self-supervised learning by adding an explicit variance term per dimension plus covariance regularization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VICReg to address the collapse problem where self-supervised encoders output constant vectors with no useful information. It does so through a loss that forces each embedding dimension to maintain variance above a threshold, while an invariance term aligns representations of different views of the same image and a covariance term reduces redundancy across dimensions. A reader would care because this supplies a simple, explicit alternative to the hidden architectural biases used in prior methods. The approach matches state-of-the-art accuracy on downstream image tasks and improves stability when added to other self-supervised techniques.

Core claim

VICReg combines a variance regularization term that keeps the standard deviation of each embedding dimension above a fixed threshold, an invariance term that makes embeddings from different augmented views similar, and a covariance term that decorrelates dimensions, thereby avoiding the trivial constant solution and achieving results on par with the state of the art on several downstream tasks.

What carries the argument

The variance regularization term, which penalizes any embedding dimension whose standard deviation falls below a chosen threshold, combined with covariance regularization for decorrelation.

If this is right

The method matches state-of-the-art performance on multiple downstream image tasks.
Adding the variance term to other self-supervised methods stabilizes their training dynamics.
The same term produces measurable accuracy gains when inserted into existing approaches.
Collapse is avoided through explicit loss terms rather than implicit architectural constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The explicit variance threshold may simplify hyperparameter search when moving to new embedding dimensionalities.
The approach opens a path to apply similar per-dimension constraints in self-supervised settings beyond images.
Because the mechanism is loss-based, it could be combined with architectural changes to further reduce reliance on negative samples.

Load-bearing premise

Enforcing per-dimension variance above a fixed threshold together with covariance regularization is enough to stop collapse for many architectures and datasets without creating new instabilities.

What would settle it

Training an encoder with the VICReg loss and observing that multiple embedding dimensions still have near-zero variance or that downstream accuracy falls below the reported baselines.

read the original abstract

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The paper introduces VICReg, a self-supervised learning method for visual representations that explicitly prevents collapse via three regularization terms: a variance term that penalizes per-dimension standard deviations below a fixed threshold, an invariance term based on MSE between embeddings of different views of the same image, and a covariance term that reduces redundancy by penalizing off-diagonal covariances. The approach is evaluated on ImageNet pretraining with linear evaluation and transfer tasks using standard ResNet backbones, achieving results competitive with prior state-of-the-art methods; ablations further show that the variance term can be added to existing methods (e.g., SimCLR, Barlow Twins) to stabilize training and improve performance.

Significance. If the empirical results hold, the work is significant for providing an explicit, architecture-agnostic regularization mechanism to avoid collapse in SSL, moving beyond reliance on implicit biases such as stop-gradients or asymmetric predictors. The ability to retrofit the variance term into other frameworks for measurable gains adds practical value, and the consistent performance across standard benchmarks with reproducible components strengthens its contribution to the field.

minor comments (4)

[Abstract] Abstract: the claim of results 'on par with the state of the art' would be strengthened by briefly naming the closest baselines (e.g., Barlow Twins, SimCLR) and the specific metric gaps on ImageNet linear evaluation.
[§3.2] §3.2, Eq. (3): the variance loss uses a fixed threshold of 1; while treated as a hyperparameter, a short sensitivity plot or statement on robustness across datasets would clarify whether this choice is broadly stable.
[Table 1] Table 1 and §4.3: reporting standard deviations over multiple runs (or at least noting the number of seeds) for the main results would allow readers to assess the reliability of the reported improvements when the variance term is added to other methods.
[§4.1] §4.1: the description of the covariance regularization could explicitly state whether the normalization matches the standard sample covariance or a modified form, to aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We appreciate the recognition that VICReg provides an explicit, architecture-agnostic mechanism to avoid collapse and that the variance term can be usefully retrofitted to other methods. No specific major comments were raised in the report, so we will focus on minor revisions for clarity and presentation.

Circularity Check

0 steps flagged

No significant circularity: explicit statistical regularizers evaluated on external tasks

full rationale

The VICReg loss is constructed directly from three explicit terms: invariance (MSE between view embeddings), variance (hinge on per-dimension std above fixed threshold γ), and covariance (sum of squared off-diagonal correlations). None of these terms is obtained by fitting a parameter to a subset of the target data and then relabeling the fit as a prediction. Downstream linear evaluation and transfer results are measured on standard ImageNet splits and other datasets using fixed ResNet backbones; they constitute independent empirical evidence rather than a tautological consequence of the training objective. No self-citation chain is invoked to justify uniqueness or to forbid alternatives; the method is presented as a straightforward combination of known statistical penalties. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on the standard SSL invariance assumption plus the new variance non-collapse condition; hyperparameters for the three loss weights are fitted on validation data.

free parameters (1)

lambda, mu, nu
Weights balancing the variance, invariance, and covariance terms in the total loss; chosen via validation sweeps.

axioms (2)

domain assumption Different views of the same image should produce similar embeddings (invariance).
Core premise of view-based self-supervised learning invoked in the loss definition.
domain assumption Non-zero variance per embedding dimension prevents collapse to constant vectors.
Introduced in the paper to justify the variance term.

pith-pipeline@v0.9.0 · 5443 in / 1358 out tokens · 35796 ms · 2026-05-15T15:25:02.646360+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith.Cost.JcostCore Jcost_pos_of_ne_one echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

a term that maintains the variance of each embedding dimension above a threshold... a term that decorrelates each pair of variables
IndisputableMonolith.Foundation.DAlembert.Inevitability bilinear_family_forced echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization
IndisputableMonolith.Foundation.LogicAsFunctionalEquation RCL_is_unique_functional_form_of_logic refines

?

refines
Relation between the paper passage and the cited Recognition theorem.

the composite value is determined by a finite pairwise polynomial algebra

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Normalizing Trajectory Models
cs.CV 2026-05 unverdicted novelty 7.0

NTM uses per-step conditional normalizing flows plus a trajectory-wide predictor to achieve exact-likelihood 4-step sampling that matches or exceeds baselines on text-to-image tasks.
Normalizing Trajectory Models
cs.CV 2026-05 unverdicted novelty 7.0

NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
cs.LG 2026-05 unverdicted novelty 7.0

PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token co...
Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAs
cs.LG 2026-04 unverdicted novelty 7.0

Action-conditioned JEPA models treat pathology as a transition vector on latent states to simulate cardiac dynamics, outperforming supervised learning by over 0.05 AUROC in low-resource regimes on MIMIC-IV-ECG.
Coevolving Representations in Joint Image-Feature Diffusion
cs.CV 2026-04 unverdicted novelty 7.0

CoReDi coevolves semantic representations with the diffusion model via a jointly learned linear projection stabilized by stop-gradient, normalization, and regularization, yielding faster convergence and higher sample ...
Predictive but Not Plannable: RC-aux for Latent World Models
cs.LG 2026-05 unverdicted novelty 6.0

RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Understanding Self-Supervised Learning via Latent Distribution Matching
cs.LG 2026-05 unverdicted novelty 6.0

Self-supervised learning is cast as latent distribution matching that aligns representations to a model while enforcing uniformity, unifying multiple SSL families and proving identifiability for predictive variants ev...
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
cs.LG 2026-04 unverdicted novelty 6.0

DNNs mitigate dimensional collapse of embeddings in feature interaction models, shown via parallel and stacked experiments plus gradient analysis.
Monitoring Neural Training with Topology: A Footprint-Predictable Collapse Index
cs.LG 2026-04 unverdicted novelty 6.0

A composite Collapse Index based on incremental discrete Morse homology provides low-latency early warning of representational collapse during neural network training.
Self-Supervised Representation Learning via Hyperspherical Density Shaping
cs.CV 2026-04 unverdicted novelty 6.0

HyDeS introduces hyperspherical density shaping with a von Mises-Fisher estimator to create theoretically grounded self-supervised representations that focus on foreground features.
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
cs.IR 2026-04 unverdicted novelty 6.0

RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
HSG: Hyperbolic Scene Graph
cs.CV 2026-04 unverdicted novelty 6.0

Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
Hierarchical Planning with Latent World Models
cs.LG 2026-04 unverdicted novelty 6.0

Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
Rapidly deploying on-device eye tracking by distilling visual foundation models
cs.CV 2026-04 unverdicted novelty 6.0

DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction
cs.LG 2026-03 unverdicted novelty 6.0

Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
Revisiting Feature Prediction for Learning Visual Representations from Video
cs.CV 2024-02 conditional novelty 6.0

V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
Vision Transformers Need Registers
cs.CV 2023-09 unverdicted novelty 6.0

Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization
cs.LG 2026-05 unverdicted novelty 5.0

MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC be...
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
cs.LG 2026-04 unverdicted novelty 5.0

CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
cs.IR 2026-04 unverdicted novelty 5.0

RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.
The Global Neural World Model: Spatially Grounded Discrete Topologies for Action-Conditioned Planning
cs.LG 2026-04 unverdicted novelty 4.0

GNWM maps environments to a discrete 2D grid with snapping to stabilize autoregressive planning and learns generalized dynamics from maximum-entropy random walks.
Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard
stat.AP 2026-02 unverdicted novelty 4.0

On 4080 German deceased donors, an ensemble ML model reached MCC 0.76 for kidney discard prediction, with standardized preprocessing and feature selection proving more important than the specific algorithm chosen.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 20 Pith papers · 5 internal anchors

[1]

Self-labelling via simultaneous clustering and representation learning

Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. In ICLR, 2020

work page 2020
[2]

Learning representations by maximizing mutual information across views

Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In NeurIPS, 2019

work page 2019
[3]

Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer

Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer. Cliquecnn: Deep unsupervised exemplar learning. In NeurIPS, 2016

work page 2016
[4]

Signature verification using a “siamese” time delay neural network

Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Sackinger, and Roopak Shah. Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994

work page 1994
[5]

Deep clustering for unsupervised learning

Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning. In ECCV, 2018

work page 2018
[6]

Unsupervised pre-training of image features on non-curated data

Mathilde Caron, Piotr Bojanowski, Julien Mairal, and Armand Joulin. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019

work page 2019
[7]

Unsupervised learning of visual features by contrasting cluster assignments

Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020

work page 2020
[8]

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020 a

work page 2020
[9]

Big self-supervised models are strong semi-supervised learners

Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. Big self-supervised models are strong semi-supervised learners. In NeurIPS, 2020 b

work page 2020
[10]

Exploring simple siamese representation learning

Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In CVPR, 2020

work page 2020
[12]

Learning a similarity metric discriminatively, with application to face verification

Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005

work page 2005
[13]

Sinkhorn distances: Lightspeed computation of optimal transport

Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS, 2013

work page 2013
[14]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009

work page 2009
[15]

An image is worth 16x16 words: Transformers for image recognition at scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021

work page 2021
[16]

Whitening for self-supervised representation learning, 2021

Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. Whitening for self-supervised representation learning, 2021

work page 2021
[17]

Mark Everingham, Luc Van Gool, John Winn Christopher K. I. Williams, and Andrew Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010

work page 2010
[18]

Fleet, Jamie Ryan Kiros, and Sanja Fidler

Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. Vse++: Improving visual-semantic embeddings with hard negatives. In BMVC, 2018

work page 2018
[19]

Liblinear: A library for large linear classification

Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 2008

work page 2008
[20]

Learning representations by predicting bags of visual words

Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, and Matthieu Cord. Learning representations by predicting bags of visual words. In CVPR, 2020

work page 2020
[21]

Online bag-of-visual-words generation for unsupervised representation learning

Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, and Patrick Pérez. Online bag-of-visual-words generation for unsupervised representation learning. In CVPR, 2021

work page 2021
[23]

Priya Goyal, Quentin Duval, Jeremy Reizenstein, Matthew Leavitt, Min Xu, Benjamin Lefaudeux, Mannat Singh, Vinicius Reis, Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Ishan Misra. Vissl. https://github.com/facebookresearch/vissl, 2021

work page 2021
[24]

Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko

Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020

work page 2020
[25]

Dimensionality reduction by learning an invariant mapping

Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006

work page 2006
[26]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

work page 2016
[27]

Mask r-cnn

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In ICCV, 2017

work page 2017
[28]

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020

work page 2020
[29]

Distilling the knowledge in a neural network

Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015

work page 2015
[30]

Learning deep representations by mutual information estimation and maximization

R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In ICLR, 2019

work page 2019
[31]

The inaturalist species classification and detection dataset

Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In CVPR, 2018

work page 2018
[32]

Unsupervised deep learning by neighbourhood discovery

Jiabo Huang, Qi Dong andShaogang Gong, and Xiatian Zhu. Unsupervised deep learning by neighbourhood discovery. In ICML, 2019

work page 2019
[33]

Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S

Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, and Aäron van den Oord. Data-efficient image recognition with contrastive predictive coding. In ICML, 2019

work page 2019
[34]

Batch normalization: Accelerating deep network training by reducing internal covariate shift

Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015

work page 2015
[35]

Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. In ICLR, 2021

work page 2021
[36]

Lawrence Zitnick, and Piotr Dollár

Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context. In ECCV, 2014

work page 2014
[37]

Sgdr: stochastic gradient descent with warm restarts

Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with warm restarts. In ICLR, 2017

work page 2017
[38]

Self-supervised learning of pretext-invariant representations

Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In CVPR, 2020

work page 2020
[39]

Karol J. Piczak. ESC : Dataset for Environmental Sound Classification . In Proceedings of the 23rd Annual ACM Conference on Multimedia , 2015

work page 2015
[40]

Faster r-cnn: Towards real-time object detection with region proposal networks

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015

work page 2015
[43]

What makes for good views for contrastive learning

Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning. In NeurIPS, 2020

work page 2020
[46]

Detectron2

Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github.com/facebookresearch/detectron2, 2019

work page 2019
[47]

Unsupervised feature learning via non-parametric instance discrimination

Zhirong Wu, Yuanjun Xiong, Stella Yu, , and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018

work page 2018
[48]

Unsupervised deep embedding for clustering analysis

Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In ICML, 2016

work page 2016
[49]

Aggregated residual transformations for deep neural networks

Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017

work page 2017
[50]

Clusterfit: Improving generalization of visual representations

Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, and Dhruv Mahajan. Clusterfit: Improving generalization of visual representations. In CVPR, 2020

work page 2020
[51]

Joint unsupervised learning of deep representations and image clusters

Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint unsupervised learning of deep representations and image clusters. In CVPR, 2016

work page 2016
[52]

Unsupervised embedding learning via invariant and spreading instance feature

Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In CVPR, 2019

work page 2019
[56]

Learning deep features for scene recognition using places database

Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. Learning deep features for scene recognition using places database. In NeurIPS, 2014

work page 2014
[57]

Local aggregation for unsupervised learning of visual embeddings

Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins. Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019

work page 2019
[58]

2016 , booktitle =

He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. 2016 , booktitle =

work page 2016
[59]

Wide Residual Networks

Sergey Zagoruyko and Nikos Komodakis , title =. arXiv preprint arXiv:1605.07146 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[60]

2015 , booktitle =

Distilling the Knowledge in a Neural Network , author =. 2015 , booktitle =

work page 2015
[61]

CVPR , year =

Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He , title =. CVPR , year =

work page
[62]

CVPR , year=

Jia Deng and Wei Dong and Richard Socher and Li-Jia Li and Kai Li and Li Fei-Fei , title=. CVPR , year=

work page
[63]

, booktitle =

Piczak, Karol J. , booktitle =

work page
[64]

Hinton , title=

Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , title=. ICML , year=

work page
[65]

CVPR , year=

Xinlei Chen and Kaiming He , title=. CVPR , year=

work page
[66]

NeurIPS , year=

What makes for good views for contrastive learning , author=. NeurIPS , year=

work page
[67]

NeurIPS , year=

Bootstrap your own latent: A new approach to self-supervised Learning , author=. NeurIPS , year=

work page
[68]

NeurIPS , year=

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , author=. NeurIPS , year=

work page
[69]

CVPR , year =

Misra, Ishan and Maaten, Laurens van der , title =. CVPR , year =

work page
[70]

CVPR , year =

He, Kaiming and Fan, Haoqi and Wu, Yuxin and Xie, Saining and Girshick, Ross , title =. CVPR , year =

work page
[71]

NeurIPS , year =

Bachman, Philip and Hjelm, R Devon and Buchwalter, William , title =. NeurIPS , year =

work page
[72]

NeurIPS , year =

Jane Bromley and Isabelle Guyon and Yann LeCun and Eduard Sackinger and Roopak Shah , title =. NeurIPS , year =

work page
[73]

ECCV , year =

Mathilde Caron and Piotr Bojanowski and Armand Joulin and Matthijs Douze , title =. ECCV , year =

work page
[74]

ICCV , year =

Pulkit Agrawal and Joao Carreira and Jitendra Malik , title =. ICCV , year =

work page
[75]

Efros , title =

Carl Doersch and Abhinav Gupta and Alexei A. Efros , title =. ICCV , year =

work page
[76]

CVPR , year =

Simon Jenni and Paolo Favaro , title =. CVPR , year =

work page
[77]

WACV , year =

Dahun Kim and Donghyeon Cho and Donggeun Yoo and In So Kweon , title =. WACV , year =

work page
[78]

ECCV , year =

Gustav Larsson and Michael Maire and Gregory Shakhnarovich , title =. ECCV , year =

work page
[79]

ECCV , year =

Shuffle and Learn: Unsupervised Learning using Temporal Order Verification , author =. ECCV , year =

work page
[80]

CVPR , year =

Deepak Pathak and Ross Girshick and Piotr Dollár and Trevor Darrell and Bharath Hariharan , title =. CVPR , year =

work page
[81]

Efros , title =

Deepak Pathak and Philipp Krahenbuhl and Jeff Donahue and Trevor Darrell and Alexei A. Efros , title =. CVPR , year =

work page
[82]

ICCV , year =

Xiaolong Wang and Abhinav Gupta , title =. ICCV , year =

work page
[83]

ICCV , year =

Xiaolong Wang and Kaiming He and Abhinav Gupta , title =. ICCV , year =

work page
[84]

Efros , title =

Richard Zhang and Phillip Isola and Alexei A. Efros , title =. CVPR , year =

work page
[85]

Efros , title =

Richard Zhang and Phillip Isola and Alexei A. Efros , title =. ECCV , year =

work page
[86]

arXiv preprint arXiv:1906.05849v4 , year =

Yonglong Tian and Dilip Krishnan and and Phillip Isola , title =. arXiv preprint arXiv:1906.05849v4 , year =

work page arXiv 1906
[87]

NeurIPS , year =

Bolei Zhou and Agata Lapedriza and Jianxiong Xiao and Antonio Torralba and Aude Oliva , title =. NeurIPS , year =

work page
[88]

Mark Everingham and Luc Van Gool and Christopher K. I. Williams, John Winn and Andrew Zisserman , title =. IJCV , year =

work page
[89]

CVPR , year =

Grant Van Horn and Oisin Mac Aodha and Yang Song and Yin Cui and Chen Sun and Alex Shepard and Hartwig Adam and Pietro Perona and Serge Belongie , title =. CVPR , year =

work page

Showing first 80 references.