pith. machine review for the scientific record. sign in

arxiv: 2105.04906 · v3 · submitted 2021-05-11 · 💻 cs.CV · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:25 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords self-supervised learningrepresentation learningvariance regularizationcovariance regularizationembedding collapseimage representationsVICReg
0
0 comments X

The pith

VICReg prevents collapse to constant embeddings in self-supervised learning by adding an explicit variance term per dimension plus covariance regularization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces VICReg to address the collapse problem where self-supervised encoders output constant vectors with no useful information. It does so through a loss that forces each embedding dimension to maintain variance above a threshold, while an invariance term aligns representations of different views of the same image and a covariance term reduces redundancy across dimensions. A reader would care because this supplies a simple, explicit alternative to the hidden architectural biases used in prior methods. The approach matches state-of-the-art accuracy on downstream image tasks and improves stability when added to other self-supervised techniques.

Core claim

VICReg combines a variance regularization term that keeps the standard deviation of each embedding dimension above a fixed threshold, an invariance term that makes embeddings from different augmented views similar, and a covariance term that decorrelates dimensions, thereby avoiding the trivial constant solution and achieving results on par with the state of the art on several downstream tasks.

What carries the argument

The variance regularization term, which penalizes any embedding dimension whose standard deviation falls below a chosen threshold, combined with covariance regularization for decorrelation.

If this is right

  • The method matches state-of-the-art performance on multiple downstream image tasks.
  • Adding the variance term to other self-supervised methods stabilizes their training dynamics.
  • The same term produces measurable accuracy gains when inserted into existing approaches.
  • Collapse is avoided through explicit loss terms rather than implicit architectural constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The explicit variance threshold may simplify hyperparameter search when moving to new embedding dimensionalities.
  • The approach opens a path to apply similar per-dimension constraints in self-supervised settings beyond images.
  • Because the mechanism is loss-based, it could be combined with architectural changes to further reduce reliance on negative samples.

Load-bearing premise

Enforcing per-dimension variance above a fixed threshold together with covariance regularization is enough to stop collapse for many architectures and datasets without creating new instabilities.

What would settle it

Training an encoder with the VICReg loss and observing that multiple embedding dimensions still have near-zero variance or that downstream accuracy falls below the reported baselines.

read the original abstract

Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 4 minor

Summary. The paper introduces VICReg, a self-supervised learning method for visual representations that explicitly prevents collapse via three regularization terms: a variance term that penalizes per-dimension standard deviations below a fixed threshold, an invariance term based on MSE between embeddings of different views of the same image, and a covariance term that reduces redundancy by penalizing off-diagonal covariances. The approach is evaluated on ImageNet pretraining with linear evaluation and transfer tasks using standard ResNet backbones, achieving results competitive with prior state-of-the-art methods; ablations further show that the variance term can be added to existing methods (e.g., SimCLR, Barlow Twins) to stabilize training and improve performance.

Significance. If the empirical results hold, the work is significant for providing an explicit, architecture-agnostic regularization mechanism to avoid collapse in SSL, moving beyond reliance on implicit biases such as stop-gradients or asymmetric predictors. The ability to retrofit the variance term into other frameworks for measurable gains adds practical value, and the consistent performance across standard benchmarks with reproducible components strengthens its contribution to the field.

minor comments (4)
  1. [Abstract] Abstract: the claim of results 'on par with the state of the art' would be strengthened by briefly naming the closest baselines (e.g., Barlow Twins, SimCLR) and the specific metric gaps on ImageNet linear evaluation.
  2. [§3.2] §3.2, Eq. (3): the variance loss uses a fixed threshold of 1; while treated as a hyperparameter, a short sensitivity plot or statement on robustness across datasets would clarify whether this choice is broadly stable.
  3. [Table 1] Table 1 and §4.3: reporting standard deviations over multiple runs (or at least noting the number of seeds) for the main results would allow readers to assess the reliability of the reported improvements when the variance term is added to other methods.
  4. [§4.1] §4.1: the description of the covariance regularization could explicitly state whether the normalization matches the standard sample covariance or a modified form, to aid reproducibility.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We appreciate the recognition that VICReg provides an explicit, architecture-agnostic mechanism to avoid collapse and that the variance term can be usefully retrofitted to other methods. No specific major comments were raised in the report, so we will focus on minor revisions for clarity and presentation.

Circularity Check

0 steps flagged

No significant circularity: explicit statistical regularizers evaluated on external tasks

full rationale

The VICReg loss is constructed directly from three explicit terms: invariance (MSE between view embeddings), variance (hinge on per-dimension std above fixed threshold γ), and covariance (sum of squared off-diagonal correlations). None of these terms is obtained by fitting a parameter to a subset of the target data and then relabeling the fit as a prediction. Downstream linear evaluation and transfer results are measured on standard ImageNet splits and other datasets using fixed ResNet backbones; they constitute independent empirical evidence rather than a tautological consequence of the training objective. No self-citation chain is invoked to justify uniqueness or to forbid alternatives; the method is presented as a straightforward combination of known statistical penalties. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on the standard SSL invariance assumption plus the new variance non-collapse condition; hyperparameters for the three loss weights are fitted on validation data.

free parameters (1)
  • lambda, mu, nu
    Weights balancing the variance, invariance, and covariance terms in the total loss; chosen via validation sweeps.
axioms (2)
  • domain assumption Different views of the same image should produce similar embeddings (invariance).
    Core premise of view-based self-supervised learning invoked in the loss definition.
  • domain assumption Non-zero variance per embedding dimension prevents collapse to constant vectors.
    Introduced in the paper to justify the variance term.

pith-pipeline@v0.9.0 · 5443 in / 1358 out tokens · 35796 ms · 2026-05-15T15:25:02.646360+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Normalizing Trajectory Models

    cs.CV 2026-05 unverdicted novelty 7.0

    NTM uses per-step conditional normalizing flows plus a trajectory-wide predictor to achieve exact-likelihood 4-step sampling that matches or exceeds baselines on text-to-image tasks.

  2. Normalizing Trajectory Models

    cs.CV 2026-05 unverdicted novelty 7.0

    NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.

  3. PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

    cs.LG 2026-05 unverdicted novelty 7.0

    PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token co...

  4. Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAs

    cs.LG 2026-04 unverdicted novelty 7.0

    Action-conditioned JEPA models treat pathology as a transition vector on latent states to simulate cardiac dynamics, outperforming supervised learning by over 0.05 AUROC in low-resource regimes on MIMIC-IV-ECG.

  5. Coevolving Representations in Joint Image-Feature Diffusion

    cs.CV 2026-04 unverdicted novelty 7.0

    CoReDi coevolves semantic representations with the diffusion model via a jointly learned linear projection stabilized by stop-gradient, normalization, and regularization, yielding faster convergence and higher sample ...

  6. Predictive but Not Plannable: RC-aux for Latent World Models

    cs.LG 2026-05 unverdicted novelty 6.0

    RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.

  7. Understanding Self-Supervised Learning via Latent Distribution Matching

    cs.LG 2026-05 unverdicted novelty 6.0

    Self-supervised learning is cast as latent distribution matching that aligns representations to a model while enforcing uniformity, unifying multiple SSL families and proving identifiability for predictive variants ev...

  8. Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective

    cs.LG 2026-04 unverdicted novelty 6.0

    DNNs mitigate dimensional collapse of embeddings in feature interaction models, shown via parallel and stacked experiments plus gradient analysis.

  9. Monitoring Neural Training with Topology: A Footprint-Predictable Collapse Index

    cs.LG 2026-04 unverdicted novelty 6.0

    A composite Collapse Index based on incremental discrete Morse homology provides low-latency early warning of representational collapse during neural network training.

  10. Self-Supervised Representation Learning via Hyperspherical Density Shaping

    cs.CV 2026-04 unverdicted novelty 6.0

    HyDeS introduces hyperspherical density shaping with a von Mises-Fisher estimator to create theoretically grounded self-supervised representations that focus on foreground features.

  11. RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems

    cs.IR 2026-04 unverdicted novelty 6.0

    RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.

  12. HSG: Hyperbolic Scene Graph

    cs.CV 2026-04 unverdicted novelty 6.0

    Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.

  13. Hierarchical Planning with Latent World Models

    cs.LG 2026-04 unverdicted novelty 6.0

    Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.

  14. Rapidly deploying on-device eye tracking by distilling visual foundation models

    cs.CV 2026-04 unverdicted novelty 6.0

    DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.

  15. Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

    cs.LG 2026-03 unverdicted novelty 6.0

    Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.

  16. Revisiting Feature Prediction for Learning Visual Representations from Video

    cs.CV 2024-02 conditional novelty 6.0

    V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.

  17. Vision Transformers Need Registers

    cs.CV 2023-09 unverdicted novelty 6.0

    Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.

  18. MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization

    cs.LG 2026-05 unverdicted novelty 5.0

    MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC be...

  19. Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective

    cs.LG 2026-04 unverdicted novelty 5.0

    CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...

  20. RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems

    cs.IR 2026-04 unverdicted novelty 5.0

    RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.

  21. The Global Neural World Model: Spatially Grounded Discrete Topologies for Action-Conditioned Planning

    cs.LG 2026-04 unverdicted novelty 4.0

    GNWM maps environments to a discrete 2D grid with snapping to stabilize autoregressive planning and learns generalized dynamics from maximum-entropy random walks.

  22. Comparative Evaluation of Machine Learning Models for Predicting Donor Kidney Discard

    stat.AP 2026-02 unverdicted novelty 4.0

    On 4080 German deceased donors, an ensemble ML model reached MCC 0.76 for kidney discard prediction, with standardized preprocessing and feature selection proving more important than the specific algorithm chosen.

Reference graph

Works this paper leans on

119 extracted references · 119 canonical work pages · cited by 20 Pith papers · 5 internal anchors

  1. [1]

    Self-labelling via simultaneous clustering and representation learning

    Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. In ICLR, 2020

  2. [2]

    Learning representations by maximizing mutual information across views

    Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In NeurIPS, 2019

  3. [3]

    Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer

    Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer. Cliquecnn: Deep unsupervised exemplar learning. In NeurIPS, 2016

  4. [4]

    Signature verification using a “siamese” time delay neural network

    Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Sackinger, and Roopak Shah. Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994

  5. [5]

    Deep clustering for unsupervised learning

    Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning. In ECCV, 2018

  6. [6]

    Unsupervised pre-training of image features on non-curated data

    Mathilde Caron, Piotr Bojanowski, Julien Mairal, and Armand Joulin. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019

  7. [7]

    Unsupervised learning of visual features by contrasting cluster assignments

    Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020

  8. [8]

    Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020 a

  9. [9]

    Big self-supervised models are strong semi-supervised learners

    Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. Big self-supervised models are strong semi-supervised learners. In NeurIPS, 2020 b

  10. [10]

    Exploring simple siamese representation learning

    Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In CVPR, 2020

  11. [12]

    Learning a similarity metric discriminatively, with application to face verification

    Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005

  12. [13]

    Sinkhorn distances: Lightspeed computation of optimal transport

    Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS, 2013

  13. [14]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009

  14. [15]

    An image is worth 16x16 words: Transformers for image recognition at scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021

  15. [16]

    Whitening for self-supervised representation learning, 2021

    Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. Whitening for self-supervised representation learning, 2021

  16. [17]

    Mark Everingham, Luc Van Gool, John Winn Christopher K. I. Williams, and Andrew Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010

  17. [18]

    Fleet, Jamie Ryan Kiros, and Sanja Fidler

    Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. Vse++: Improving visual-semantic embeddings with hard negatives. In BMVC, 2018

  18. [19]

    Liblinear: A library for large linear classification

    Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 2008

  19. [20]

    Learning representations by predicting bags of visual words

    Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, and Matthieu Cord. Learning representations by predicting bags of visual words. In CVPR, 2020

  20. [21]

    Online bag-of-visual-words generation for unsupervised representation learning

    Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, and Patrick Pérez. Online bag-of-visual-words generation for unsupervised representation learning. In CVPR, 2021

  21. [23]

    Priya Goyal, Quentin Duval, Jeremy Reizenstein, Matthew Leavitt, Min Xu, Benjamin Lefaudeux, Mannat Singh, Vinicius Reis, Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Ishan Misra. Vissl. https://github.com/facebookresearch/vissl, 2021

  22. [24]

    Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko

    Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020

  23. [25]

    Dimensionality reduction by learning an invariant mapping

    Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006

  24. [26]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016

  25. [27]

    Mask r-cnn

    Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In ICCV, 2017

  26. [28]

    Momentum contrast for unsupervised visual representation learning

    Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020

  27. [29]

    Distilling the knowledge in a neural network

    Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015

  28. [30]

    Learning deep representations by mutual information estimation and maximization

    R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In ICLR, 2019

  29. [31]

    The inaturalist species classification and detection dataset

    Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In CVPR, 2018

  30. [32]

    Unsupervised deep learning by neighbourhood discovery

    Jiabo Huang, Qi Dong andShaogang Gong, and Xiatian Zhu. Unsupervised deep learning by neighbourhood discovery. In ICML, 2019

  31. [33]

    Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S

    Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, and Aäron van den Oord. Data-efficient image recognition with contrastive predictive coding. In ICML, 2019

  32. [34]

    Batch normalization: Accelerating deep network training by reducing internal covariate shift

    Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015

  33. [35]

    Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. In ICLR, 2021

  34. [36]

    Lawrence Zitnick, and Piotr Dollár

    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context. In ECCV, 2014

  35. [37]

    Sgdr: stochastic gradient descent with warm restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with warm restarts. In ICLR, 2017

  36. [38]

    Self-supervised learning of pretext-invariant representations

    Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In CVPR, 2020

  37. [39]

    Karol J. Piczak. ESC : Dataset for Environmental Sound Classification . In Proceedings of the 23rd Annual ACM Conference on Multimedia , 2015

  38. [40]

    Faster r-cnn: Towards real-time object detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015

  39. [43]

    What makes for good views for contrastive learning

    Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning. In NeurIPS, 2020

  40. [46]

    Detectron2

    Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github.com/facebookresearch/detectron2, 2019

  41. [47]

    Unsupervised feature learning via non-parametric instance discrimination

    Zhirong Wu, Yuanjun Xiong, Stella Yu, , and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018

  42. [48]

    Unsupervised deep embedding for clustering analysis

    Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In ICML, 2016

  43. [49]

    Aggregated residual transformations for deep neural networks

    Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017

  44. [50]

    Clusterfit: Improving generalization of visual representations

    Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, and Dhruv Mahajan. Clusterfit: Improving generalization of visual representations. In CVPR, 2020

  45. [51]

    Joint unsupervised learning of deep representations and image clusters

    Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint unsupervised learning of deep representations and image clusters. In CVPR, 2016

  46. [52]

    Unsupervised embedding learning via invariant and spreading instance feature

    Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In CVPR, 2019

  47. [56]

    Learning deep features for scene recognition using places database

    Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. Learning deep features for scene recognition using places database. In NeurIPS, 2014

  48. [57]

    Local aggregation for unsupervised learning of visual embeddings

    Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins. Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019

  49. [58]

    2016 , booktitle =

    He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. 2016 , booktitle =

  50. [59]

    Wide Residual Networks

    Sergey Zagoruyko and Nikos Komodakis , title =. arXiv preprint arXiv:1605.07146 , year =

  51. [60]

    2015 , booktitle =

    Distilling the Knowledge in a Neural Network , author =. 2015 , booktitle =

  52. [61]

    CVPR , year =

    Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He , title =. CVPR , year =

  53. [62]

    CVPR , year=

    Jia Deng and Wei Dong and Richard Socher and Li-Jia Li and Kai Li and Li Fei-Fei , title=. CVPR , year=

  54. [63]

    , booktitle =

    Piczak, Karol J. , booktitle =

  55. [64]

    Hinton , title=

    Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , title=. ICML , year=

  56. [65]

    CVPR , year=

    Xinlei Chen and Kaiming He , title=. CVPR , year=

  57. [66]

    NeurIPS , year=

    What makes for good views for contrastive learning , author=. NeurIPS , year=

  58. [67]

    NeurIPS , year=

    Bootstrap your own latent: A new approach to self-supervised Learning , author=. NeurIPS , year=

  59. [68]

    NeurIPS , year=

    Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , author=. NeurIPS , year=

  60. [69]

    CVPR , year =

    Misra, Ishan and Maaten, Laurens van der , title =. CVPR , year =

  61. [70]

    CVPR , year =

    He, Kaiming and Fan, Haoqi and Wu, Yuxin and Xie, Saining and Girshick, Ross , title =. CVPR , year =

  62. [71]

    NeurIPS , year =

    Bachman, Philip and Hjelm, R Devon and Buchwalter, William , title =. NeurIPS , year =

  63. [72]

    NeurIPS , year =

    Jane Bromley and Isabelle Guyon and Yann LeCun and Eduard Sackinger and Roopak Shah , title =. NeurIPS , year =

  64. [73]

    ECCV , year =

    Mathilde Caron and Piotr Bojanowski and Armand Joulin and Matthijs Douze , title =. ECCV , year =

  65. [74]

    ICCV , year =

    Pulkit Agrawal and Joao Carreira and Jitendra Malik , title =. ICCV , year =

  66. [75]

    Efros , title =

    Carl Doersch and Abhinav Gupta and Alexei A. Efros , title =. ICCV , year =

  67. [76]

    CVPR , year =

    Simon Jenni and Paolo Favaro , title =. CVPR , year =

  68. [77]

    WACV , year =

    Dahun Kim and Donghyeon Cho and Donggeun Yoo and In So Kweon , title =. WACV , year =

  69. [78]

    ECCV , year =

    Gustav Larsson and Michael Maire and Gregory Shakhnarovich , title =. ECCV , year =

  70. [79]

    ECCV , year =

    Shuffle and Learn: Unsupervised Learning using Temporal Order Verification , author =. ECCV , year =

  71. [80]

    CVPR , year =

    Deepak Pathak and Ross Girshick and Piotr Dollár and Trevor Darrell and Bharath Hariharan , title =. CVPR , year =

  72. [81]

    Efros , title =

    Deepak Pathak and Philipp Krahenbuhl and Jeff Donahue and Trevor Darrell and Alexei A. Efros , title =. CVPR , year =

  73. [82]

    ICCV , year =

    Xiaolong Wang and Abhinav Gupta , title =. ICCV , year =

  74. [83]

    ICCV , year =

    Xiaolong Wang and Kaiming He and Abhinav Gupta , title =. ICCV , year =

  75. [84]

    Efros , title =

    Richard Zhang and Phillip Isola and Alexei A. Efros , title =. CVPR , year =

  76. [85]

    Efros , title =

    Richard Zhang and Phillip Isola and Alexei A. Efros , title =. ECCV , year =

  77. [86]

    arXiv preprint arXiv:1906.05849v4 , year =

    Yonglong Tian and Dilip Krishnan and and Phillip Isola , title =. arXiv preprint arXiv:1906.05849v4 , year =

  78. [87]

    NeurIPS , year =

    Bolei Zhou and Agata Lapedriza and Jianxiong Xiao and Antonio Torralba and Aude Oliva , title =. NeurIPS , year =

  79. [88]

    Mark Everingham and Luc Van Gool and Christopher K. I. Williams, John Winn and Andrew Zisserman , title =. IJCV , year =

  80. [89]

    CVPR , year =

    Grant Van Horn and Oisin Mac Aodha and Yang Song and Yin Cui and Chen Sun and Alex Shepard and Hartwig Adam and Pietro Perona and Serge Belongie , title =. CVPR , year =

Showing first 80 references.