Recognition: 3 theorem links
· Lean TheoremVICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning
Pith reviewed 2026-05-15 15:25 UTC · model grok-4.3
The pith
VICReg prevents collapse to constant embeddings in self-supervised learning by adding an explicit variance term per dimension plus covariance regularization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
VICReg combines a variance regularization term that keeps the standard deviation of each embedding dimension above a fixed threshold, an invariance term that makes embeddings from different augmented views similar, and a covariance term that decorrelates dimensions, thereby avoiding the trivial constant solution and achieving results on par with the state of the art on several downstream tasks.
What carries the argument
The variance regularization term, which penalizes any embedding dimension whose standard deviation falls below a chosen threshold, combined with covariance regularization for decorrelation.
If this is right
- The method matches state-of-the-art performance on multiple downstream image tasks.
- Adding the variance term to other self-supervised methods stabilizes their training dynamics.
- The same term produces measurable accuracy gains when inserted into existing approaches.
- Collapse is avoided through explicit loss terms rather than implicit architectural constraints.
Where Pith is reading between the lines
- The explicit variance threshold may simplify hyperparameter search when moving to new embedding dimensionalities.
- The approach opens a path to apply similar per-dimension constraints in self-supervised settings beyond images.
- Because the mechanism is loss-based, it could be combined with architectural changes to further reduce reliance on negative samples.
Load-bearing premise
Enforcing per-dimension variance above a fixed threshold together with covariance regularization is enough to stop collapse for many architectures and datasets without creating new instabilities.
What would settle it
Training an encoder with the VICReg loss and observing that multiple embedding dimensions still have near-zero variance or that downstream accuracy falls below the reported baselines.
read the original abstract
Recent self-supervised methods for image representation learning are based on maximizing the agreement between embedding vectors from different views of the same image. A trivial solution is obtained when the encoder outputs constant vectors. This collapse problem is often avoided through implicit biases in the learning architecture, that often lack a clear justification or interpretation. In this paper, we introduce VICReg (Variance-Invariance-Covariance Regularization), a method that explicitly avoids the collapse problem with a simple regularization term on the variance of the embeddings along each dimension individually. VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization, and achieves results on par with the state of the art on several downstream tasks. In addition, we show that incorporating our new variance term into other methods helps stabilize the training and leads to performance improvements.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces VICReg, a self-supervised learning method for visual representations that explicitly prevents collapse via three regularization terms: a variance term that penalizes per-dimension standard deviations below a fixed threshold, an invariance term based on MSE between embeddings of different views of the same image, and a covariance term that reduces redundancy by penalizing off-diagonal covariances. The approach is evaluated on ImageNet pretraining with linear evaluation and transfer tasks using standard ResNet backbones, achieving results competitive with prior state-of-the-art methods; ablations further show that the variance term can be added to existing methods (e.g., SimCLR, Barlow Twins) to stabilize training and improve performance.
Significance. If the empirical results hold, the work is significant for providing an explicit, architecture-agnostic regularization mechanism to avoid collapse in SSL, moving beyond reliance on implicit biases such as stop-gradients or asymmetric predictors. The ability to retrofit the variance term into other frameworks for measurable gains adds practical value, and the consistent performance across standard benchmarks with reproducible components strengthens its contribution to the field.
minor comments (4)
- [Abstract] Abstract: the claim of results 'on par with the state of the art' would be strengthened by briefly naming the closest baselines (e.g., Barlow Twins, SimCLR) and the specific metric gaps on ImageNet linear evaluation.
- [§3.2] §3.2, Eq. (3): the variance loss uses a fixed threshold of 1; while treated as a hyperparameter, a short sensitivity plot or statement on robustness across datasets would clarify whether this choice is broadly stable.
- [Table 1] Table 1 and §4.3: reporting standard deviations over multiple runs (or at least noting the number of seeds) for the main results would allow readers to assess the reliability of the reported improvements when the variance term is added to other methods.
- [§4.1] §4.1: the description of the covariance regularization could explicitly state whether the normalization matches the standard sample covariance or a modified form, to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the positive and accurate summary of our work, as well as the recommendation for minor revision. We appreciate the recognition that VICReg provides an explicit, architecture-agnostic mechanism to avoid collapse and that the variance term can be usefully retrofitted to other methods. No specific major comments were raised in the report, so we will focus on minor revisions for clarity and presentation.
Circularity Check
No significant circularity: explicit statistical regularizers evaluated on external tasks
full rationale
The VICReg loss is constructed directly from three explicit terms: invariance (MSE between view embeddings), variance (hinge on per-dimension std above fixed threshold γ), and covariance (sum of squared off-diagonal correlations). None of these terms is obtained by fitting a parameter to a subset of the target data and then relabeling the fit as a prediction. Downstream linear evaluation and transfer results are measured on standard ImageNet splits and other datasets using fixed ResNet backbones; they constitute independent empirical evidence rather than a tautological consequence of the training objective. No self-citation chain is invoked to justify uniqueness or to forbid alternatives; the method is presented as a straightforward combination of known statistical penalties. The derivation therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- lambda, mu, nu
axioms (2)
- domain assumption Different views of the same image should produce similar embeddings (invariance).
- domain assumption Non-zero variance per embedding dimension prevents collapse to constant vectors.
Lean theorems connected to this paper
-
IndisputableMonolith.Cost.JcostCoreJcost_pos_of_ne_one echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
a term that maintains the variance of each embedding dimension above a threshold... a term that decorrelates each pair of variables
-
IndisputableMonolith.Foundation.DAlembert.Inevitabilitybilinear_family_forced echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
VICReg combines the variance term with a decorrelation mechanism based on redundancy reduction and covariance regularization
-
IndisputableMonolith.Foundation.LogicAsFunctionalEquationRCL_is_unique_functional_form_of_logic refines?
refinesRelation between the paper passage and the cited Recognition theorem.
the composite value is determined by a finite pairwise polynomial algebra
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 21 Pith papers
-
Normalizing Trajectory Models
NTM models each generative reverse step as a conditional normalizing flow with a hybrid shallow-deep architecture, enabling exact-likelihood training and strong four-step sampling performance on text-to-image tasks.
-
Normalizing Trajectory Models
NTM uses per-step conditional normalizing flows plus a trajectory-wide predictor to achieve exact-likelihood 4-step sampling that matches or exceeds baselines on text-to-image tasks.
-
PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization
PairAlign learns compact audio token sequences via self-alignment of paired content views using an autoregressive decoder, achieving strong cross-view consistency and edit-distance preservation while reducing token co...
-
Beyond Patient Invariance: Learning Cardiac Dynamics via Action-Conditioned JEPAs
Action-conditioned JEPA models treat pathology as a transition vector on latent states to simulate cardiac dynamics, outperforming supervised learning by over 0.05 AUROC in low-resource regimes on MIMIC-IV-ECG.
-
Coevolving Representations in Joint Image-Feature Diffusion
CoReDi coevolves semantic representations with the diffusion model via a jointly learned linear projection stabilized by stop-gradient, normalization, and regularization, yielding faster convergence and higher sample ...
-
Predictive but Not Plannable: RC-aux for Latent World Models
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
-
Understanding Self-Supervised Learning via Latent Distribution Matching
Self-supervised learning is cast as latent distribution matching that aligns representations to a model while enforcing uniformity, unifying multiple SSL families and proving identifiability for predictive variants ev...
-
Understanding DNNs in Feature Interaction Models: A Dimensional Collapse Perspective
DNNs mitigate dimensional collapse of embeddings in feature interaction models, shown via parallel and stacked experiments plus gradient analysis.
-
Monitoring Neural Training with Topology: A Footprint-Predictable Collapse Index
A composite Collapse Index based on incremental discrete Morse homology provides low-latency early warning of representational collapse during neural network training.
-
Self-Supervised Representation Learning via Hyperspherical Density Shaping
HyDeS introduces hyperspherical density shaping with a von Mises-Fisher estimator to create theoretically grounded self-supervised representations that focus on foreground features.
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp raises effective rank of representations in deep MetaFormer recommenders via randomized splitting and multi-embeddings, delivering 2-5% GMV gains in production deployments at Weixin.
-
HSG: Hyperbolic Scene Graph
Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
-
Hierarchical Planning with Latent World Models
Hierarchical planning over multi-scale latent world models enables 70% success on real robotic pick-and-place with goal-only input where flat models achieve 0%, while cutting planning compute up to 4x in simulations.
-
Rapidly deploying on-device eye tracking by distilling visual foundation models
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
-
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction
Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
-
MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization
MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC be...
-
Learning Invariant Modality Representation for Robust Multimodal Learning from a Causal Inference Perspective
CmIR uses causal inference to separate invariant causal representations from spurious ones in multimodal data, improving generalization under distribution shifts and noise via invariance, mutual information, and recon...
-
RankUp: Towards High-rank Representations for Large Scale Advertising Recommender Systems
RankUp enhances representation capacity in deep MetaFormer recommenders via permutation splitting and multi-embeddings, achieving GMV improvements of 2-5% in Weixin production systems.
-
The Global Neural World Model: Spatially Grounded Discrete Topologies for Action-Conditioned Planning
GNWM maps environments to a discrete 2D grid with snapping to stabilize autoregressive planning and learns generalized dynamics from maximum-entropy random walks.
Reference graph
Works this paper leans on
-
[1]
Self-labelling via simultaneous clustering and representation learning
Yuki Markus Asano, Christian Rupprecht, and Andrea Vedaldi. Self-labelling via simultaneous clustering and representation learning. In ICLR, 2020
work page 2020
-
[2]
Learning representations by maximizing mutual information across views
Philip Bachman, R Devon Hjelm, and William Buchwalter. Learning representations by maximizing mutual information across views. In NeurIPS, 2019
work page 2019
-
[3]
Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer
Miguel A. Bautista, Artsiom Sanakoyeu, Ekaterina Sutter, and Björn Ommer. Cliquecnn: Deep unsupervised exemplar learning. In NeurIPS, 2016
work page 2016
-
[4]
Signature verification using a “siamese” time delay neural network
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Sackinger, and Roopak Shah. Signature verification using a “siamese” time delay neural network. In NeurIPS, 1994
work page 1994
-
[5]
Deep clustering for unsupervised learning
Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Matthijs Douze. Deep clustering for unsupervised learning. In ECCV, 2018
work page 2018
-
[6]
Unsupervised pre-training of image features on non-curated data
Mathilde Caron, Piotr Bojanowski, Julien Mairal, and Armand Joulin. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019
work page 2019
-
[7]
Unsupervised learning of visual features by contrasting cluster assignments
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments. In NeurIPS, 2020
work page 2020
-
[8]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey E. Hinton. A simple framework for contrastive learning of visual representations. In ICML, 2020 a
work page 2020
-
[9]
Big self-supervised models are strong semi-supervised learners
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, and Geoffrey Hinton. Big self-supervised models are strong semi-supervised learners. In NeurIPS, 2020 b
work page 2020
-
[10]
Exploring simple siamese representation learning
Xinlei Chen and Kaiming He. Exploring simple siamese representation learning. In CVPR, 2020
work page 2020
-
[12]
Learning a similarity metric discriminatively, with application to face verification
Sumit Chopra, Raia Hadsell, and Yann LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005
work page 2005
-
[13]
Sinkhorn distances: Lightspeed computation of optimal transport
Marco Cuturi. Sinkhorn distances: Lightspeed computation of optimal transport. In NeurIPS, 2013
work page 2013
-
[14]
Imagenet: A large-scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009
work page 2009
-
[15]
An image is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021
work page 2021
-
[16]
Whitening for self-supervised representation learning, 2021
Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, and Nicu Sebe. Whitening for self-supervised representation learning, 2021
work page 2021
-
[17]
Mark Everingham, Luc Van Gool, John Winn Christopher K. I. Williams, and Andrew Zisserman. The pascal visual object classes (voc) challenge. IJCV, 2010
work page 2010
-
[18]
Fleet, Jamie Ryan Kiros, and Sanja Fidler
Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, and Sanja Fidler. Vse++: Improving visual-semantic embeddings with hard negatives. In BMVC, 2018
work page 2018
-
[19]
Liblinear: A library for large linear classification
Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, Xiang-Rui Wang, and Chih-Jen Lin. Liblinear: A library for large linear classification. JMLR, 2008
work page 2008
-
[20]
Learning representations by predicting bags of visual words
Spyros Gidaris, Andrei Bursuc, Nikos Komodakis, Patrick Pérez, and Matthieu Cord. Learning representations by predicting bags of visual words. In CVPR, 2020
work page 2020
-
[21]
Online bag-of-visual-words generation for unsupervised representation learning
Spyros Gidaris, Andrei Bursuc, Gilles Puy, Nikos Komodakis, Matthieu Cord, and Patrick Pérez. Online bag-of-visual-words generation for unsupervised representation learning. In CVPR, 2021
work page 2021
-
[23]
Priya Goyal, Quentin Duval, Jeremy Reizenstein, Matthew Leavitt, Min Xu, Benjamin Lefaudeux, Mannat Singh, Vinicius Reis, Mathilde Caron, Piotr Bojanowski, Armand Joulin, and Ishan Misra. Vissl. https://github.com/facebookresearch/vissl, 2021
work page 2021
-
[24]
Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, and Michal Valko. Bootstrap your own latent: A new approach to self-supervised learning. In NeurIPS, 2020
work page 2020
-
[25]
Dimensionality reduction by learning an invariant mapping
Raia Hadsell, Sumit Chopra, and Yann LeCun. Dimensionality reduction by learning an invariant mapping. In CVPR, 2006
work page 2006
-
[26]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016
work page 2016
-
[27]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In ICCV, 2017
work page 2017
-
[28]
Momentum contrast for unsupervised visual representation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020
work page 2020
-
[29]
Distilling the knowledge in a neural network
Geoffrey Hinton, Oriol Vinyals, and Jeffrey Dean. Distilling the knowledge in a neural network. In NIPS Deep Learning and Representation Learning Workshop, 2015
work page 2015
-
[30]
Learning deep representations by mutual information estimation and maximization
R Devon Hjelm, Alex Fedorov, Samuel Lavoie-Marchildon, Karan Grewal, Adam Trischler, and Yoshua Bengio. Learning deep representations by mutual information estimation and maximization. In ICLR, 2019
work page 2019
-
[31]
The inaturalist species classification and detection dataset
Grant Van Horn, Oisin Mac Aodha, Yang Song, Yin Cui, Chen Sun, Alex Shepard, Hartwig Adam, Pietro Perona, and Serge Belongie. The inaturalist species classification and detection dataset. In CVPR, 2018
work page 2018
-
[32]
Unsupervised deep learning by neighbourhood discovery
Jiabo Huang, Qi Dong andShaogang Gong, and Xiatian Zhu. Unsupervised deep learning by neighbourhood discovery. In ICML, 2019
work page 2019
-
[33]
Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, and Aäron van den Oord. Data-efficient image recognition with contrastive predictive coding. In ICML, 2019
work page 2019
-
[34]
Batch normalization: Accelerating deep network training by reducing internal covariate shift
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015
work page 2015
-
[35]
Junnan Li, Pan Zhou, Caiming Xiong, and Steven C.H. Hoi. Prototypical contrastive learning of unsupervised representations. In ICLR, 2021
work page 2021
-
[36]
Lawrence Zitnick, and Piotr Dollár
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft coco: Common objects in context. In ECCV, 2014
work page 2014
-
[37]
Sgdr: stochastic gradient descent with warm restarts
Ilya Loshchilov and Frank Hutter. Sgdr: stochastic gradient descent with warm restarts. In ICLR, 2017
work page 2017
-
[38]
Self-supervised learning of pretext-invariant representations
Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In CVPR, 2020
work page 2020
-
[39]
Karol J. Piczak. ESC : Dataset for Environmental Sound Classification . In Proceedings of the 23rd Annual ACM Conference on Multimedia , 2015
work page 2015
-
[40]
Faster r-cnn: Towards real-time object detection with region proposal networks
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NeurIPS, 2015
work page 2015
-
[43]
What makes for good views for contrastive learning
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola. What makes for good views for contrastive learning. In NeurIPS, 2020
work page 2020
-
[46]
Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. Detectron2. https://github.com/facebookresearch/detectron2, 2019
work page 2019
-
[47]
Unsupervised feature learning via non-parametric instance discrimination
Zhirong Wu, Yuanjun Xiong, Stella Yu, , and Dahua Lin. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018
work page 2018
-
[48]
Unsupervised deep embedding for clustering analysis
Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In ICML, 2016
work page 2016
-
[49]
Aggregated residual transformations for deep neural networks
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. Aggregated residual transformations for deep neural networks. In CVPR, 2017
work page 2017
-
[50]
Clusterfit: Improving generalization of visual representations
Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, and Dhruv Mahajan. Clusterfit: Improving generalization of visual representations. In CVPR, 2020
work page 2020
-
[51]
Joint unsupervised learning of deep representations and image clusters
Jianwei Yang, Devi Parikh, and Dhruv Batra. Joint unsupervised learning of deep representations and image clusters. In CVPR, 2016
work page 2016
-
[52]
Unsupervised embedding learning via invariant and spreading instance feature
Mang Ye, Xu Zhang, Pong C Yuen, and Shih-Fu Chang. Unsupervised embedding learning via invariant and spreading instance feature. In CVPR, 2019
work page 2019
-
[56]
Learning deep features for scene recognition using places database
Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. Learning deep features for scene recognition using places database. In NeurIPS, 2014
work page 2014
-
[57]
Local aggregation for unsupervised learning of visual embeddings
Chengxu Zhuang, Alex Lin Zhai, and Daniel Yamins. Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019
work page 2019
-
[58]
He, Kaiming and Zhang, Xiangyu and Ren, Shaoqing and Sun, Jian , title =. 2016 , booktitle =
work page 2016
-
[59]
Sergey Zagoruyko and Nikos Komodakis , title =. arXiv preprint arXiv:1605.07146 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[60]
Distilling the Knowledge in a Neural Network , author =. 2015 , booktitle =
work page 2015
-
[61]
Saining Xie and Ross Girshick and Piotr Dollár and Zhuowen Tu and Kaiming He , title =. CVPR , year =
-
[62]
Jia Deng and Wei Dong and Richard Socher and Li-Jia Li and Kai Li and Li Fei-Fei , title=. CVPR , year=
- [63]
-
[64]
Ting Chen and Simon Kornblith and Mohammad Norouzi and Geoffrey E. Hinton , title=. ICML , year=
- [65]
-
[66]
What makes for good views for contrastive learning , author=. NeurIPS , year=
-
[67]
Bootstrap your own latent: A new approach to self-supervised Learning , author=. NeurIPS , year=
-
[68]
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , author=. NeurIPS , year=
- [69]
-
[70]
He, Kaiming and Fan, Haoqi and Wu, Yuxin and Xie, Saining and Girshick, Ross , title =. CVPR , year =
-
[71]
Bachman, Philip and Hjelm, R Devon and Buchwalter, William , title =. NeurIPS , year =
-
[72]
Jane Bromley and Isabelle Guyon and Yann LeCun and Eduard Sackinger and Roopak Shah , title =. NeurIPS , year =
-
[73]
Mathilde Caron and Piotr Bojanowski and Armand Joulin and Matthijs Douze , title =. ECCV , year =
-
[74]
Pulkit Agrawal and Joao Carreira and Jitendra Malik , title =. ICCV , year =
-
[75]
Carl Doersch and Abhinav Gupta and Alexei A. Efros , title =. ICCV , year =
- [76]
-
[77]
Dahun Kim and Donghyeon Cho and Donggeun Yoo and In So Kweon , title =. WACV , year =
-
[78]
Gustav Larsson and Michael Maire and Gregory Shakhnarovich , title =. ECCV , year =
-
[79]
Shuffle and Learn: Unsupervised Learning using Temporal Order Verification , author =. ECCV , year =
-
[80]
Deepak Pathak and Ross Girshick and Piotr Dollár and Trevor Darrell and Bharath Hariharan , title =. CVPR , year =
-
[81]
Deepak Pathak and Philipp Krahenbuhl and Jeff Donahue and Trevor Darrell and Alexei A. Efros , title =. CVPR , year =
- [82]
- [83]
-
[84]
Richard Zhang and Phillip Isola and Alexei A. Efros , title =. CVPR , year =
-
[85]
Richard Zhang and Phillip Isola and Alexei A. Efros , title =. ECCV , year =
-
[86]
arXiv preprint arXiv:1906.05849v4 , year =
Yonglong Tian and Dilip Krishnan and and Phillip Isola , title =. arXiv preprint arXiv:1906.05849v4 , year =
-
[87]
Bolei Zhou and Agata Lapedriza and Jianxiong Xiao and Antonio Torralba and Aude Oliva , title =. NeurIPS , year =
-
[88]
Mark Everingham and Luc Van Gool and Christopher K. I. Williams, John Winn and Andrew Zisserman , title =. IJCV , year =
-
[89]
Grant Van Horn and Oisin Mac Aodha and Yang Song and Yin Cui and Chen Sun and Alex Shepard and Hartwig Adam and Pietro Perona and Serge Belongie , title =. CVPR , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.