hub Mixed citations

Improved Baselines with Momentum Contrastive Learning

Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He · 2020 · cs.CV · arXiv 2003.04297

Mixed citation behavior. Most common role is background (50%).

40 Pith papers citing it

Background 50% of classified citations

open full Pith review browse 40 citing papers arXiv PDF

abstract

Contrastive unsupervised learning has recently shown encouraging progress, e.g., in Momentum Contrast (MoCo) and SimCLR. In this note, we verify the effectiveness of two of SimCLR's design improvements by implementing them in the MoCo framework. With simple modifications to MoCo---namely, using an MLP projection head and more data augmentation---we establish stronger baselines that outperform SimCLR and do not require large training batches. We hope this will make state-of-the-art unsupervised learning research more accessible. Code will be made public.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 method 2 baseline 1

citation-polarity summary

background 4 use method 2 baseline 1 unclear 1

representative citing papers

Learning to (Learn at Test Time): RNNs with Expressive Hidden States

cs.LG · 2024-07-05 · conditional · novelty 8.0

TTT layers treat the hidden state as a trainable model updated at test time, allowing linear-complexity sequence models to scale perplexity reduction with context length unlike Mamba.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Emerging Properties in Self-Supervised Vision Transformers

cs.CV · 2021-04-29 · conditional · novelty 8.0

Self-supervised ViTs show emergent semantic segmentation and 78.3% k-NN accuracy on ImageNet; DINO reaches 80.1% linear evaluation with ViT-Base.

Targeted Downstream-Agnostic Attack

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

Introduces Targeted Downstream-Agnostic Attack (TDAA) that uses a threat image as feature anchor and example-specific perturbations to achieve targeted attacks on unknown downstream tasks from pre-trained encoders.

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

SeBA is a joint-embedding framework that separates tabular data into two complementary views and aligns one view's representations to the nearest-neighbor structure of the other, improving feature-label relationships and achieving SOTA results in most benchmarks without relying on augmentations.

Attention Transfer Is Not Universally Effective for Vision Transformers

cs.CV · 2026-05-08 · accept · novelty 7.0

Attention transfer from ViT teachers succeeds for only 7 of 11 families and fails for the rest because of architectural mismatch between teacher and student.

TinySSL: Distilled Self-Supervised Pretraining for Sub-Megabyte MCU Models

cs.CV · 2026-05-07 · conditional · novelty 7.0

CA-DSSL enables effective self-supervised pretraining for 396K-parameter MCU backbones, reaching 62.7% linear-probe accuracy on CIFAR-100 and 94% of supervised performance while fitting in 378 KB INT8.

Generative Texture Filtering

cs.CV · 2026-04-21 · unverdicted · novelty 7.0

A two-stage fine-tuning strategy on pre-trained generative models enables effective texture filtering that outperforms prior methods on challenging cases.

CBEN -- A Multimodal Machine Learning Dataset for Cloud Robust Remote Sensing Image Understanding

cs.CV · 2026-02-13 · accept · novelty 7.0

CBEN provides paired optical-radar images with cloud occlusion, revealing 23-33 point AP drops in clear-sky trained models and 17-29 point relative gains when models are trained on cloudy data.

Joint Embedding Variational Bayes

cs.LG · 2026-02-05 · unverdicted · novelty 7.0

VJE is a new variational non-contrastive SSL method that models target embeddings with a directional-radial Student-t distribution to enable structured uncertainty estimation directly in the learned representation space.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

BEiT: BERT Pre-Training of Image Transformers

cs.CV · 2021-06-15 · conditional · novelty 7.0

BEiT pre-trains vision transformers via masked image modeling on visual tokens and reaches 83.2% ImageNet top-1 accuracy for the base model and 86.3% for the large model using only ImageNet-1K data.

VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning

cs.CV · 2021-05-11 · accept · novelty 7.0

VICReg prevents collapse in self-supervised image embeddings via explicit variance, invariance, and covariance regularization and matches state-of-the-art downstream performance.

Vision Foundation Models as Generalist Tokenizers for Image Generation

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

VFMTok builds a generalist image tokenizer on frozen VFMs using adaptive quantization and semantic alignment, delivering gFID 1.36 for autoregressive and 1.25 for continuous generation on ImageNet with 3x faster convergence.

ArmSSL: Adversarial Robust Black-Box Watermarking for Self-Supervised Learning Pre-trained Encoders

cs.CR · 2026-04-24 · unverdicted · novelty 6.0

ArmSSL is a black-box verifiable and adversarially robust watermarking framework for SSL pre-trained encoders using paired discrepancy enlargement, latent entanglement, distribution alignment, and reference-guided tuning.

Beyond Binary Contrast: Modeling Continuous Skeleton Action Spaces with Transitional Anchors

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

TranCLR models continuous skeleton action spaces with transitional anchors and multi-level manifold calibration, yielding smoother and more accurate representations than binary contrastive methods.

Shape: A Self-Supervised 3D Geometry Foundation Model for Industrial CAD Analysis

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

A 10.9M-parameter self-supervised model pretrained on 61k CAD meshes achieves R²=0.729 reconstruction and 98.1% top-1 retrieval on held-out data via masked normalized geometry reconstruction and multi-resolution contrastive learning.

Boosting Visual Instruction Tuning with Self-Supervised Guidance

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Mixing 3-10% of visually grounded self-supervised instructions into visual instruction tuning consistently boosts MLLM performance on vision-centric benchmarks.

Probing Intrinsic Medical Task Relationships: A Contrastive Learning Perspective

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

TaCo contrastively embeds semantic, generative, and transformation tasks from medical imaging into a joint space to reveal which tasks cluster, blend, or remain distinct.

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

TPSNet combines CLIP text prompts and phase features as dual priors to deliver better semantic supervision and domain alignment than pseudo-label clustering in unsupervised cross-domain image retrieval.

Vision Transformers Need More Than Registers

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

ViTs exhibit lazy aggregation by relying on irrelevant background patches for global semantics, and selectively integrating patch features into the CLS token reduces this effect and improves results across label-, text-, and self-supervision.

LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping

cs.CV · 2025-11-11 · unverdicted · novelty 6.0

LandSegmenter creates a task-specific foundation model for LULC mapping using weak labels from existing products, an RS adapter, text encoder, and confidence-guided fusion to achieve competitive zero-shot performance across modalities and taxonomies.

CoUn: Empowering Machine Unlearning via Contrastive Learning

cs.LG · 2025-09-19 · unverdicted · novelty 6.0

CoUn emulates retrained-model behavior on forget data by using contrastive learning on retain data to adjust semantic representations while preserving retain clusters via supervised learning, outperforming prior MU methods in experiments.

Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning

cs.CV · 2025-07-18 · conditional · novelty 6.0

Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.

citing papers explorer

Showing 2 of 2 citing papers after filters.

BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning cs.LG · 2026-04-30 · unreviewed · ref 19 · internal anchor
Image Generators are Generalist Vision Learners cs.CV · 2026-04-22 · unreviewed · ref 9 · 2 links · internal anchor

Improved Baselines with Momentum Contrastive Learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer