hub Canonical reference

Momentum contrast for unsupervised visual representation learning

Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, Ross Girshick · 2020

Canonical reference. 70% of citing Pith papers cite this work as background.

22 Pith papers citing it

Background 70% of classified citations

browse 22 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 7 method 3

citation-polarity summary

background 7 use method 3

representative citing papers

MaxSketch: Robust Distinct Counting in Streams via Random Projections

stat.ML · 2026-05-15 · unverdicted · novelty 7.0

MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.

Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.

CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization

cs.CV · 2026-05-09 · conditional · novelty 7.0 · 2 refs

CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.

Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.

TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial MSSL methods on urban mobility and road tasks.

The Indra Representation Hypothesis for Multimodal Alignment

cs.CV · 2026-04-06 · unverdicted · novelty 7.0

Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.

FuTCR: Future-Targeted Contrast and Repulsion for Continual Panoptic Segmentation

cs.CV · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

FuTCR improves new-class panoptic quality by up to 28% in continual panoptic segmentation by discovering future-like regions in background areas and applying targeted contrast and repulsion to restructure representations.

DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.

Continuous Limits of Coupled Flows in Representation Learning

cs.LG · 2026-04-18 · unverdicted · novelty 6.0

Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.

Image-to-Image Translation Framework Embedded with Rotation Symmetry Priors

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.

Rapidly deploying on-device eye tracking by distilling visual foundation models

cs.CV · 2026-04-02 · unverdicted · novelty 6.0

DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.

Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks

cs.CV · 2026-05-18 · unverdicted · novelty 5.0

SMIT, which combines masked image modeling with self-distillation, delivers the highest segmentation accuracy, fastest convergence, and best few-shot performance across nine CT and MRI tasks compared to contrastive and rotation-based SSL methods.

HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations

stat.ML · 2026-05-16 · unverdicted · novelty 5.0

HYVINT introduces an intensity-driven incidence mechanism and tractable variational estimator for hypergraph generation, with error bounds and empirical gains in fidelity, novelty, and diversity.

ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.

Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

Pan-FM learns balanced representations across seven organs by adaptively masking dominant organs during pre-training, yielding stronger disease prediction and missing-organ robustness than single-organ or naive multimodal baselines on UK Biobank.

ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.

Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision

cs.CL · 2026-04-10 · unverdicted · novelty 5.0

A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.

Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks

cs.CV · 2026-04-02 · unverdicted · novelty 5.0

Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.

Momentum-Anchored Multi-Scale Fusion Model for Long-Tailed Chest X-Ray Classification

cs.CV · 2026-05-04 · unverdicted · novelty 4.0

A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.

Representation learning from OCT images

cs.CV · 2026-05-04 · unverdicted · novelty 3.0

A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.

FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics

cs.LG · 2026-05-17

Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective

cs.LG · 2026-05-13 · 2 refs

citing papers explorer

Showing 22 of 22 citing papers.

MaxSketch: Robust Distinct Counting in Streams via Random Projections stat.ML · 2026-05-15 · unverdicted · none · ref 11
MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning cs.LG · 2026-05-12 · unverdicted · none · ref 16
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization cs.CV · 2026-05-09 · conditional · none · ref 8 · 2 links
CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection cs.CV · 2026-05-08 · unverdicted · none · ref 12
S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.
TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations cs.CV · 2026-05-07 · unverdicted · none · ref 19
TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial MSSL methods on urban mobility and road tasks.
The Indra Representation Hypothesis for Multimodal Alignment cs.CV · 2026-04-06 · unverdicted · none · ref 23
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
FuTCR: Future-Targeted Contrast and Repulsion for Continual Panoptic Segmentation cs.CV · 2026-05-12 · unverdicted · none · ref 34 · 2 links
FuTCR improves new-class panoptic quality by up to 28% in continual panoptic segmentation by discovering future-like regions in background areas and applying targeted contrast and repulsion to restructure representations.
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring cs.CV · 2026-05-06 · unverdicted · none · ref 20
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
Continuous Limits of Coupled Flows in Representation Learning cs.LG · 2026-04-18 · unverdicted · none · ref 47
Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.
Image-to-Image Translation Framework Embedded with Rotation Symmetry Priors cs.CV · 2026-04-14 · unverdicted · none · ref 35
Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.
Rapidly deploying on-device eye tracking by distilling visual foundation models cs.CV · 2026-04-02 · unverdicted · none · ref 32
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks cs.CV · 2026-05-18 · unverdicted · none · ref 26
SMIT, which combines masked image modeling with self-distillation, delivers the highest segmentation accuracy, fastest convergence, and best few-shot performance across nine CT and MRI tasks compared to contrastive and rotation-based SSL methods.
HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations stat.ML · 2026-05-16 · unverdicted · none · ref 48
HYVINT introduces an intensity-driven incidence mechanism and tractable variational estimator for hypergraph generation, with error bounds and empirical gains in fidelity, novelty, and diversity.
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs cs.CV · 2026-05-08 · unverdicted · none · ref 38
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness cs.CV · 2026-05-08 · unverdicted · none · ref 24
Pan-FM learns balanced representations across seven organs by adaptively masking dominant organs during pre-training, yielding stronger disease prediction and missing-organ robustness than single-organ or naive multimodal baselines on UK Biobank.
ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines cs.CV · 2026-04-13 · unverdicted · none · ref 40
ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision cs.CL · 2026-04-10 · unverdicted · none · ref 26
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks cs.CV · 2026-04-02 · unverdicted · none · ref 19
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
Momentum-Anchored Multi-Scale Fusion Model for Long-Tailed Chest X-Ray Classification cs.CV · 2026-05-04 · unverdicted · none · ref 15
A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.
Representation learning from OCT images cs.CV · 2026-05-04 · unverdicted · none · ref 49
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.
FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics cs.LG · 2026-05-17 · unreviewed · ref 18
Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective cs.LG · 2026-05-13 · unreviewed · ref 31 · 2 links

Momentum contrast for unsupervised visual representation learning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer