hub

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Momentum contrast for unsupervised visual representation learning , author=

20 Pith papers cite this work. Polarity classification is still indexing.

20 Pith papers citing it

browse 20 citing papers

hub tools

JSON dossier citing papers JSON

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

A Unified Geometric Framework for Weighted Contrastive Learning

cs.LG · 2026-05-13 · unverdicted · novelty 8.0

Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preserve regular simplex or unique optima.

ConRetroBert: EMA Stabilized Dual Encoders for Template-Based Single-Step Retrosynthesis

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

ConRetroBert achieves 62.4% top-1 accuracy on USPTO-50k by combining contrastive pretraining, hard-negative listwise ranking, and EMA-stabilized dual encoders for template retrieval in retrosynthesis.

TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.

Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

cs.CL · 2026-05-07 · unverdicted · novelty 7.0

POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.

Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.

Divide and Contrast: Learning Robust Temporal Features without Augmentation

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.

Neural Collapse by Design: Learning Class Prototypes on the Hypersphere

cs.LG · 2026-05-19 · unverdicted · novelty 6.0 · 2 refs

Supervised classification reaches neural collapse by design via normalized prototype losses on the hypersphere, outperforming CE and SCL on ImageNet-1K and other benchmarks with faster convergence and better transfer.

GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.

LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics

cs.LG · 2025-11-11 · conditional · novelty 6.0

LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.

PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

PACD-Net uses pseudo-augmented contrastive distillation with a hybrid Swin Transformer-CNN backbone to estimate TAR, TIR, and TBR from sparse SMBG data and outperforms prior methods in accuracy and stability under sparse conditions.

TriForces: Augmenting Atomistic GNNs for Transferable Representations

cs.LG · 2026-05-20 · unverdicted · novelty 5.0

TriForces adds a model-agnostic three-stream architecture plus self-supervised objectives to atomistic GNNs, improving transfer performance on MatBench, QM9, and limited-data OMat24 without DFT labels.

Temporal Aware Pruning for Efficient Diffusion-based Video Generation

cs.CV · 2026-05-18 · unverdicted · novelty 5.0 · 2 refs

TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

cs.CV · 2026-05-14 · unverdicted · novelty 5.0

A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.

Information theoretic underpinning of self-supervised learning by clustering

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

cs.CL · 2024-12-18 · unverdicted · novelty 5.0

ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.

The Platonic Representation Hypothesis

cs.LG · 2024-05-13 · unverdicted · novelty 5.0

Representations learned by large AI models are converging toward a shared statistical model of reality.

Robustness Analysis of USmorph: II. Optimizing Feature Extraction, Dimensionality Reduction, and Clustering for Unsupervised Galaxy Morphology Classification

astro-ph.GA · 2026-05-20 · unverdicted · novelty 3.0

Optimizes ImageNet-pretrained AlexNet, UMAP, and a bagging multi-cluster voting scheme with K-means, Birch and Agg for unsupervised galaxy morphology classification, reporting improved stability and consistency with galaxy evolution expectations.

LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention

cs.CV · 2026-05-06

Statistical Consistency and Generalization of Contrastive Representation Learning

cs.LG · 2026-05-04 · 2 refs

citing papers explorer

Showing 20 of 20 citing papers.

A Unified Geometric Framework for Weighted Contrastive Learning cs.LG · 2026-05-13 · unverdicted · none · ref 34
Weighted InfoNCE objectives realize specific target geometries in embedding space, with SupCon producing size-dependent inter-class similarities under imbalance while Soft SupCon and certain continuous variants preserve regular simplex or unique optima.
ConRetroBert: EMA Stabilized Dual Encoders for Template-Based Single-Step Retrosynthesis cs.LG · 2026-05-12 · unverdicted · none · ref 26
ConRetroBert achieves 62.4% top-1 accuracy on USPTO-50k by combining contrastive pretraining, hard-negative listwise ranking, and EMA-stabilized dual encoders for template retrieval in retrosynthesis.
TokenRatio: Principled Token-Level Preference Optimization via Ratio Matching cs.CL · 2026-05-12 · unverdicted · none · ref 64 · 2 links
TBPO posits a token-level Bradley-Terry model and derives a Bregman-divergence density-ratio matching loss that generalizes DPO while preserving token-level optimality.
Optimal Representations for Generalized Contrastive Learning with Imbalanced Datasets cs.LG · 2026-05-11 · unverdicted · none · ref 97
In generalized contrastive learning with imbalanced classes, optimal representations collapse to class means whose angular geometry is determined by class proportions via convex optimization, and extreme imbalance causes all minority classes to collapse to one vector.
Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients cs.CL · 2026-05-07 · unverdicted · none · ref 22
POPO uses bounded importance sampling on positive rollouts and a siamese policy network to achieve implicit negative gradients and stable optimization, matching or exceeding GRPO on math benchmarks such as 36.67% on AIME 2025.
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling cs.LG · 2026-05-22 · unverdicted · none · ref 49
SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.
Divide and Contrast: Learning Robust Temporal Features without Augmentation cs.LG · 2026-05-20 · unverdicted · none · ref 13
Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.
Neural Collapse by Design: Learning Class Prototypes on the Hypersphere cs.LG · 2026-05-19 · unverdicted · none · ref 49 · 2 links
Supervised classification reaches neural collapse by design via normalized prototype losses on the hypersphere, outperforming CE and SCL on ImageNet-1K and other benchmarks with faster convergence and better transfer.
GCE-MIL: Faithful and Recoverable Evidence for Multiple Instance Learning in Whole-Slide Imaging cs.CV · 2026-05-17 · unverdicted · none · ref 17
GCE-MIL is a backbone-agnostic wrapper that directly optimizes MIL evidence for sufficiency, necessity, and recoverability, yielding modest gains in Macro-F1 and C-index plus more faithful patch selection across many backbones and datasets.
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics cs.LG · 2025-11-11 · conditional · none · ref 10
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
PACD-Net: Pseudo-Augmented Contrastive Distillation for Glycemic Control Estimation from SMBG cs.LG · 2026-05-20 · unverdicted · none · ref 66
PACD-Net uses pseudo-augmented contrastive distillation with a hybrid Swin Transformer-CNN backbone to estimate TAR, TIR, and TBR from sparse SMBG data and outperforms prior methods in accuracy and stability under sparse conditions.
TriForces: Augmenting Atomistic GNNs for Transferable Representations cs.LG · 2026-05-20 · unverdicted · none · ref 66
TriForces adds a model-agnostic three-stream architecture plus self-supervised objectives to atomistic GNNs, improving transfer performance on MatBench, QM9, and limited-data OMat24 without DFT labels.
Temporal Aware Pruning for Efficient Diffusion-based Video Generation cs.CV · 2026-05-18 · unverdicted · none · ref 15 · 2 links
TAPE applies temporal-aware token pruning with smoothing, reselection, and timestep scheduling to speed up video diffusion models while preserving visual fidelity and coherence.
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging cs.CV · 2026-05-14 · unverdicted · none · ref 6
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.
Information theoretic underpinning of self-supervised learning by clustering cs.LG · 2026-05-12 · unverdicted · none · ref 50
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference cs.CL · 2024-12-18 · unverdicted · none · ref 91
ModernBERT is a new bidirectional encoder model achieving SOTA performance on diverse classification and retrieval benchmarks while offering superior speed and memory efficiency for long-context inference.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 131
Representations learned by large AI models are converging toward a shared statistical model of reality.
Robustness Analysis of USmorph: II. Optimizing Feature Extraction, Dimensionality Reduction, and Clustering for Unsupervised Galaxy Morphology Classification astro-ph.GA · 2026-05-20 · unverdicted · none · ref 171
Optimizes ImageNet-pretrained AlexNet, UMAP, and a bagging multi-cluster voting scheme with K-means, Birch and Agg for unsupervised galaxy morphology classification, reporting improved stability and consistency with galaxy evolution expectations.
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unreviewed · ref 147
Statistical Consistency and Generalization of Contrastive Representation Learning cs.LG · 2026-05-04 · unreviewed · ref 13 · 2 links

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer