MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
hub Canonical reference
Momentum contrast for unsupervised visual representation learning
Canonical reference. 70% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
years
2026 22representative citing papers
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.
TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial MSSL methods on urban mobility and road tasks.
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
FuTCR improves new-class panoptic quality by up to 28% in continual panoptic segmentation by discovering future-like regions in background areas and applying targeted contrast and repulsion to restructure representations.
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.
Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
SMIT, which combines masked image modeling with self-distillation, delivers the highest segmentation accuracy, fastest convergence, and best few-shot performance across nine CT and MRI tasks compared to contrastive and rotation-based SSL methods.
HYVINT introduces an intensity-driven incidence mechanism and tractable variational estimator for hypergraph generation, with error bounds and empirical gains in fidelity, novelty, and diversity.
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
Pan-FM learns balanced representations across seven organs by adaptively masking dominant organs during pre-training, yielding stronger disease prediction and missing-organ robustness than single-organ or naive multimodal baselines on UK Biobank.
ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.
citing papers explorer
-
MaxSketch: Robust Distinct Counting in Streams via Random Projections
MaxSketch achieves O~(log n / ε²) memory for (1+ε)-approximate distinct counting in streams with geometric structure via max-linear random projections.
-
Disentangled Sparse Representations for Concept-Separated Diffusion Unlearning
SAEParate disentangles sparse representations in diffusion models via contrastive clustering and nonlinear encoding to enable more precise concept unlearning with reduced side effects.
-
CoLVR: Enhancing Exploratory Latent Visual Reasoning via Contrastive Optimization
CoLVR uses latent contrastive objectives with angle-based perturbation and RL trajectory rewards to increase exploratory visual reasoning in MLLMs, delivering 5-8% gains on VSP, Jigsaw, and MMStar benchmarks.
-
Masks Can Talk: Extracting Structured Text Information from Single-Modal Images for Remote Sensing Change Detection
S2M extracts structured text quadruples from change masks to provide noise-free multimodal supervision, achieving 17.80% Sek and 66.14% F_scd on the new Gaza-Change-v2 dataset and outperforming LLM-based multimodal methods.
-
TRAJGANR: Trajectory-Centric Urban Multimodal Learning via Geospatially Aligned Neural Representations
TrajGANR learns continuous neural representations of trajectories to enable fine-grained alignment with street-view images and locations in a joint multimodal self-supervised objective, outperforming prior geospatial MSSL methods on urban mobility and road tasks.
-
The Indra Representation Hypothesis for Multimodal Alignment
Unimodal model representations converge to a relational structure captured by the Indra representation via V-enriched Yoneda embedding, which is unique and structure-preserving and improves cross-model and cross-modal robustness when instantiated with angular distance.
-
FuTCR: Future-Targeted Contrast and Repulsion for Continual Panoptic Segmentation
FuTCR improves new-class panoptic quality by up to 28% in continual panoptic segmentation by discovering future-like regions in background areas and applying targeted contrast and repulsion to restructure representations.
-
DART: A Vision-Language Foundation Model for Comprehensive Rope Condition Monitoring
DART is a cross-modal foundation model that delivers rope damage classification, severity regression, and few-shot recognition from a single frozen representation trained on 4270 images across 14 damage classes.
-
Continuous Limits of Coupled Flows in Representation Learning
Discrete decentralized learning dynamics on manifolds converge uniformly to an overdamped Langevin SDE whose stationary states produce orthogonally disentangled, linearly separable features.
-
Image-to-Image Translation Framework Embedded with Rotation Symmetry Priors
Rotation-equivariant convolutions and adaptive TL-Conv layers are added to I2I networks to preserve rotation symmetry and improve translation quality across domains.
-
Rapidly deploying on-device eye tracking by distilling visual foundation models
DistillGaze reduces median gaze error by 58.62% on a 2000+ participant dataset by distilling foundation models into a 256K-parameter on-device model using synthetic labeled data and unlabeled real data.
-
Benchmarking transferability of SSL pretraining to same and different modality segmentation tasks
SMIT, which combines masked image modeling with self-distillation, delivers the highest segmentation accuracy, fastest convergence, and best few-shot performance across nine CT and MRI tasks compared to contrastive and rotation-based SSL methods.
-
HYVINT: Intensity-Driven Hypergraph Generation with Variational Representations
HYVINT introduces an intensity-driven incidence mechanism and tractable variational estimator for hypergraph generation, with error bounds and empirical gains in fidelity, novelty, and diversity.
-
ShellfishNet: A Domain-Specific Benchmark for Visual Recognition of Marine Molluscs
ShellfishNet is a new benchmark of 8,691 images across 32 mollusc taxa for evaluating vision models on real-world underwater ecological monitoring tasks including robustness to degradation.
-
Pan-FM: A Pan-Organ Foundation Model with Saliency-Guided Masking for Missing Robustness
Pan-FM learns balanced representations across seven organs by adaptively masking dominant organs during pre-training, yielding stronger disease prediction and missing-organ robustness than single-organ or naive multimodal baselines on UK Biobank.
-
ConvFormer3D-TAP: Phase/Uncertainty-Aware Front-End Fusion for Cine CMR View Classification Pipelines
ConvFormer3D-TAP classifies six cine CMR views at 96% accuracy using 3D conv tokenization, multiscale attention, and uncertainty-aware multi-clip fusion on 150k sequences.
-
Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
A supervision construction procedure generates explicit support and controlled non-support examples (counterfactual and topic-related negatives) without manual annotation, producing verifiers that demonstrate genuine evidence dependence in radiology tasks.
-
Language-Pretraining-Induced Bias: A Strong Foundation for General Vision Tasks
Random label bridge training aligns LLM parameters with vision tasks, and partial training of certain layers often suffices due to their foundational properties.
-
Momentum-Anchored Multi-Scale Fusion Model for Long-Tailed Chest X-Ray Classification
A new neural network stabilizes features for rare chest X-ray diseases via momentum anchoring and multi-scale fusion on EfficientNet, achieving 0.8682 AUC on ChestX-ray14.
-
Representation learning from OCT images
A structured survey of representation learning methods for retinal OCT image analysis, covering supervised, self-supervised, generative, multimodal, and foundation model approaches along with datasets and open problems.
- FML-bench: A Controlled Study of AI Research Agent Strategies from the Perspective of Search Dynamics
- Revisiting Reinforcement Learning with Verifiable Rewards from a Contrastive Perspective