Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
6 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
A generative framework using geometric diffusion for brain networks and tabular diffusion for other organs integrates ICD-coded SDoH proxies to improve disease reasoning on UK Biobank data.
citing papers explorer
-
Divide and Contrast: Learning Robust Temporal Features without Augmentation
Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.
-
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
-
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
DINO-WM builds world models on pre-trained DINOv2 features to enable zero-shot planning from offline data without rewards or demonstrations.
-
Revisiting Feature Prediction for Learning Visual Representations from Video
V-JEPA models trained only on feature prediction from 2 million public videos achieve 81.9% on Kinetics-400, 72.2% on Something-Something-v2, and 77.9% on ImageNet-1K using frozen ViT-H/16 backbones.
-
Information theoretic underpinning of self-supervised learning by clustering
SSL clustering is derived as KL-divergence optimization where a teacher-distribution constraint normalizes via inverse cluster priors and simplifies to batch centering by Jensen's inequality.
-
Marrying Generative Model of Healthcare Events with Digital Twin of Social Determinants of Health for Disease Reasoning
A generative framework using geometric diffusion for brain networks and tabular diffusion for other organs integrates ICD-coded SDoH proxies to improve disease reasoning on UK Biobank data.