super hub Mixed citations

DINOv2: Learning Robust Visual Features without Supervision

Huy Vo, Marc Szafraniec, Maxime Oquab, Vasil Khalidov · 2023 · cs.CV · arXiv 2304.07193

Mixed citation behavior. Most common role is background (44%).

728 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 728 citing papers more from Huy Vo arXiv PDF

abstract

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques to scale our pretraining in terms of data and model size. Most of the technical contributions aim at accelerating and stabilizing the training at scale. In terms of data, we propose an automatic pipeline to build a dedicated, diverse, and curated image dataset instead of uncurated data, as typically done in the self-supervised literature. In terms of models, we train a ViT model (Dosovitskiy et al., 2020) with 1B parameters and distill it into a series of smaller models that surpass the best available all-purpose features, OpenCLIP (Ilharco et al., 2021) on most of the benchmarks at image and pixel levels.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 59 background 57 baseline 9 dataset 3 other 1

citation-polarity summary

background 57 use method 57 baseline 9 unclear 4 use dataset 2

claims ledger

abstract The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any system by producing all-purpose visual features, i.e., features that work across image distributions and tasks without finetuning. This work shows that existing pretraining methods, especially self-supervised methods, can produce such features if trained on enough curated data from diverse sources. We revisit existing approaches and combine different techniques

authors

Huy Vo Marc Szafraniec Maxime Oquab Th\'eo Moutakanni Timoth\'ee Darcet Vasil Khalidov

co-cited works

representative citing papers

Every9D-21M: Large-Scale Real-World 9D Canonicalization of Everyday Objects

cs.CV · 2026-05-27 · conditional · novelty 8.0

Every9D-21M supplies 21.8M real-world 9D pose annotations for 700 everyday categories by propagating manual canonical poses through cross-instance alignment in object-centric videos and verifying them multiview.

CalibAnyView: Beyond Single-View Camera Calibration in the Wild

cs.CV · 2026-05-14 · conditional · novelty 8.0

A multi-view transformer predicts dense perspective fields that feed a geometric optimizer to estimate camera intrinsics and gravity from arbitrary numbers of real-world views.

Rigel3D: Rig-aware Latents for Animation-Ready 3D Asset Generation

cs.GR · 2026-05-13 · unverdicted · novelty 8.0

Rigel3D jointly generates rigged 3D meshes with geometry, skeleton topology, joint positions, and skinning weights using coupled surface and skeleton latent representations for image-conditioned animation-ready asset synthesis.

On the Generation and Mitigation of Harmful Geometry in Image-to-3D Models

cs.CR · 2026-05-10 · conditional · novelty 8.0

Image-to-3D models successfully generate harmful geometries in most cases with under 0.3% caught by commercial filters; existing safeguards are weak but a stacked defense cuts harmful outputs to under 1% at 11% false-positive cost.

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing

cs.CV · 2026-04-17 · unverdicted · novelty 8.0

neuralCAD-Edit benchmark shows even the best foundation model (GPT 5.2) scores 53% lower than human CAD experts in acceptance trials for multimodal-instructed 3D model edits.

Towards Realistic 3D Emission Materials: Dataset, Baseline, and Evaluation for Emission Texture Generation

cs.CV · 2026-04-13 · unverdicted · novelty 8.0

The work creates the first dataset and baseline for generating emission textures on 3D objects to reproduce glowing materials from input images.

Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models

cs.CV · 2024-09-25 · accept · novelty 8.0

Molmo VLMs trained on newly collected PixMo open datasets achieve state-of-the-art performance among open-weight models and surpass multiple proprietary VLMs including Claude 3.5 Sonnet and Gemini 1.5 Pro.

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

cs.CL · 2024-09-04 · accept · novelty 8.0

MMMU-Pro is a stricter multimodal benchmark that removes text-only solvable questions, augments options, and requires reading text from images, yielding substantially lower model scores of 16.8-26.9%.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

Prototype Memory-Guided Training-Free Anomaly Classification and Localization in Prenatal Ultrasound

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

A training-free prototype memory-guided framework for multi-class prenatal ultrasound anomaly classification and localization using few reference images per class, validated on a 9-category multi-center dataset.

EPO: Boosting 3D Foundation Models with Edge-based Pose Optimization

cs.CV · 2026-07-01 · unverdicted · novelty 7.0

EPO is a trackless, edge-map-alignment framework that refines pose estimates from 3D foundation models and matches or exceeds bundle-adjustment performance with substantially lower runtime and memory use.

GEAR: Guided End-to-End AutoRegression for Image Synthesis

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

GEAR jointly trains VQ tokenizer and AR generator end-to-end via dual hard/soft read-out and representation alignment, achieving up to 10x faster ImageNet gFID convergence than LlamaGen-REPA while generalizing across quantizers and to text-to-image.

Language-Assisted Super-Resolution from Real-World Low-Resolution Patches

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

LA-SR redefines unpaired super-resolution in language space by projecting images into a semantically rich representation and applying vision-language model guided losses to handle real-world degradations extracted from depth variations.

WarpHammer: Densifying Scene Warps with 3D Object Priors for Extreme View Synthesis

cs.CV · 2026-06-30 · unverdicted · novelty 7.0

WarpHammer densifies scene warps with 3D object priors from generative models and fuses pose-unknown auxiliary views via multi-view geometry to enable stable extreme novel view synthesis.

Beyond 2D Matching: A Unified Single-Stage Framework for Geometry-Aware Cross-View Object Geo-Localization

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

A new dataset of 220k+ cross-view pairs and a single-stage geometry-aware model GAGeo based on the π³ 3D foundation model outperforms prior methods on object geo-localization with strong generalization and zero-shot ground-to-drone capability.

Complete virtual unwrapping and reading of a rolled Herculaneum papyrus

eess.IV · 2026-06-27 · unverdicted · novelty 7.0

First complete digital unwrapping and reading of a Herculaneum papyrus scroll (PHerc. 1667) via synchrotron X-ray CT, virtual unrolling, and machine learning.

Unleashing Infinite Motion: Scaling Expressive Quadrupedal Motion via Generative Video Priors

cs.RO · 2026-06-26 · conditional · novelty 7.0

Uni-Mo generates 7,488 language-annotated quadruped motions via LLM prompts and video diffusion, lifts them to 3D trajectories, and trains policies achieving 96.7% real-robot success on 392 sampled motions.

A Unified Framework for Vision Transformers Equivariant to Discrete Subgroups of $\mathrm{O}(2)$

cs.CV · 2026-06-26 · unverdicted · novelty 7.0 · 2 refs

Constructs G-equivariant ViTs for arbitrary discrete G ≤ O(2), proves H ≤ G implies G-models embed into H-models and single-head equivariant attention realizes all ordinary G-equivariant maps, introduces D6 hexagonal model, and reports preliminary accuracy gains on PatternNet in low-data regimes.

Learning 1-Bit LiDAR-based Localization with Auxiliary Objective

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

BiLoc is the first binary neural network framework for 6-DoF LiDAR pose estimation that uses an auxiliary objective to adaptively regulate information retention and achieve SOTA among BNNs on large outdoor datasets.

Scene and Human in One World: Reconstruction in a Feedforward Pass

cs.CV · 2026-06-26 · unverdicted · novelty 7.0

SHOW is a mask-promptable framework coupling feed-forward scene reconstruction with human mesh recovery in a unified metric space to resolve scale ambiguity and improve human-scene alignment from monocular video.

MIRAGE: Protecting against Malicious Image Editing via False Moderation

cs.CR · 2026-06-24 · unverdicted · novelty 7.0

MIRAGE immunizes images by crafting perturbations that align them with policy-violating concepts in open-source moderation models, triggering refusals in closed-source commercial image editors at over 88% success rate.

Rethinking Prototype-based Similarity Learning for Few-Shot Object Detection

cs.CV · 2026-06-22 · unverdicted · novelty 7.0

Introduces TSMa using text-visual channel interaction and SHARe using ViT layer-aligned autoregressive regression to improve prototype-based few-shot object detection, reporting +10.1 nAP on COCO.

Polarisation and Faraday rotation measure imaging at metre wavelengths with sub-arcsecond resolution: a foundational calibration strategy

astro-ph.IM · 2026-06-16 · unverdicted · novelty 7.0

A calibration strategy using full-Jones corrections with an in-field unpolarised calibrator and visibility-based multi-epoch alignment enables sub-arcsecond polarimetric imaging with LOFAR at metre wavelengths.

Pano3D: Unified 3D Reconstruction and Panoptic Segmentation

cs.CV · 2026-06-12 · unverdicted · novelty 7.0

Pano3D augments 3D feedforward reconstruction backbones with a set-based mask decoder and joint geometric-semantic training to achieve SOTA 3D panoptic segmentation on ScanNet, ScanNet200, and ScanNet++.

citing papers explorer

Showing 50 of 51 citing papers after filters.

Balancing Image Compression and Generation with Bootstrapped Tokenization cs.LG · 2026-06-04 · unverdicted · none · ref 54 · internal anchor
SelfBootTok decomposes image tokens into global and local groups via self-bootstrapped learning, enabling generators to use only global tokens for ~40% less computation and a new SOTA gFID of 1.56 with 64 tokens.
UR-JEPA: Uniform Rectifiability as a Regularizer for Joint-Embedding Predictive Architectures cs.LG · 2026-05-31 · unverdicted · none · ref 12 · internal anchor
UR-JEPA applies uniform rectifiability regularization via a smoothed Carleson square function to JEPA training, producing embeddings with 4-5 order PCA spectral drop at dimension 20-25 and lower seed variance than Gaussian regularization on Inet10, Galaxy10, and EuroSAT.
How Neural Losses Shape VAE Latents cs.LG · 2026-05-30 · unverdicted · none · ref 29 · internal anchor
Neural reconstruction losses in VAEs reduce latent information content and produce more isotropic latent geometries with even uncertainty distribution.
Probabilistic Recurrent Intention Switching Model cs.LG · 2026-05-26 · unverdicted · none · ref 4 · internal anchor
PRISM replaces Markov or fixed-window intention models in multi-intention IRL with a recurrent network, proving an exact EM decomposition into closed-form per-intention reward problems and reporting highest held-out likelihood on gridworld, mouse, and robotic tasks.
PEIRA: Learning Predictive Encoders through Inter-View Regressor Alignment cs.LG · 2026-05-17 · unverdicted · none · ref 51 · internal anchor
PEIRA learns predictive encoders by optimizing the trace of the optimal inter-view linear regressor, with only nontrivial global minimizers as stable equilibria that recover leading nonlinear canonical correlation subspaces.
Seeking the Unfamiliar but Memorable: Conceptual Creativity as Meta-Learning cs.LG · 2026-05-15 · unverdicted · none · ref 31 · internal anchor
Creativity is defined as meta-learning where a frozen diffusion creator optimizes candidates for rapid improvement by an adapting appraiser such as an autoencoder or CLIP adapter.
SurF: A Generative Model for Multivariate Irregular Time Series Forecasting cs.LG · 2026-05-13 · unverdicted · none · ref 13 · internal anchor
SurF applies the Time Rescaling Theorem as a learnable bijection to create a single generative model for forecasting irregular multivariate event streams that outperforms or matches baselines on six benchmarks.
SMA: Submodular Modality Aligner For Data Efficient Multimodal Learning cs.LG · 2026-05-13 · unverdicted · none · ref 43 · internal anchor
SMA uses a submodular mutual information objective on data sets to deliver competitive zero-shot classification and retrieval performance on CLIP benchmarks with only tens of thousands of samples, orders of magnitude fewer than standard approaches.
Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic cs.LG · 2026-05-12 · unverdicted · none · ref 133 · 2 links · internal anchor
Embedding Temporal Logic (ETL) performs runtime monitoring directly in learned embedding spaces using distance-based predicates composed with temporal operators, supported by conformal calibration for reliable predicate evaluation.
Unlearning with Asymmetric Sources: Improved Unlearning-Utility Trade-off with Public Data cs.LG · 2026-05-11 · unverdicted · none · ref 3 · internal anchor
ALU uses public data to suppress unlearning cost quadratically while characterizing distribution mismatch effects, enabling mass unlearning with maintained utility.
What Cohort INRs Encode and Where to Freeze Them cs.LG · 2026-05-08 · unverdicted · none · ref 45 · internal anchor
Optimal INR freeze depth matches highest weight stable rank layer; SAEs reveal SIREN atoms are localized while FFMLP atoms trace cohort contours with causal impact on PSNR.
Frequency-Forcing: From Scaling-as-Time to Soft Frequency Guidance cs.LG · 2026-04-21 · unverdicted · none · ref 16 · internal anchor
Frequency-Forcing guides pixel flow-matching with a data-derived low-frequency auxiliary stream to softly enforce scale-ordered generation, improving FID on ImageNet-256 over baselines.
Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation cs.LG · 2026-04-11 · unverdicted · none · ref 183 · internal anchor
The first survey on Attention Sink in Transformers structures the literature around fundamental utilization, mechanistic interpretation, and strategic mitigation.
Conformal Margin Risk Minimization: An Envelope Framework for Robust Learning under Label Noise cs.LG · 2026-04-07 · unverdicted · none · ref 45 · internal anchor
CMRM adds a conformal quantile regularization on prediction margins to any loss, improving noisy-label classification accuracy up to 3.39% across methods and benchmarks while preserving performance at zero noise.
Beauty in the Eye of AI: Aligning LLMs and Vision Models with Human Aesthetics in Network Visualization cs.LG · 2026-04-03 · conditional · none · ref 3 · internal anchor
LLMs and vision models achieve human-human alignment levels in judging network visualization aesthetics through prompt engineering on a new dataset of human preferences from 27 participants.
How Out-of-Equilibrium Phase Transitions can Seed Pattern Formation in Trained Diffusion Models cs.LG · 2026-03-20 · unverdicted · none · ref 21 · internal anchor
Pattern formation in trained diffusion models emerges from out-of-equilibrium phase transitions driven by instabilities in low-frequency denoising modes linked to data symmetries and architectural constraints.
SetFlow: Generating Structured Sets of Representations for Multiple Instance Learning cs.LG · 2026-03-20 · unverdicted · none · ref 25 · internal anchor
SetFlow is a flow-matching generative model for permutation-invariant MIL bags in representation space that produces synthetic data improving classification performance and enabling training on synthetic data alone.
QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models cs.LG · 2026-02-23 · unverdicted · none · ref 27 · internal anchor
QuantVLA is the first post-training quantization framework for VLA models that quantizes the diffusion transformer action head and reports higher task success rates than full-precision baselines with roughly 70% memory savings on the quantized components.
Mixture of Predefined Experts: Maximizing Data Usage on Vertical Federated Learning cs.LG · 2026-02-13 · unverdicted · none · ref 33 · internal anchor
Split-MoPE integrates split learning with predefined-expert routing to maximize usable data in vertical federated learning under sample misalignment, delivering state-of-the-art accuracy in one communication round plus built-in robustness and per-sample contribution scores.
From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity cs.LG · 2025-12-02 · conditional · none · ref 33 · internal anchor
Flow matching models follow a two-stage process of navigation across data modes then refinement to nearest samples, revealed by exact computation of the oracle marginal velocity field.
KODA: Contrastive Representation Comparison and Alignment for Vision-Language Foundation Models cs.LG · 2026-06-02 · unverdicted · none · ref 4 · internal anchor
KODA uses modality-wise kernel composition and constrained optimization to discover interpretable discrepancy structures between vision-language representations.
Spatial Transcriptomics-Guided Alignment Enhances Molecular Profiling in Pathology Foundation Model cs.LG · 2026-05-29 · unverdicted · none · ref 13 · internal anchor
STAMP uses a curated 1.8M-pair spatial transcriptomics atlas and pathway-informed alignment to augment pathology foundation models for molecular phenotype inference from H&E WSIs.
Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling cs.LG · 2026-05-22 · unverdicted · none · ref 60 · internal anchor
SemiPrune uses a small labeled subset and semi-supervised pseudo-labeling to enable supervised dataset pruning methods, achieving state-of-the-art results on domain-specific, image-corrupted, and long-tailed datasets.
Uncovering the Latent Potential of Deep Intermediate Representations cs.LG · 2026-05-21 · unverdicted · none · ref 61 · internal anchor
Introduces LOES, a constructive spectral method to select task-discriminative subspaces from intermediate layer embeddings, and GeoReg for enforcing simplicial class geometry during fine-tuning, with reported gains increasing with model depth across modalities.
Divide and Contrast: Learning Robust Temporal Features without Augmentation cs.LG · 2026-05-20 · unverdicted · none · ref 42 · internal anchor
Di-COT is an unsupervised contrastive method that stochastically partitions time-series windows into overlapping sub-blocks to learn representations without augmentation, reporting SOTA results on classification and transfer tasks across multiple benchmarks while cutting training time.
AURORA: Contextual Orthogonalization for Geometric Representation Learning in Healthcare Foundation Models cs.LG · 2026-05-18 · unverdicted · none · ref 24 · internal anchor
AURORA is a representation learning framework that uses contextual orthogonalization and relational alignment to create disentangled, geometrically interpretable latent spaces in healthcare foundation models.
20/20 Vision Language Models: A Prescription for Better VLMs through Data Curation Alone cs.LG · 2026-05-12 · conditional · none · ref 28 · 2 links · internal anchor
Data curation alone raises VLM accuracy by more than 11 points on average across many benchmarks while cutting required training compute by up to 87 times.
DiffATS: Diffusion in Aligned Tensor Space cs.LG · 2026-05-10 · unverdicted · none · ref 40 · internal anchor
DiffATS trains diffusion models directly on aligned Tucker tensor primitives that are proven to be homeomorphisms, delivering efficient unconditional and conditional generation across images, videos, and PDE data with high compression.
Event Fields: Learning Latent Event Structure for Waveform Foundation Models cs.LG · 2026-05-09 · unverdicted · none · ref 24 · internal anchor
Event-centric waveform foundation models are learned via self-supervised consistency on latent event structures and interactions, yielding improved performance and label efficiency over sequence-based baselines on physiological tasks.
Predictive but Not Plannable: RC-aux for Latent World Models cs.LG · 2026-05-08 · unverdicted · none · ref 32 · internal anchor
RC-aux corrects spatiotemporal mismatch in reconstruction-free latent world models by adding multi-horizon prediction and reachability supervision, improving planning performance on goal-conditioned pixel-control tasks.
Continuous Adversarial Flow Models cs.LG · 2026-04-13 · unverdicted · none · ref 55 · internal anchor
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding cs.LG · 2026-04-09 · unverdicted · none · ref 79 · internal anchor
A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.
GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control cs.LG · 2026-04-08 · unverdicted · none · ref 6 · internal anchor
GIRL reduces latent rollout drift by 38-61% versus DreamerV3 in MBRL by grounding transitions with DINOv2 embeddings and using an information-theoretic adaptive bottleneck, yielding better long-horizon returns on control benchmarks.
Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training cs.LG · 2026-04-08 · conditional · none · ref 24 · internal anchor
Data Warmup accelerates diffusion training on ImageNet by scheduling images from low to high complexity via a foreground-based metric and temperature-controlled sampler, improving FID and IS scores faster than uniform sampling.
Uncertainty-Aware Foundation Models for Clinical Data cs.LG · 2026-04-05 · unverdicted · none · ref 66 · internal anchor
The work introduces uncertainty-aware foundation models for clinical data by learning set-valued patient representations that enforce consistency across partial observations and integrate multimodal self-supervised objectives.
Diffusion Models Memorize in Training -- and Generalize in Inference cs.LG · 2026-03-12 · unverdicted · none · ref 46 · internal anchor
Diffusion models overfit denoising loss at intermediate noise but generalize in inference as model error smooths the flow field and sampling paths avoid memorized noisy training data.
Drift Localization using Conformal Predictions cs.LG · 2026-02-23 · unverdicted · none · ref 13 · internal anchor
Conformal predictions enable drift localization by identifying affected samples, outperforming local testing on image datasets.
Image Diffusion Preview with Consistency Solver cs.LG · 2025-12-15 · unverdicted · none · ref 28 · internal anchor
ConsistencySolver enables high-quality low-step diffusion previews by adapting general linear multistep methods into a lightweight RL-optimized solver, matching multistep DPM-Solver FID with 47% fewer steps and cutting user interaction time by nearly 50%.
LeJEPA: Provable and Scalable Self-Supervised Learning Without the Heuristics cs.LG · 2025-11-11 · conditional · none · ref 51 · internal anchor
LeJEPA derives an optimal isotropic Gaussian target for embeddings and enforces it via sketched regularization to deliver scalable, heuristics-free self-supervised pretraining with 79% ImageNet linear accuracy on ViT-H/14.
Interactive Post-Training for Vision-Language-Action Models cs.LG · 2025-05-22 · unverdicted · none · ref 23 · internal anchor
RIPT-VLA applies RL with dynamic rollout sampling and leave-one-out advantage estimation to fine-tune VLA models, achieving up to 97.5% success rates and recovering from 4% to 97% success with one demonstration in 15 iterations.
Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra cs.LG · 2026-05-23 · conditional · none · ref 27 · internal anchor
Muon optimizer outperforms AdamW in ViT training on two image datasets, with gains that depend on data augmentation strength and are linked to wider singular-value spread in QKV gradients and prevention of late-training mode collapse in MLP blocks.
stable-worldmodel: A Platform for Reproducible World Modeling Research and Evaluation cs.LG · 2026-05-20 · unverdicted · none · ref 36 · internal anchor
The paper presents stable-worldmodel (swm), a platform with high-performance data layer, modern world model baselines, planning solvers, and extended environments for reproducible research and generalization evaluation.
WISTERIA: Learning Clinical Representations from Noisy Supervision via Multi-View Consistency in Electronic Health Records cs.LG · 2026-05-10 · unverdicted · none · ref 25 · internal anchor
WISTERIA learns robust clinical representations from noisy EHR labels by enforcing consistency across multiple weak supervision views plus ontology regularization.
Stable Multimodal Graph Unlearning via Feature-Dimension Aware Quantile Selection cs.LG · 2026-05-05 · unverdicted · none · ref 44 · internal anchor
FDQ improves stability in multimodal graph unlearning by using feature-dimension aware quantile selection to protect sensitive high-dimensional layers while preserving utility and enabling effective forgetting.
BrainDINO: A Brain MRI Foundation Model for Generalizable Clinical Representation Learning cs.LG · 2026-04-30 · unverdicted · none · ref 25 · internal anchor
BrainDINO, trained via self-distillation on millions of unlabeled axial brain MRI slices, yields a unified representation that equals or exceeds baselines across diverse neuroimaging tasks when used with a frozen encoder and lightweight heads.
Concrete Jungle: Towards Concreteness Paved Contrastive Negative Mining for Compositional Understanding cs.LG · 2026-04-14 · unverdicted · none · ref 48 · internal anchor
Using lexical concreteness to guide contrastive negative mining and a new margin-based Cement loss, the Slipform framework reaches state-of-the-art on compositional benchmarks for vision-language models.
TOAST: Transformer Optimization using Adaptive and Simple Transformations cs.LG · 2024-10-07 · unverdicted · none · ref 25 · internal anchor
TOAST approximates full transformer blocks in pretrained models via lightweight closed-form mappings to cut parameters and FLOPs without retraining or finetuning.
The Platonic Representation Hypothesis cs.LG · 2024-05-13 · unverdicted · none · ref 197 · internal anchor
Representations learned by large AI models are converging toward a shared statistical model of reality.
Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory cs.LG · 2026-06-04 · unverdicted · none · ref 73 · internal anchor
The book presents principles from optimization and information theory to explain deep network architectures and enable new interpretable models.
Data-Centric Foundation Models in Computational Healthcare: A Survey cs.LG · 2024-01-04 · unverdicted · none · ref 215 · internal anchor
The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.

DINOv2: Learning Robust Visual Features without Supervision

hub tools

citation-role summary

citation-polarity summary

claims ledger

authors

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer