Sam-clip: Merging vision foundation models to- wards semantic and spatial understanding

Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models

cs.CV · 2025-12-23 · conditional · novelty 6.0

SigLino distills SigLIP2 and DINOv3 into efficient vision models via asymmetric relation-knowledge distillation, token-balanced batching, and hierarchical data sampling on a new 200M-image corpus, yielding better transfer to grounding VLMs than training from scratch.

citing papers explorer

Showing 1 of 1 citing paper.

SigLino: Efficient Multi-Teacher Distillation for Agglomerative Vision Foundation Models cs.CV · 2025-12-23 · conditional · none · ref 34
SigLino distills SigLIP2 and DINOv3 into efficient vision models via asymmetric relation-knowledge distillation, token-balanced batching, and hierarchical data sampling on a new 200M-image corpus, yielding better transfer to grounding VLMs than training from scratch.

Sam-clip: Merging vision foundation models to- wards semantic and spatial understanding

fields

years

verdicts

representative citing papers

citing papers explorer