Dinov2 meets text: A unified framework for image-and pixel-level vision-language alignment

Jose, C · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Current VLMs depend on tightly aligned curated data and cannot exploit the weakly-aligned egocentric video signals that dominate naturalistic infant input.

citing papers explorer

Showing 1 of 1 citing paper.

EgoBabyVLM: Benchmarking Cross-Modal Learning from Naturalistic Egocentric Video Data cs.LG · 2026-05-18 · unverdicted · none · ref 25
Current VLMs depend on tightly aligned curated data and cannot exploit the weakly-aligned egocentric video signals that dominate naturalistic infant input.

Dinov2 meets text: A unified framework for image-and pixel-level vision-language alignment

fields

years

verdicts

representative citing papers

citing papers explorer