SVL uses vision-language alignment via scene-level shadow ratio regression and global-to-local coupling on a frozen DINOv3 encoder to disambiguate shadows from dark surfaces in dense prediction.
Vision-language models for vision tasks: A survey
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 3roles
background 2polarities
background 2representative citing papers
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.