SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.
Vision-language models for vision tasks: A survey
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 2polarities
background 2representative citing papers
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.
citing papers explorer
-
Revisiting Shadow Detection from a Vision-Language Perspective
SVL uses language embeddings aligned with global image representations via shadow ratio regression and global-to-local coupling to improve shadow detection robustness in ambiguous cases.
-
MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization
MFVLR uses multi-domain vision-language reconstruction with a fine-grained language transformer, multi-domain vision encoder, and vision injection module to achieve generalizable detection and localization of diffusion-synthesized face forgeries.
- Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models