Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,

· 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models

cs.CR · 2026-05-19 · unverdicted · novelty 6.0

Hydra stabilizes multi-concept backdoor attacks in diffusion models via evolutionary trigger search in text encoder space and trigger-clean regularization during multi-task fine-tuning, achieving high attack success while preserving clean image quality.

ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection

cs.CV · 2026-04-20 · unverdicted · novelty 5.0

ZSG-IAD is a zero-shot multimodal system that uses language-guided two-hop grounding and rule-based reinforcement learning to produce anomaly masks and explainable reports from industrial sensor data.

DiffMagicFace: Identity Consistent Facial Editing of Real Videos

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.

citing papers explorer

Showing 3 of 3 citing papers.

Awakening the Hydra: Stabilizing Multi-Concept Backdoor Injection in Text-to-Image Diffusion Models cs.CR · 2026-05-19 · unverdicted · none · ref 51
Hydra stabilizes multi-concept backdoor attacks in diffusion models via evolutionary trigger search in text encoder space and trigger-clean regularization during multi-task fine-tuning, achieving high attack success while preserving clean image quality.
ZSG-IAD: A Multimodal Framework for Zero-Shot Grounded Industrial Anomaly Detection cs.CV · 2026-04-20 · unverdicted · none · ref 29
ZSG-IAD is a zero-shot multimodal system that uses language-guided two-hop grounding and rule-based reinforcement learning to produce anomaly masks and explainable reports from industrial sensor data.
DiffMagicFace: Identity Consistent Facial Editing of Real Videos cs.CV · 2026-04-15 · unverdicted · none · ref 41
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.

Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation,

fields

years

verdicts

representative citing papers

citing papers explorer