Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Li, J · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.

Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models

cs.CV · 2024-11-08 · unverdicted · novelty 6.0

Presents LLaVA-AlignedVQ, an edge-cloud VQA system with AlignedVQ that delivers 1365x feature compression, 96.8% lower transmission than JPEG90, 2-15x speedup, and accuracy within -2.23% to +1.6% of the baseline across eight datasets.

citing papers explorer

Showing 2 of 2 citing papers.

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety cs.CR · 2026-04-21 · unverdicted · none · ref 143
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models cs.CV · 2024-11-08 · unverdicted · none · ref 19
Presents LLaVA-AlignedVQ, an edge-cloud VQA system with AlignedVQ that delivers 1365x feature compression, 96.8% lower transmission than JPEG90, 2-15x speedup, and accuracy within -2.23% to +1.6% of the baseline across eight datasets.

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

fields

years

verdicts

representative citing papers

citing papers explorer