Flamingo: a visual language model for few-shot learning

Alayrac, J · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety

cs.CR · 2026-04-21 · unverdicted · novelty 7.0

ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.

Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

cs.CV · 2024-12-19 · unverdicted · novelty 6.0

Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.

Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models

cs.CV · 2024-11-08 · unverdicted · novelty 6.0

Presents LLaVA-AlignedVQ, an edge-cloud VQA system with AlignedVQ that delivers 1365x feature compression, 96.8% lower transmission than JPEG90, 2-15x speedup, and accuracy within -2.23% to +1.6% of the baseline across eight datasets.

citing papers explorer

Showing 3 of 3 citing papers.

ProjLens: Unveiling the Role of Projectors in Multimodal Model Safety cs.CR · 2026-04-21 · unverdicted · none · ref 121
ProjLens shows that backdoor parameters in MLLMs are encoded in low-rank subspaces of the projector and that embeddings shift toward the target direction with magnitude linear in input norm, activating only on poisoned samples.
Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations cs.CV · 2024-12-19 · unverdicted · none · ref 77
Video Prediction Policy conditions robot action learning on future-frame predictions inside fine-tuned video diffusion models, yielding 18.6% relative gains on Calvin ABC-D and 31.6% higher real-world success rates.
Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models cs.CV · 2024-11-08 · unverdicted · none · ref 3
Presents LLaVA-AlignedVQ, an edge-cloud VQA system with AlignedVQ that delivers 1365x feature compression, 96.8% lower transmission than JPEG90, 2-15x speedup, and accuracy within -2.23% to +1.6% of the baseline across eight datasets.

Flamingo: a visual language model for few-shot learning

fields

years

verdicts

representative citing papers

citing papers explorer