Instructblip: Towards general-purpose vision- language models with instruction tuning,

· 2023

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion

cs.CV · 2025-03-08 · unverdicted · novelty 6.0

RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.

UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning

cs.CV · 2025-08-15 · unverdicted · novelty 5.0

UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.

Recent Advances in Multimodal Affective Computing: An NLP Perspective

cs.CL · 2024-09-11 · unverdicted · novelty 3.0

Survey organizing multimodal affective computing research around four NLP tasks, method paradigms, datasets, evaluation protocols, and future directions while releasing a resource repository.

citing papers explorer

Showing 3 of 3 citing papers.

RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion cs.CV · 2025-03-08 · unverdicted · none · ref 19
RedDiffuser is a reinforced diffusion framework that generates adversarial visual contexts to audit and expose widespread multimodal safety failures in VLMs, increasing unsafe response rates by up to 10.69% on LLaVA with transfer to other models.
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning cs.CV · 2025-08-15 · unverdicted · none · ref 50
UAV-VL-R1 combines SFT and multi-stage GRPO reinforcement learning on a new 50,019-sample HRVQA-VL dataset to deliver substantially higher zero-shot accuracy on UAV visual reasoning tasks than both its 2B baseline and a 72B-scale model.
Recent Advances in Multimodal Affective Computing: An NLP Perspective cs.CL · 2024-09-11 · unverdicted · none · ref 74
Survey organizing multimodal affective computing research around four NLP tasks, method paradigms, datasets, evaluation protocols, and future directions while releasing a resource repository.

Instructblip: Towards general-purpose vision- language models with instruction tuning,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer