Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi · 2023

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VGR: Visual Grounded Reasoning

cs.CV · 2025-06-13 · unverdicted · novelty 7.0

VGR introduces a visual-grounded reasoning MLLM that detects and replays image regions during inference, achieving gains on visual benchmarks with 30% fewer image tokens than the LLaVA-NeXT-7B baseline.

Test-Time Distillation for Continual Model Adaptation

cs.CV · 2025-06-03 · conditional · novelty 7.0

CoDiRe blends VLM and target model predictions via MSP-based weighting and Optimal Transport rectification to enable stable continual test-time adaptation, outperforming CoTTA by 10.55% on ImageNet-C at 48% of the compute cost.

GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

cs.RO · 2025-06-17 · unverdicted · novelty 6.0

GAF creates 4D dynamic scene models by adding motion to 3D Gaussians, enabling better reconstruction and 7.3% higher success in robotic tasks.

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

cs.CV · 2024-04-25 · conditional · novelty 5.0

A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.

GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization

cs.RO · 2026-05-12

citing papers explorer

Showing 6 of 6 citing papers.

VGR: Visual Grounded Reasoning cs.CV · 2025-06-13 · unverdicted · none · ref 20
VGR introduces a visual-grounded reasoning MLLM that detects and replays image regions during inference, achieving gains on visual benchmarks with 30% fewer image tokens than the LLaVA-NeXT-7B baseline.
Test-Time Distillation for Continual Model Adaptation cs.CV · 2025-06-03 · conditional · none · ref 25
CoDiRe blends VLM and target model predictions via MSP-based weighting and Optimal Transport rectification to enable stable continual test-time adaptation, outperforming CoTTA by 10.55% on ImageNet-C at 48% of the compute cost.
GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation cs.RO · 2025-06-17 · unverdicted · none · ref 36
GAF creates 4D dynamic scene models by adding motion to 3D Gaussians, enabling better reconstruction and 7.3% higher success in robotic tasks.
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture cs.CV · 2026-05-12 · unverdicted · none · ref 67
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning cs.CV · 2024-04-25 · conditional · none · ref 18
A temporal pooling layer added to LLaVA smooths video feature distributions and lifts performance on dense video captioning and QA to new SOTA levels without extra parameters.
GuidedVLA: Specifying Task-Relevant Factors via Plug-and-Play Action Attention Specialization cs.RO · 2026-05-12 · unreviewed · ref 51

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer