Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi · 2023

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models

cs.CV · 2026-03-16 · unverdicted · novelty 7.0

LLMind uses bio-inspired non-uniform sampling via a Mobius module and closed-loop semantic feedback to retain 82-97% of full-resolution VLM performance with only 1-5% of pixels on VQA benchmarks.

More than the Sum: Panorama-Language Models for Adverse Omni-Scenes

cs.CV · 2026-03-10 · unverdicted · novelty 7.0

Panorama-Language Models with a sparse attention module and PanoVQA dataset deliver superior holistic reasoning on 360° adverse omni-scenes compared to stitched pinhole views.

Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models

cs.AI · 2026-04-07 · unverdicted · novelty 6.0

Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.

fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding

cs.CL · 2025-11-24 · unverdicted · novelty 6.0

fMRI-LM builds a foundation model that aligns fMRI signals with language through tokenization, LLM adaptation, and instruction tuning to enable semantic understanding of brain activity.

EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.

citing papers explorer

Showing 5 of 5 citing papers.

LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models cs.CV · 2026-03-16 · unverdicted · none · ref 35
LLMind uses bio-inspired non-uniform sampling via a Mobius module and closed-loop semantic feedback to retain 82-97% of full-resolution VLM performance with only 1-5% of pixels on VQA benchmarks.
More than the Sum: Panorama-Language Models for Adverse Omni-Scenes cs.CV · 2026-03-10 · unverdicted · none · ref 24
Panorama-Language Models with a sparse attention module and PanoVQA dataset deliver superior holistic reasoning on 360° adverse omni-scenes compared to stitched pinhole views.
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models cs.AI · 2026-04-07 · unverdicted · none · ref 15
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding cs.CL · 2025-11-24 · unverdicted · none · ref 22
fMRI-LM builds a foundation model that aligns fMRI signals with language through tokenization, LLM adaptation, and instruction tuning to enable semantic understanding of brain activity.
EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling cs.CV · 2026-04-18 · unverdicted · none · ref 24
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.

Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer