LLMind uses bio-inspired non-uniform sampling via a Mobius module and closed-loop semantic feedback to retain 82-97% of full-resolution VLM performance with only 1-5% of pixels on VQA benchmarks.
Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Panorama-Language Models with a sparse attention module and PanoVQA dataset deliver superior holistic reasoning on 360° adverse omni-scenes compared to stitched pinhole views.
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
fMRI-LM builds a foundation model that aligns fMRI signals with language through tokenization, LLM adaptation, and instruction tuning to enable semantic understanding of brain activity.
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.
citing papers explorer
-
LLMind: Bio-inspired Training-free Adaptive Visual Representations for Vision-Language Models
LLMind uses bio-inspired non-uniform sampling via a Mobius module and closed-loop semantic feedback to retain 82-97% of full-resolution VLM performance with only 1-5% of pixels on VQA benchmarks.
-
More than the Sum: Panorama-Language Models for Adverse Omni-Scenes
Panorama-Language Models with a sparse attention module and PanoVQA dataset deliver superior holistic reasoning on 360° adverse omni-scenes compared to stitched pinhole views.
-
Thinking Diffusion: Penalize and Guide Visual-Grounded Reasoning in Diffusion Multimodal Language Models
Position and step penalty plus visual reasoning guidance fix premature answering and weak visual grounding in diffusion MLLMs, delivering up to 7.5% accuracy gains and over 3x speedup.
-
fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding
fMRI-LM builds a foundation model that aligns fMRI signals with language through tokenization, LLM adaptation, and instruction tuning to enable semantic understanding of brain activity.
-
EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.