VLM-UnBench demonstrates that prompt-based training-free unlearning in VLMs leaves forget accuracy near the no-instruction baseline except under oracle conditions that reveal the target concept.
Can clip count stars? an empirical study on quantity bias in clip
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
VLMs fail at counting because visual evidence degrades in later language layers, and a lightweight Modality Attention Share intervention can encourage better use of image information during answer generation.
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.
citing papers explorer
-
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
VLM-UnBench demonstrates that prompt-based training-free unlearning in VLMs leaves forget accuracy near the no-instruction baseline except under oracle conditions that reveal the target concept.
-
Counting to Four is still a Chore for VLMs
VLMs fail at counting because visual evidence degrades in later language layers, and a lightweight Modality Attention Share intervention can encourage better use of image information during answer generation.
-
EvoComp: Learning Visual Token Compression for Multimodal Large Language Models via Semantic-Guided Evolutionary Labeling
EvoComp compresses visual tokens in MLLMs by 3x while retaining 99.3% accuracy via an evolutionary labeling strategy that searches for low-loss, semantically diverse token subsets.