Current VLMs excel at individual manga panel interpretation but systematically fail at temporal causality and cross-panel cohesion in long-form narratives.
(pea)nuts and bolts of visual narrative: Structure and meaning in sequential image com- prehension.Cognitive science, 36(6):1084–1112, 2012
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Re:Verse -- Can Your VLM Read a Manga?
Current VLMs excel at individual manga panel interpretation but systematically fail at temporal causality and cross-panel cohesion in long-form narratives.