Introduces culture-aware humorous captioning task and staged alignment framework that improves contextual fit and balances image relevance with humor in multimodal LLMs.
Humor in pixels: Benchmarking large multimodal models understanding of online comics
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it