IRS supervision framework improves multimodal models on New Yorker cartoon caption matching and ranking by explicitly training incongruity detection, reinterpretation, and human preference alignment, with the largest model nearing expert ranking performance.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding
IRS supervision framework improves multimodal models on New Yorker cartoon caption matching and ranking by explicitly training incongruity detection, reinterpretation, and human preference alignment, with the largest model nearing expert ranking performance.