Introduces YesBut benchmark showing state-of-the-art multimodal models lag humans on interpreting humorous contradictions in comics.
G- eval: NLG evaluation using gpt-4 with better human alignment
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
SiRA uses LLM world models for simulative reasoning to achieve up to 124% higher task completion and 32.2% navigation success versus reactive baselines in web environments.
citing papers explorer
-
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Introduces YesBut benchmark showing state-of-the-art multimodal models lag humans on interpreting humorous contradictions in comics.
-
General Agentic Planning Through Simulative Reasoning with World Models
SiRA uses LLM world models for simulative reasoning to achieve up to 124% higher task completion and 32.2% navigation success versus reactive baselines in web environments.