Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Proposes CSR task and HalluSegBench using visual counterfactuals to diagnose segmentation hallucinations in VLMs, plus RobustSeg via counterfactual fine-tuning that reduces hallucinations by 30% on FP-RefCOCO.
Multimodal LLMs significantly underperform humans at spotting objects that break 3D consistency in multi-view image pairs.
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
citing papers explorer
-
Do-Undo Bench: Reversibility for Action Understanding in Image Generation
Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
-
Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination
Proposes CSR task and HalluSegBench using visual counterfactuals to diagnose segmentation hallucinations in VLMs, plus RobustSeg via counterfactual fine-tuning that reduces hallucinations by 30% on FP-RefCOCO.
-
Multimodal Language Models Cannot Spot Spatial Inconsistencies
Multimodal LLMs significantly underperform humans at spotting objects that break 3D consistency in multi-view image pairs.
-
Video models are zero-shot learners and reasoners
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.