Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

Anand Bhattad, Konpat Preechakul, Alexei A Efros · 2025 · arXiv 2503.21770

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Do-Undo Bench: Reversibility for Action Understanding in Image Generation

cs.CV · 2025-12-15 · unverdicted · novelty 7.0

Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.

Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination

cs.CV · 2025-06-26 · unverdicted · novelty 7.0

Proposes CSR task and HalluSegBench using visual counterfactuals to diagnose segmentation hallucinations in VLMs, plus RobustSeg via counterfactual fine-tuning that reduces hallucinations by 30% on FP-RefCOCO.

Multimodal Language Models Cannot Spot Spatial Inconsistencies

cs.CV · 2026-04-01 · unverdicted · novelty 6.0

Multimodal LLMs significantly underperform humans at spotting objects that break 3D consistency in multi-view image pairs.

Video models are zero-shot learners and reasoners

cs.LG · 2025-09-24 · unverdicted · novelty 6.0

Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

citing papers explorer

Showing 4 of 4 citing papers.

Do-Undo Bench: Reversibility for Action Understanding in Image Generation cs.CV · 2025-12-15 · unverdicted · none · ref 4
Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
Counterfactual Segmentation Reasoning: Diagnosing and Mitigating Pixel-Grounding Hallucination cs.CV · 2025-06-26 · unverdicted · none · ref 4
Proposes CSR task and HalluSegBench using visual counterfactuals to diagnose segmentation hallucinations in VLMs, plus RobustSeg via counterfactual fine-tuning that reduces hallucinations by 30% on FP-RefCOCO.
Multimodal Language Models Cannot Spot Spatial Inconsistencies cs.CV · 2026-04-01 · unverdicted · none · ref 8
Multimodal LLMs significantly underperform humans at spotting objects that break 3D consistency in multi-view image pairs.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 52
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.

Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer