ImageTime is a benchmark that probes image generation models' visual world modeling by requiring coherent four-state sequences in single images, scored via VLM judge.
Gir-bench: Versatile benchmark for generating images with reasoning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
MetaPoint represents 2D coordinates as special tokens in visual generative models to enable precise spatial control using existing positional encodings without architectural modifications.
citing papers explorer
-
Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency
ImageTime is a benchmark that probes image generation models' visual world modeling by requiring coherent four-state sequences in single images, scored via VLM judge.
-
MetaPoint: Unlocking Precise Spatial Control in Agentic Visual Generation
MetaPoint represents 2D coordinates as special tokens in visual generative models to enable precise spatial control using existing positional encodings without architectural modifications.