Introduces WorldCoder-Bench and StateProbe for evaluating LLM-generated physically grounded 3D browser worlds, with frontier models reaching at most 27.8% verification coverage.
V-gamegym: Visual game generation for code large language models.arXiv preprint arXiv:2509.20136, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
WorldCoder-Bench: Benchmarking Physically Grounded 3D World Synthesis
Introduces WorldCoder-Bench and StateProbe for evaluating LLM-generated physically grounded 3D browser worlds, with frontier models reaching at most 27.8% verification coverage.