Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
Unisvg: A unified dataset for vector graphic understanding and generation with multimodal large language models
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 5roles
background 1polarities
background 1representative citing papers
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
ShapeCodeBench introduces a renewable benchmark for perception-to-program reconstruction of synthetic shapes, with evaluations showing low exact-match performance from current models and heuristics.
G2F-RAG converts retrieved knowledge subgraphs into a single visual reasoning frame appended to videos, enabling training-free and interpretable improvements for LMM-based video reasoning on knowledge-intensive tasks.
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.
citing papers explorer
-
Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation
Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
-
Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond
Proposes a levels x laws taxonomy for world models in AI agents, defining L1-L3 capabilities across physical, digital, social, and scientific regimes while reviewing over 400 works to outline a roadmap for advanced agentic modeling.
-
ShapeCodeBench: A Renewable Benchmark for Perception-to-Program Reconstruction of Synthetic Shape Scenes
ShapeCodeBench introduces a renewable benchmark for perception-to-program reconstruction of synthetic shapes, with evaluations showing low exact-match performance from current models and heuristics.
-
Graph-to-Frame RAG: Visual-Space Knowledge Fusion for Training-Free and Auditable Video Reasoning
G2F-RAG converts retrieved knowledge subgraphs into a single visual reasoning frame appended to videos, enabling training-free and interpretable improvements for LMM-based video reasoning on knowledge-intensive tasks.
-
VisPhyWorld: Probing Physical Reasoning via Code-Driven Video Reconstruction
VisPhyWorld evaluates MLLMs' physical reasoning via executable code generation for video reconstruction, with VisPhyBench showing strong semantics but weak parameter inference and dynamics simulation.