Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
Omnisvg: A unified scalable vector graphics generation model.arXiv preprint arXiv:2504.06263, 2025b
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
Element-level leave-one-out analysis yields per-element quality scores and four structural metrics (purity, coverage, compactness, locality) that quantify SVG modularity and enable artifact detection.
HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.
Visual-ERM is a new multimodal reward model that supplies fine-grained visual feedback for training vision-language models on chart-to-code, table, and SVG tasks, yielding measurable gains over prior rewards.
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
AnchorFlow predicts image-conditioned sparse anchor point fields to resolve into ordered Bezier paths, then uses rendering feedback to correct errors, yielding SVGs with reduced editable complexity and competitive fidelity.
DataSway supports creation of semantically aligned animations for metaphoric data visualizations by generating clips via VLMs and coordinating timelines based on entity order, attributes, layout, or randomness.
citing papers explorer
-
Vision2Code: A Multi-Domain Benchmark for Evaluating Image-to-Code Generation
Vision2Code is a multi-domain benchmark that evaluates image-to-code generation via rendered outputs scored by a VLM rater with dataset-specific rubrics, revealing domain-dependent model performance and enabling improvement without paired reference code.
-
mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval
mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.
-
Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis
Element-level leave-one-out analysis yields per-element quality scores and four structural metrics (purity, coverage, compactness, locality) that quantify SVG modularity and enable artifact detection.
-
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.
-
Visual-ERM: Reward Modeling for Visual Equivalence
Visual-ERM is a new multimodal reward model that supplies fine-grained visual feedback for training vision-language models on chart-to-code, table, and SVG tasks, yielding measurable gains over prior rewards.
-
Vision-as-Inverse-Graphics Agent via Interleaved Multimodal Reasoning
VIGA introduces a training-free interleaved multimodal reasoning loop that improves vision-as-inverse-graphics accuracy over one-shot baselines on BlenderGym, SlideBench, and new BlenderBench.
-
AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields
AnchorFlow predicts image-conditioned sparse anchor point fields to resolve into ordered Bezier paths, then uses rendering feedback to correct errors, yielding SVGs with reduced editable complexity and competitive fidelity.
-
DataSway: Vivifying Metaphoric Visualization with Animation Clip Generation and Coordination
DataSway supports creation of semantically aligned animations for metaphoric data visualizations by generating clips via VLMs and coordinating timelines based on entity order, attributes, layout, or randomness.