pith. machine review for the scientific record. sign in

arxiv: 2412.10437 · v3 · submitted 2024-12-11 · 💻 cs.CV · cs.GR· cs.LG

Recognition: unknown

SVGFusion: A VAE-Diffusion Transformer for Vector Graphic Generation

Authors on Pith no claims yet
classification 💻 cs.CV cs.GRcs.LG
keywords svgssvgfusionvectorarchitecturecodegeneratinghigh-qualitysequence
0
0 comments X
read the original abstract

Generating high-quality Scalable Vector Graphics (SVGs) from text remains a significant challenge. Existing LLM-based models that generate SVG code as a flat token sequence struggle with poor structural understanding and error accumulation, while optimization-based methods are slow and yield uneditable outputs. To address these limitations, we introduce SVGFusion, a unified framework that adapts the VAE-diffusion architecture to bridge the dual code-visual nature of SVGs. Our model features two core components: a Vector-Pixel Fusion Variational Autoencoder (VP-VAE) that learns a perceptually rich latent space by jointly encoding SVG code and its rendered image, and a Vector Space Diffusion Transformer (VS-DiT) that achieves globally coherent compositions through iterative refinement. Furthermore, this architecture is enhanced by a Rendering Sequence Modeling strategy, which ensures accurate object layering and occlusion. Evaluated on our novel SVGX-Dataset comprising 240k human-designed SVGs, SVGFusion establishes a new state-of-the-art, generating high-quality, editable SVGs that are strictly semantically aligned with the input text.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. VAnim: Rendering-Aware Sparse State Modeling for Structure-Preserving Vector Animation

    cs.CV 2026-05 unverdicted novelty 7.0

    VAnim creates open-domain text-to-SVG animations via sparse state updates on a persistent DOM tree, identification-first planning, and rendering-aware RL with a new 134k-example benchmark.

  2. Render-in-the-Loop: Vector Graphics Generation via Visual Self-Feedback

    cs.CV 2026-04 unverdicted novelty 7.0

    Render-in-the-Loop reformulates SVG generation as a step-wise visual-context-aware process using self-feedback from rendered intermediate states, VSF training, and RaV inference to outperform baselines on MMSVGBench f...

  3. mEOL: Training-Free Instruction-Guided Multimodal Embedder for Vector Graphics and Image Retrieval

    cs.CV 2026-04 unverdicted novelty 7.0

    mEOL creates aligned embeddings for text, images, and SVGs using instruction-guided MLLM one-word summaries and semantic SVG rewriting, outperforming baselines on a new text-to-SVG retrieval benchmark.

  4. LottieGPT: Tokenizing Vector Animation for Autoregressive Generation

    cs.CV 2026-04 unverdicted novelty 7.0

    LottieGPT tokenizes Lottie animations into compact sequences and fine-tunes Qwen-VL to autoregressively generate coherent vector animations from natural language or visual prompts, outperforming prior SVG models.

  5. Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling

    cs.LG 2026-04 unverdicted novelty 7.0

    HiVG introduces hierarchical SVG tokenization with atomic and segment tokens plus HMN initialization to enable more efficient and stable autoregressive generation of vector graphics programs.