SVGDreamer: Text Guided SVG Generation with Diffusion Model

Chuang Wang; Dong Xu; Haitao Zhou; Jing Zhang; Qian Yu; Ximing Xing

arxiv: 2312.16476 · v7 · submitted 2023-12-27 · 💻 cs.CV · cs.AI

SVGDreamer: Text Guided SVG Generation with Diffusion Model

Ximing Xing , Haitao Zhou , Chuang Wang , Jing Zhang , Dong Xu , Qian Yu This is my paper

Pith reviewed 2026-05-24 05:26 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords text-guided SVGdiffusion modelsvector graphicsscore distillationsemantic vectorizationparticle-based optimizationeditability

0 comments

The pith

SVGDreamer uses semantic decomposition and particle distillation to generate editable and diverse text-guided SVGs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve text-to-SVG synthesis by addressing poor editability, visual quality, and sample diversity in prior work. It establishes a framework that first vectorizes images semantically into foreground and background components using attention controls, then optimizes the vectors as particle distributions with score distillation and aesthetic rewards. If this holds, it would make text-prompted vector graphics practical for design applications where easy modification is essential.

Core claim

SVGDreamer shows that its SIVE process with attention-based primitive control and attention-mask loss, together with VPSD that models SVGs over control points and colors with reward reweighting, leads to vector outputs that outperform baselines in editability, quality, and diversity.

What carries the argument

Semantic-driven image vectorization (SIVE) that separates foreground objects and background with attention mechanisms, combined with Vectorized Particle-based Score Distillation (VPSD) for distributional optimization of vector parameters.

If this is right

Vector elements can be edited independently due to the attention-mask loss and primitive control.
Shapes avoid over-smoothing and colors avoid over-saturation through particle-based modeling.
Generation converges more quickly when particles are reweighted by a reward model.
Diversity of outputs increases because SVGs are treated as distributions rather than single optimizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the semantic split works well, the method could extend to generating complex multi-object scenes with consistent style.
Design tools might incorporate this to allow prompt-based starting points for vector editing sessions.
Testing on prompts involving fine details like text in icons could reveal limits of the current attention control.

Load-bearing premise

Attention-based primitive control combined with an attention-mask loss enables fine-grained independent manipulation of individual vector elements without artifacts or loss of global coherence.

What would settle it

A direct comparison experiment where SVGDreamer SVGs do not score higher on editability measures or diversity metrics than the baselines would falsify the central performance claim.

Figures

Figures reproduced from arXiv: 2312.16476 by Chuang Wang, Dong Xu, Haitao Zhou, Jing Zhang, Qian Yu, Ximing Xing.

**Figure 1.** Figure 1: Given a text prompt, SVGDreamer can generate a variety of vector graphics. SVGDreamer is a versatile tool that can work with [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of SVGDreamer. The method consists of two parts: semantic-driven image vectorization (SIVE, Sec. 3.1) and SVG synthesis through VPSD optimization (Sec. 3.2). The result obtained from SIVE can be used as input of VPSD for further refinement. 3.1.2 Semantic-aware Optimization In this stage, we utilize an attention-based mask loss to separately optimize the objects in the foreground and background.… view at source ↗

**Figure 3.** Figure 3: The process of Vectorized Particle-based Score Distillation. VPSD allows k SVGs as input and simultaneously optimizes k sets of SVG parameters. estimated by, ∇θLSDS(ϕ, x = R(θ)) ≜ Et,ϵ,a w(t)(ϵϕ(zt; y, t) − ϵ) ∂z ∂xa ∂xa ∂θ (3) where w(t) is the weighting function. And noised to form zt = αtxa + σtϵ. Unfortunately, SDS-based methods often suffer from issues such as shape over-smoothing, color over-s… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of different methods. Note that DiffSketcher was originally designed for vector sketch generation; therefore, we re-implemented it to generate RGB vector graphics. This style allows for a wide range of compositions while maintaining a minimalistic expression. We utilize closed form Bezier curves with trainable control points and fill col- ´ ors. 2) Sketch is a way to convey informati… view at source ↗

**Figure 5.** Figure 5: Examples of vector assets created by SVGDreamer. We specify foreground content as an SVG asset through a text prompt. To create assets that fit the SVG style, such as flat polygon vector, we constrain the vector representation via using a different prompt modifier to encourage the appropriate style: * ... on a white background, full body action pose, complete body, concept art, flat 2d vector icon. LIVE (G… view at source ↗

**Figure 6.** Figure 6: Comparison of LIVE vectorization with SIVE. In the first row, “Foreground 1” and “Foreground 2” refer to Astronaut and Plants, respectively. Glyphs have been added manually and were not produced by our method. In the LIVE setup, we follow the protocol outlined in VectorFusion [12], which represents a vector image with 128 paths distributed across four layers, with 32 paths in each layer. hierarchies acros… view at source ↗

**Figure 7.** Figure 7: Examples showcasing the editability of the results generated by our SVGDreamer. [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: More results generated by our SVGDreamer. The style is governed by vector primitives. 3 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of synthetic posters generated by different methods. The input text prompts and glyphs to be added to the posters are displayed on the left side. ”Bold logo icon in blue, black, white colors for a “An astronaut, the logo, vector art.” simplified version of Great Wave of Kanagawa” Temple Temple “The logo of the Japanese mystery temple,, game art, cartoon, 3d animation style” “A Starbucks coffee c… view at source ↗

**Figure 10.** Figure 10: Examples of synthetic icons. Note that the glyphs are manually added. A man in an astronaut suit walking …… A beautiful photo of the Eiffel Tower [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Visualizations of the LDM cross-attention maps. timization stability. Note that SDS-based methods [12, 48] do not work well in such small CFG weights. Instead, our VPSD provides a trade-off option between CFG weight and diversity, and it can generate more diverse results by simply setting a smaller CFG. E.2. Ablation on ReFL In [45], only selected particles update the LoRA network in each iteration. Howev… view at source ↗

**Figure 12.** Figure 12: Ablation on how Classifier-free Guidances (CFG) [7] weight affects the randomness. Smaller CFG provides more diversity. But too small CFG provides less optimization stability. The prompt is “A photograph of an astronaut riding a horse”. 16 and analyze how this variation affects the outcomes. The CFG of VPSD is set as 7.5. As shown in [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Effect of the Reward Feedback Learning (ReFL). When employing ReFL, the visual quality of the generated results is significantly enhanced. 16 particles 8 particles 4 particles 1 particles Seed 1 Seed 2 Seed 2 Seed 1 Seed 1 Seed 2 Seed 3 Seed 4 Seed 5 Seed 6 Seed 7 Seed 8 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Ablation on the number of particles. The diversity of the generated results is slightly larger as the number of particles increases. The quality of generated results is not significantly affected by the number of particles. The prompt is “A photograph of an astronaut riding a horse”. erated results is not significantly affected by the number of particles. Considering the high computation overhead associa… view at source ↗

**Figure 15.** Figure 15: Effect of the number of paths. Adding vector paths can be synthesized to enhance SVG detail. VSD VSPD [PITH_FULL_IMAGE:figures/full_fig_p018_15.png] view at source ↗

**Figure 16.** Figure 16: 2D image synthesis. Comparison of the results from using VPSD and VSD for 2D image synthesis. like water reflections. Additionally, VPSD better aligns with text prompts. F. VPSD for 2D Image Synthesis In this work, VPSD is specifically designed for text-to-SVG generation; however, it can also be adapted for 2D image synthesis. As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p018_16.png] view at source ↗

read the original abstract

Text-guided scalable vector graphics (SVG) synthesis has broad applications in icon and sketch generation. However, existing text-to-SVG methods often suffer from limited editability, suboptimal visual quality, and low sample diversity. To address these challenges, we propose \textbf{SVGDreamer}, a novel framework for text-guided vector graphics synthesis. Our method introduces a \textbf{semantic-driven image vectorization (SIVE)} process, which decomposes the generation procedure into foreground objects and background elements, thereby improving structural controllability and editability. In particular, SIVE incorporates attention-based primitive control and an attention-mask loss to facilitate fine-grained manipulation of individual vector elements. To further improve generation quality and diversity, we propose \textbf{Vectorized Particle-based Score Distillation (VPSD)}, which models SVGs as distributions over control points and colors. Compared with existing text-to-SVG optimization methods, VPSD alleviates over-smoothed shapes, over-saturated colors, limited diversity, and slow convergence. Moreover, VPSD leverages a reward model to reweight vector particles, leading to better visual aesthetics and faster convergence. Extensive experiments demonstrate that SVGDreamer consistently outperforms existing baselines in editability, visual quality, and diversity. Project page: https://ximinng.github.io/SVGDreamer-project/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SVGDreamer adds SIVE for semantic decomposition with attention masks and VPSD for particle reweighting in score distillation, but the outperformance claims rest on unshown experiments.

read the letter

The main things to know are that the paper names two concrete components aimed at documented weaknesses in prior text-to-SVG work, and that the abstract offers no numbers to back the superiority claims on editability or diversity. SIVE splits the process into foreground and background objects, then applies attention-based primitive control plus an attention-mask loss to support independent edits. VPSD treats SVG control points and colors as particles in a distribution and adds a reward model to reweight them during optimization, which the authors say reduces over-smoothing, over-saturation, and slow convergence compared with standard score distillation. These are direct responses to issues like limited sample variety and shape collapse that show up in earlier optimization-based methods. The construction itself is straightforward and builds on existing diffusion models without circular fitting. The soft spot is the missing evidence. The abstract states consistent gains over baselines but supplies no quantitative metrics, baseline list, or ablation tables, so it is impossible to check whether the mask loss actually limits cross-talk between primitives or whether the reweighting produces measurable diversity improvements. The stress-test concern about diffuse diffusion attention maps is reasonable on the given description; nothing in the abstract demonstrates that the mask term is strong enough to enforce clean decoupling. This paper is aimed at researchers working on text-conditioned vector graphics or diffusion methods for design tools. A reader who wants to adapt the particle reweighting idea or test the attention control in their own pipeline could extract value from the method description. It deserves a serious referee because the components address real, cited failure modes with a reproducible-looking recipe, even though the results section will need close examination.

Referee Report

2 major / 1 minor

Summary. The manuscript presents SVGDreamer, a framework for text-guided SVG generation. It introduces a semantic-driven image vectorization (SIVE) process that decomposes generation into foreground objects and background elements, incorporating attention-based primitive control and an attention-mask loss to improve structural controllability and editability. It further proposes Vectorized Particle-based Score Distillation (VPSD), which models SVGs as distributions over control points and colors and uses a reward model to reweight particles for improved quality, diversity, and convergence. The central claim is that extensive experiments demonstrate consistent outperformance over existing baselines in editability, visual quality, and diversity.

Significance. If the results hold, the work would advance text-to-SVG synthesis by improving fine-grained editability and sample diversity, with applications in icon and sketch generation. The combination of semantic decomposition via SIVE and particle-based optimization in VPSD is a novel direction that builds directly on external diffusion models without self-referential parameter fitting.

major comments (2)

[SIVE process description] The headline claim of superior editability rests on the SIVE process's attention-based primitive control and attention-mask loss enabling independent manipulation of individual vector elements. The manuscript provides no analysis or evidence that this loss is strong enough to overcome the typically soft, spatially extended nature of diffusion attention maps and prevent cross-talk between primitives while preserving global coherence.
[Experimental results] The abstract asserts outperformance on editability, quality, and diversity, yet the provided text contains no quantitative metrics, baseline comparisons, or ablation results to support these claims; without such data the central experimental superiority cannot be verified.

minor comments (1)

[VPSD optimization] The description of VPSD as modeling SVGs as distributions over control points and colors would benefit from an explicit equation or pseudocode definition to clarify the particle reweighting step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on SVGDreamer. We address each major comment below with targeted responses and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [SIVE process description] The headline claim of superior editability rests on the SIVE process's attention-based primitive control and attention-mask loss enabling independent manipulation of individual vector elements. The manuscript provides no analysis or evidence that this loss is strong enough to overcome the typically soft, spatially extended nature of diffusion attention maps and prevent cross-talk between primitives while preserving global coherence.

Authors: We acknowledge the absence of a dedicated quantitative analysis of cross-talk in the current manuscript. The attention-mask loss is explicitly designed to align rendered primitive masks with diffusion attention maps, and the semantic decomposition in SIVE further localizes control. Qualitative editing results demonstrate independent manipulation with minimal visible interference. In revision we will add a new analysis subsection with metrics (e.g., mask overlap ratios before/after editing) and discussion of how the loss interacts with soft attention maps while maintaining coherence. revision: yes
Referee: [Experimental results] The abstract asserts outperformance on editability, quality, and diversity, yet the provided text contains no quantitative metrics, baseline comparisons, or ablation results to support these claims; without such data the central experimental superiority cannot be verified.

Authors: The experiments section (Section 4) of the full manuscript contains quantitative evaluations, including user-study scores for editability, diversity measured via feature variance, and visual-quality comparisons against baselines such as VectorFusion and DiffSketch, plus ablations on SIVE and VPSD. If these elements were not apparent in the reviewed copy, we will expand the section with additional tables, statistical significance tests, and clearer baseline descriptions to make the supporting data unambiguous. revision: yes

Circularity Check

0 steps flagged

No circularity detected in derivation or claims

full rationale

The paper introduces SVGDreamer via two new components (SIVE with attention-based primitive control and attention-mask loss; VPSD with particle-based score distillation and reward reweighting) that are described as novel constructions building on external diffusion models. No equations, fitted parameters, or self-citations are presented that reduce the claimed editability/quality/diversity gains to quantities defined by the authors' own inputs or prior work. The experimental comparisons to baselines are external and falsifiable, leaving the central claims self-contained rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claims rest on the untested effectiveness of the newly introduced attention-mask loss and particle reweighting; no external benchmarks or prior proofs are cited for these mechanisms.

invented entities (2)

SIVE process no independent evidence
purpose: Decompose SVG generation into foreground objects and background elements for structural controllability
Newly proposed decomposition step with no independent prior evidence supplied in the abstract.
VPSD optimization no independent evidence
purpose: Model SVGs as distributions over control points and colors to reduce over-smoothing and improve diversity
New particle-based score distillation variant introduced without external validation in the abstract.

pith-pipeline@v0.9.0 · 5773 in / 1185 out tokens · 21597 ms · 2026-05-24T05:26:53.342007+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Voxify3D: Pixel Art Meets Volumetric Rendering
cs.CV 2025-12 unverdicted novelty 7.0

Voxify3D generates voxel art from 3D meshes via orthographic pixel supervision, patch-based CLIP alignment, and palette-constrained Gumbel-Softmax quantization, achieving 37.12 CLIP-IQA and 77.90% user preference.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Deepsvg: A hierarchical generative network for vector graphics animation

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Informa- tion Processing Systems (NIPS), 33:16351–16361, 2020. 2

work page 2020
[2]

Textdiffuser: Diffusion models as text painters

Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters. arXiv preprint arXiv:2305.10855, 2023. 1

work page arXiv 2023
[3]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition (NIPS) , pages 12873–12883, 2021. 5

work page 2021
[4]

CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders

Kevin Frans, Lisa Soros, and Olaf Witkowski. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. In Advances in Neural Information Processing Systems (NIPS), 2022. 1, 2, 7, 8

work page 2022
[5]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. In International Conference on Learning Representations (ICLR), 2018. 2

work page 2018
[6]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. Advances in neural information processing systems (NIPS), 30, 2017. 7, 8

work page 2017
[7]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022. 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2022
[8]

Denoising dif- fusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. In Advances in Neural Infor- mation Processing Systems (NIPS), pages 6840–6851, 2020. 2

work page 2020
[9]

Image quality metrics: Psnr vs

Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In 2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. 7, 8

work page 2010
[10]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations (ICLR),

work page
[11]

Word-as-image for semantic typography

Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography. ACM Transactions on Graphics (TOG), 42(4),

work page
[12]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2023. 1, 2, 4, 5, 6, 7, 8

work page 2023
[13]

Blip: Bootstrapping language-image pre-training for uni- fied vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for uni- fied vision-language understanding and generation. In In- ternational Conference on Machine Learning (ICML), pages 12888–12900. PMLR, 2022. 7, 8

work page 2022
[14]

Differentiable vector graphics rasterization for editing and learning

Tzu-Mao Li, Michal Luk ´aˇc, Gharbi Micha ¨el, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020. 1, 2, 4

work page 2020
[15]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. 4

work page 2023
[16]

A learned representation for scalable vec- tor graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 2

work page 2019
[17]

Towards layer- wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer- wise image vectorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022. 2, 4, 7

work page 2022
[18]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 4

work page 2021
[19]

Clip-clop: Clip-guided collage and photomontage

Piotr Mirowski, Dylan Banarse, Mateusz Malinowski, Si- mon Osindero, and Chrisantha Fernando. Clip-clop: Clip-guided collage and photomontage. arXiv preprint arXiv:2205.03146, 2022. 1, 2

work page arXiv 2022
[20]

GLIDE: Towards photorealis- tic image generation and editing with text-guided diffusion 9 models

Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards photorealis- tic image generation and editing with text-guided diffusion 9 models. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 16784–16804, 2022. 1, 2

work page 2022
[21]

Do 2d {gan}s know 3d shape? unsupervised 3d shape reconstruction from 2d image{gan}s

Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, and Ping Luo. Do 2d {gan}s know 3d shape? unsupervised 3d shape reconstruction from 2d image{gan}s. In International Conference on Learning Representations (ICLR), 2021. 4

work page 2021
[22]

Barron, and Ben Milden- hall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representa- tions (ICLR), 2023. 2, 4, 5, 6, 8

work page 2023
[23]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021. 1, 2, 7, 8

work page 2021
[24]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gen- eration with clip latents. arXiv preprint arXiv:2204.06125,

work page internal anchor Pith review Pith/arXiv arXiv
[25]

Im2vec: Synthesizing vector graphics without vector supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) , pages 7342–7351, 2021. 2

work page 2021
[26]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1, 2, 4, 6

work page 2022
[27]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NIPS), pages 36479–36494, 2022. 1, 2, 4

work page 2022
[28]

Styleclipdraw: Coupling content and style in text-to-drawing synthesis

Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. Styleclip- draw: Coupling content and style in text-to-drawing synthe- sis. arXiv preprint arXiv:2111.03133, 2022. 1, 2

work page arXiv 2022
[29]

Improved aesthetic predictor

Christoph Schuhmann. Improved aesthetic predictor. https : / / github . com / christophschuhmann / improved-aesthetic-predictor, 2022. 7, 8

work page 2022
[30]

Clipgen: A deep gener- ative model for clipart vectorization and synthesis

I-Chao Shen and Bing-Yu Chen. Clipgen: A deep gener- ative model for clipart vectorization and synthesis. IEEE Transactions on Visualization and Computer Graphics , 28 (12):4211–4224, 2022. 2

work page 2022
[31]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the In- ternational Conference on Machine Learning (ICML), pages 2256–2265, 2015. 2

work page 2015
[32]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021. 6

work page 2021
[33]

Generative modeling by es- timating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by es- timating gradients of the data distribution. In Advances in Neural Information Processing Systems (NIPS), 2019. 2

work page 2019
[34]

Clipfont: Text guided vector wordart generation

Yiren Song and Yuxuan Zhang. Clipfont: Text guided vector wordart generation. In 33rd British Machine Vision Con- ference 2022, BMVC 2022, London, UK, November 21-24, 2022, 2022. 1

work page 2022
[35]

Score-based generative modeling through stochastic differential equa- tions

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. In International Conference on Learning Represen- tations (ICLR), 2021. 2

work page 2021
[36]

Clipvg: Text-guided image manipulation using differentiable vector graphics

Yiren Song, Xuning Shao, Kang Chen, Weidong Zhang, Zhongliang Jing, and Minzhe Li. Clipvg: Text-guided image manipulation using differentiable vector graphics. In Pro- ceedings of the Conference on Artificial Intelligence (AAAI),

work page
[37]

If by deepfloyd lab at stabilityai

StabilityAI. If by deepfloyd lab at stabilityai. https:// github.com/deep-floyd/IF, 2023. 1, 2

work page 2023
[38]

Marvel: Raster gray-level manga vectorization via primitive-wise deep reinforcement learn- ing

Hao Su, Xuefeng Liu, Jianwei Niu, Jiahe Cui, Ji Wan, Xing- hao Wu, and Nana Wang. Marvel: Raster gray-level manga vectorization via primitive-wise deep reinforcement learn- ing. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 2023. 2

work page 2023
[39]

Modern evolution strategies for creativity: Fitting concrete images and abstract concepts

Yingtao Tian and David Ha. Modern evolution strategies for creativity: Fitting concrete images and abstract concepts. In Artificial Intelligence in Music, Sound, Art and Design , pages 275–291. Springer, 2022. 2

work page 2022
[40]

Clipasso: Semantically-aware object sketching

Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Ro- man Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022. 1, 2

work page 2022
[41]

Clipascene: Scene sketching with different types and levels of abstraction

Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. Clipascene: Scene sketching with different types and levels of abstraction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023. 1

work page 2023
[42]

Yeh, and Greg Shakhnarovich

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023. 4

work page 2023
[43]

Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning

Yizhi Wang and Zhouhui Lian. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning. ACM Transactions on Graphics (TOG), 40(6), 2021. 2

work page 2021
[44]

Aesthetic text logo synthesis via content-aware layout inferring

Yizhi Wang, Gu Pu, Wenhan Luo, Pengfei Wang, Yexin ans Xiong, Hongwen Kang, Zhonghao Wang, and Zhouhui Lian. Aesthetic text logo synthesis via content-aware layout inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2

work page 2022
[45]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. arXiv preprint arXiv:2305.16213, 2023. 4, 6 10

work page arXiv 2023
[46]

Icon- shop: Text-based vector icon synthesis with autoregressive transformers

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-based vector icon synthesis with autoregressive transformers. arXiv preprint arXiv:2304.14400, 2023. 2

work page arXiv 2023
[47]

Human preference score: Better aligning text-to- image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong- sheng Li. Human preference score: Better aligning text-to- image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023. 7, 8

work page 2096
[48]

Diffsketcher: Text guided vector sketch synthesis through latent diffusion models

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models. In Advances in Neural Information Processing Systems (NIPS), 2023. 1, 2, 4, 5, 6, 7, 8

work page 2023
[49]

Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023. 2, 6, 4, 8

work page 2023
[50]

man” and “astronaut

Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, and Kai Chen. Glyphcontrol: Glyph conditional control for visual text generation. 2023. 1 11 SVGDreamer: Text Guided SVG Generation with Diffusion Model Supplementary Material Overview This supplementary material is organized into several sec- tions that provide additional details and analysis re...

work page 2023

[1] [1]

Deepsvg: A hierarchical generative network for vector graphics animation

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation. Advances in Neural Informa- tion Processing Systems (NIPS), 33:16351–16361, 2020. 2

work page 2020

[2] [2]

Textdiffuser: Diffusion models as text painters

Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters. arXiv preprint arXiv:2305.10855, 2023. 1

work page arXiv 2023

[3] [3]

Taming transformers for high-resolution image synthesis

Patrick Esser, Robin Rombach, and Bjorn Ommer. Taming transformers for high-resolution image synthesis. In Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition (NIPS) , pages 12873–12883, 2021. 5

work page 2021

[4] [4]

CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders

Kevin Frans, Lisa Soros, and Olaf Witkowski. CLIPDraw: Exploring text-to-drawing synthesis through language-image encoders. In Advances in Neural Information Processing Systems (NIPS), 2022. 1, 2, 7, 8

work page 2022

[5] [5]

A neural representation of sketch drawings

David Ha and Douglas Eck. A neural representation of sketch drawings. In International Conference on Learning Representations (ICLR), 2018. 2

work page 2018

[6] [6]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. Advances in neural information processing systems (NIPS), 30, 2017. 7, 8

work page 2017

[7] [7]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022. 2, 5

work page internal anchor Pith review Pith/arXiv arXiv 2022

[8] [8]

Denoising dif- fusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models. In Advances in Neural Infor- mation Processing Systems (NIPS), pages 6840–6851, 2020. 2

work page 2020

[9] [9]

Image quality metrics: Psnr vs

Alain Hor ´e and Djemel Ziou. Image quality metrics: Psnr vs. ssim. In 2010 20th International Conference on Pattern Recognition, pages 2366–2369, 2010. 7, 8

work page 2010

[10] [10]

LoRA: Low-rank adaptation of large language models

Edward J Hu, yelong shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InIn- ternational Conference on Learning Representations (ICLR),

work page

[11] [11]

Word-as-image for semantic typography

Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography. ACM Transactions on Graphics (TOG), 42(4),

work page

[12] [12]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), 2023. 1, 2, 4, 5, 6, 7, 8

work page 2023

[13] [13]

Blip: Bootstrapping language-image pre-training for uni- fied vision-language understanding and generation

Junnan Li, Dongxu Li, Caiming Xiong, and Steven Hoi. Blip: Bootstrapping language-image pre-training for uni- fied vision-language understanding and generation. In In- ternational Conference on Machine Learning (ICML), pages 12888–12900. PMLR, 2022. 7, 8

work page 2022

[14] [14]

Differentiable vector graphics rasterization for editing and learning

Tzu-Mao Li, Michal Luk ´aˇc, Gharbi Micha ¨el, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics (TOG), 39(6):193:1–193:15, 2020. 1, 2, 4

work page 2020

[15] [15]

Magic3d: High-resolution text-to-3d content creation

Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 300–309, 2023. 4

work page 2023

[16] [16]

A learned representation for scalable vec- tor graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. 2

work page 2019

[17] [17]

Towards layer- wise image vectorization

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer- wise image vectorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16314–16323, 2022. 2, 4, 7

work page 2022

[18] [18]

Nerf: Representing scenes as neural radiance fields for view syn- thesis

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. Communications of the ACM , 65(1):99–106, 2021. 4

work page 2021

[19] [19]

Clip-clop: Clip-guided collage and photomontage

Piotr Mirowski, Dylan Banarse, Mateusz Malinowski, Si- mon Osindero, and Chrisantha Fernando. Clip-clop: Clip-guided collage and photomontage. arXiv preprint arXiv:2205.03146, 2022. 1, 2

work page arXiv 2022

[20] [20]

GLIDE: Towards photorealis- tic image generation and editing with text-guided diffusion 9 models

Alexander Quinn Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob Mcgrew, Ilya Sutskever, and Mark Chen. GLIDE: Towards photorealis- tic image generation and editing with text-guided diffusion 9 models. In Proceedings of the 39th International Conference on Machine Learning (ICML), pages 16784–16804, 2022. 1, 2

work page 2022

[21] [21]

Do 2d {gan}s know 3d shape? unsupervised 3d shape reconstruction from 2d image{gan}s

Xingang Pan, Bo Dai, Ziwei Liu, Chen Change Loy, and Ping Luo. Do 2d {gan}s know 3d shape? unsupervised 3d shape reconstruction from 2d image{gan}s. In International Conference on Learning Representations (ICLR), 2021. 4

work page 2021

[22] [22]

Barron, and Ben Milden- hall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Milden- hall. Dreamfusion: Text-to-3d using 2d diffusion. In The Eleventh International Conference on Learning Representa- tions (ICLR), 2023. 2, 4, 5, 6, 8

work page 2023

[23] [23]

Learn- ing transferable visual models from natural language super- vision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learn- ing transferable visual models from natural language super- vision. In International Conference on Machine Learning (ICML), pages 8748–8763. PMLR, 2021. 1, 2, 7, 8

work page 2021

[24] [24]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image gen- eration with clip latents. arXiv preprint arXiv:2204.06125,

work page internal anchor Pith review Pith/arXiv arXiv

[25] [25]

Im2vec: Synthesizing vector graphics without vector supervision

Pradyumna Reddy, Michael Gharbi, Michal Lukac, and Niloy J Mitra. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR) , pages 7342–7351, 2021. 2

work page 2021

[26] [26]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022. 1, 2, 4, 6

work page 2022

[27] [27]

Photorealistic text-to-image diffusion models with deep language understanding

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding. In Advances in Neural Information Processing Systems (NIPS), pages 36479–36494, 2022. 1, 2, 4

work page 2022

[28] [28]

Styleclipdraw: Coupling content and style in text-to-drawing synthesis

Peter Schaldenbrand, Zhixuan Liu, and Jean Oh. Styleclip- draw: Coupling content and style in text-to-drawing synthe- sis. arXiv preprint arXiv:2111.03133, 2022. 1, 2

work page arXiv 2022

[29] [29]

Improved aesthetic predictor

Christoph Schuhmann. Improved aesthetic predictor. https : / / github . com / christophschuhmann / improved-aesthetic-predictor, 2022. 7, 8

work page 2022

[30] [30]

Clipgen: A deep gener- ative model for clipart vectorization and synthesis

I-Chao Shen and Bing-Yu Chen. Clipgen: A deep gener- ative model for clipart vectorization and synthesis. IEEE Transactions on Visualization and Computer Graphics , 28 (12):4211–4224, 2022. 2

work page 2022

[31] [31]

Deep unsupervised learning using nonequilibrium thermodynamics

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the In- ternational Conference on Machine Learning (ICML), pages 2256–2265, 2015. 2

work page 2015

[32] [32]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. In International Conference on Learning Representations (ICLR), 2021. 6

work page 2021

[33] [33]

Generative modeling by es- timating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by es- timating gradients of the data distribution. In Advances in Neural Information Processing Systems (NIPS), 2019. 2

work page 2019

[34] [34]

Clipfont: Text guided vector wordart generation

Yiren Song and Yuxuan Zhang. Clipfont: Text guided vector wordart generation. In 33rd British Machine Vision Con- ference 2022, BMVC 2022, London, UK, November 21-24, 2022, 2022. 1

work page 2022

[35] [35]

Score-based generative modeling through stochastic differential equa- tions

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equa- tions. In International Conference on Learning Represen- tations (ICLR), 2021. 2

work page 2021

[36] [36]

Clipvg: Text-guided image manipulation using differentiable vector graphics

Yiren Song, Xuning Shao, Kang Chen, Weidong Zhang, Zhongliang Jing, and Minzhe Li. Clipvg: Text-guided image manipulation using differentiable vector graphics. In Pro- ceedings of the Conference on Artificial Intelligence (AAAI),

work page

[37] [37]

If by deepfloyd lab at stabilityai

StabilityAI. If by deepfloyd lab at stabilityai. https:// github.com/deep-floyd/IF, 2023. 1, 2

work page 2023

[38] [38]

Marvel: Raster gray-level manga vectorization via primitive-wise deep reinforcement learn- ing

Hao Su, Xuefeng Liu, Jianwei Niu, Jiahe Cui, Ji Wan, Xing- hao Wu, and Nana Wang. Marvel: Raster gray-level manga vectorization via primitive-wise deep reinforcement learn- ing. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 2023. 2

work page 2023

[39] [39]

Modern evolution strategies for creativity: Fitting concrete images and abstract concepts

Yingtao Tian and David Ha. Modern evolution strategies for creativity: Fitting concrete images and abstract concepts. In Artificial Intelligence in Music, Sound, Art and Design , pages 275–291. Springer, 2022. 2

work page 2022

[40] [40]

Clipasso: Semantically-aware object sketching

Yael Vinker, Ehsan Pajouheshgar, Jessica Y Bo, Ro- man Christian Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, and Ariel Shamir. Clipasso: Semantically-aware object sketching. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022. 1, 2

work page 2022

[41] [41]

Clipascene: Scene sketching with different types and levels of abstraction

Yael Vinker, Yuval Alaluf, Daniel Cohen-Or, and Ariel Shamir. Clipascene: Scene sketching with different types and levels of abstraction. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4146–4156, 2023. 1

work page 2023

[42] [42]

Yeh, and Greg Shakhnarovich

Haochen Wang, Xiaodan Du, Jiahao Li, Raymond A. Yeh, and Greg Shakhnarovich. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12619–12629, 2023. 4

work page 2023

[43] [43]

Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning

Yizhi Wang and Zhouhui Lian. Deepvecfont: Synthesizing high-quality vector fonts via dual-modality learning. ACM Transactions on Graphics (TOG), 40(6), 2021. 2

work page 2021

[44] [44]

Aesthetic text logo synthesis via content-aware layout inferring

Yizhi Wang, Gu Pu, Wenhan Luo, Pengfei Wang, Yexin ans Xiong, Hongwen Kang, Zhonghao Wang, and Zhouhui Lian. Aesthetic text logo synthesis via content-aware layout inferring. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 2

work page 2022

[45] [45]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distilla- tion. arXiv preprint arXiv:2305.16213, 2023. 4, 6 10

work page arXiv 2023

[46] [46]

Icon- shop: Text-based vector icon synthesis with autoregressive transformers

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Icon- shop: Text-based vector icon synthesis with autoregressive transformers. arXiv preprint arXiv:2304.14400, 2023. 2

work page arXiv 2023

[47] [47]

Human preference score: Better aligning text-to- image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hong- sheng Li. Human preference score: Better aligning text-to- image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2096–2105, 2023. 7, 8

work page 2096

[48] [48]

Diffsketcher: Text guided vector sketch synthesis through latent diffusion models

Ximing Xing, Chuang Wang, Haitao Zhou, Jing Zhang, Qian Yu, and Dong Xu. Diffsketcher: Text guided vector sketch synthesis through latent diffusion models. In Advances in Neural Information Processing Systems (NIPS), 2023. 1, 2, 4, 5, 6, 7, 8

work page 2023

[49] [49]

Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023

Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation, 2023. 2, 6, 4, 8

work page 2023

[50] [50]

man” and “astronaut

Yukang Yang, Dongnan Gui, Yuhui Yuan, Haisong Ding, Han Hu, and Kai Chen. Glyphcontrol: Glyph conditional control for visual text generation. 2023. 1 11 SVGDreamer: Text Guided SVG Generation with Diffusion Model Supplementary Material Overview This supplementary material is organized into several sec- tions that provide additional details and analysis re...

work page 2023