AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields

Antonio Haas; Christian Franke; Grace Li Zhang; Mengnan Jiang; Michele Franco Adesso

arxiv: 2605.19551 · v1 · pith:SU26B4W4new · submitted 2026-05-19 · 💻 cs.GR · cs.CV

AnchorFlow: Editable SVG Reconstruction via Sparse Anchor Point Fields

Mengnan Jiang , Christian Franke , Michele Franco Adesso , Antonio Haas , Grace Li Zhang This is my paper

Pith reviewed 2026-05-20 02:01 UTC · model grok-4.3

classification 💻 cs.GR cs.CV

keywords image-to-svgvector graphicsbezier curvesanchor pointseditable reconstructionsparse fieldsrendering feedback

0 comments

The pith

AnchorFlow reconstructs SVGs from raster images by predicting sparse anchor point fields that resolve into simple Bezier paths with feedback correction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to resolve the long-standing tension in image-to-vector conversion: methods that stay close to the input raster tend to produce paths with too many anchors or too many separate curves, which makes manual editing cumbersome. AnchorFlow tackles this by shifting focus to how anchors are placed along each path. It first extracts path-like foreground pieces from the raster, then predicts a sparse field of anchor locations conditioned on the image. Each field is turned into an ordered Bezier curve, after which a rendering step supplies corrective signals that prune or adjust local mistakes before the final SVG is assembled. A reader would care because the resulting files keep visual accuracy while offering far fewer control points for artists to move or delete.

Core claim

AnchorFlow models path-level anchor placement with image-conditioned sparse anchor point fields. Given foreground components extracted from a raster image, the method predicts a sparse field for each component, resolves the field into an ordered Bezier path, applies rendering-guided feedback to fix local structural errors, and reassembles the corrected paths into an optimized SVG. Experiments on isolated paths and complete images demonstrate that this yields a favorable fidelity-editability trade-off, with substantially lower editable complexity at competitive raster fidelity.

What carries the argument

Image-conditioned sparse anchor point fields that are resolved into ordered Bezier paths and refined by rendering-guided feedback.

If this is right

Output SVGs contain markedly fewer anchor points per path than dense alternatives while matching raster appearance.
The same pipeline works for both single isolated paths and full multi-component images.
Rendering feedback step removes many of the redundant anchors that boundary-following methods introduce on imperfect raster evidence.
Final assembled SVGs are directly usable in standard vector editors with reduced manual cleanup.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If foreground extraction becomes more robust, the same sparse-field idea could extend to natural photographs that contain overlapping objects.
Fewer anchors per curve may also reduce file size and speed up rendering or animation of the resulting graphics.
A direct user study measuring editing time or number of operations needed would quantify the practical editability gain beyond anchor counts.

Load-bearing premise

The method assumes that path-like foreground components can be reliably extracted from the input raster and that the predicted sparse fields can be turned into Bezier paths whose remaining local errors are fixable by rendering feedback.

What would settle it

Run the pipeline on a set of raster images whose boundaries contain clear noise or gaps; if the output SVGs require more anchors or show larger pixel error than dense baseline methods, the claimed trade-off does not hold.

Figures

Figures reproduced from arXiv: 2605.19551 by Antonio Haas, Christian Franke, Grace Li Zhang, Mengnan Jiang, Michele Franco Adesso.

**Figure 1.** Figure 1: Fidelity and editability trade-off. This leads to a central challenge in editable SVG reconstruction: how to preserve raster fidelity while keeping the vector structure sparse and editable. Different method families naturally emphasize different parts of this space, including tracing [1, 22, 23], optimization-based vectorization [10, 14, 17], adaptive reconstruction [19, 31, 32], and SVG code generation… view at source ↗

**Figure 2.** Figure 2: Overview of AnchorFlow. The input raster is decomposed into path-like foreground [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Rendering-guided field refinement. Incomplete anchor evidence may lead the initial field to [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on full-image SVG reconstruction. AnchorFlow closely preserves [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison on controlled single-path reconstruction. Red points indicate [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Robustness to boundary perturbations in single-path reconstruction. The top row uses a [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Human evaluation on visual similarity and editability in anonymized pairwise trials. [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Representative failure cases. (a) Component extraction failure: missed or merged [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Additional full-image reconstruction examples on the ColorSVG-100K [4] dataset. [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Additional full-image reconstruction examples on the Noto Emoji [ [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Additional full-image reconstruction examples on the Noto Emoji [ [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗

**Figure 12.** Figure 12: Additional robustness examples under boundary-perturbed single-path inputs. [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

**Figure 13.** Figure 13: Additional assembly-stage examples in full-image reconstruction. From left to right: [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

read the original abstract

Image-to-SVG reconstruction aims to produce vector graphics that are faithful to raster inputs and easy to edit. Existing methods face a structural trade-off in how vector structure is parameterized, including how many paths represent an image and how many anchor points define each path. High-fidelity methods often rely on many paths or densely parameterized curves, whereas overly compact SVG generation may deviate from the input geometry. This issue becomes more pronounced when local raster evidence is imperfect, where boundary-following reconstruction can introduce redundant anchors and fragmented structures. We argue that this trade-off should be addressed at the level of anchor placement, since anchors on Bezier curves define local path structure and strongly affect both accuracy and editability. We propose AnchorFlow, an editable SVG reconstruction framework that models path-level anchor placement with sparse anchor point fields. Given path-like foreground components extracted from a raster image, AnchorFlow predicts an image-conditioned sparse anchor field for each component and resolves it into an ordered Bezier path. Rendering-guided feedback then corrects local structural errors before re-resolution. The recovered paths are then assembled and optimized into the final SVG. Experiments on isolated paths and full images show that AnchorFlow achieves a favorable fidelity-editability trade-off, substantially reducing editable complexity while preserving competitive raster fidelity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AnchorFlow tries to fix the editability-fidelity trade-off in image-to-SVG by predicting sparse anchor fields per component and using rendering feedback to clean up the paths, but the resolution step looks like the weakest link.

read the letter

Hi, the one thing to take away is that this paper introduces sparse image-conditioned anchor point fields as the main lever for placing anchors on Bezier curves when vectorizing rasters, then resolves those fields into ordered paths and applies rendering-guided feedback to fix local mistakes before final assembly. That is the actual new piece compared with the high-fidelity dense or overly compact baselines they cite. The pipeline description is clear enough on the high level and the motivation about redundant anchors under imperfect boundary evidence is reasonable. They also claim experiments on both isolated paths and full images that support a better trade-off, which is the part that could be useful to people building design tools. The soft spot is exactly the one the stress-test note flags: turning the predicted sparse field into a correctly ordered, non-self-intersecting Bezier path is not trivial when the input raster has noise or ambiguous edges, and the abstract gives almost no detail on how the resolution or the feedback loop actually recovers from those cases. If the full paper only shows qualitative before-afters without ablations on ordering failures or quantitative measures of structural errors, the central claim stays hard to evaluate. The method itself stays within standard graphics primitives and does not appear circular. This is the kind of incremental but practical work that graphics and HCI people who care about editable vector output would want to read. It is grounded enough and specific enough to deserve real referee time rather than a desk reject, even if the results section turns out to need tightening. I would send it out for review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces AnchorFlow, a framework for image-to-SVG reconstruction that extracts path-like foreground components from a raster input, predicts an image-conditioned sparse anchor point field for each component, resolves the field into an ordered Bezier path, applies rendering-guided feedback to correct local structural errors, and assembles/optimizes the paths into a final SVG. The central claim is that modeling anchor placement via sparse fields yields a favorable fidelity-editability trade-off, substantially reducing editable complexity while preserving competitive raster fidelity, particularly when local raster evidence is imperfect.

Significance. If the quantitative results and robustness claims hold, the work would meaningfully advance vector graphics reconstruction by shifting the parameterization focus to sparse anchor placement and a prediction-resolution-feedback pipeline, offering a concrete alternative to dense curves or boundary-following methods that produce redundant or fragmented structures.

major comments (2)

[Method (resolution and feedback pipeline)] The resolution step from predicted sparse anchor fields to ordered Bezier paths is load-bearing for the fidelity-editability claim (abstract and method overview). The manuscript must demonstrate, via targeted ablations or failure-case analysis, that this step plus the rendering-guided feedback reliably avoids ambiguous ordering, self-intersections, or missing closures when raster evidence is imperfect; otherwise the claimed advantage over boundary-following approaches cannot be verified.
[Experiments] Experiments section: the abstract states that experiments on isolated paths and full images show a favorable trade-off, but the manuscript must report concrete metrics (e.g., raster fidelity error, number of anchors/paths, editability measures) against explicit baselines, including error bars or statistical significance, to substantiate the central claim.

minor comments (2)

[Method] Clarify the exact definition and conditioning of the sparse anchor point field (e.g., how image features are encoded and how sparsity is enforced) to improve reproducibility.
[Figures and tables] Figure captions and text should explicitly link visual results to the quantitative tables so readers can directly assess the fidelity-editability trade-off.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important aspects of the resolution pipeline and experimental reporting that we will address to strengthen the work. We respond to each major comment below.

read point-by-point responses

Referee: [Method (resolution and feedback pipeline)] The resolution step from predicted sparse anchor fields to ordered Bezier paths is load-bearing for the fidelity-editability claim (abstract and method overview). The manuscript must demonstrate, via targeted ablations or failure-case analysis, that this step plus the rendering-guided feedback reliably avoids ambiguous ordering, self-intersections, or missing closures when raster evidence is imperfect; otherwise the claimed advantage over boundary-following approaches cannot be verified.

Authors: We agree that the resolution and feedback components are central to our claims. The manuscript describes the conversion from sparse anchor fields to ordered Bezier paths and the use of rendering feedback for local corrections in the method section. To better substantiate robustness under imperfect raster evidence, we will add targeted ablations (with and without the feedback loop) and failure-case visualizations showing handling of ordering ambiguities, self-intersections, and closures. These additions will directly compare against boundary-following baselines to verify the claimed advantage. revision: yes
Referee: [Experiments] Experiments section: the abstract states that experiments on isolated paths and full images show a favorable trade-off, but the manuscript must report concrete metrics (e.g., raster fidelity error, number of anchors/paths, editability measures) against explicit baselines, including error bars or statistical significance, to substantiate the central claim.

Authors: We acknowledge that more granular quantitative details are needed to fully support the trade-off claim. The experiments section currently evaluates on isolated paths and full images, but we will expand it to report explicit metrics including raster fidelity errors, anchor and path counts, and editability measures. Direct comparisons to baselines will be included, along with error bars from multiple runs and statistical significance tests. revision: yes

Circularity Check

0 steps flagged

No circularity in AnchorFlow's proposed modeling framework

full rationale

The paper describes AnchorFlow as a proposed framework that extracts path-like foreground components from a raster image, predicts an image-conditioned sparse anchor field, resolves it into an ordered Bezier path, applies rendering-guided feedback to correct local errors, and assembles the paths into a final SVG. No equations, derivations, or self-referential definitions are present in the provided text; the sparse anchor field prediction and resolution steps are presented as independent architectural choices rather than quantities defined in terms of their own outputs or fitted parameters. The central claim of a favorable fidelity-editability trade-off rests on empirical experiments rather than any reduction to self-citations or ansatzes smuggled via prior work. This makes the approach self-contained with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claim rests on unstated assumptions about component extraction and feedback correction whose details are not supplied.

pith-pipeline@v0.9.0 · 5757 in / 1131 out tokens · 33299 ms · 2026-05-20T02:01:17.658459+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

predicts an image-conditioned sparse anchor field for each component and resolves it into an ordered Bézier path. Rendering-guided feedback then corrects local structural errors
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sparse anchor fields for editable SVG reconstruction

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 1 internal anchor

[1]

AutoTrace: Bitmap to vector graphics converter.https://autotrace.sourceforge

AutoTrace Project. AutoTrace: Bitmap to vector graphics converter.https://autotrace.sourceforge. net/, 2024. Open source software

work page 2024
[2]

DeepSVG: A hierarchical generative network for vector graphics animation

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. DeepSVG: A hierarchical generative network for vector graphics animation. InAdvances in Neural Information Processing Systems, volume 33, pages 16351–16361, 2020. URLhttps://proceedings.neurips.cc/paper/2020/hash/ bcf9d6bd14a2095866ce8c950b702341-Abstract.html

work page 2020
[3]

Image vectorization via gradient reconstruction.Computer Graphics Forum, 44(2):e70055, 2025

Souymodip Chakraborty, Vineet Batra, Ankit Phogat, Vishwas Jain, Jaswant Singh Ranawat, Sumit Dhingra, Kevin Wampler, and Michal Lukáˇc. Image vectorization via gradient reconstruction.Computer Graphics Forum, 44(2):e70055, 2025. doi: 10.1111/cgf.70055. URL https://onlinelibrary.wiley. com/doi/10.1111/cgf.70055

work page doi:10.1111/cgf.70055 2025
[4]

Svgbuilder: Component-based colored svg generation with text-guided autoregressive transformers

Zehao Chen and Rong Pan. Svgbuilder: Component-based colored svg generation with text-guided autoregressive transformers. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

work page 2025
[5]

Cloud2curve: Generation and vectorization of parametric sketches

Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. Cloud2curve: Generation and vectorization of parametric sketches. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7088–7097, 2021

work page 2021
[6]

Image vectorization: A review.arXiv preprint arXiv:2306.06441, 2023

Maria Dziuba, Ivan Jarsky, Valeria Efimova, and Andrey Filchenkov. Image vectorization: A review.arXiv preprint arXiv:2306.06441, 2023

work page arXiv 2023
[7]

Deep vectorization of technical drawings

Vage Egiazarian, Oleg V oynov, Alexey Artemov, Denis V olkhonskiy, Aleksandr Safin, Maria Taktasheva, Denis Zorin, and Evgeny Burnaev. Deep vectorization of technical drawings. InComputer Vision – ECCV 2020, pages 582–598, 2020. doi: 10.1007/978-3-030-58601-0_35. URL https://www.ecva.net/ papers/eccv_2020/papers_ECCV/html/123580579.php

work page doi:10.1007/978-3-030-58601-0_35 2020
[8]

Noto Emoji

Google Fonts. Noto Emoji. https://github.com/googlefonts/noto-emoji, 2026. Open source emoji library with vector SVG and PNG assets

work page 2026
[9]

Weld, and Ranjay Krishna

Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S. Weld, and Ranjay Krishna. Vfig: Vectorizing complex figures in svg with vision-language models.arXiv preprint arXiv:2603.24575, 2026

work page arXiv 2026
[10]

Optimize & reduce: A top-down approach for image vectorization

Or Hirschorn, Amir Jevnisek, and Shai Avidan. Optimize & reduce: A top-down approach for image vectorization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2148–2156, 2024. doi: 10.1609/aaai.v38i3.27987. URL https://ojs.aaai.org/index.php/AAAI/ article/view/27987

work page doi:10.1609/aaai.v38i3.27987 2024
[11]

AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling

Juncheng Hu, Ziteng Xue, Guotao Liang, Anran Qi, Buyu Li, Sheng Wang, Dong Xu, and Qian Yu. Amodalsvg: Amodal image vectorization via semantic layer peeling.arXiv preprint arXiv:2604.10940, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

Rosin, and Yu-Kun Lai

Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, and Yu-Kun Lai. Supersvg: Superpixel- based scalable vector graphics synthesis.arXiv preprint arXiv:2406.09794, 2024

work page arXiv 2024
[13]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1911–1920, 2023

work page 1911
[14]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics, 39(6):193:1–193:15, 2020

Tzu-Mao Li, Michal Lukáˇc, Michaël Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics, 39(6):193:1–193:15, 2020. doi: 10.1145/3414685.3417871. URLhttps://people.csail.mit.edu/tzumao/diffvg/

work page doi:10.1145/3414685.3417871 2020
[15]

End-to-end line drawing vectorization

Hanyuan Liu, Chengze Li, Xueting Liu, and Tien-Tsin Wong. End-to-end line drawing vectorization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4559–4566, 2022. doi: 10.1609/aaai.v36i4.20379. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/20379

work page doi:10.1609/aaai.v36i4.20379 2022
[16]

A learned representation for scalable vector graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vector graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019

work page 2019
[17]

In: CVPR

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer-wise image vectorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16314–16323, 2022. doi: 10.1109/CVPR52688. 2022.01583. URL https://openaccess.thecvf.com/content/CVPR2022/html/Ma_Towards_ Layer-...

work page doi:10.1109/cvpr52688 2022
[18]

Fluent Emoji

Microsoft. Fluent Emoji. https://github.com/microsoft/fluentui-emoji, 2026. Open source emoji collection, MIT license

work page 2026
[19]

Pradyumna Reddy, Michaël Gharbi, Michal Luká ˇc, and Niloy J. Mitra. Im2Vec: Synthesizing vector graphics without vector supervision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 734–743, 2021. URLhttps://arxiv.org/abs/2102.02798

work page arXiv 2021
[20]

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Starvector: Generating scalable vector graphics code from images and text. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16175–16186, 2025. doi: 10.1109/CVPR52734.2...

work page doi:10.1109/cvpr52734.2025.01508 2025
[21]

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Rishav Pramanik, Aarash Feizi, Pascal Wichmann, Arnab Kumar Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Rendering-aware reinforcement learning for vector graphics generation. InThe Thirty-Ninth Annual Conferenc...

work page 2025
[22]

Potrace: A polygon-based tracing algorithm, September 2003

Peter Selinger. Potrace: A polygon-based tracing algorithm, September 2003. URL https://www. mathstat.dal.ca/~selinger/potrace/potrace.pdf. Technical report

work page 2003
[23]

VTracer: Raster to vector graphics converter

Vision Cortex. VTracer: Raster to vector graphics converter. https://github.com/visioncortex/ vtracer, 2024. Open source software

work page 2024
[24]

Layered image vectorization via semantic simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun, Yuanhao Gong, Daniel Cohen-Or, and Min Lu. Layered image vectorization via semantic simplification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7728–7738, 2025. URL https://openaccess.thecvf.com/content/CVPR2025/html/Wang_Layered_Image_ Vectorization_via_Semantic_Simplifi...

work page 2025
[25]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004
[26]

Iconshop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics, 42(6):230:1–230:14, 2023

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Iconshop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics, 42(6):230:1–230:14, 2023

work page 2023
[27]

Chat2svg: Vector graphics generation with large language models and image diffusion models

Ronghuan Wu, Wanchao Su, and Jing Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025
[28]

Svgdreamer: Text guided svg generation with diffusion model

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, and Qian Yu. Svgdreamer: Text guided svg generation with diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

work page 2024
[29]

Omnisvg: A unified scalable vector graphics generation model.arXiv preprint arXiv:2504.06263, 2025

Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, and Yu-Gang Jiang. Omnisvg: A unified scalable vector graphics generation model.arXiv preprint arXiv:2504.06263, 2025

work page arXiv 2025
[30]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018
[31]

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Kaibo Zhao, Liang Bao, Yufei Li, Xu Su, Ke Zhang, and Xiaotian Qiao. Less is more: Efficient image vectorization with adaptive parameterization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18166–18175, 2025. doi: 10.1109/CVPR52734.2025.01693. URL https://openaccess.thecvf.com/content/CVPR2025/html/Zhao_Less_i...

work page doi:10.1109/cvpr52734.2025.01693 2025
[32]

Segmentation-guided layer-wise image vectorization with gradient fills.arXiv preprint arXiv:2408.15741, 2024

Hengyu Zhou, Hui Zhang, and Bin Wang. Segmentation-guided layer-wise image vectorization with gradient fills.arXiv preprint arXiv:2408.15741, 2024

work page arXiv 2024
[33]

Win” and “Lose

Haokun Zhu, Juang Ian Chong, Teng Hu, Ran Yi, Yu-Kun Lai, and Paul L. Rosin. Samvg: A multi-stage image vectorization model with the segment-anything model.arXiv preprint arXiv:2311.05276, 2023. 12 A Supplementary experimental results A.1 Failure cases Figure 8 shows two representative failure modes of AnchorFlow. First, the AdaVec-style decomposi- tion f...

work page arXiv 2023
[34]

Parse the SVG path geometry and obtain the editable anchor positions from the path structure

work page
[35]

Rasterize the local path or component into a grayscale crop and normalize it to the training resolution

work page
[36]

Sample the target contour and construct the sparse field target using anchor peaks and weaker contour support, as defined in Eq. 12

work page
[37]

This construction ensures that the predictor is trained on structural evidence for anchor placement rather than on object occupancy masks or final SVG code

Store the raster crop, the field target, and metadata needed for reproducible splitting and validation. This construction ensures that the predictor is trained on structural evidence for anchor placement rather than on object occupancy masks or final SVG code. D.2 Training corpus composition We train the anchor field predictor on a lightweight mixed SVG-d...

work page
[38]

Decompose X into path-like foreground components {Xm, Tm}M m=1 =D(X) , where Xm is a normal- ized crop andT m stores the crop-to-canvas transform

work page
[39]

, M: 2.1 Encode and decode the crop to obtain an initial sparse anchor field: zm,0 =E ϕ(Xm) and Fm,0 = Gθ(zm,0)

For each componentm= 1, . . . , M: 2.1 Encode and decode the crop to obtain an initial sparse anchor field: zm,0 =E ϕ(Xm) and Fm,0 = Gθ(zm,0). 2.2 Hard resolve the field into anchors, ordering, connectivity, tangents, and cubic Bézier controls: (C 0 m, s0 m) =H(F m,0, Xm). 2.3 Apply SDF-guided control point pulling under the fixed anchor structure. Only i...

work page
[40]

Transform all accepted component paths back to the original canvas usingT −1 m

work page
[41]

Assemble the transformed paths according to the part order, assign appearance from the raster components, and optionally fit simple gradients for non-uniform parts

work page
[42]

Table 11: Default inference settings used in the implementation

Optionally apply global pydiffvg polishing to the assembled SVG without adding new paths. Table 11: Default inference settings used in the implementation. Setting Value Maximum field refinement rounds 2 Latent update variable∆zonly Acceptance score tolerance-based strokeF δ Control point refinement SDF-guided control point pulling Final repair optional mi...

work page

[1] [1]

AutoTrace: Bitmap to vector graphics converter.https://autotrace.sourceforge

AutoTrace Project. AutoTrace: Bitmap to vector graphics converter.https://autotrace.sourceforge. net/, 2024. Open source software

work page 2024

[2] [2]

DeepSVG: A hierarchical generative network for vector graphics animation

Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. DeepSVG: A hierarchical generative network for vector graphics animation. InAdvances in Neural Information Processing Systems, volume 33, pages 16351–16361, 2020. URLhttps://proceedings.neurips.cc/paper/2020/hash/ bcf9d6bd14a2095866ce8c950b702341-Abstract.html

work page 2020

[3] [3]

Image vectorization via gradient reconstruction.Computer Graphics Forum, 44(2):e70055, 2025

Souymodip Chakraborty, Vineet Batra, Ankit Phogat, Vishwas Jain, Jaswant Singh Ranawat, Sumit Dhingra, Kevin Wampler, and Michal Lukáˇc. Image vectorization via gradient reconstruction.Computer Graphics Forum, 44(2):e70055, 2025. doi: 10.1111/cgf.70055. URL https://onlinelibrary.wiley. com/doi/10.1111/cgf.70055

work page doi:10.1111/cgf.70055 2025

[4] [4]

Svgbuilder: Component-based colored svg generation with text-guided autoregressive transformers

Zehao Chen and Rong Pan. Svgbuilder: Component-based colored svg generation with text-guided autoregressive transformers. InProceedings of the AAAI Conference on Artificial Intelligence, 2025

work page 2025

[5] [5]

Cloud2curve: Generation and vectorization of parametric sketches

Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. Cloud2curve: Generation and vectorization of parametric sketches. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7088–7097, 2021

work page 2021

[6] [6]

Image vectorization: A review.arXiv preprint arXiv:2306.06441, 2023

Maria Dziuba, Ivan Jarsky, Valeria Efimova, and Andrey Filchenkov. Image vectorization: A review.arXiv preprint arXiv:2306.06441, 2023

work page arXiv 2023

[7] [7]

Deep vectorization of technical drawings

Vage Egiazarian, Oleg V oynov, Alexey Artemov, Denis V olkhonskiy, Aleksandr Safin, Maria Taktasheva, Denis Zorin, and Evgeny Burnaev. Deep vectorization of technical drawings. InComputer Vision – ECCV 2020, pages 582–598, 2020. doi: 10.1007/978-3-030-58601-0_35. URL https://www.ecva.net/ papers/eccv_2020/papers_ECCV/html/123580579.php

work page doi:10.1007/978-3-030-58601-0_35 2020

[8] [8]

Noto Emoji

Google Fonts. Noto Emoji. https://github.com/googlefonts/noto-emoji, 2026. Open source emoji library with vector SVG and PNG assets

work page 2026

[9] [9]

Weld, and Ranjay Krishna

Qijia He, Xunmei Liu, Hammaad Memon, Ziang Li, Zixian Ma, Jaemin Cho, Jason Ren, Daniel S. Weld, and Ranjay Krishna. Vfig: Vectorizing complex figures in svg with vision-language models.arXiv preprint arXiv:2603.24575, 2026

work page arXiv 2026

[10] [10]

Optimize & reduce: A top-down approach for image vectorization

Or Hirschorn, Amir Jevnisek, and Shai Avidan. Optimize & reduce: A top-down approach for image vectorization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 2148–2156, 2024. doi: 10.1609/aaai.v38i3.27987. URL https://ojs.aaai.org/index.php/AAAI/ article/view/27987

work page doi:10.1609/aaai.v38i3.27987 2024

[11] [11]

AmodalSVG: Amodal Image Vectorization via Semantic Layer Peeling

Juncheng Hu, Ziteng Xue, Guotao Liang, Anran Qi, Buyu Li, Sheng Wang, Dong Xu, and Qian Yu. Amodalsvg: Amodal image vectorization via semantic layer peeling.arXiv preprint arXiv:2604.10940, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[12] [12]

Rosin, and Yu-Kun Lai

Teng Hu, Ran Yi, Baihong Qian, Jiangning Zhang, Paul L. Rosin, and Yu-Kun Lai. Supersvg: Superpixel- based scalable vector graphics synthesis.arXiv preprint arXiv:2406.09794, 2024

work page arXiv 2024

[13] [13]

Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1911–1920, 2023

work page 1911

[14] [14]

Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics, 39(6):193:1–193:15, 2020

Tzu-Mao Li, Michal Lukáˇc, Michaël Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics, 39(6):193:1–193:15, 2020. doi: 10.1145/3414685.3417871. URLhttps://people.csail.mit.edu/tzumao/diffvg/

work page doi:10.1145/3414685.3417871 2020

[15] [15]

End-to-end line drawing vectorization

Hanyuan Liu, Chengze Li, Xueting Liu, and Tien-Tsin Wong. End-to-end line drawing vectorization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4559–4566, 2022. doi: 10.1609/aaai.v36i4.20379. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/20379

work page doi:10.1609/aaai.v36i4.20379 2022

[16] [16]

A learned representation for scalable vector graphics

Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vector graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision, 2019

work page 2019

[17] [17]

In: CVPR

Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer-wise image vectorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16314–16323, 2022. doi: 10.1109/CVPR52688. 2022.01583. URL https://openaccess.thecvf.com/content/CVPR2022/html/Ma_Towards_ Layer-...

work page doi:10.1109/cvpr52688 2022

[18] [18]

Fluent Emoji

Microsoft. Fluent Emoji. https://github.com/microsoft/fluentui-emoji, 2026. Open source emoji collection, MIT license

work page 2026

[19] [19]

Pradyumna Reddy, Michaël Gharbi, Michal Luká ˇc, and Niloy J. Mitra. Im2Vec: Synthesizing vector graphics without vector supervision. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 734–743, 2021. URLhttps://arxiv.org/abs/2102.02798

work page arXiv 2021

[20] [20]

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Juan A. Rodriguez, Abhay Puri, Shubham Agarwal, Issam H. Laradji, Pau Rodriguez, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Starvector: Generating scalable vector graphics code from images and text. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16175–16186, 2025. doi: 10.1109/CVPR52734.2...

work page doi:10.1109/cvpr52734.2025.01508 2025

[21] [21]

Juan A. Rodriguez, Haotian Zhang, Abhay Puri, Rishav Pramanik, Aarash Feizi, Pascal Wichmann, Arnab Kumar Mondal, Mohammad Reza Samsami, Rabiul Awal, Perouz Taslakian, Spandana Gella, Sai Rajeswar, David Vazquez, Christopher Pal, and Marco Pedersoli. Rendering-aware reinforcement learning for vector graphics generation. InThe Thirty-Ninth Annual Conferenc...

work page 2025

[22] [22]

Potrace: A polygon-based tracing algorithm, September 2003

Peter Selinger. Potrace: A polygon-based tracing algorithm, September 2003. URL https://www. mathstat.dal.ca/~selinger/potrace/potrace.pdf. Technical report

work page 2003

[23] [23]

VTracer: Raster to vector graphics converter

Vision Cortex. VTracer: Raster to vector graphics converter. https://github.com/visioncortex/ vtracer, 2024. Open source software

work page 2024

[24] [24]

Layered image vectorization via semantic simplification

Zhenyu Wang, Jianxi Huang, Zhida Sun, Yuanhao Gong, Daniel Cohen-Or, and Min Lu. Layered image vectorization via semantic simplification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7728–7738, 2025. URL https://openaccess.thecvf.com/content/CVPR2025/html/Wang_Layered_Image_ Vectorization_via_Semantic_Simplifi...

work page 2025

[25] [25]

Bovik, Hamid R

Zhou Wang, Alan C. Bovik, Hamid R. Sheikh, and Eero P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, 2004

work page 2004

[26] [26]

Iconshop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics, 42(6):230:1–230:14, 2023

Ronghuan Wu, Wanchao Su, Kede Ma, and Jing Liao. Iconshop: Text-guided vector icon synthesis with autoregressive transformers.ACM Transactions on Graphics, 42(6):230:1–230:14, 2023

work page 2023

[27] [27]

Chat2svg: Vector graphics generation with large language models and image diffusion models

Ronghuan Wu, Wanchao Su, and Jing Liao. Chat2svg: Vector graphics generation with large language models and image diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025

work page 2025

[28] [28]

Svgdreamer: Text guided svg generation with diffusion model

Ximing Xing, Haitao Zhou, Chuang Wang, Jing Zhang, Dong Xu, and Qian Yu. Svgdreamer: Text guided svg generation with diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

work page 2024

[29] [29]

Omnisvg: A unified scalable vector graphics generation model.arXiv preprint arXiv:2504.06263, 2025

Yiying Yang, Wei Cheng, Sijin Chen, Xianfang Zeng, Jiaxu Zhang, Liao Wang, Gang Yu, Xingjun Ma, and Yu-Gang Jiang. Omnisvg: A unified scalable vector graphics generation model.arXiv preprint arXiv:2504.06263, 2025

work page arXiv 2025

[30] [30]

Efros, Eli Shechtman, and Oliver Wang

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018

work page 2018

[31] [31]

Rocket-1: Mastering open-world interaction with visual-temporal context prompting

Kaibo Zhao, Liang Bao, Yufei Li, Xu Su, Ke Zhang, and Xiaotian Qiao. Less is more: Efficient image vectorization with adaptive parameterization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18166–18175, 2025. doi: 10.1109/CVPR52734.2025.01693. URL https://openaccess.thecvf.com/content/CVPR2025/html/Zhao_Less_i...

work page doi:10.1109/cvpr52734.2025.01693 2025

[32] [32]

Segmentation-guided layer-wise image vectorization with gradient fills.arXiv preprint arXiv:2408.15741, 2024

Hengyu Zhou, Hui Zhang, and Bin Wang. Segmentation-guided layer-wise image vectorization with gradient fills.arXiv preprint arXiv:2408.15741, 2024

work page arXiv 2024

[33] [33]

Win” and “Lose

Haokun Zhu, Juang Ian Chong, Teng Hu, Ran Yi, Yu-Kun Lai, and Paul L. Rosin. Samvg: A multi-stage image vectorization model with the segment-anything model.arXiv preprint arXiv:2311.05276, 2023. 12 A Supplementary experimental results A.1 Failure cases Figure 8 shows two representative failure modes of AnchorFlow. First, the AdaVec-style decomposi- tion f...

work page arXiv 2023

[34] [34]

Parse the SVG path geometry and obtain the editable anchor positions from the path structure

work page

[35] [35]

Rasterize the local path or component into a grayscale crop and normalize it to the training resolution

work page

[36] [36]

Sample the target contour and construct the sparse field target using anchor peaks and weaker contour support, as defined in Eq. 12

work page

[37] [37]

This construction ensures that the predictor is trained on structural evidence for anchor placement rather than on object occupancy masks or final SVG code

Store the raster crop, the field target, and metadata needed for reproducible splitting and validation. This construction ensures that the predictor is trained on structural evidence for anchor placement rather than on object occupancy masks or final SVG code. D.2 Training corpus composition We train the anchor field predictor on a lightweight mixed SVG-d...

work page

[38] [38]

Decompose X into path-like foreground components {Xm, Tm}M m=1 =D(X) , where Xm is a normal- ized crop andT m stores the crop-to-canvas transform

work page

[39] [39]

, M: 2.1 Encode and decode the crop to obtain an initial sparse anchor field: zm,0 =E ϕ(Xm) and Fm,0 = Gθ(zm,0)

For each componentm= 1, . . . , M: 2.1 Encode and decode the crop to obtain an initial sparse anchor field: zm,0 =E ϕ(Xm) and Fm,0 = Gθ(zm,0). 2.2 Hard resolve the field into anchors, ordering, connectivity, tangents, and cubic Bézier controls: (C 0 m, s0 m) =H(F m,0, Xm). 2.3 Apply SDF-guided control point pulling under the fixed anchor structure. Only i...

work page

[40] [40]

Transform all accepted component paths back to the original canvas usingT −1 m

work page

[41] [41]

Assemble the transformed paths according to the part order, assign appearance from the raster components, and optionally fit simple gradients for non-uniform parts

work page

[42] [42]

Table 11: Default inference settings used in the implementation

Optionally apply global pydiffvg polishing to the assembled SVG without adding new paths. Table 11: Default inference settings used in the implementation. Setting Value Maximum field refinement rounds 2 Latent update variable∆zonly Acceptance score tolerance-based strokeF δ Control point refinement SDF-guided control point pulling Final repair optional mi...

work page