Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization

Jaerin Lee; Kanggeon Lee; Kyoung Mu Lee

arxiv: 2605.11913 · v2 · pith:5LP3HL3Pnew · submitted 2026-05-12 · 💻 cs.CV

Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization

Jaerin Lee , Kanggeon Lee , Kyoung Mu Lee This is my paper

Pith reviewed 2026-06-30 22:25 UTC · model grok-4.3

classification 💻 cs.CV

keywords differentiable vectorizationimage vectorizationtopology collapsegradient aggregationhierarchical optimizationvector graphicscurve primitives

0 comments

The pith

Vector Scaffolding organizes curve optimization into hierarchical stages to prevent topology collapse during differentiable image vectorization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Flat optimization lets hundreds of random curves compete directly on pixel error, so local high-frequency noise warps larger structures into dense, uneditable polygon collections. The paper traces the collapse to a persistent imbalance between area and boundary gradient magnitudes and introduces Interior Gradient Aggregation to rebalance the loss landscape for multi-scale curve mixtures. Progressive Stratification and Rapid Inflation Scheduling then let primitives be added at extremely high learning rates while preserving macroscopic topology. The resulting vectors are produced faster and match the input image more closely than prior flat methods.

Core claim

By replacing flat pixel-matching with a staged topological construction that first stabilizes learning through Interior Gradient Aggregation and then densifies primitives via Progressive Stratification and Rapid Inflation Scheduling, the optimization converges to editable vector graphics without the redundant structures that previously limited practical use.

What carries the argument

Interior Gradient Aggregation, which corrects the scale imbalance between area and boundary gradient contributions inside the differentiable rendering loss.

If this is right

Optimization finishes in roughly 2.5 times less wall-clock time than the prior state of the art.
Final rasterized images reach up to 1.4 dB higher PSNR on standard test sets.
The output vector files contain fewer redundant curves and preserve larger-scale structures, making manual editing more feasible.
Learning rates fifty times larger than usual remain stable once the gradient aggregation is applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same rebalancing step may reduce collapse in other differentiable rendering tasks that mix primitives at widely different scales.
Adding one more stratification level could support still larger primitive budgets before collapse reappears.
The stabilized high-rate schedule could be paired with user-specified topology constraints to produce vectors that match both image content and intended editability.

Load-bearing premise

The mathematical imbalance between area and boundary gradients is the main driver of topology collapse, and Interior Gradient Aggregation plus the two scheduling techniques fix it without creating new instabilities.

What would settle it

Reproducing the reported benchmarks with the proposed method and finding neither faster convergence nor higher PSNR, or observing the same degree of topology collapse, would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11913 by Jaerin Lee, Kanggeon Lee, Kyoung Mu Lee.

**Figure 1.** Figure 1: We introduce a hierarchical optimization framework for fast and stable differentiable image vectorization. By accelerating the learning dynamics of multi-scale curve mixtures, we achieve higher rendering fidelity in a fraction of the optimization time required by existing methods. The slow speed of these early works is due to the sequential reconstruction of vectors, curve by curve. Bézier Splatting [14] … view at source ↗

**Figure 2.** Figure 2: Overview of Vector Scaffolding. (a) Interior Gradient Aggregation: Optimization is stabilized by aggregating internal area gradients alongside boundary gradients via the Reynolds transport theorem. (b) Rapid Inflation Scheduling: Progressive Stratification aligns vector representation with the natural power law of image frequency, enabling extremely high learning rates without instability. The vector re… view at source ↗

**Figure 3.** Figure 3: Qualitative Comparison. Compared with the state-of-the-art differentiable vectorization method [14], our method preserves fine structural details and coherent object boundaries under the same curve budget (N = 512). optimization time by 2.5× compared to the fastest baseline, Bézier Splatting [14], while achieving the best PSNR scores. We emphasize that this 2.5× figure is measured in wall-clock time; the … view at source ↗

**Figure 4.** Figure 4: LoD Control Demonstration. We fit our Vector Scaffolding to a super high-resolution image of the Earth (8000 × 8000) [19]. The first row shows the training dynamics at different curve counts, while the second row shows the level-of-detail (LoD) separation after fitting 1024 curves. (a) Ground truth Kodim 07 (b) Without interior gradients 23.4553 dB (c) With interior gradients 28.0802 dB [PITH_FULL_IMAGE:f… view at source ↗

**Figure 5.** Figure 5: Effect of Interior Gradients. (a) Ground truth. (b) Without interior gradients, the base curves lose their internal anchors, causing optimization drift and poor convergence. (c) With interior gradients, our method maintains structural integrity while capturing photometric information. vector representation can be densified sequentially from base structures to finest details. Therefore, our Vector Scaffold… view at source ↗

**Figure 6.** Figure 6: Hierarchical Scaffolding vs. Flat Optimization. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Layered Primitive Visualization. The deterministic temporal z-ordering induced by Progressive Stratification naturally aligns the optimization-induced layer index with the underlying scale hierarchy, so newer fine-scale curves sit on top of coarser base curves without dynamic re-sorting [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Editability Demonstration. Output of our framework imported into a vector-editing demo built upon our pipeline. The hierarchical scaffold yields path primitives organized by level-of-detail, enabling straightforward selection and local edits at the vector level. We claim improved local editability rather than a full semanticeditability solution [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Optimization trajectory on Kodak kodim01. Top: ours; bottom: Bézier Splatting. Columns are matched iterations (∼100, 600, 1600, 4000, 9980). Our method anchors smooth roof/wall regions early, whereas the baseline scatters narrow strokes that never coalesce [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 10.** Figure 10: Optimization trajectory on DIV2K 0294. Top: ours; bottom: Bézier Splatting. Background foliage and fur texture form coherently in our run, while the baseline keeps redistributing strokes near the subject without locking the surrounding context. the baseline algorithm for 10 k iterations. This speedup is visualized in Figure 1b in the main text. To this end, Figures 9–10 present intermediate frames extract… view at source ↗

**Figure 11.** Figure 11: Optimization trajectory on Kodak kodim19 (portrait). Top: ours; bottom: Bézier Splatting at matched iterations. Our hierarchical refinement quickly converges to clean silhouettes, while the baseline keeps scattered fragments around the boundaries throughout training [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Optimization trajectory on DIV2K 0112. Top: ours; bottom: Bézier Splatting. The portrait scene benefits the most from progressive stratification — skin tones and fabric shading are recovered smoothly in our method, while the baseline distributes high-frequency noise across the face throughout training [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

read the original abstract

Differentiable vector graphics have enabled powerful gradient-based optimization of vector primitives directly from raster images. However, existing frameworks formulate this as a flat optimization problem, forcing hundreds to thousands of randomly initialized curves to blindly compete for pixel-level error reduction. This disordered optimization leads to topology collapse, where macroscopic structures are distorted by internal high-frequency noise, resulting in a redundant and uneditable "polygon soup" that limits practical editability. To address this limitation, we propose Vector Scaffolding, a novel hierarchical optimization framework that shifts from flat pixel-matching to structured topological construction tailored for vector graphics. By identifying a key cause of topology collapse as the mathematical imbalance between area and boundary gradients, we introduce Interior Gradient Aggregation to stabilize the learning dynamics of multi-scale curve mixtures. Upon this stabilized landscape, we employ Progressive Stratification and Rapid Inflation Scheduling to progressively densify vector primitives with extremely high learning rates ($\times 50$). Experiments demonstrate that our approach accelerates optimization by $2.5\times$ while simultaneously improving PSNR by up to 1.4 dB over the previous state of the art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Vector Scaffolding gives a hierarchical way to stabilize vector curve optimization by targeting gradient imbalance, with reported speed and quality gains, but the mechanism needs direct ablations to hold up.

read the letter

Hi,

The main thing here is a shift from flat pixel-level optimization of hundreds of curves to a structured hierarchical process that tries to build topology more deliberately. The paper names three pieces: Interior Gradient Aggregation to counter the area-boundary gradient imbalance, Progressive Stratification, and Rapid Inflation Scheduling that lets learning rates go up by 50x.

What is actually new is the combination framed as scaffolding rather than just another loss tweak. It does a reasonable job laying out why disordered competition produces redundant curves and uneditable results, and the reported 2.5x faster optimization plus 1.4 dB PSNR lift over prior work would matter for anyone turning raster images into editable vectors.

The soft spot is the missing link between the claimed cause and the fix. The stress-test note is on target: there is no sign in the abstract of separate gradient magnitude measurements before and after aggregation, nor a test of whether simply re-weighting the baseline loss would cut collapse without the full scaffolding. If those controls are absent from the full paper, the attribution stays uncertain and the gains could trace to the scheduling or initialization instead.

This is for graphics and vision researchers who work on differentiable vectorization or need cleaner outputs for editing pipelines. A reader who wants concrete optimization tricks for this sub-problem can extract value even while testing the assumptions themselves.

It deserves peer review. The framing is clear and the practical target is real, so referees can check the experiments and ask for the necessary ablations.

Referee Report

2 major / 2 minor

Summary. The paper claims that topology collapse in flat differentiable vector graphics optimization stems from the mathematical imbalance between area and boundary gradients; it introduces Vector Scaffolding with Interior Gradient Aggregation to stabilize multi-scale curve mixtures, plus Progressive Stratification and Rapid Inflation Scheduling to enable ×50 learning rates, yielding 2.5× faster optimization and up to 1.4 dB PSNR gains over prior state-of-the-art methods.

Significance. If the reported gains hold under rigorous validation, the work would advance practical differentiable vectorization by producing more topologically coherent and editable outputs, addressing a recognized limitation in converting raster images to vector primitives for graphics applications.

major comments (2)

[Abstract] Abstract and §1: the central attribution of topology collapse to area-boundary gradient imbalance, and the claim that Interior Gradient Aggregation directly corrects it, is load-bearing for the 2.5× speed-up and 1.4 dB PSNR results, yet no gradient-magnitude measurements, separate ablations of re-weighting the baseline loss, or tests isolating initialization/curvature effects are described; without these the mechanism remains unsubstantiated.
[Experiments] Experiments section: the abstract states clear quantitative improvements but supplies no baselines, datasets, error bars, or statistical significance tests, preventing assessment of whether the gains generalize or are attributable to the proposed scaffolding rather than hyper-parameter tuning.

minor comments (2)

[Abstract] The abstract refers to 'extremely high learning rates (×50)' without stating the reference learning rate or the precise form of Rapid Inflation Scheduling.
Notation for the three proposed components (Interior Gradient Aggregation, Progressive Stratification, Rapid Inflation Scheduling) is introduced without an accompanying diagram or pseudocode in the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript accordingly to strengthen the claims with additional evidence.

read point-by-point responses

Referee: [Abstract] Abstract and §1: the central attribution of topology collapse to area-boundary gradient imbalance, and the claim that Interior Gradient Aggregation directly corrects it, is load-bearing for the 2.5× speed-up and 1.4 dB PSNR results, yet no gradient-magnitude measurements, separate ablations of re-weighting the baseline loss, or tests isolating initialization/curvature effects are described; without these the mechanism remains unsubstantiated.

Authors: Section 3.1 derives the area-boundary gradient imbalance from first principles as the source of topology collapse in flat optimization. Interior Gradient Aggregation is introduced precisely to rebalance these gradients during multi-scale curve optimization. We agree that direct empirical validation would strengthen the argument and will add gradient-magnitude measurements before/after aggregation, ablations isolating re-weighting from the baseline loss, and controlled experiments holding initialization and curvature fixed. revision: yes
Referee: [Experiments] Experiments section: the abstract states clear quantitative improvements but supplies no baselines, datasets, error bars, or statistical significance tests, preventing assessment of whether the gains generalize or are attributable to the proposed scaffolding rather than hyper-parameter tuning.

Authors: The current experiments section compares against prior state-of-the-art differentiable vectorization methods and reports the stated speed-up and PSNR gains. To enable rigorous assessment, the revision will explicitly enumerate all baselines, name the datasets, report standard deviations from multiple independent runs, and include statistical significance tests (e.g., paired t-tests) confirming that the improvements arise from the scaffolding components rather than hyper-parameter differences alone. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation or claims

full rationale

The provided abstract and context contain no equations, fitted parameters, or self-citations that reduce any claimed prediction, cause identification, or performance gain to the inputs by construction. The core claims rest on an empirical identification of gradient imbalance as a cause of topology collapse, followed by proposed stabilization techniques whose efficacy is asserted via experimental speed and PSNR improvements. These are external benchmarks rather than self-referential derivations. No load-bearing step matches any of the enumerated circularity patterns, as there are no quoted reductions of the form 'X is defined in terms of Y' or 'fitted input renamed as prediction.' The reader's preliminary score of 2.0 is consistent with the absence of any detectable circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; full manuscript unavailable so ledger entries are limited to claims visible in the provided text.

free parameters (1)

Rapid Inflation Scheduling multiplier
Extremely high learning rate of x50 introduced to densify primitives; value appears chosen rather than derived.

axioms (1)

domain assumption Mathematical imbalance between area and boundary gradients is the key cause of topology collapse
Abstract states this imbalance was identified as the primary cause.

pith-pipeline@v0.9.1-grok · 5723 in / 1164 out tokens · 30188 ms · 2026-06-30T22:25:21.363739+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 23 canonical work pages

[1]

In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition Workshops (CVPRW)

Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super- resolution: Dataset and study. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition Workshops (CVPRW). pp. 1122–1131 (2017).https://doi.org/10.1109/CVPRW.2017.150

work page doi:10.1109/cvprw.2017.150 2017
[2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Cao, D., Wang, Z., Echevarria, J., Liu, Y.: SVGformer: Representation learning for continuous vector graphics using transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10093– 10102 (2023)

2023
[3]

Capsfusion: Rethinking image-text data at scale

Chen, Y., Ni, B., Liu, J., Huang, X., Chen, X.: Towards high-fidelity artistic image vectorization via texture-encapsulated shape parameterization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15877–15886 (2024).https://doi.org/10.1109/CVPR52733.2024.01503

work page doi:10.1109/cvpr52733.2024.01503 2024
[4]

ACM Transactions on Graphics42(4), 1– 13 (2023).https://doi.org/10.1145/3592128

Du, Z.J., Kang, L.F., Tan, J., Gingold, Y., Xu, K.: Image vectorization and editing via linear gradient layer decomposition. ACM Transactions on Graphics42(4), 1– 13 (2023).https://doi.org/10.1145/3592128

work page doi:10.1145/3592128 2023
[5]

Dataset (1999), https://r0k.us/graphics/kodak/, accessed: 2026-05-12

Eastman Kodak Company: Kodak lossless true color image suite. Dataset (1999), https://r0k.us/graphics/kodak/, accessed: 2026-05-12

1999
[6]

Capsfusion: Rethinking image-text data at scale

Guédon, A., Lepetit, V.: SuGaR: Surface-aligned gaussian splatting for efficient 3D mesh reconstruction and high-quality mesh rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5354–5363 (2024).https://doi.org/10.1109/CVPR52733.2024.00512

work page doi:10.1109/cvpr52733.2024.00512 2024
[7]

In: International Conference on Learning Representations (ICLR) (2025)

Guo, M., Wang, B., He, K., Matusik, W.: TetSphere splatting: Representing high- quality geometry with lagrangian volumetric meshes. In: International Conference on Learning Representations (ICLR) (2025)

2025
[8]

Proceedings of the AAAI Conference on Artificial Intelli- gence38(3), 2148–2156 (2024).https://doi.org/10.1609/aaai.v38i3.27987

Hirschorn, O., Jevnisek, A., Avidan, S.: Optimize & reduce: A top-down approach for image vectorization. Proceedings of the AAAI Conference on Artificial Intelli- gence38(3), 2148–2156 (2024).https://doi.org/10.1609/aaai.v38i3.27987

work page doi:10.1609/aaai.v38i3.27987 2024
[9]

In: Ad- vances in Neural Information Processing Systems

Ho, J., Jain, A.N., Abbeel, P.: Denoising diffusion probabilistic models. In: Ad- vances in Neural Information Processing Systems. vol. 33, pp. 6840–6851 (2020)

2020
[10]

In: ACM SIGGRAPH 2024 Conference Papers

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2D gaussian splatting for geomet- rically accurate radiance fields. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–11. Association for Computing Machinery (2024).https://doi.org/10.1145/ 3641519.3657428

work page arXiv 2024
[11]

Jain, A., Xie, A., Abbeel, P.: VectorFusion: Text-to-SVG by abstracting pixel- baseddiffusionmodels.In:ProceedingsoftheIEEE/CVFConferenceonComputer VisionandPatternRecognition(CVPR).pp.1911–1920(2023).https://doi.org/ 10.1109/CVPR52729.2023.00190

work page doi:10.1109/cvpr52729.2023.00190 1911
[12]

Kerbl, G

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/3592433

work page doi:10.1145/3592433 2023
[13]

ACM Trans

Li, T.M., Lukáč, M., Gharbi, M., Ragan-Kelley, J.: Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics39(6), 1–15 (2020).https://doi.org/10.1145/3414685.3417871

work page doi:10.1145/3414685.3417871 2020
[14]

In: Advances in Neural Information Processing Systems (2025)

Liu, X., Zhou, C., Zhao, N., Huang, S.: Bézier splatting for fast and differentiable vector graphics rendering. In: Advances in Neural Information Processing Systems (2025)

2025
[15]

In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Lopes,R.G.,Ha,D.,Eck,D.,Shlens,J.:Alearnedrepresentationforscalablevector graphics. In: Proceedings of the IEEE/CVF International Conference on Computer 16 J. Lee et al. Vision (ICCV). pp. 7930–7939 (2019).https://doi.org/10.1109/ICCV.2019. 00802

work page doi:10.1109/iccv.2019 2019
[16]

In: 2024 International Conference on 3D Vision (3DV)

Luiten, J.T., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: Track- ing by persistent dynamic view synthesis. In: 2024 International Conference on 3D Vision (3DV). pp. 800–809 (2024).https://doi.org/10.1109/3DV62453.2024. 00044

work page doi:10.1109/3dv62453.2024 2024
[17]

Ma, X., Zhou, Y., Xu, X., Sun, B., Filev, V., Orlov, N., Fu, Y., Shi, H.: To- wardslayer-wiseimagevectorization.In:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). pp. 16314–16323 (2022). https://doi.org/10.1109/CVPR52688.2022.01583

work page doi:10.1109/cvpr52688.2022.01583 2022
[18]

Müller, A

Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics41(4), 1–15 (2022).https://doi.org/10.1145/3528223.3530127

work page doi:10.1145/3528223.3530127 2022
[19]

Flickr image, NASA Goddard Space Flight Center (2012), https://www.flickr.com/photos/gsfc/6760135001, public domain (NASA me- dia usage guidelines)

NASA Goddard Photo and Video: Most amazing high definition image of earth – blue marble 2012. Flickr image, NASA Goddard Space Flight Center (2012), https://www.flickr.com/photos/gsfc/6760135001, public domain (NASA me- dia usage guidelines). Accessed: 2026-05-12

work page arXiv 2012
[20]

Lambourne, Karl D

Reddy, P., Gharbi, M., Lukáč, M., Mitra, N.J.: Im2Vec: Synthesizing vector graph- ics without vector supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7342–7351 (2021). https://doi.org/10.1109/CVPR46437.2021.00726

work page doi:10.1109/cvpr46437.2021.00726 2021
[21]

In: Advances in Neural Information Processing Systems

Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Im- plicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems. vol. 33, pp. 7462–7473 (2020)

2020
[22]

In: International Conference on Learning Representations (ICLR) (2024)

Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: Generative gaussian splatting for efficient 3D content creation. In: International Conference on Learning Representations (ICLR) (2024)

2024
[23]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Wang, Z., Huang, J., Sun, Z., Gong, Y., Cohen-Or, D., Lu, M.: Layered image vec- torization via semantic simplification. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 7728–7738 (2025)

2025
[24]

Capsfusion: Rethinking image-text data at scale

Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4D gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20310–20320 (2024).https://doi.org/10.1109/CVPR52733.2024.01920

work page doi:10.1109/cvpr52733.2024.01920 2024
[25]

IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9508–9520 (2024).https://doi.org/10.1109/ TPAMI.2024.3423382

Xie, X., Zhou, P., Li, H., Lin, Z., Yan, S.: Adan: Adaptive nesterov momentum al- gorithm for faster optimizing deep models. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9508–9520 (2024).https://doi.org/10.1109/ TPAMI.2024.3423382

work page arXiv 2024
[26]

In: Advances in Neural Information Processing Systems

Xing, X., Wang, C., Zhou, H., Zhang, J., Yu, Q., Xu, D.: DiffSketcher: Text guided vector sketch synthesis through latent diffusion models. In: Advances in Neural Information Processing Systems. vol. 36, pp. 15869–15889 (2023)

2023
[27]

Capsfusion: Rethinking image-text data at scale

Xing, X., Zhou, H., Wang, C., Zhang, J., Xu, D., Yu, Q.: SVGDreamer: Text guided SVG generation with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4546–4555 (2024).https://doi.org/10.1109/CVPR52733.2024.00435

work page doi:10.1109/cvpr52733.2024.00435 2024
[28]

Capsfusion: Rethinking image-text data at scale

Yi, T., Fang, J., Wang, J., Wu, G., Xie, L., Zhang, X., Liu, W., Tian, Q., Wang, X.: GaussianDreamer: Fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6796–6807 (2024).https://doi. org/10.1109/CVPR52733.2024.00649 Vector Scaff...

work page doi:10.1109/cvpr52733.2024.00649 2024
[29]

ACM Transactions on Graphics43(4), 1–13 (2024).https://doi.org/10

Zhang, P., Zhao, N., Liao, J.: Text-to-vector generation with neural path represen- tation. ACM Transactions on Graphics43(4), 1–13 (2024).https://doi.org/10. 1145/3658204

2024
[30]

In: 2018 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 586–595 (2018).https://doi.org/10.1109/CVPR.2018.00068

work page doi:10.1109/cvpr.2018.00068 2018
[31]

doi:10.1007/978-3-031- 53274-0_10 Mohammed J

Zhang, X., Ge, X., Xu, T., He, D., Wang, Y., Qin, H., Lu, G., Geng, J., Zhang, J.: GaussianImage: 1000 FPS image representation and compression by 2D gaussian splatting. In: Computer Vision – ECCV 2024. Lecture Notes in Computer Science, vol. 15067, pp. 327–345. Springer (2024).https://doi.org/10.1007/978-3-031- 72673-6_18

work page doi:10.1007/978-3-031- 2024
[32]

In: ACM SIGGRAPH 2025 Conference Papers

Zhang, Y., Li, B., Kuznetsov, A., Jindal, A., Diolatzis, S., Chen, K., Sochenov, A., Kaplanyan, A., Sun, Q.: Image-GS: Content-adaptive image representation via 2D gaussians. In: ACM SIGGRAPH 2025 Conference Papers. pp. 1–11. Association for Computing Machinery (2025).https://doi.org/10.1145/3721238.3730596

work page doi:10.1145/3721238.3730596 2025
[33]

polygon-soup

Zwicker, M., Pfister, H., van Baar, J., Gross, M.: EWA volume splatting. In: Pro- ceedings Visualization, 2001. VIS ’01. pp. 29–36. IEEE Computer Society (2001). https://doi.org/10.5555/601671.601674 18 J. Lee et al. Supplementary Material Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization Jaerin Lee, Kanggeon Lee, Kyoung...

work page doi:10.5555/601671.601674 2001

[1] [1]

In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition Workshops (CVPRW)

Agustsson, E., Timofte, R.: NTIRE 2017 challenge on single image super- resolution: Dataset and study. In: Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition Workshops (CVPRW). pp. 1122–1131 (2017).https://doi.org/10.1109/CVPRW.2017.150

work page doi:10.1109/cvprw.2017.150 2017

[2] [2]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Cao, D., Wang, Z., Echevarria, J., Liu, Y.: SVGformer: Representation learning for continuous vector graphics using transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10093– 10102 (2023)

2023

[3] [3]

Capsfusion: Rethinking image-text data at scale

Chen, Y., Ni, B., Liu, J., Huang, X., Chen, X.: Towards high-fidelity artistic image vectorization via texture-encapsulated shape parameterization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15877–15886 (2024).https://doi.org/10.1109/CVPR52733.2024.01503

work page doi:10.1109/cvpr52733.2024.01503 2024

[4] [4]

ACM Transactions on Graphics42(4), 1– 13 (2023).https://doi.org/10.1145/3592128

Du, Z.J., Kang, L.F., Tan, J., Gingold, Y., Xu, K.: Image vectorization and editing via linear gradient layer decomposition. ACM Transactions on Graphics42(4), 1– 13 (2023).https://doi.org/10.1145/3592128

work page doi:10.1145/3592128 2023

[5] [5]

Dataset (1999), https://r0k.us/graphics/kodak/, accessed: 2026-05-12

Eastman Kodak Company: Kodak lossless true color image suite. Dataset (1999), https://r0k.us/graphics/kodak/, accessed: 2026-05-12

1999

[6] [6]

Capsfusion: Rethinking image-text data at scale

Guédon, A., Lepetit, V.: SuGaR: Surface-aligned gaussian splatting for efficient 3D mesh reconstruction and high-quality mesh rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 5354–5363 (2024).https://doi.org/10.1109/CVPR52733.2024.00512

work page doi:10.1109/cvpr52733.2024.00512 2024

[7] [7]

In: International Conference on Learning Representations (ICLR) (2025)

Guo, M., Wang, B., He, K., Matusik, W.: TetSphere splatting: Representing high- quality geometry with lagrangian volumetric meshes. In: International Conference on Learning Representations (ICLR) (2025)

2025

[8] [8]

Proceedings of the AAAI Conference on Artificial Intelli- gence38(3), 2148–2156 (2024).https://doi.org/10.1609/aaai.v38i3.27987

Hirschorn, O., Jevnisek, A., Avidan, S.: Optimize & reduce: A top-down approach for image vectorization. Proceedings of the AAAI Conference on Artificial Intelli- gence38(3), 2148–2156 (2024).https://doi.org/10.1609/aaai.v38i3.27987

work page doi:10.1609/aaai.v38i3.27987 2024

[9] [9]

In: Ad- vances in Neural Information Processing Systems

Ho, J., Jain, A.N., Abbeel, P.: Denoising diffusion probabilistic models. In: Ad- vances in Neural Information Processing Systems. vol. 33, pp. 6840–6851 (2020)

2020

[10] [10]

In: ACM SIGGRAPH 2024 Conference Papers

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2D gaussian splatting for geomet- rically accurate radiance fields. In: ACM SIGGRAPH 2024 Conference Papers. pp. 1–11. Association for Computing Machinery (2024).https://doi.org/10.1145/ 3641519.3657428

work page arXiv 2024

[11] [11]

Jain, A., Xie, A., Abbeel, P.: VectorFusion: Text-to-SVG by abstracting pixel- baseddiffusionmodels.In:ProceedingsoftheIEEE/CVFConferenceonComputer VisionandPatternRecognition(CVPR).pp.1911–1920(2023).https://doi.org/ 10.1109/CVPR52729.2023.00190

work page doi:10.1109/cvpr52729.2023.00190 1911

[12] [12]

Kerbl, G

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3D gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/3592433

work page doi:10.1145/3592433 2023

[13] [13]

ACM Trans

Li, T.M., Lukáč, M., Gharbi, M., Ragan-Kelley, J.: Differentiable vector graphics rasterization for editing and learning. ACM Transactions on Graphics39(6), 1–15 (2020).https://doi.org/10.1145/3414685.3417871

work page doi:10.1145/3414685.3417871 2020

[14] [14]

In: Advances in Neural Information Processing Systems (2025)

Liu, X., Zhou, C., Zhao, N., Huang, S.: Bézier splatting for fast and differentiable vector graphics rendering. In: Advances in Neural Information Processing Systems (2025)

2025

[15] [15]

In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Lopes,R.G.,Ha,D.,Eck,D.,Shlens,J.:Alearnedrepresentationforscalablevector graphics. In: Proceedings of the IEEE/CVF International Conference on Computer 16 J. Lee et al. Vision (ICCV). pp. 7930–7939 (2019).https://doi.org/10.1109/ICCV.2019. 00802

work page doi:10.1109/iccv.2019 2019

[16] [16]

In: 2024 International Conference on 3D Vision (3DV)

Luiten, J.T., Kopanas, G., Leibe, B., Ramanan, D.: Dynamic 3D gaussians: Track- ing by persistent dynamic view synthesis. In: 2024 International Conference on 3D Vision (3DV). pp. 800–809 (2024).https://doi.org/10.1109/3DV62453.2024. 00044

work page doi:10.1109/3dv62453.2024 2024

[17] [17]

Ma, X., Zhou, Y., Xu, X., Sun, B., Filev, V., Orlov, N., Fu, Y., Shi, H.: To- wardslayer-wiseimagevectorization.In:ProceedingsoftheIEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). pp. 16314–16323 (2022). https://doi.org/10.1109/CVPR52688.2022.01583

work page doi:10.1109/cvpr52688.2022.01583 2022

[18] [18]

Müller, A

Müller, T., Evans, A., Schied, C., Keller, A.: Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics41(4), 1–15 (2022).https://doi.org/10.1145/3528223.3530127

work page doi:10.1145/3528223.3530127 2022

[19] [19]

Flickr image, NASA Goddard Space Flight Center (2012), https://www.flickr.com/photos/gsfc/6760135001, public domain (NASA me- dia usage guidelines)

NASA Goddard Photo and Video: Most amazing high definition image of earth – blue marble 2012. Flickr image, NASA Goddard Space Flight Center (2012), https://www.flickr.com/photos/gsfc/6760135001, public domain (NASA me- dia usage guidelines). Accessed: 2026-05-12

work page arXiv 2012

[20] [20]

Lambourne, Karl D

Reddy, P., Gharbi, M., Lukáč, M., Mitra, N.J.: Im2Vec: Synthesizing vector graph- ics without vector supervision. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7342–7351 (2021). https://doi.org/10.1109/CVPR46437.2021.00726

work page doi:10.1109/cvpr46437.2021.00726 2021

[21] [21]

In: Advances in Neural Information Processing Systems

Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Im- plicit neural representations with periodic activation functions. In: Advances in Neural Information Processing Systems. vol. 33, pp. 7462–7473 (2020)

2020

[22] [22]

In: International Conference on Learning Representations (ICLR) (2024)

Tang, J., Ren, J., Zhou, H., Liu, Z., Zeng, G.: DreamGaussian: Generative gaussian splatting for efficient 3D content creation. In: International Conference on Learning Representations (ICLR) (2024)

2024

[23] [23]

In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR)

Wang, Z., Huang, J., Sun, Z., Gong, Y., Cohen-Or, D., Lu, M.: Layered image vec- torization via semantic simplification. In: Proceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 7728–7738 (2025)

2025

[24] [24]

Capsfusion: Rethinking image-text data at scale

Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W., Liu, W., Tian, Q., Wang, X.: 4D gaussian splatting for real-time dynamic scene rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 20310–20320 (2024).https://doi.org/10.1109/CVPR52733.2024.01920

work page doi:10.1109/cvpr52733.2024.01920 2024

[25] [25]

IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9508–9520 (2024).https://doi.org/10.1109/ TPAMI.2024.3423382

Xie, X., Zhou, P., Li, H., Lin, Z., Yan, S.: Adan: Adaptive nesterov momentum al- gorithm for faster optimizing deep models. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 9508–9520 (2024).https://doi.org/10.1109/ TPAMI.2024.3423382

work page arXiv 2024

[26] [26]

In: Advances in Neural Information Processing Systems

Xing, X., Wang, C., Zhou, H., Zhang, J., Yu, Q., Xu, D.: DiffSketcher: Text guided vector sketch synthesis through latent diffusion models. In: Advances in Neural Information Processing Systems. vol. 36, pp. 15869–15889 (2023)

2023

[27] [27]

Capsfusion: Rethinking image-text data at scale

Xing, X., Zhou, H., Wang, C., Zhang, J., Xu, D., Yu, Q.: SVGDreamer: Text guided SVG generation with diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4546–4555 (2024).https://doi.org/10.1109/CVPR52733.2024.00435

work page doi:10.1109/cvpr52733.2024.00435 2024

[28] [28]

Capsfusion: Rethinking image-text data at scale

Yi, T., Fang, J., Wang, J., Wu, G., Xie, L., Zhang, X., Liu, W., Tian, Q., Wang, X.: GaussianDreamer: Fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 6796–6807 (2024).https://doi. org/10.1109/CVPR52733.2024.00649 Vector Scaff...

work page doi:10.1109/cvpr52733.2024.00649 2024

[29] [29]

ACM Transactions on Graphics43(4), 1–13 (2024).https://doi.org/10

Zhang, P., Zhao, N., Liao, J.: Text-to-vector generation with neural path represen- tation. ACM Transactions on Graphics43(4), 1–13 (2024).https://doi.org/10. 1145/3658204

2024

[30] [30]

In: 2018 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition

Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 586–595 (2018).https://doi.org/10.1109/CVPR.2018.00068

work page doi:10.1109/cvpr.2018.00068 2018

[31] [31]

doi:10.1007/978-3-031- 53274-0_10 Mohammed J

Zhang, X., Ge, X., Xu, T., He, D., Wang, Y., Qin, H., Lu, G., Geng, J., Zhang, J.: GaussianImage: 1000 FPS image representation and compression by 2D gaussian splatting. In: Computer Vision – ECCV 2024. Lecture Notes in Computer Science, vol. 15067, pp. 327–345. Springer (2024).https://doi.org/10.1007/978-3-031- 72673-6_18

work page doi:10.1007/978-3-031- 2024

[32] [32]

In: ACM SIGGRAPH 2025 Conference Papers

Zhang, Y., Li, B., Kuznetsov, A., Jindal, A., Diolatzis, S., Chen, K., Sochenov, A., Kaplanyan, A., Sun, Q.: Image-GS: Content-adaptive image representation via 2D gaussians. In: ACM SIGGRAPH 2025 Conference Papers. pp. 1–11. Association for Computing Machinery (2025).https://doi.org/10.1145/3721238.3730596

work page doi:10.1145/3721238.3730596 2025

[33] [33]

polygon-soup

Zwicker, M., Pfister, H., van Baar, J., Gross, M.: EWA volume splatting. In: Pro- ceedings Visualization, 2001. VIS ’01. pp. 29–36. IEEE Computer Society (2001). https://doi.org/10.5555/601671.601674 18 J. Lee et al. Supplementary Material Vector Scaffolding: Inter-Scale Orchestration for Differentiable Image Vectorization Jaerin Lee, Kanggeon Lee, Kyoung...

work page doi:10.5555/601671.601674 2001