pith. sign in

arxiv: 2602.12280 · v2 · pith:5PMIFUQHnew · submitted 2026-02-12 · 💻 cs.CV

Stroke of Surprise: Progressive Semantic Illusions in Vector Sketching

Pith reviewed 2026-05-21 12:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords strokessemanticillusionsspatialstructuralvectorframeworkinitial
0
0 comments X

The pith

Stroke of Surprise is a framework that generates vector sketches undergoing semantic transformation from one concept to another by adding strokes, using dual-branch SDS and overlay loss for optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The work focuses on making drawings that trick the eye in a new way. Normally, illusions use static images that look different from different angles or when overlapped. Here, the trick happens over time as the drawing is created stroke by stroke. The computer figures out a sequence of lines where the first few lines look like one thing, say a duck, but when more lines are added, it becomes a sheep. This is done by optimizing the strokes so they serve both purposes at different stages. To achieve this, the system uses a special optimization technique called dual-branch Score Distillation Sampling. This helps guide the generation to match two different target concepts at once. They also add an Overlay Loss to make sure the new strokes complement the old ones instead of just hiding them. The result is a sketch that has two valid interpretations depending on how many strokes have been drawn so far. The authors claim their method creates stronger and more recognizable illusions compared to previous approaches. This moves the idea of visual anagrams, which are usually spatial, into the temporal domain of drawing sequences.

Core claim

Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension.

Load-bearing premise

The dual-constraint that initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes, with the existence of a discoverable common structural subspace.

Figures

Figures reproduced from arXiv: 2602.12280 by Huai-Hsun Cheng, Siang-Ling Zhang, Yu-Lun Liu.

Figure 1
Figure 1. Figure 1: Progressive semantic illusions from text. Given a pair of text prompts (a), our method generates a vector sketch that evolves over time. The initial generated sketch (b) depicts the first concept (e.g., “pig”). By adding further generated strokes (c), the drawing is transformed into a totally different object (e.g., “angel”). This creates a Stroke of Surprise: the process subverts the viewer’s expectation … view at source ↗
Figure 2
Figure 2. Figure 2: Challenges in progressive illusion sketching. (a) Raster￾based methods (e.g., Nano Banana Pro) rely on destructive editing, modifying the initial structure to fit the final target and thus vio￾lating the progressive constraint. (b) Vector-based baselines (e.g., SketchDreamer [93] or SketchAgent [110]) employ a greedy strat￾egy, where specific Phase 1 details become semantic noise or clutter in Phase 2. (c)… view at source ↗
Figure 3
Figure 3. Figure 3: Pipeline overview. Our method optimizes a set of learnable stroke parameters, which are divided into prefix strokes Sprefix and delta strokes Sdelta. The optimization process involves two parallel branches. In the top branch, only the prefix strokes are rendered by a differentiable rasterizer to create a partial sketch (e.g., a rabbit). This sketch is then guided by a pre-trained, frozen text-to-image diff… view at source ↗
Figure 5
Figure 5. Figure 5: VLM-based evaluation and ranking pipeline. We em￾ploy GPT-4o to assess the quality of illusion sketches. (a) For Phase 1, the model evaluates the recognizability of the prefix sketch (Sprefix). (b) For Phase 2, the model evaluates the full sketch (Sfull) while simultaneously comparing it against the delta strokes (Sdelta). This comparison ensures that the prefix strokes provide essential structural scaffol… view at source ↗
Figure 6
Figure 6. Figure 6: Multi-phase pipeline. We scale to K phases (e.g., Apple→Sheep→Einstein) using cumulative stroke subsets (S1, . . . , SK). Parallel branches optimize each cumulative sketch I1:i against prompt pi. Joint optimization ensures early strokes receive gradients from all subsequent losses (PL i SDS), creating a structure primed for the entire evolutionary sequence. VLM-based Quality Assessment. We employ GPT-4o to… view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparisons. We compare against SketchDreamer [93], SketchAgent [110], and Nano Banana Pro. (a) SketchDreamer produces noisy strokes, causing severe visual clutter. (b) SketchAgent yields overly abstract results with low recognizability. (c) Nano Banana Pro relies on destructive editing (e.g., overwriting the pig structure to draw an angel), failing the progressive constraint despite high image… view at source ↗
Figure 8
Figure 8. Figure 8: Phase 2 extension with fixed prefix (ours). We evaluate how methods extend a fixed Phase 1 sketch generated by our method. Interestingly, baselines produce better Phase 2 results here than in [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Ablation on optimization strategy. (a) Sequential gen￾eration yields a rigid Phase 1, creating structural conflicts (e.g., the duck’s beak) that fail Phase 2 repurposing. (b) Joint optimization (Ours) identifies a common structural subspace, yielding a ver￾satile Phase 1 where features serve both interpretations (e.g., the beak doubles as the cow’s ear). (d) Scattered (e) Gathered (Ours) (f) Gathered + Sh… view at source ↗
Figure 9
Figure 9. Figure 9: User study. (Left) Preference: Participants overwhelm￾ingly favor our method (green) over baselines across both ranking strategies. (Right) Reliability: A high success rate (¿97%) con￾firms that our pipeline consistently yields valid illusions, ensuring robustness against the inherent stochasticity of the generation pro￾cess. lected our method in 67.7% of GPT-ranking and 87.1% of Metric-ranking cases ( [P… view at source ↗
Figure 12
Figure 12. Figure 12: Ablation of overlay loss (Loverlay). (a) Without Loverlay, the model generates redundant strokes atop existing ones to satisfy the semantic target, resulting in visual clutter (red circle) and high intersection artifacts. (b) With Loverlay, the generated strokes (Sdelta) become spatially complementary to the prefix (Sprefix), avoiding collisions to produce a clean, coherent line drawing. Less strokes More… view at source ↗
Figure 13
Figure 13. Figure 13: Analysis of stroke count. (Top) Simple concepts (horse) form recognizable silhouettes with minimal strokes (8→16). (Bottom) While complex concepts (Einstein) require a larger budget (32→64) to capture essential details. Fewer strokes result in abstraction. Our default (16→32) balances structural sim￾plicity and semantic fidelity. semantic fidelity. 4.4. Applications We demonstrate versatility beyond stand… view at source ↗
Figure 14
Figure 14. Figure 14: Additional 2-phase progressive illusion results produced by our method. apple rabbit pig horse plush bear chef apple sheep Einstein pig rabbit angel rabbit cow greek statue apple chicken angel [PITH_FULL_IMAGE:figures/full_fig_p009_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Additional 3-phase progressive illusion results produced by our method. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Additional qualitative comparisons. CLIP!"#$%&: 33.0; CLIP'()): 31.7; CLIP*#)+,: 16.1 Φ IR!"#$%& : 0.920; Φ IR'()) : 0.629; Φ IR*#)+, : 0.048 HPS!"#$%&: 0.242; HPS'()): 0.224; HPS*#)+,: 0.167 Ranking Score: 0.404; Rank: #1 CLIP!"#$%&: 30.9; CLIP'()): 33.0; CLIP*#)+,: 17.6 Φ IR!"#$%& : 0.858; Φ IR'()) : 0.532; Φ IR*#)+, : 0.050 HPS!"#$%&: 0.222; HPS'()): 0.198; HPS*#)+,: 0.171 Ranking Score: 0.212; Rank: #… view at source ↗
Figure 17
Figure 17. Figure 17: Metric-based ranking. Ranking Score: 0.765; Rank: #1 Phase 1 Score: 0.9 Phase 2 Score: 0.85 Ranking Score: 0.765; Rank: #1 Phase 1 Score: 0.9 Phase 2 Score: 0.85 Ranking Score: 0.7225; Rank: #3 Phase 1 Score: 0.85 Phase 2 Score: 0.85 Ranking Score: 0.7225; Rank: #3 Phase 1 Score: 0.85 Phase 2 Score: 0.85 [PITH_FULL_IMAGE:figures/full_fig_p010_17.png] view at source ↗
Figure 20
Figure 20. Figure 20: Extension on vector graph. carrot rabbit apple pig [PITH_FULL_IMAGE:figures/full_fig_p011_20.png] view at source ↗
read the original abstract

Visual illusions traditionally rely on spatial manipulations such as multi-view consistency. In this work, we introduce Progressive Semantic Illusions, a novel vector sketching task where a single sketch undergoes a dramatic semantic transformation through the sequential addition of strokes. We present Stroke of Surprise, a generative framework that optimizes vector strokes to satisfy distinct semantic interpretations at different drawing stages. The core challenge lies in the "dual-constraint": initial prefix strokes must form a coherent object (e.g., a duck) while simultaneously serving as the structural foundation for a second concept (e.g., a sheep) upon adding delta strokes. To address this, we propose a sequence-aware joint optimization framework driven by a dual-branch Score Distillation Sampling (SDS) mechanism. Unlike sequential approaches that freeze the initial state, our method dynamically adjusts prefix strokes to discover a "common structural subspace" valid for both targets. Furthermore, we introduce a novel Overlay Loss that enforces spatial complementarity, ensuring structural integration rather than occlusion. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, successfully expanding visual anagrams from the spatial to the temporal dimension. Project page: https://stroke-of-surprise.github.io/

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Progressive Semantic Illusions as a new vector sketching task in which a single sketch undergoes semantic transformation via sequential stroke addition. It proposes the Stroke of Surprise framework, which employs sequence-aware joint optimization driven by a dual-branch Score Distillation Sampling (SDS) mechanism to discover a common structural subspace satisfying two distinct semantic targets (e.g., prefix strokes forming a duck that later supports a sheep), together with a novel Overlay Loss to enforce spatial complementarity rather than occlusion. The authors claim that extensive experiments show the method significantly outperforms state-of-the-art baselines in recognizability and illusion strength, thereby extending visual anagrams from the spatial to the temporal domain.

Significance. If the experimental claims hold, the work would be a meaningful contribution to generative computer vision by formalizing and solving a temporal extension of visual anagrams. The dual-branch SDS and Overlay Loss constitute concrete technical advances for handling the dual-constraint problem, and the absence of free parameters in the core optimization is a positive feature. The approach could influence downstream applications in creative tools and interactive illustration if the progressive coherence is robustly demonstrated.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (dual-branch SDS and dual-constraint description): the central claim that prefix strokes achieve independent coherence for the first concept while serving as foundation for the second relies on joint optimization; no explicit loss term (beyond the final Overlay Loss) is described that enforces standalone recognizability of the prefix alone. This leaves open the possibility that observed success arises from simultaneous gradient balancing rather than discovery of a truly independent common structural subspace, which is load-bearing for the dual-constraint formulation.
  2. [§4] §4 (experiments): the strongest claim of significant outperformance in recognizability and illusion strength is presented without reference to specific quantitative metrics, ablation tables isolating the contribution of the dual-branch versus sequential freezing, or user-study protocols for illusion strength. Concrete results (e.g., Table X or Figure Y) are required to substantiate the temporal-expansion claim.
minor comments (2)
  1. [§3.1] Clarify the precise definition and parameterization of 'delta strokes' and the temporal sequencing mechanism to ensure reproducibility of the progressive addition process.
  2. [§5] Add a short discussion of failure modes, such as cases where no common structural subspace exists for the chosen concept pairs.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and have revised the manuscript to improve clarity on the optimization mechanism and to provide explicit experimental details and references.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (dual-branch SDS and dual-constraint description): the central claim that prefix strokes achieve independent coherence for the first concept while serving as foundation for the second relies on joint optimization; no explicit loss term (beyond the final Overlay Loss) is described that enforces standalone recognizability of the prefix alone. This leaves open the possibility that observed success arises from simultaneous gradient balancing rather than discovery of a truly independent common structural subspace, which is load-bearing for the dual-constraint formulation.

    Authors: We appreciate the referee highlighting this point of potential ambiguity. The dual-branch SDS applies the first semantic target’s score distillation loss exclusively to the prefix strokes (enforcing standalone coherence for the initial concept) while the second branch applies the loss to the full stroke sequence. Joint optimization then discovers the common structural subspace by allowing prefix adjustments under both constraints simultaneously. We agree the description could be more explicit and have added a clarifying subsection in §3 that isolates the contribution of each SDS branch to the dual constraints, along with an updated abstract sentence referencing this mechanism. revision: yes

  2. Referee: [§4] §4 (experiments): the strongest claim of significant outperformance in recognizability and illusion strength is presented without reference to specific quantitative metrics, ablation tables isolating the contribution of the dual-branch versus sequential freezing, or user-study protocols for illusion strength. Concrete results (e.g., Table X or Figure Y) are required to substantiate the temporal-expansion claim.

    Authors: We thank the referee for noting the need for clearer substantiation. The submitted manuscript contains quantitative recognizability metrics (CLIP similarity), an ablation comparing dual-branch SDS to sequential freezing, and a user study on illusion strength, but cross-references were insufficient. We have revised §4 to explicitly cite Table 2 (quantitative results), Figure 5 (ablation isolating dual-branch contribution), and the supplementary material (user-study protocol with 100 participants and pairwise comparison design). These additions directly support the outperformance claims and the temporal extension of visual anagrams. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation extends SDS via independent dual-branch optimization and overlay loss without reducing to self-defined inputs.

full rationale

The paper's core contribution is a sequence-aware joint optimization using a dual-branch SDS mechanism plus a novel Overlay Loss to enforce spatial complementarity for progressive semantic illusions. This extends prior SDS work with new components (dual-branch, overlay) to address the dual-constraint of prefix strokes being coherent for both initial and final concepts. No equations or claims reduce by construction to fitted parameters renamed as predictions, self-citations that bear the load of uniqueness, or ansatzes smuggled from prior author work. The method is presented as an engineering extension validated by experiments, remaining self-contained against external benchmarks like standard SDS baselines.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities are detailed. The approach assumes existence of optimizable common structural subspaces for dual targets and relies on prior SDS techniques.

axioms (1)
  • domain assumption Existence of a common structural subspace valid for both semantic targets that can be discovered by dynamic adjustment of prefix strokes
    Central to the dual-constraint and sequence-aware joint optimization described in the abstract.

pith-pipeline@v0.9.0 · 5744 in / 1242 out tokens · 71481 ms · 2026-05-21T12:40:15.820927+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

139 extracted references · 139 canonical work pages · 7 internal anchors

  1. [1]

    Cose: Compositional stroke embeddings

    Emre Aksan, Thomas Deselaers, Andrea Tagliasacchi, and Otmar Hilliges. Cose: Compositional stroke embeddings. Advances in Neural Information Processing Systems, 33: 10041–10052, 2020. 2

  2. [2]

    Abstracting sketches through simple primitives

    Stephan Alaniz, Massimiliano Mancini, Anjan Dutta, Diego Marcos, and Zeynep Akata. Abstracting sketches through simple primitives. InEuropean Conference on Computer Vision, pages 396–412. Springer, 2022. 3

  3. [3]

    As-rigid- as-possible shape interpolation

    Marc Alexa, Daniel Cohen-Or, and David Levin. As-rigid- as-possible shape interpolation. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 165–172. 2023. 3

  4. [4]

    Swiftsketch: A diffusion model for image- to-vector sketch generation

    Ellie Arar, Yarden Frenkel, Daniel Cohen-Or, Ariel Shamir, and Yael Vinker. Swiftsketch: A diffusion model for image- to-vector sketch generation. InProceedings of the Special Interest Group on Computer Graphics and Interactive Tech- niques Conference Conference Papers, pages 1–12, 2025. 2

  5. [5]

    Break-a-scene: Extracting multiple concepts from a single image

    Omri Avrahami, Kfir Aberman, Ohad Fried, Daniel Cohen- Or, and Dani Lischinski. Break-a-scene: Extracting multiple concepts from a single image. InSIGGRAPH Asia 2023 Conference Papers, pages 1–12, 2023. 3

  6. [6]

    4d-fy: Text-to-4d generation using hybrid score distillation sampling

    Sherwin Bahmani, Ivan Skorokhodov, Victor Rong, Gordon Wetzstein, Leonidas Guibas, Peter Wonka, Sergey Tulyakov, Jeong Joon Park, Andrea Tagliasacchi, and David B Lindell. 4d-fy: Text-to-4d generation using hybrid score distillation sampling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7996–8006,

  7. [7]

    Sketchinr: A first look into sketches as implicit neural representations

    Hmrishav Bandyopadhyay, Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Tao Xiang, Timothy Hospedales, and Yi-Zhe Song. Sketchinr: A first look into sketches as implicit neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12565–12574, 2024. 3

  8. [8]

    Feature-based image metamorphosis

    Thaddeus Beier and Shawn Neely. Feature-based image metamorphosis. InSeminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 529–536. 2023. 3

  9. [9]

    How renault uses numerical control for car body design and tooling

    Pierre E B´ezier. How renault uses numerical control for car body design and tooling. Technical report, SAE Technical Paper, 1968. 3

  10. [10]

    #$%&: 33.0; CLIP'()): 31.7; CLIP*#)+,: 16.1ΦIR!

    Ayan Kumar Bhunia, Ayan Das, Umar Riaz Muhammad, Yongxin Yang, Timothy M Hospedales, Tao Xiang, Yulia Gryaditskaya, and Yi-Zhe Song. Pixelor: A competitive 8 bear cat chicken dog cow angel dolphin peacock horse monkey lighthouse firefighter fox cow koala horse rabbit greekstatue sheep pig dog detective flamingo giraffe Figure 14.Additional 2-phase progres...

  11. [11]

    Doodleformer: Creative sketch drawing with transformers

    Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Jorma Laak- sonen, and Michael Felsberg. Doodleformer: Creative sketch drawing with transformers. InEuropean Conference on Computer Vision, pages 338–355. Springer, 2022. 2

  12. [12]

    Sketch2saliency: Learning to detect salient objects from human drawings

    Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Ku- mar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Sketch2saliency: Learning to detect salient objects from human drawings. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2733–2743, 2023. 3

  13. [13]

    Recognition-by-components: a theory of human image understanding.Psychological review, 94(2): 115, 1987

    Irving Biederman. Recognition-by-components: a theory of human image understanding.Psychological review, 94(2): 115, 1987. 3

  14. [14]

    Surface versus edge-based determinants of visual recognition.Cognitive psychology, 20(1):38–64, 1988

    Irving Biederman and Ginny Ju. Surface versus edge-based determinants of visual recognition.Cognitive psychology, 20(1):38–64, 1988. 3

  15. [15]

    Diffusion illusions: Hiding images in plain sight

    Ryan Burgert, Xiang Li, Abe Leite, Kanchana Ranasinghe, and Michael Ryoo. Diffusion illusions: Hiding images in plain sight. InACM SIGGRAPH 2024 Conference Papers, pages 1–11, 2024. 3

  16. [16]

    A computational approach to edge detection

    John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelli- gence, (6):679–698, 2009. 2 10 rabbit elephant chicken monkey Figure 19.Extension on variable-width B- spline. rabbit horse pikachu sunflower Figure 20.Extension on vector graph. carrot rabbit apple pig Figure 21.Extension on colored strokes

  17. [17]

    Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems, 33:16351–16361, 2020

    Alexandre Carlier, Martin Danelljan, Alexandre Alahi, and Radu Timofte. Deepsvg: A hierarchical generative network for vector graphics animation.Advances in Neural Informa- tion Processing Systems, 33:16351–16361, 2020. 2

  18. [18]

    The artist as neuroscientist.Nature, 434 (7031):301–307, 2005

    Patrick Cavanagh. The artist as neuroscientist.Nature, 434 (7031):301–307, 2005. 3

  19. [19]

    Lookingglass: Generative anamor- phoses via laplacian pyramid warping

    Pascal Chang, Sergio Sancho, Jingwei Tang, Markus Gross, and Vinicius Azevedo. Lookingglass: Generative anamor- phoses via laplacian pyramid warping. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 24–33, 2025. 3

  20. [20]

    Denoising likelihood score matching for conditional score-based data generation.arXiv preprint arXiv:2203.14206, 2022

    Chen-Hao Chao, Wei-Fang Sun, Bo-Wun Cheng, Yi-Chen Lo, Chia-Che Chang, Yu-Lun Liu, Yu-Lin Chang, Chia- Ping Chen, and Chun-Yi Lee. Denoising likelihood score matching for conditional score-based data generation.arXiv preprint arXiv:2203.14206, 2022. 3

  21. [21]

    Attend-and-excite: Attention-based se- mantic guidance for text-to-image diffusion models.ACM transactions on Graphics (TOG), 42(4):1–10, 2023

    Hila Chefer, Yuval Alaluf, Yael Vinker, Lior Wolf, and Daniel Cohen-Or. Attend-and-excite: Attention-based se- mantic guidance for text-to-image diffusion models.ACM transactions on Graphics (TOG), 42(4):1–10, 2023. 3

  22. [22]

    Svgbuilder: Component-based colored svg generation with text-guided autoregressive trans- formers

    Zehao Chen and Rong Pan. Svgbuilder: Component-based colored svg generation with text-guided autoregressive trans- formers. InProceedings of the AAAI Conference on Artificial Intelligence, pages 2358–2366, 2025. 2

  23. [23]

    Images that sound: Composing images and sounds on a single canvas

    Ziyang Chen, Daniel Geng, and Andrew Owens. Images that sound: Composing images and sounds on a single canvas. Advances in Neural Information Processing Systems, 37: 85045–85073, 2024. 3

  24. [24]

    Camouflage images

    Hung-Kuo Chu, Wei-Hsin Hsu, Niloy J Mitra, Daniel Cohen- Or, Tien-Tsin Wong, and Tong-Yee Lee. Camouflage images. ACM Trans. Graph., 29(4):51–1, 2010. 3

  25. [25]

    B´eziersketch: A generative model for scal- able vector sketches

    Ayan Das, Yongxin Yang, Timothy Hospedales, Tao Xiang, and Yi-Zhe Song. B´eziersketch: A generative model for scal- able vector sketches. InEuropean conference on computer vision, pages 632–647. Springer, 2020. 2

  26. [26]

    Drawing ap- prentice: An enactive co-creative agent for artistic collabora- tion

    Nicholas Davis, Chih-PIn Hsiao, Kunwar Yashraj Singh, Lisa Li, Sanat Moningi, and Brian Magerko. Drawing ap- prentice: An enactive co-creative agent for artistic collabora- tion. InProceedings of the 2015 ACM SIGCHI Conference on Creativity and Cognition, pages 185–186, 2015. 3

  27. [27]

    Outillages m´ethodes calcul.Andr e Citro en Automobiles SA, Paris, 4:25, 1959

    Paul De Casteljau. Outillages m´ethodes calcul.Andr e Citro en Automobiles SA, Paris, 4:25, 1959. 3

  28. [28]

    Rasp: Revisiting 3d anamor- phic art for shadow-guided packing of irregular objects

    Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar, and Shanmuganathan Raman. Rasp: Revisiting 3d anamor- phic art for shadow-guided packing of irregular objects. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 5849–5858, 2025. 3

  29. [29]

    How do humans sketch objects?ACM Transactions on graphics (TOG), 31 (4):1–10, 2012

    Mathias Eitz, James Hays, and Marc Alexa. How do humans sketch objects?ACM Transactions on graphics (TOG), 31 (4):1–10, 2012. 2, 3

  30. [30]

    Drawing as a versatile cognitive tool.Nature Reviews Psychology, 2(9):556–568, 2023

    Judith E Fan, Wilma A Bainbridge, Rebecca Chamberlain, and Jeffrey D Wammes. Drawing as a versatile cognitive tool.Nature Reviews Psychology, 2(9):556–568, 2023. 3

  31. [31]

    Illusion3d: 3d mul- tiview illusion with 2d diffusion priors.arXiv preprint arXiv:2412.09625, 2024

    Yue Feng, Vaibhav Sanjay, Spencer Lutz, Badour AlBa- har, Songwei Ge, and Jia-Bin Huang. Illusion3d: 3d mul- tiview illusion with 2d diffusion priors.arXiv preprint arXiv:2412.09625, 2024. 3

  32. [32]

    Clipdraw: Ex- ploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022

    Kevin Frans, Lisa Soros, and Olaf Witkowski. Clipdraw: Ex- ploring text-to-drawing synthesis through language-image encoders.Advances in Neural Information Processing Sys- tems, 35:5207–5218, 2022. 2

  33. [33]

    Ptdiffusion: Free lunch for generating optical illusion hidden pictures with phase-transferred diffusion model

    Xiang Gao, Shuai Yang, and Jiaying Liu. Ptdiffusion: Free lunch for generating optical illusion hidden pictures with phase-transferred diffusion model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 18240–18249, 2025. 3

  34. [34]

    Creative sketch generation.arXiv preprint arXiv:2011.10039, 2020

    Songwei Ge, Vedanuj Goswami, C Lawrence Zitnick, and Devi Parikh. Creative sketch generation.arXiv preprint arXiv:2011.10039, 2020. 2

  35. [35]

    Factorized diffusion: Perceptual illusions by noise decomposition

    Daniel Geng, Inbum Park, and Andrew Owens. Factorized diffusion: Perceptual illusions by noise decomposition. In European Conference on Computer Vision, pages 366–384. Springer, 2024. 3

  36. [36]

    Visual ana- grams: Generating multi-view optical illusions with diffu- sion models

    Daniel Geng, Inbum Park, and Andrew Owens. Visual ana- grams: Generating multi-view optical illusions with diffu- sion models. InProceedings of the IEEE/CVF Conference 11 on Computer Vision and Pattern Recognition, pages 24154– 24163, 2024. 2, 3, 5

  37. [37]

    Draw: A recurrent neural network for image generation

    Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. InInternational conference on machine learning, pages 1462–1471. PMLR, 2015. 2

  38. [38]

    Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of dif- fusion models.Advances in Neural Information Processing Systems, 36:15890–15902, 2023

    Yuchao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shun- ing Chang, Weijia Wu, et al. Mix-of-show: Decentralized low-rank adaptation for multi-concept customization of dif- fusion models.Advances in Neural Information Processing Systems, 36:15890–15902, 2023. 3

  39. [39]

    A Neural Representation of Sketch Drawings

    David Ha and Douglas Eck. A neural representation of sketch drawings.arXiv preprint arXiv:1704.03477, 2017. 2

  40. [40]

    Delta denoising score

    Amir Hertz, Kfir Aberman, and Daniel Cohen-Or. Delta denoising score. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pages 2328–2337,

  41. [41]

    Painterly rendering with curved brush strokes of multiple sizes

    Aaron Hertzmann. Painterly rendering with curved brush strokes of multiple sizes. InProceedings of the 25th an- nual conference on Computer graphics and interactive tech- niques, pages 453–460, 1998. 2

  42. [42]

    A survey of stroke-based rendering

    Aaron Hertzmann. A survey of stroke-based rendering. In- stitute of Electrical and Electronics Engineers, 2003. 2

  43. [43]

    Clipscore: A reference-free evaluation met- ric for image captioning, 2022

    Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation met- ric for image captioning, 2022. 5

  44. [44]

    Optimize & reduce: a top-down approach for image vectorization

    Or Hirschorn, Amir Jevnisek, and Shai Avidan. Optimize & reduce: a top-down approach for image vectorization. InPro- ceedings of the AAAI Conference on Artificial Intelligence, pages 2148–2156, 2024. 3

  45. [45]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 3

  46. [46]

    Multi- view wire art.ACM Trans

    Kai-Wen Hsiao, Jia-Bin Huang, and Hung-Kuo Chu. Multi- view wire art.ACM Trans. Graph., 37(6):242, 2018. 3

  47. [47]

    Stroke- based neural painting and stylization with dynamically pre- dicted painting region

    Teng Hu, Ran Yi, Haokun Zhu, Liang Liu, Jinlong Peng, Yabiao Wang, Chengjie Wang, and Lizhuang Ma. Stroke- based neural painting and stylization with dynamically pre- dicted painting region. InProceedings of the 31st ACM International Conference on Multimedia, pages 7470–7480,

  48. [48]

    Voxify3D: Pixel Art Meets Volumetric Rendering

    Yi-Chuan Huang, Jiewen Chan, Hao-Jen Chien, and Yu-Lun Liu. V oxify3d: Pixel art meets volumetric rendering.arXiv preprint arXiv:2512.07834, 2025. 3

  49. [49]

    Learning to paint with model-based deep reinforcement learning

    Zhewei Huang, Wen Heng, and Shuchang Zhou. Learning to paint with model-based deep reinforcement learning. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8709–8718, 2019. 3

  50. [50]

    A collaborative, interactive and context-aware drawing agent for co-creative design.IEEE Transactions on Visualization and Computer Graphics, 30(8):5525–5537, 2023

    Francisco Ibarrola, Tomas Lawton, and Kazjon Grace. A collaborative, interactive and context-aware drawing agent for co-creative design.IEEE Transactions on Visualization and Computer Graphics, 30(8):5525–5537, 2023. 3

  51. [51]

    Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4): 1–11, 2023

    Shir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, and Ariel Shamir. Word-as-image for semantic typography.ACM Transactions on Graphics (TOG), 42(4): 1–11, 2023. 3

  52. [52]

    Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models

    Ajay Jain, Amber Xie, and Pieter Abbeel. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1911–1920, 2023. 2, 3

  53. [53]

    Mcˆ 2: Multi- concept guidance for customized multi-concept generation

    Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, Wenbo Li, Renjing Pei, Fan Li, and Wangmeng Zuo. Mcˆ 2: Multi- concept guidance for customized multi-concept generation. InProceedings of the Computer Vision and Pattern Recogni- tion Conference, pages 2802–2812, 2025. 3

  54. [54]

    Quick, draw! the data.dataset for online game Quick, Draw, 2016

    Jonas Jongejan, Henry Rowley, Takashi Kawashima, Jong- min Kim, and Nick Fox-Gieg. Quick, draw! the data.dataset for online game Quick, Draw, 2016. 2

  55. [55]

    On the temporality for sketch representation learning.arXiv preprint arXiv:2512.04007, 2025

    Marcelo Isaias de Moraes Junior and Moacir Antonelli Ponti. On the temporality for sketch representation learning.arXiv preprint arXiv:2512.04007, 2025. 2

  56. [56]

    Orga- nization in vision: Essays on gestalt perception.(No Title),

    Gaetano Kanizsa, Paolo Legrenzi, and Paolo Bozzi. Orga- nization in vision: Essays on gestalt perception.(No Title),

  57. [57]

    Creative sketching partner: an analysis of human-ai co-creativity

    Pegah Karimi, Jeba Rezwana, Safat Siddiqui, Mary Lou Maher, and Nasrin Dehbozorgi. Creative sketching partner: an analysis of human-ai co-creativity. InProceedings of the 25th international conference on intelligent user interfaces, pages 221–230, 2020. 3

  58. [58]

    Noise-free score distillation.arXiv preprint arXiv:2310.17590, 2023

    Oren Katzir, Or Patashnik, Daniel Cohen-Or, and Dani Lischinski. Noise-free score distillation.arXiv preprint arXiv:2310.17590, 2023. 3

  59. [59]

    Stealthattack: Robust 3d gaussian splatting poisoning via density-guided illusions

    Bo-Hsu Ke, You-Zhe Xie, Yu-Lun Liu, and Wei-Chen Chiu. Stealthattack: Robust 3d gaussian splatting poisoning via density-guided illusions. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27400– 27411, 2025. 3

  60. [60]

    Collaborative score dis- tillation for consistent visual synthesis.arXiv preprint arXiv:2307.04787, 2023

    Subin Kim, Kyungmin Lee, June Suk Choi, Jongheon Jeong, Kihyuk Sohn, and Jinwoo Shin. Collaborative score dis- tillation for consistent visual synthesis.arXiv preprint arXiv:2307.04787, 2023. 3

  61. [61]

    How to handle sketch-abstraction in sketch-based image retrieval? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16859–16869, 2024

    Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. How to handle sketch-abstraction in sketch-based image retrieval? InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16859–16869, 2024. 3

  62. [62]

    Posterior dis- tillation sampling

    Juil Koo, Chanho Park, and Minhyuk Sung. Posterior dis- tillation sampling. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 13352–13361, 2024. 3

  63. [63]

    Multi-concept customization of text- to-image diffusion

    Nupur Kumari, Bingliang Zhang, Richard Zhang, Eli Shecht- man, and Jun-Yan Zhu. Multi-concept customization of text- to-image diffusion. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 1931–1941, 2023. 3

  64. [64]

    Drawing with reframer: Emergence and control in co-creative ai

    Tomas Lawton, Francisco J Ibarrola, Dan Ventura, and Kazjon Grace. Drawing with reframer: Emergence and control in co-creative ai. InProceedings of the 28th Inter- national Conference on Intelligent User Interfaces, pages 264–277, 2023. 3

  65. [65]

    Skyfall-gs: Synthe- sizing immersive 3d urban scenes from satellite imagery

    Jie-Ying Lee, Yi-Ruei Liu, Shr-Ruei Tsai, Wei-Cheng Chang, Chung-Ho Wu, Jiewen Chan, Zhenjun Zhao, 12 Chieh Hubert Lin, and Yu-Lun Liu. Skyfall-gs: Synthe- sizing immersive 3d urban scenes from satellite imagery. arXiv preprint arXiv:2510.15869, 2025. 3

  66. [66]

    Universal sketch perceptual grouping

    Ke Li, Kaiyue Pang, Jifei Song, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Honggang Zhang. Universal sketch perceptual grouping. InProceedings of the european conference on computer vision (ECCV), pages 582–597,

  67. [67]

    Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020

    Tzu-Mao Li, Michal Luk´aˇc, Micha¨el Gharbi, and Jonathan Ragan-Kelley. Differentiable vector graphics rasterization for editing and learning.ACM Transactions on Graphics (TOG), 39(6):1–15, 2020. 3

  68. [68]

    Luciddreamer: Towards high- fidelity text-to-3d generation via interval score matching

    Yixun Liang, Xin Yang, Jiantao Lin, Haodong Li, Xiao- gang Xu, and Yingcong Chen. Luciddreamer: Towards high- fidelity text-to-3d generation via interval score matching. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6517–6526, 2024. 3

  69. [69]

    Magic3d: High-resolution text-to-3d content creation

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 300–309, 2023. 3

  70. [70]

    Sketch-bert: Learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt

    Hangyu Lin, Yanwei Fu, Xiangyang Xue, and Yu-Gang Jiang. Sketch-bert: Learning sketch bidirectional encoder representation from transformers by self-supervised learning of sketch gestalt. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 6758–6767, 2020. 2

  71. [71]

    Sketchgan: Joint sketch com- pletion and recognition with generative adversarial network

    Fang Liu, Xiaoming Deng, Yu-Kun Lai, Yong-Jin Liu, Cuixia Ma, and Hongan Wang. Sketchgan: Joint sketch com- pletion and recognition with generative adversarial network. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5830–5839, 2019. 2

  72. [72]

    Compositional visual generation with composable diffusion models

    Nan Liu, Shuang Li, Yilun Du, Antonio Torralba, and Joshua B Tenenbaum. Compositional visual generation with composable diffusion models. InEuropean conference on computer vision, pages 423–439. Springer, 2022. 3

  73. [73]

    Paint transformer: Feed forward neural painting with stroke prediction

    Songhua Liu, Tianwei Lin, Dongliang He, Fu Li, Ruifeng Deng, Xin Li, Errui Ding, and Hao Wang. Paint transformer: Feed forward neural painting with stroke prediction. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6598–6607, 2021. 3

  74. [74]

    B\’ezier splatting for fast and differentiable vector graphics rendering.arXiv preprint arXiv:2503.16424, 2025

    Xi Liu, Chaoyi Zhou, Nanxuan Zhao, and Siyu Huang. B\’ezier splatting for fast and differentiable vector graphics rendering.arXiv preprint arXiv:2503.16424, 2025. 3

  75. [75]

    SyncDreamer: Generating Multiview-consistent Images from a Single-view Image

    Yuan Liu, Cheng Lin, Zijiao Zeng, Xiaoxiao Long, Lingjie Liu, Taku Komura, and Wenping Wang. Syncdreamer: Gen- erating multiview-consistent images from a single-view im- age.arXiv preprint arXiv:2309.03453, 2023. 3

  76. [76]

    Cones: Concept neurons in diffusion models for customized generation.arXiv preprint arXiv:2303.05125, 2023

    Zhiheng Liu, Ruili Feng, Kai Zhu, Yifei Zhang, Kecheng Zheng, Yu Liu, Deli Zhao, Jingren Zhou, and Yang Cao. Cones: Concept neurons in diffusion models for customized generation.arXiv preprint arXiv:2303.05125, 2023. 3

  77. [77]

    A learned representation for scalable vec- tor graphics

    Raphael Gontijo Lopes, David Ha, Douglas Eck, and Jonathon Shlens. A learned representation for scalable vec- tor graphics. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7930–7939, 2019. 3

  78. [78]

    Score distillation via reparametrized ddim.Advances in Neural Information Pro- cessing Systems, 37:26011–26044, 2024

    Artem Lukoianov, Haitz S ´aez de Oc ´ariz Borde, Kristjan Greenewald, Vitor Guizilini, Timur Bagautdinov, Vincent Sitzmann, and Justin M Solomon. Score distillation via reparametrized ddim.Advances in Neural Information Pro- cessing Systems, 37:26011–26044, 2024. 3

  79. [79]

    Shadow- draw: From any object to shadow-drawing compositional art.arXiv preprint arXiv:2512.05110, 2025

    Rundong Luo, Noah Snavely, and Wei-Chiu Ma. Shadow- draw: From any object to shadow-drawing compositional art.arXiv preprint arXiv:2512.05110, 2025. 2, 5

  80. [80]

    Towards layer- wise image vectorization

    Xu Ma, Yuqian Zhou, Xingqian Xu, Bin Sun, Valerii Filev, Nikita Orlov, Yun Fu, and Humphrey Shi. Towards layer- wise image vectorization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16314–16323, 2022. 3

Showing first 80 references.