SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation

Chaitat Utintu; Yi-Zhe Song

arxiv: 2405.18716 · v3 · submitted 2024-05-29 · 💻 cs.CV

SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation

Chaitat Utintu , Yi-Zhe Song This is my paper

Pith reviewed 2026-05-24 00:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords sketch colourisationdiffusion inversionlatent blendingself-attentiontraining-free editingregion-based controlimage composition

0 comments

The pith

SketchDeco enables precise sketch colourisation by painting user colours into masked regions via diffusion inversion then blending with custom self-attention, all without training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to give artists exact control over sketch colourisation through simple masks and colour palettes rather than tedious manual work or vague text prompts. It recasts the task as a training-free composition problem solved entirely in latent space. Diffusion inversion first paints the chosen colours into the specified regions with precision. A custom self-attention step then merges these local changes into a single consistent image. The result is local colour accuracy together with global visual harmony produced in 15-20 steps on ordinary hardware.

Core claim

The central claim is that a guided latent-space blending process, which first uses diffusion inversion to paint user-defined colours into specified regions and then applies a custom self-attention mechanism to integrate these edits with a globally consistent base image, delivers both local colour fidelity and global harmony without any model fine-tuning.

What carries the argument

The guided latent-space blending process that combines diffusion inversion for precise local colour application with a custom self-attention mechanism for harmonious global integration.

If this is right

Artists obtain direct spatial and chromatic control through masks and palettes.
Local edits remain faithful while the full image stays visually coherent.
No fine-tuning or extra training data is needed.
Professional results appear after 15-20 inference steps on consumer GPUs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inversion-plus-attention pattern could support region edits on photographs or other image types.
Attention customisation might replace fine-tuning for many composition tasks inside pre-trained diffusion models.
Automatic mask generation from text or sketches could reduce the remaining user effort further.

Load-bearing premise

A custom self-attention mechanism will reliably produce harmonious global blending from local diffusion-inversion edits across varied sketches and colour palettes without any fine-tuning.

What would settle it

Generate outputs on a collection of complex sketches containing multiple adjacent colour regions and check whether the masked areas retain their exact assigned colours while the overall image shows consistent lighting and style.

Figures

Figures reproduced from arXiv: 2405.18716 by Chaitat Utintu, Yi-Zhe Song.

**Figure 2.** Figure 2: Framework Overview. Given an input sketch and region-specific colour palettes with corresponding masks, our method employs a divide-and-conquer strategy consisting of two sequential stages: Global and Local Sketch Colourisations. 3.1. Background Pixel Space Diffusion Models: Diffusion Probabilistic Models (DPMs) [18, 27, 63] approximate a data distribution p(x) by iteratively denoising a Gaussian-distribu… view at source ↗

**Figure 3.** Figure 3: Global and Local Sketch Colourisation Stages. (a) In global stage 3.2, given a sketch \protect \mathcal {S} and colour palettes \ifmmode \lbrace \else \textbraceleft \fi \mathcal {P_H}\}_{i=1}^{n} , BLIP-2 [38] infers sketch class semantics, a K-D Tree [5] maps palette hexcodes to colour names, and Scribble ControlNet [83] generates globally colourised results \ifmmode \lbrace \else \textbraceleft \fi \mat… view at source ↗

**Figure 4.** Figure 4: 3D search space constructed using K-D Tree algorithm. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 6.** Figure 6: Initial Gaussian noise incorporation in latent space. [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative results on diverse datasets (global). Our method outperforms SOTA techniques, showing better sketch fidelity, colour vividness, realism, and overall colourisation quality. 4.1. Qualitative Evaluation Local Colourisation: Compared with reference-based methods, our approach yields more realistic results with vivid colours and stronger fidelity to input sketch and reference local colours (see [… view at source ↗

**Figure 8.** Figure 8: Qualitative evaluation on diverse datasets. (local) Our method outperforms SOTA techniques, achieving more accurate local colour propagation, and enhanced sketch fidelity. By effectively integrating local reference cues through the proposed two-stage pipeline (see Sec. 3), it produces coherent and realistic colourisation with improved consistency and visual quality across diverse styles and datasets [PITH… view at source ↗

**Figure 9.** Figure 9: Qualitative evaluation on in-the-wild sketches. We randomly selected sketches from www.freepik.com using the keywords “black-and-white [class] sketch.” Our sketch colourisation technique allows users to specify colour palettes and corresponding region masks (using Photoshop), with generative priors embedded in diffusion model determining the best way to apply the chosen palettes. Comparison of Attention Ma… view at source ↗

**Figure 11.** Figure 11: Trade-off between harmonisation and faithfulness. A low τ = 0.0 gives better harmonisation of colours but lacks faithfulness (e.g., missing regions in the wings of ‘butterfly’) whereas a high τ = 1.0 gives highly faithful image generation but lacks harmonisation of colours (e.g., the red patch on “cow”). Benefit of Scaling Factor \tau [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 10.** Figure 10: Contributions of Attention Maps. (i) No injection (without attention map injection), (ii-vi) Injection A-E (different combinations of attention map injection: A = SA(\scriptstyle {\mathcal {I}}^{\ast } )+CA(\scriptstyle {{\mathcal {I}}}^{\ast } , \scriptstyle {\mathcal {I}}^{G}_{\mathcal {M}} ), B = SA(\scriptstyle {\mathcal {I}}^{\ast } )+SA(\scriptstyle {\mathcal {I}}^{G}_{\mathcal {M}} ), C = SA(\scrip… view at source ↗

read the original abstract

We introduce SketchDeco, a training-free approach to sketch colourisation that bridges the gap between professional design needs and intuitive, region-based control. Our method empowers artists to use simple masks and colour palettes for precise spatial and chromatic specification, avoiding both the tediousness of manual assignment and the ambiguity of text-based prompts. We reformulate this task as a novel, training-free composition problem. Our core technical contribution is a guided latent-space blending process: we first leverage diffusion inversion to precisely ``paint'' user-defined colours into specified regions, and then use a custom self-attention mechanism to harmoniously blend these local edits with a globally consistent base image. This ensures both local colour fidelity and global harmony without requiring any model fine-tuning. Our system produces high-quality results in 15--20 inference steps on consumer GPUs, making professional-quality, controllable colourisation accessible.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SketchDeco, a training-free method for precise sketch colourisation. It reformulates the task as latent-space composition: diffusion inversion is used to inject user-specified colours into masked regions, after which a custom self-attention operator blends the local edits into a globally consistent base image, claiming both local fidelity and global harmony in 15-20 steps without any fine-tuning.

Significance. If the custom self-attention operator demonstrably enforces harmonious blending from local inversion edits across varied sketches and palettes, the work would supply a practical, training-free interface for region-based control that avoids both manual painting and prompt ambiguity, with direct utility for design workflows on consumer hardware.

major comments (2)

[Abstract / core technical contribution paragraph] The central claim that the custom self-attention mechanism produces reliable global harmony without fine-tuning or dataset-specific tuning is load-bearing yet unsupported by any derivation, pseudocode, or ablation; the abstract presents the operator as solving the blending problem but supplies no concrete formulation or comparison to standard cross-attention or feature blending.
[Abstract] No quantitative metrics, ablation studies, or failure-case analysis are reported, so the assertion of 'high-quality results' and the training-free guarantee cannot be evaluated; this directly affects verifiability of the guided latent-space blending pipeline.

minor comments (1)

[Abstract] The phrase 'consumer GPUs' is used without specifying VRAM, batch size, or exact timing measurements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on the manuscript. We address each major comment below, providing clarifications on the technical details and committing to revisions where the presentation can be strengthened.

read point-by-point responses

Referee: [Abstract / core technical contribution paragraph] The central claim that the custom self-attention mechanism produces reliable global harmony without fine-tuning or dataset-specific tuning is load-bearing yet unsupported by any derivation, pseudocode, or ablation; the abstract presents the operator as solving the blending problem but supplies no concrete formulation or comparison to standard cross-attention or feature blending.

Authors: The abstract is written as a concise overview. The concrete formulation of the custom self-attention operator (a modified self-attention that selectively blends query-key-value features from the colour-inverted latent with those of the base image to enforce local fidelity while preserving global consistency), its derivation from standard attention, pseudocode (Algorithm 1), and direct comparisons to cross-attention and feature blending are fully detailed in Section 3.2 with supporting ablations in Section 4.2. We will revise the abstract to include a brief mathematical description of the operator for improved clarity. revision: partial
Referee: [Abstract] No quantitative metrics, ablation studies, or failure-case analysis are reported, so the assertion of 'high-quality results' and the training-free guarantee cannot be evaluated; this directly affects verifiability of the guided latent-space blending pipeline.

Authors: The current manuscript prioritises qualitative evaluation across diverse sketches, masks, and palettes to highlight the training-free practicality and visual fidelity on consumer hardware. Standard quantitative metrics for colour harmony are limited and often subjective; however, we acknowledge that explicit ablations and failure-case analysis would strengthen verifiability. In the revision we will expand Section 4 with additional ablation studies on the self-attention blending parameters, include perceptual/user-study metrics where feasible, and add a dedicated discussion of observed failure modes. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method presented as procedural composition without reduction to fitted inputs or self-citations.

full rationale

The paper describes a training-free latent blending process relying on diffusion inversion followed by a custom self-attention operator. No equations, parameters, or claims in the provided text reduce a prediction or result to its own inputs by construction. The approach is framed as a novel procedural pipeline rather than a fitted or self-defined quantity, with no load-bearing self-citations or ansatzes invoked from prior author work. This is the expected honest non-finding for a methods paper whose central claim is an algorithmic composition rather than a derived statistical result.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on standard properties of pre-trained diffusion models and inversion; the custom attention operator is introduced without additional learned parameters in the abstract description.

axioms (2)

domain assumption Diffusion inversion can be used to insert user-specified colors into designated latent regions while preserving structural information from the sketch.
Invoked as the first step of the guided blending process.
ad hoc to paper A modified self-attention mechanism can enforce global harmony without retraining the underlying diffusion model.
Central to the second stage of the method.

pith-pipeline@v0.9.0 · 5675 in / 1231 out tokens · 22851 ms · 2026-05-24T00:58:27.262191+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

93 extracted references · 93 canonical work pages · 6 internal anchors

[1]

co / xinsir/anime- painter

Anime Painter.https : / / huggingface . co / xinsir/anime- painter. Accessed: 2024-10-06. 6, 7

work page 2024
[2]

Accessed: 2024-10-06

CounterfeitXL.https://huggingface.co/gsdf/ CounterfeitXL. Accessed: 2024-10-06. 6, 7

work page 2024
[3]

Accessed: 2024-03-03

CSS Color Module Level 3.https://www.w3.org/ TR/css-color-3/. Accessed: 2024-03-03. 3, 6, 14

work page 2024
[4]

Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo

Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David K. Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo. Color- ing with Words: Guiding Image Colorization Through Text- based Palette Generation. InECCV, 2018. 2

work page 2018
[5]

Multidimensional binary search trees used for associative searching.Communications of the ACM,

Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Communications of the ACM,

work page
[6]

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, and Yi-Zhe Song. Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches. InCVPR, 2022. 2

work page 2022
[7]

Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings

Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi- Zhe Song. Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings. InCVPR, 2023. 2

work page 2023
[8]

Palette-based Photo Recoloring

Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. Palette-based Photo Recoloring. In SIGGRAPH, 2015. 2

work page 2015
[9]

L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer

Zheng Chang, Shuchen Weng, Yu Li, Si Li, and Boxin Shi. L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. InECCV, 2022. 2

work page 2022
[10]

Automatic Image Colorization Via Multimodal Predictions

Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch¨olkopf. Automatic Image Colorization Via Multimodal Predictions. InECCV, 2008. 2

work page 2008
[11]

Language-Based Image Editing with Recur- rent Attentive Models

Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, and Xiaodong Liu. Language-Based Image Editing with Recur- rent Attentive Models. InCVPR, 2018. 1, 2

work page 2018
[12]

SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis

Wengling Chen and James Hays. SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis. InCVPR,

work page
[13]

Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model

Shin-I Cheng, Yu-Jie Chen, Wei-Chen Chiu, Hung-Yu Tseng, and Hsin-Ying Lee. Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model. In WACV, 2023. 2, 14

work page 2023
[14]

Controllable Image Synthesis via SegV AE

Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, and Ming-Hsuan Yang. Controllable Image Synthesis via SegV AE. InECCV,

work page
[15]

Deep Col- orization

Zezhou Cheng, Qingxiong Yang, and Bin Sheng. Deep Col- orization. InICCV, 2015. 14

work page 2015
[16]

StarGAN v2: Diverse Image Synthesis for Multiple Do- mains

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse Image Synthesis for Multiple Do- mains. InCVPR, 2020. 5, 6, 13

work page 2020
[17]

Automatic Controllable Colorization via Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, and Chenyang Lei. Automatic Controllable Colorization via Imagination. In CVPR, 2024. 3, 4

work page 2024
[18]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion Models Beat GANs on Image Synthesis. InNeurIPS, 2021. 3

work page 2021
[19]

Williams, John Winn, and Andrew Zisserman

Mark Everingham, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.Int. J. Comput. Vision, 2010. 5, 6

work page 2010
[20]

A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts

Beck Fabian, Dachsbacher Carsten, and Sadlo Filip. A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts. InVMV, 2018. 2

work page 2018
[21]

Comicolorization: Semi-Automatic Manga Colorization

Chie Furusawa, Kazuyuki Hiroshiba, Keisuke Ogaki, and Yuri Odagiri. Comicolorization: Semi-Automatic Manga Colorization. InSIGGRAPH, 2017. 2

work page 2017
[22]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. In NeurIPS, 2014. 2

work page 2014
[23]

CoGS: Controllable Generation and Search from Sketch and Style

Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, and John Collomosse. CoGS: Controllable Generation and Search from Sketch and Style. InECCV, 2022. 2

work page 2022
[24]

Prompt-to-Prompt Image Editing with Cross Attention Control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-Prompt Image Editing with Cross Attention Control. InICLR, 2023. 5, 6

work page 2023
[25]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InNeurIPS, 2017. 6

work page 2017
[26]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-Free Diffusion Guidance. InNeurIPS Workshop on Deep Generative Mod- els and Downstream Applications, 2021. 2

work page 2021
[27]

Denoising Dif- fusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InNeurIPS, 2020. 2, 3

work page 2020
[28]

Composer: Creative and Controllable Im- age Synthesis with Composable Conditions

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. Composer: Creative and Controllable Im- age Synthesis with Composable Conditions. InICML, 2023. 2

work page 2023
[29]

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwang Hee Lee. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. InICLR, 2020. 2

work page 2020
[30]

Dynamic Closest Color Warp- ing to Sort and Compare Palettes

Suzi Kim and Sunghee Choi. Dynamic Closest Color Warp- ing to Sort and Compare Palettes. InACM TOG, 2021. 6, 7

work page 2021
[31]

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.arXiv preprint arXiv:2305.15194, 2023

Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, and Namhyuk Ahn. DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.arXiv preprint arXiv:2305.15194, 2023. 3, 6, 7, 14

work page arXiv 2023
[32]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-Encoding Vari- ational Bayes.arXiv preprint arXiv:1312.6114, 2013. 2

work page internal anchor Pith review Pith/arXiv arXiv 2013
[33]

Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches. InCVPR, 2023. 2 9

work page 2023
[34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural net- works. InNeurIPS, 2012. 6

work page 2012
[35]

Diverse Image- to-Image Translation via Disentangled Representations

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Ma- neesh Kumar Singh, and Ming-Hsuan Yang. Diverse Image- to-Image Translation via Disentangled Representations. In ECCV, 2018. 2

work page 2018
[36]

Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence

Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence. InCVPR, 2020. 1, 2

work page 2020
[37]

Colorization Using Optimization

Anat Levin, Dani Lischinski, and Yair Weiss. Colorization Using Optimization. InSIGGRAPH, 2004. 2

work page 2004
[38]

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.arXiv preprint arXiv:2301.12597, 2023. 3, 4, 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

GLIGEN: Open-Set Grounded Text-to-Image Generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. GLIGEN: Open-Set Grounded Text-to-Image Generation. In CVPR, 2023. 2

work page 2023
[40]

Self-Supervised Sketch-to-Image Synthesis

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed El- gammal. Self-Supervised Sketch-to-Image Synthesis. In AAAI, 2021. 2

work page 2021
[41]

Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual Instruction Tuning.arXiv preprint arXiv:2304.08485,

work page internal anchor Pith review Pith/arXiv arXiv
[42]

Unsupervised Sketch- to-Photo Synthesis

Runtao Liu, Qian Yu, and Stella Yu. Unsupervised Sketch- to-Photo Synthesis. InECCV, 2020. 2

work page 2020
[43]

In- trinsic colorization

Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, Chi-Sing Leung, and Pheng-Ann Heng. In- trinsic colorization. InSIGGRAPH, 2008. 2

work page 2008
[44]

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, and Ping Luo. MangaNinja: Line Art Colorization with Precise Reference Following. InCVPR, 2025. 2, 6, 7

work page 2025
[45]

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models.arXiv preprint arXiv:2211.01095, 2023. 2, 4, 5, 6, 8

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition. InICCV, 2023. 2, 4, 5, 6

work page 2023
[47]

Image Generation from Sketch Constraint Using Con- textual GAN

Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. Image Generation from Sketch Constraint Using Con- textual GAN. InECCV, 2018. 2

work page 2018
[48]

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. InCVPR,

work page
[49]

SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions. InICLR, 2022. 13

work page 2022
[50]

FreeCon- trol: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.arXiv preprint arXiv:2312.07536, 2023

Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou. FreeCon- trol: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.arXiv preprint arXiv:2312.07536, 2023. 2

work page arXiv 2023
[51]

Null-text Inversion for Editing Real Images using Guided Diffusion Models.arXiv preprint arXiv:2211.09794, 2022

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text Inversion for Editing Real Images using Guided Diffusion Models.arXiv preprint arXiv:2211.09794, 2022. 2, 5, 8

work page arXiv 2022
[52]

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2I- Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.arXiv preprint arXiv:2302.08453, 2023. 2, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023
[53]

Semantic Image Synthesis with Spatially-Adaptive Normalization

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic Image Synthesis with Spatially-Adaptive Normalization. InCVPR, 2019. 2

work page 2019
[54]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. InICML,

work page
[55]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR, 2022. 1, 2, 3, 5, 6, 7, 14

work page 2022
[56]

U- Net: Convolutional Networks for Biomedical Image Seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional Networks for Biomedical Image Seg- mentation. InMICCAI, 2015. 2, 3

work page 2015
[57]

Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Pho- torealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. InNeurIPS, 2022. 1

work page 2022
[58]

Sketch3T: Test-time Training for Zero-Shot SBIR

Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Sketch3T: Test-time Training for Zero-Shot SBIR. InCVPR,

work page
[59]

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. InCVPR, 2017. 1

work page 2017
[60]

A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch

Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, and James Hays. A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch. InECCV, 2022. 4

work page 2022
[61]

FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023

Chenyang Si, Ziqi Huang, Yuming Jiang, and Ziwei Liu. FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023. 2

work page arXiv 2023
[62]

Weiss, Niru Mah- eswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. InICML,

work page
[63]

Denois- ing Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing Diffusion Implicit Models. InICLR, 2021. 2, 3, 5

work page 2021
[64]

Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equa- tions. InICLR, 2021. 5 10

work page 2021
[65]

Pixel Difference Net- works for Efficient Edge Detection

Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietik ¨ainen, and Li Liu. Pixel Difference Net- works for Efficient Edge Detection. InICCV, 2021. 6

work page 2021
[66]

Rethinking the In- ception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the In- ception Architecture for Computer Vision. InCVPR, 2016. 6

work page 2016
[67]

Plug-and-Play Diffusion Features for Text- Driven Image-to-Image Translation.arXiv preprint arXiv:2211.12572, 2022

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-Play Diffusion Features for Text- Driven Image-to-Image Translation.arXiv preprint arXiv:2211.12572, 2022. 5

work page arXiv 2022
[68]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. InNeurIPS, 2017. 2, 3, 4

work page 2017
[69]

Sketch-Guided Text-to-Image Diffusion Models

Andrey V oynov, Kfir Aberman, and Daniel Cohen-Or. Sketch-Guided Text-to-Image Diffusion Models. InSIG- GRAPH, 2023. 2

work page 2023
[70]

EDICT: Ex- act Diffusion Inversion via Coupled Transformations

Bram Wallace, Akash Gokul, and Nikhil Naik. EDICT: Ex- act Diffusion Inversion via Coupled Transformations. In CVPR, 2023. 5

work page 2023
[71]

GIT: A Generative Image-to-text Transformer for Vision and Language

Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. GIT: A Generative Image-to-text Transformer for Vision and Language.arXiv preprint arXiv:2205.14100, 2022. 13

work page internal anchor Pith review Pith/arXiv arXiv 2022
[72]

DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models

Qiang Wang, Di Kong, Fengyin Lin, and Yonggang Qi. DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models. InBMVC, 2022. 2

work page 2022
[73]

Sketch Your Own GAN

Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. Sketch Your Own GAN. InICCV, 2021. 2

work page 2021
[74]

L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions

Shuchen Weng, Hao Wu, Zheng Chang, Jiajun Tang, Si Li, and Boxin Shi. L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions. InAAAI, 2022. 2

work page 2022
[75]

Self-driven dual-path learning for reference-based line art colorization under limited data

Shukai Wu, Xiao Yan, Weiming Liu, Shuchang Xu, and Sanyuan Zhang. Self-driven dual-path learning for reference-based line art colorization under limited data. In IEEE TCSVT. IEEE, 2023. 2

work page 2023
[76]

FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes

Shukai Wu, Yuhang Yang, Shuchang Xu, Weiming Liu, Xiao Yan, and Sanyuan Zhang. FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes. InACM MM, pages 8662–8673, 2023. 2

work page 2023
[77]

Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior

Yanze Wu, Xintao Wang, Yu Li, Honglun Zhang, Xun Zhao, and Ying Shan. Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior. InICCV, 2021. 2, 14

work page 2021
[78]

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, and Joyce Chai. CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation. InNeurIPS,

work page
[79]

Stylization-Based Architecture for Fast Deep Exemplar Colorization

Zhongyou Xu, Tingting Wang, Faming Fang, Yun Sheng, and Guixu Zhang. Stylization-Based Architecture for Fast Deep Exemplar Colorization. InCVPR, 2020. 2

work page 2020
[80]

ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text

Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Is- sei Fujishiro, and Suguru Saito. ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text. InWACV, 2025. 1, 2, 3, 6, 7

work page 2025

Showing first 80 references.

[1] [1]

co / xinsir/anime- painter

Anime Painter.https : / / huggingface . co / xinsir/anime- painter. Accessed: 2024-10-06. 6, 7

work page 2024

[2] [2]

Accessed: 2024-10-06

CounterfeitXL.https://huggingface.co/gsdf/ CounterfeitXL. Accessed: 2024-10-06. 6, 7

work page 2024

[3] [3]

Accessed: 2024-03-03

CSS Color Module Level 3.https://www.w3.org/ TR/css-color-3/. Accessed: 2024-03-03. 3, 6, 14

work page 2024

[4] [4]

Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo

Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David K. Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo. Color- ing with Words: Guiding Image Colorization Through Text- based Palette Generation. InECCV, 2018. 2

work page 2018

[5] [5]

Multidimensional binary search trees used for associative searching.Communications of the ACM,

Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Communications of the ACM,

work page

[6] [6]

Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches

Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, and Yi-Zhe Song. Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches. InCVPR, 2022. 2

work page 2022

[7] [7]

Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings

Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi- Zhe Song. Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings. InCVPR, 2023. 2

work page 2023

[8] [8]

Palette-based Photo Recoloring

Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. Palette-based Photo Recoloring. In SIGGRAPH, 2015. 2

work page 2015

[9] [9]

L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer

Zheng Chang, Shuchen Weng, Yu Li, Si Li, and Boxin Shi. L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. InECCV, 2022. 2

work page 2022

[10] [10]

Automatic Image Colorization Via Multimodal Predictions

Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch¨olkopf. Automatic Image Colorization Via Multimodal Predictions. InECCV, 2008. 2

work page 2008

[11] [11]

Language-Based Image Editing with Recur- rent Attentive Models

Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, and Xiaodong Liu. Language-Based Image Editing with Recur- rent Attentive Models. InCVPR, 2018. 1, 2

work page 2018

[12] [12]

SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis

Wengling Chen and James Hays. SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis. InCVPR,

work page

[13] [13]

Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model

Shin-I Cheng, Yu-Jie Chen, Wei-Chen Chiu, Hung-Yu Tseng, and Hsin-Ying Lee. Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model. In WACV, 2023. 2, 14

work page 2023

[14] [14]

Controllable Image Synthesis via SegV AE

Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, and Ming-Hsuan Yang. Controllable Image Synthesis via SegV AE. InECCV,

work page

[15] [15]

Deep Col- orization

Zezhou Cheng, Qingxiong Yang, and Bin Sheng. Deep Col- orization. InICCV, 2015. 14

work page 2015

[16] [16]

StarGAN v2: Diverse Image Synthesis for Multiple Do- mains

Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse Image Synthesis for Multiple Do- mains. InCVPR, 2020. 5, 6, 13

work page 2020

[17] [17]

Automatic Controllable Colorization via Imagination

Xiaoyan Cong, Yue Wu, Qifeng Chen, and Chenyang Lei. Automatic Controllable Colorization via Imagination. In CVPR, 2024. 3, 4

work page 2024

[18] [18]

Diffusion Models Beat GANs on Image Synthesis

Prafulla Dhariwal and Alex Nichol. Diffusion Models Beat GANs on Image Synthesis. InNeurIPS, 2021. 3

work page 2021

[19] [19]

Williams, John Winn, and Andrew Zisserman

Mark Everingham, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.Int. J. Comput. Vision, 2010. 5, 6

work page 2010

[20] [20]

A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts

Beck Fabian, Dachsbacher Carsten, and Sadlo Filip. A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts. InVMV, 2018. 2

work page 2018

[21] [21]

Comicolorization: Semi-Automatic Manga Colorization

Chie Furusawa, Kazuyuki Hiroshiba, Keisuke Ogaki, and Yuri Odagiri. Comicolorization: Semi-Automatic Manga Colorization. InSIGGRAPH, 2017. 2

work page 2017

[22] [22]

Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio

Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. In NeurIPS, 2014. 2

work page 2014

[23] [23]

CoGS: Controllable Generation and Search from Sketch and Style

Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, and John Collomosse. CoGS: Controllable Generation and Search from Sketch and Style. InECCV, 2022. 2

work page 2022

[24] [24]

Prompt-to-Prompt Image Editing with Cross Attention Control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-Prompt Image Editing with Cross Attention Control. InICLR, 2023. 5, 6

work page 2023

[25] [25]

GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InNeurIPS, 2017. 6

work page 2017

[26] [26]

Classifier-Free Diffusion Guidance

Jonathan Ho and Tim Salimans. Classifier-Free Diffusion Guidance. InNeurIPS Workshop on Deep Generative Mod- els and Downstream Applications, 2021. 2

work page 2021

[27] [27]

Denoising Dif- fusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InNeurIPS, 2020. 2, 3

work page 2020

[28] [28]

Composer: Creative and Controllable Im- age Synthesis with Composable Conditions

Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. Composer: Creative and Controllable Im- age Synthesis with Composable Conditions. InICML, 2023. 2

work page 2023

[29] [29]

U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwang Hee Lee. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. InICLR, 2020. 2

work page 2020

[30] [30]

Dynamic Closest Color Warp- ing to Sort and Compare Palettes

Suzi Kim and Sunghee Choi. Dynamic Closest Color Warp- ing to Sort and Compare Palettes. InACM TOG, 2021. 6, 7

work page 2021

[31] [31]

DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.arXiv preprint arXiv:2305.15194, 2023

Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, and Namhyuk Ahn. DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.arXiv preprint arXiv:2305.15194, 2023. 3, 6, 7, 14

work page arXiv 2023

[32] [32]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-Encoding Vari- ational Bayes.arXiv preprint arXiv:1312.6114, 2013. 2

work page internal anchor Pith review Pith/arXiv arXiv 2013

[33] [33]

Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches. InCVPR, 2023. 2 9

work page 2023

[34] [34]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural net- works. InNeurIPS, 2012. 6

work page 2012

[35] [35]

Diverse Image- to-Image Translation via Disentangled Representations

Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Ma- neesh Kumar Singh, and Ming-Hsuan Yang. Diverse Image- to-Image Translation via Disentangled Representations. In ECCV, 2018. 2

work page 2018

[36] [36]

Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence

Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence. InCVPR, 2020. 1, 2

work page 2020

[37] [37]

Colorization Using Optimization

Anat Levin, Dani Lischinski, and Yair Weiss. Colorization Using Optimization. InSIGGRAPH, 2004. 2

work page 2004

[38] [38]

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.arXiv preprint arXiv:2301.12597, 2023. 3, 4, 6, 12

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

GLIGEN: Open-Set Grounded Text-to-Image Generation

Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. GLIGEN: Open-Set Grounded Text-to-Image Generation. In CVPR, 2023. 2

work page 2023

[40] [40]

Self-Supervised Sketch-to-Image Synthesis

Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed El- gammal. Self-Supervised Sketch-to-Image Synthesis. In AAAI, 2021. 2

work page 2021

[41] [41]

Visual Instruction Tuning

Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual Instruction Tuning.arXiv preprint arXiv:2304.08485,

work page internal anchor Pith review Pith/arXiv arXiv

[42] [42]

Unsupervised Sketch- to-Photo Synthesis

Runtao Liu, Qian Yu, and Stella Yu. Unsupervised Sketch- to-Photo Synthesis. InECCV, 2020. 2

work page 2020

[43] [43]

In- trinsic colorization

Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, Chi-Sing Leung, and Pheng-Ann Heng. In- trinsic colorization. InSIGGRAPH, 2008. 2

work page 2008

[44] [44]

MangaNinja: Line Art Colorization with Precise Reference Following

Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, and Ping Luo. MangaNinja: Line Art Colorization with Precise Reference Following. InCVPR, 2025. 2, 6, 7

work page 2025

[45] [45]

DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models

Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models.arXiv preprint arXiv:2211.01095, 2023. 2, 4, 5, 6, 8

work page internal anchor Pith review Pith/arXiv arXiv 2023

[46] [46]

TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition. InICCV, 2023. 2, 4, 5, 6

work page 2023

[47] [47]

Image Generation from Sketch Constraint Using Con- textual GAN

Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. Image Generation from Sketch Constraint Using Con- textual GAN. InECCV, 2018. 2

work page 2018

[48] [48]

RePaint: Inpainting using Denoising Diffusion Probabilistic Models

Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. InCVPR,

work page

[49] [49]

SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions

Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions. InICLR, 2022. 13

work page 2022

[50] [50]

FreeCon- trol: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.arXiv preprint arXiv:2312.07536, 2023

Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou. FreeCon- trol: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.arXiv preprint arXiv:2312.07536, 2023. 2

work page arXiv 2023

[51] [51]

Null-text Inversion for Editing Real Images using Guided Diffusion Models.arXiv preprint arXiv:2211.09794, 2022

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text Inversion for Editing Real Images using Guided Diffusion Models.arXiv preprint arXiv:2211.09794, 2022. 2, 5, 8

work page arXiv 2022

[52] [52]

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2I- Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.arXiv preprint arXiv:2302.08453, 2023. 2, 6, 7

work page internal anchor Pith review Pith/arXiv arXiv 2023

[53] [53]

Semantic Image Synthesis with Spatially-Adaptive Normalization

Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic Image Synthesis with Spatially-Adaptive Normalization. InCVPR, 2019. 2

work page 2019

[54] [54]

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. InICML,

work page

[55] [55]

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR, 2022. 1, 2, 3, 5, 6, 7, 14

work page 2022

[56] [56]

U- Net: Convolutional Networks for Biomedical Image Seg- mentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional Networks for Biomedical Image Seg- mentation. InMICCAI, 2015. 2, 3

work page 2015

[57] [57]

Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Pho- torealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. InNeurIPS, 2022. 1

work page 2022

[58] [58]

Sketch3T: Test-time Training for Zero-Shot SBIR

Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Sketch3T: Test-time Training for Zero-Shot SBIR. InCVPR,

work page

[59] [59]

Scribbler: Controlling Deep Image Synthesis with Sketch and Color

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. InCVPR, 2017. 1

work page 2017

[60] [60]

A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch

Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, and James Hays. A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch. InECCV, 2022. 4

work page 2022

[61] [61]

FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023

Chenyang Si, Ziqi Huang, Yuming Jiang, and Ziwei Liu. FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023. 2

work page arXiv 2023

[62] [62]

Weiss, Niru Mah- eswaranathan, and Surya Ganguli

Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. InICML,

work page

[63] [63]

Denois- ing Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing Diffusion Implicit Models. InICLR, 2021. 2, 3, 5

work page 2021

[64] [64]

Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equa- tions. InICLR, 2021. 5 10

work page 2021

[65] [65]

Pixel Difference Net- works for Efficient Edge Detection

Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietik ¨ainen, and Li Liu. Pixel Difference Net- works for Efficient Edge Detection. InICCV, 2021. 6

work page 2021

[66] [66]

Rethinking the In- ception Architecture for Computer Vision

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the In- ception Architecture for Computer Vision. InCVPR, 2016. 6

work page 2016

[67] [67]

Plug-and-Play Diffusion Features for Text- Driven Image-to-Image Translation.arXiv preprint arXiv:2211.12572, 2022

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-Play Diffusion Features for Text- Driven Image-to-Image Translation.arXiv preprint arXiv:2211.12572, 2022. 5

work page arXiv 2022

[68] [68]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. InNeurIPS, 2017. 2, 3, 4

work page 2017

[69] [69]

Sketch-Guided Text-to-Image Diffusion Models

Andrey V oynov, Kfir Aberman, and Daniel Cohen-Or. Sketch-Guided Text-to-Image Diffusion Models. InSIG- GRAPH, 2023. 2

work page 2023

[70] [70]

EDICT: Ex- act Diffusion Inversion via Coupled Transformations

Bram Wallace, Akash Gokul, and Nikhil Naik. EDICT: Ex- act Diffusion Inversion via Coupled Transformations. In CVPR, 2023. 5

work page 2023

[71] [71]

GIT: A Generative Image-to-text Transformer for Vision and Language

Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. GIT: A Generative Image-to-text Transformer for Vision and Language.arXiv preprint arXiv:2205.14100, 2022. 13

work page internal anchor Pith review Pith/arXiv arXiv 2022

[72] [72]

DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models

Qiang Wang, Di Kong, Fengyin Lin, and Yonggang Qi. DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models. InBMVC, 2022. 2

work page 2022

[73] [73]

Sketch Your Own GAN

Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. Sketch Your Own GAN. InICCV, 2021. 2

work page 2021

[74] [74]

L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions

Shuchen Weng, Hao Wu, Zheng Chang, Jiajun Tang, Si Li, and Boxin Shi. L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions. InAAAI, 2022. 2

work page 2022

[75] [75]

Self-driven dual-path learning for reference-based line art colorization under limited data

Shukai Wu, Xiao Yan, Weiming Liu, Shuchang Xu, and Sanyuan Zhang. Self-driven dual-path learning for reference-based line art colorization under limited data. In IEEE TCSVT. IEEE, 2023. 2

work page 2023

[76] [76]

FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes

Shukai Wu, Yuhang Yang, Shuchang Xu, Weiming Liu, Xiao Yan, and Sanyuan Zhang. FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes. InACM MM, pages 8662–8673, 2023. 2

work page 2023

[77] [77]

Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior

Yanze Wu, Xintao Wang, Yu Li, Honglun Zhang, Xun Zhao, and Ying Shan. Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior. InICCV, 2021. 2, 14

work page 2021

[78] [78]

CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation

Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, and Joyce Chai. CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation. InNeurIPS,

work page

[79] [79]

Stylization-Based Architecture for Fast Deep Exemplar Colorization

Zhongyou Xu, Tingting Wang, Faming Fang, Yun Sheng, and Guixu Zhang. Stylization-Based Architecture for Fast Deep Exemplar Colorization. InCVPR, 2020. 2

work page 2020

[80] [80]

ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text

Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Is- sei Fujishiro, and Suguru Saito. ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text. InWACV, 2025. 1, 2, 3, 6, 7

work page 2025