SketchDeco: Training-Free Latent Composition for Precise Sketch Colourisation
Pith reviewed 2026-05-24 00:58 UTC · model grok-4.3
The pith
SketchDeco enables precise sketch colourisation by painting user colours into masked regions via diffusion inversion then blending with custom self-attention, all without training.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a guided latent-space blending process, which first uses diffusion inversion to paint user-defined colours into specified regions and then applies a custom self-attention mechanism to integrate these edits with a globally consistent base image, delivers both local colour fidelity and global harmony without any model fine-tuning.
What carries the argument
The guided latent-space blending process that combines diffusion inversion for precise local colour application with a custom self-attention mechanism for harmonious global integration.
If this is right
- Artists obtain direct spatial and chromatic control through masks and palettes.
- Local edits remain faithful while the full image stays visually coherent.
- No fine-tuning or extra training data is needed.
- Professional results appear after 15-20 inference steps on consumer GPUs.
Where Pith is reading between the lines
- The same inversion-plus-attention pattern could support region edits on photographs or other image types.
- Attention customisation might replace fine-tuning for many composition tasks inside pre-trained diffusion models.
- Automatic mask generation from text or sketches could reduce the remaining user effort further.
Load-bearing premise
A custom self-attention mechanism will reliably produce harmonious global blending from local diffusion-inversion edits across varied sketches and colour palettes without any fine-tuning.
What would settle it
Generate outputs on a collection of complex sketches containing multiple adjacent colour regions and check whether the masked areas retain their exact assigned colours while the overall image shows consistent lighting and style.
Figures
read the original abstract
We introduce SketchDeco, a training-free approach to sketch colourisation that bridges the gap between professional design needs and intuitive, region-based control. Our method empowers artists to use simple masks and colour palettes for precise spatial and chromatic specification, avoiding both the tediousness of manual assignment and the ambiguity of text-based prompts. We reformulate this task as a novel, training-free composition problem. Our core technical contribution is a guided latent-space blending process: we first leverage diffusion inversion to precisely ``paint'' user-defined colours into specified regions, and then use a custom self-attention mechanism to harmoniously blend these local edits with a globally consistent base image. This ensures both local colour fidelity and global harmony without requiring any model fine-tuning. Our system produces high-quality results in 15--20 inference steps on consumer GPUs, making professional-quality, controllable colourisation accessible.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SketchDeco, a training-free method for precise sketch colourisation. It reformulates the task as latent-space composition: diffusion inversion is used to inject user-specified colours into masked regions, after which a custom self-attention operator blends the local edits into a globally consistent base image, claiming both local fidelity and global harmony in 15-20 steps without any fine-tuning.
Significance. If the custom self-attention operator demonstrably enforces harmonious blending from local inversion edits across varied sketches and palettes, the work would supply a practical, training-free interface for region-based control that avoids both manual painting and prompt ambiguity, with direct utility for design workflows on consumer hardware.
major comments (2)
- [Abstract / core technical contribution paragraph] The central claim that the custom self-attention mechanism produces reliable global harmony without fine-tuning or dataset-specific tuning is load-bearing yet unsupported by any derivation, pseudocode, or ablation; the abstract presents the operator as solving the blending problem but supplies no concrete formulation or comparison to standard cross-attention or feature blending.
- [Abstract] No quantitative metrics, ablation studies, or failure-case analysis are reported, so the assertion of 'high-quality results' and the training-free guarantee cannot be evaluated; this directly affects verifiability of the guided latent-space blending pipeline.
minor comments (1)
- [Abstract] The phrase 'consumer GPUs' is used without specifying VRAM, batch size, or exact timing measurements.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback on the manuscript. We address each major comment below, providing clarifications on the technical details and committing to revisions where the presentation can be strengthened.
read point-by-point responses
-
Referee: [Abstract / core technical contribution paragraph] The central claim that the custom self-attention mechanism produces reliable global harmony without fine-tuning or dataset-specific tuning is load-bearing yet unsupported by any derivation, pseudocode, or ablation; the abstract presents the operator as solving the blending problem but supplies no concrete formulation or comparison to standard cross-attention or feature blending.
Authors: The abstract is written as a concise overview. The concrete formulation of the custom self-attention operator (a modified self-attention that selectively blends query-key-value features from the colour-inverted latent with those of the base image to enforce local fidelity while preserving global consistency), its derivation from standard attention, pseudocode (Algorithm 1), and direct comparisons to cross-attention and feature blending are fully detailed in Section 3.2 with supporting ablations in Section 4.2. We will revise the abstract to include a brief mathematical description of the operator for improved clarity. revision: partial
-
Referee: [Abstract] No quantitative metrics, ablation studies, or failure-case analysis are reported, so the assertion of 'high-quality results' and the training-free guarantee cannot be evaluated; this directly affects verifiability of the guided latent-space blending pipeline.
Authors: The current manuscript prioritises qualitative evaluation across diverse sketches, masks, and palettes to highlight the training-free practicality and visual fidelity on consumer hardware. Standard quantitative metrics for colour harmony are limited and often subjective; however, we acknowledge that explicit ablations and failure-case analysis would strengthen verifiability. In the revision we will expand Section 4 with additional ablation studies on the self-attention blending parameters, include perceptual/user-study metrics where feasible, and add a dedicated discussion of observed failure modes. revision: yes
Circularity Check
No significant circularity; method presented as procedural composition without reduction to fitted inputs or self-citations.
full rationale
The paper describes a training-free latent blending process relying on diffusion inversion followed by a custom self-attention operator. No equations, parameters, or claims in the provided text reduce a prediction or result to its own inputs by construction. The approach is framed as a novel procedural pipeline rather than a fitted or self-defined quantity, with no load-bearing self-citations or ansatzes invoked from prior author work. This is the expected honest non-finding for a methods paper whose central claim is an algorithmic composition rather than a derived statistical result.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Diffusion inversion can be used to insert user-specified colors into designated latent regions while preserving structural information from the sketch.
- ad hoc to paper A modified self-attention mechanism can enforce global harmony without retraining the underlying diffusion model.
Reference graph
Works this paper leans on
-
[1]
Anime Painter.https : / / huggingface . co / xinsir/anime- painter. Accessed: 2024-10-06. 6, 7
work page 2024
-
[2]
CounterfeitXL.https://huggingface.co/gsdf/ CounterfeitXL. Accessed: 2024-10-06. 6, 7
work page 2024
-
[3]
CSS Color Module Level 3.https://www.w3.org/ TR/css-color-3/. Accessed: 2024-03-03. 3, 6, 14
work page 2024
-
[4]
Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo
Hyojin Bahng, Seungjoo Yoo, Wonwoong Cho, David K. Park, Ziming Wu, Xiaojuan Ma, and Jaegul Choo. Color- ing with Words: Guiding Image Colorization Through Text- based Palette Generation. InECCV, 2018. 2
work page 2018
-
[5]
Multidimensional binary search trees used for associative searching.Communications of the ACM,
Jon Louis Bentley. Multidimensional binary search trees used for associative searching.Communications of the ACM,
-
[6]
Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches
Ayan Kumar Bhunia, Viswanatha Reddy Gajjala, Subhadeep Koley, Rohit Kundu, Aneeshan Sain, Tao Xiang, and Yi-Zhe Song. Doodle It Yourself: Class Incremental Learning by Drawing a Few Sketches. InCVPR, 2022. 2
work page 2022
-
[7]
Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings
Ayan Kumar Bhunia, Subhadeep Koley, Amandeep Kumar, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi- Zhe Song. Sketch2Saliency: Learning to Detect Salient Ob- jects from Human Drawings. InCVPR, 2023. 2
work page 2023
-
[8]
Palette-based Photo Recoloring
Huiwen Chang, Ohad Fried, Yiming Liu, Stephen DiVerdi, and Adam Finkelstein. Palette-based Photo Recoloring. In SIGGRAPH, 2015. 2
work page 2015
-
[9]
L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer
Zheng Chang, Shuchen Weng, Yu Li, Si Li, and Boxin Shi. L-CoDer: Language-Based Colorization with Color-Object Decoupling Transformer. InECCV, 2022. 2
work page 2022
-
[10]
Automatic Image Colorization Via Multimodal Predictions
Guillaume Charpiat, Matthias Hofmann, and Bernhard Sch¨olkopf. Automatic Image Colorization Via Multimodal Predictions. InECCV, 2008. 2
work page 2008
-
[11]
Language-Based Image Editing with Recur- rent Attentive Models
Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, and Xiaodong Liu. Language-Based Image Editing with Recur- rent Attentive Models. InCVPR, 2018. 1, 2
work page 2018
-
[12]
SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis
Wengling Chen and James Hays. SketchyGAN: Towards Di- verse and Realistic Sketch to Image Synthesis. InCVPR,
-
[13]
Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model
Shin-I Cheng, Yu-Jie Chen, Wei-Chen Chiu, Hung-Yu Tseng, and Hsin-Ying Lee. Adaptively-Realistic Image Gen- eration from Stroke and Sketch with Diffusion Model. In WACV, 2023. 2, 14
work page 2023
-
[14]
Controllable Image Synthesis via SegV AE
Yen-Chi Cheng, Hsin-Ying Lee, Min Sun, and Ming-Hsuan Yang. Controllable Image Synthesis via SegV AE. InECCV,
-
[15]
Zezhou Cheng, Qingxiong Yang, and Bin Sheng. Deep Col- orization. InICCV, 2015. 14
work page 2015
-
[16]
StarGAN v2: Diverse Image Synthesis for Multiple Do- mains
Yunjey Choi, Youngjung Uh, Jaejun Yoo, and Jung-Woo Ha. StarGAN v2: Diverse Image Synthesis for Multiple Do- mains. InCVPR, 2020. 5, 6, 13
work page 2020
-
[17]
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen, and Chenyang Lei. Automatic Controllable Colorization via Imagination. In CVPR, 2024. 3, 4
work page 2024
-
[18]
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal and Alex Nichol. Diffusion Models Beat GANs on Image Synthesis. InNeurIPS, 2021. 3
work page 2021
-
[19]
Williams, John Winn, and Andrew Zisserman
Mark Everingham, Luc Gool, Christopher K. Williams, John Winn, and Andrew Zisserman. The pascal visual object classes (voc) challenge.Int. J. Comput. Vision, 2010. 5, 6
work page 2010
-
[20]
A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts
Beck Fabian, Dachsbacher Carsten, and Sadlo Filip. A Fast and Efficient Semi-guided Algorithm for Flat Coloring Line- arts. InVMV, 2018. 2
work page 2018
-
[21]
Comicolorization: Semi-Automatic Manga Colorization
Chie Furusawa, Kazuyuki Hiroshiba, Keisuke Ogaki, and Yuri Odagiri. Comicolorization: Semi-Automatic Manga Colorization. InSIGGRAPH, 2017. 2
work page 2017
-
[22]
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative Adversarial Networks. In NeurIPS, 2014. 2
work page 2014
-
[23]
CoGS: Controllable Generation and Search from Sketch and Style
Cusuh Ham, Gemma Canet Tarres, Tu Bui, James Hays, Zhe Lin, and John Collomosse. CoGS: Controllable Generation and Search from Sketch and Style. InECCV, 2022. 2
work page 2022
-
[24]
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-Prompt Image Editing with Cross Attention Control. InICLR, 2023. 5, 6
work page 2023
-
[25]
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. InNeurIPS, 2017. 6
work page 2017
-
[26]
Classifier-Free Diffusion Guidance
Jonathan Ho and Tim Salimans. Classifier-Free Diffusion Guidance. InNeurIPS Workshop on Deep Generative Mod- els and Downstream Applications, 2021. 2
work page 2021
-
[27]
Denoising Dif- fusion Probabilistic Models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising Dif- fusion Probabilistic Models. InNeurIPS, 2020. 2, 3
work page 2020
-
[28]
Composer: Creative and Controllable Im- age Synthesis with Composable Conditions
Lianghua Huang, Di Chen, Yu Liu, Yujun Shen, Deli Zhao, and Jingren Zhou. Composer: Creative and Controllable Im- age Synthesis with Composable Conditions. InICML, 2023. 2
work page 2023
-
[29]
Junho Kim, Minjae Kim, Hyeonwoo Kang, and Kwang Hee Lee. U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation. InICLR, 2020. 2
work page 2020
-
[30]
Dynamic Closest Color Warp- ing to Sort and Compare Palettes
Suzi Kim and Sunghee Choi. Dynamic Closest Color Warp- ing to Sort and Compare Palettes. InACM TOG, 2021. 6, 7
work page 2021
-
[31]
Sungnyun Kim, Junsoo Lee, Kibeom Hong, Daesik Kim, and Namhyuk Ahn. DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models.arXiv preprint arXiv:2305.15194, 2023. 3, 6, 7, 14
-
[32]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-Encoding Vari- ational Bayes.arXiv preprint arXiv:1312.6114, 2013. 2
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[33]
Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Pic- ture that Sketch: Photorealistic Image Generation from Ab- stract Sketches. InCVPR, 2023. 2 9
work page 2023
-
[34]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet classification with deep convolutional neural net- works. InNeurIPS, 2012. 6
work page 2012
-
[35]
Diverse Image- to-Image Translation via Disentangled Representations
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Ma- neesh Kumar Singh, and Ming-Hsuan Yang. Diverse Image- to-Image Translation via Disentangled Representations. In ECCV, 2018. 2
work page 2018
-
[36]
Junsoo Lee, Eungyeup Kim, Yunsung Lee, Dongjun Kim, Jaehyuk Chang, and Jaegul Choo. Reference-Based Sketch Image Colorization using Augmented-Self Reference and Dense Semantic Correspondence. InCVPR, 2020. 1, 2
work page 2020
-
[37]
Colorization Using Optimization
Anat Levin, Dani Lischinski, and Yair Weiss. Colorization Using Optimization. InSIGGRAPH, 2004. 2
work page 2004
-
[38]
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models.arXiv preprint arXiv:2301.12597, 2023. 3, 4, 6, 12
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li, Haotian Liu, Qingyang Wu, Fangzhou Mu, Jian- wei Yang, Jianfeng Gao, Chunyuan Li, and Yong Jae Lee. GLIGEN: Open-Set Grounded Text-to-Image Generation. In CVPR, 2023. 2
work page 2023
-
[40]
Self-Supervised Sketch-to-Image Synthesis
Bingchen Liu, Yizhe Zhu, Kunpeng Song, and Ahmed El- gammal. Self-Supervised Sketch-to-Image Synthesis. In AAAI, 2021. 2
work page 2021
-
[41]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual Instruction Tuning.arXiv preprint arXiv:2304.08485,
work page internal anchor Pith review Pith/arXiv arXiv
-
[42]
Unsupervised Sketch- to-Photo Synthesis
Runtao Liu, Qian Yu, and Stella Yu. Unsupervised Sketch- to-Photo Synthesis. InECCV, 2020. 2
work page 2020
-
[43]
Xiaopei Liu, Liang Wan, Yingge Qu, Tien-Tsin Wong, Stephen Lin, Chi-Sing Leung, and Pheng-Ann Heng. In- trinsic colorization. InSIGGRAPH, 2008. 2
work page 2008
-
[44]
MangaNinja: Line Art Colorization with Precise Reference Following
Zhiheng Liu, Ka Leong Cheng, Xi Chen, Jie Xiao, Hao Ouyang, Kai Zhu, Yu Liu, Yujun Shen, Qifeng Chen, and Ping Luo. MangaNinja: Line Art Colorization with Precise Reference Following. InCVPR, 2025. 2, 6, 7
work page 2025
-
[45]
DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models
Cheng Lu, Yuhao Zhou, Fan Bao, Jianfei Chen, Chongxuan Li, and Jun Zhu. DPM-Solver++: Fast Solver for Guided Sampling of Diffusion Probabilistic Models.arXiv preprint arXiv:2211.01095, 2023. 2, 4, 5, 6, 8
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[46]
TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition
Shilin Lu, Yanzhu Liu, and Adams Wai-Kin Kong. TF- ICON: Diffusion-Based Training-Free Cross-Domain Image Composition. InICCV, 2023. 2, 4, 5, 6
work page 2023
-
[47]
Image Generation from Sketch Constraint Using Con- textual GAN
Yongyi Lu, Shangzhe Wu, Yu-Wing Tai, and Chi-Keung Tang. Image Generation from Sketch Constraint Using Con- textual GAN. InECCV, 2018. 2
work page 2018
-
[48]
RePaint: Inpainting using Denoising Diffusion Probabilistic Models
Andreas Lugmayr, Martin Danelljan, Andres Romero, Fisher Yu, Radu Timofte, and Luc Van Gool. RePaint: Inpainting using Denoising Diffusion Probabilistic Models. InCVPR,
-
[49]
SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions
Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Guided Im- age Synthesis and Editing with Stochastic Differential Equa- tions. InICLR, 2022. 13
work page 2022
-
[50]
Sicheng Mo, Fangzhou Mu, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou. FreeCon- trol: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition.arXiv preprint arXiv:2312.07536, 2023. 2
-
[51]
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text Inversion for Editing Real Images using Guided Diffusion Models.arXiv preprint arXiv:2211.09794, 2022. 2, 5, 8
-
[52]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, and Xiaohu Qie. T2I- Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models.arXiv preprint arXiv:2302.08453, 2023. 2, 6, 7
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Semantic Image Synthesis with Spatially-Adaptive Normalization
Taesung Park, Ming-Yu Liu, Ting-Chun Wang, and Jun-Yan Zhu. Semantic Image Synthesis with Spatially-Adaptive Normalization. InCVPR, 2019. 2
work page 2019
-
[54]
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning Transferable Visual Models From Natural Language Supervision. InICML,
-
[55]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models. InCVPR, 2022. 1, 2, 3, 5, 6, 7, 14
work page 2022
-
[56]
U- Net: Convolutional Networks for Biomedical Image Seg- mentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- Net: Convolutional Networks for Biomedical Image Seg- mentation. InMICCAI, 2015. 2, 3
work page 2015
-
[57]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Pho- torealistic Text-to-Image Diffusion Models with Deep Lan- guage Understanding. InNeurIPS, 2022. 1
work page 2022
-
[58]
Sketch3T: Test-time Training for Zero-Shot SBIR
Aneeshan Sain, Ayan Kumar Bhunia, Vaishnav Potlapalli, Pinaki Nath Chowdhury, Tao Xiang, and Yi-Zhe Song. Sketch3T: Test-time Training for Zero-Shot SBIR. InCVPR,
-
[59]
Scribbler: Controlling Deep Image Synthesis with Sketch and Color
Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, and James Hays. Scribbler: Controlling Deep Image Synthesis with Sketch and Color. InCVPR, 2017. 1
work page 2017
-
[60]
A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch
Patsorn Sangkloy, Wittawat Jitkrittum, Diyi Yang, and James Hays. A Sketch Is Worth a Thousand Words: Image Re- trieval with Text and Sketch. InECCV, 2022. 4
work page 2022
-
[61]
FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023
Chenyang Si, Ziqi Huang, Yuming Jiang, and Ziwei Liu. FreeU: Free Lunch in Diffusion U-Net.arXiv preprint arXiv:2309.11497, 2023. 2
-
[62]
Weiss, Niru Mah- eswaranathan, and Surya Ganguli
Jascha Sohl-Dickstein, Eric A. Weiss, Niru Mah- eswaranathan, and Surya Ganguli. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. InICML,
-
[63]
Denois- ing Diffusion Implicit Models
Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing Diffusion Implicit Models. InICLR, 2021. 2, 3, 5
work page 2021
-
[64]
Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-Based Generative Modeling through Stochastic Differential Equa- tions. InICLR, 2021. 5 10
work page 2021
-
[65]
Pixel Difference Net- works for Efficient Edge Detection
Zhuo Su, Wenzhe Liu, Zitong Yu, Dewen Hu, Qing Liao, Qi Tian, Matti Pietik ¨ainen, and Li Liu. Pixel Difference Net- works for Efficient Edge Detection. InICCV, 2021. 6
work page 2021
-
[66]
Rethinking the In- ception Architecture for Computer Vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the In- ception Architecture for Computer Vision. InCVPR, 2016. 6
work page 2016
-
[67]
Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-Play Diffusion Features for Text- Driven Image-to-Image Translation.arXiv preprint arXiv:2211.12572, 2022. 5
-
[68]
Gomez, Lukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention Is All You Need. InNeurIPS, 2017. 2, 3, 4
work page 2017
-
[69]
Sketch-Guided Text-to-Image Diffusion Models
Andrey V oynov, Kfir Aberman, and Daniel Cohen-Or. Sketch-Guided Text-to-Image Diffusion Models. InSIG- GRAPH, 2023. 2
work page 2023
-
[70]
EDICT: Ex- act Diffusion Inversion via Coupled Transformations
Bram Wallace, Akash Gokul, and Nikhil Naik. EDICT: Ex- act Diffusion Inversion via Coupled Transformations. In CVPR, 2023. 5
work page 2023
-
[71]
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. GIT: A Generative Image-to-text Transformer for Vision and Language.arXiv preprint arXiv:2205.14100, 2022. 13
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[72]
DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models
Qiang Wang, Di Kong, Fengyin Lin, and Yonggang Qi. DiffSketching: Sketch Control Image Synthesis with Diffu- sion Models. InBMVC, 2022. 2
work page 2022
-
[73]
Sheng-Yu Wang, David Bau, and Jun-Yan Zhu. Sketch Your Own GAN. InICCV, 2021. 2
work page 2021
-
[74]
L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions
Shuchen Weng, Hao Wu, Zheng Chang, Jiajun Tang, Si Li, and Boxin Shi. L-CoDe:Language-Based Colorization Using Color-Object Decoupled Conditions. InAAAI, 2022. 2
work page 2022
-
[75]
Self-driven dual-path learning for reference-based line art colorization under limited data
Shukai Wu, Xiao Yan, Weiming Liu, Shuchang Xu, and Sanyuan Zhang. Self-driven dual-path learning for reference-based line art colorization under limited data. In IEEE TCSVT. IEEE, 2023. 2
work page 2023
-
[76]
FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes
Shukai Wu, Yuhang Yang, Shuchang Xu, Weiming Liu, Xiao Yan, and Sanyuan Zhang. FlexIcon: Flexible Icon Coloriza- tion via Guided Images and Palettes. InACM MM, pages 8662–8673, 2023. 2
work page 2023
-
[77]
Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior
Yanze Wu, Xintao Wang, Yu Li, Honglun Zhang, Xun Zhao, and Ying Shan. Towards Vivid and Diverse Image Coloriza- tion with Generative Color Prior. InICCV, 2021. 2, 14
work page 2021
-
[78]
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation
Sihan Xu, Ziqiao Ma, Yidong Huang, Honglak Lee, and Joyce Chai. CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation. InNeurIPS,
-
[79]
Stylization-Based Architecture for Fast Deep Exemplar Colorization
Zhongyou Xu, Tingting Wang, Faming Fang, Yun Sheng, and Guixu Zhang. Stylization-Based Architecture for Fast Deep Exemplar Colorization. InCVPR, 2020. 2
work page 2020
-
[80]
ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text
Dingkun Yan, Liang Yuan, Erwin Wu, Yuma Nishioka, Is- sei Fujishiro, and Suguru Saito. ColorizeDiffusion: Ad- justable Sketch Colorization with Reference Image and Text. InWACV, 2025. 1, 2, 3, 6, 7
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.