DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer
Pith reviewed 2026-05-19 19:08 UTC · model grok-4.3
The pith
DealMaTe transfers materials across objects using depth, normal, and lighting images in a text-free diffusion framework.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DealMaTe is a simplified diffusion framework for material transfer that relies solely on depth, normal, and lighting images. It introduces Multi-Dim 3D Shader LoRA to add 3D control conditions compatibly without changing base model weights and applies Shader Causal Mutual Attention with key-value caching to reduce latency from multiple inputs while preserving output quality.
What carries the argument
Multi-Dim 3D Shader LoRA, a lightweight adapter that injects depth, normal, and lighting information into the diffusion transformer for compatible control.
Load-bearing premise
The lightweight 3D information injection via Multi-Dim 3D Shader LoRA enables compatible control conditions and achieves harmonious and stable results without modifying the base model weights.
What would settle it
A case showing feature misalignment or unstable outputs when transferring materials under complex geometry or extreme lighting would disprove reliable high-fidelity performance.
Figures
read the original abstract
Recently, diffusion-based material transfer methods rely on image fine-tuning or complex architectures with auxiliary networks but face challenges such as text dependency, additional computational costs, and feature misalignment. To address these limitations, we propose \textbf{DealMaTe}, using \underline{\textbf{de}}pth, norm\underline{\textbf{a}}l, and \underline{\textbf{l}}ighting images for \underline{\textbf{ma}}terial \underline{\textbf{t}}ransf\underline{\textbf{e}}r. DealMaTe is a simplified diffusion framework that eliminates text guidance and reference networks. We design a lightweight 3D information injection method, Multi-Dim 3D Shader LoRA, which, without modifying the base model weights, enables compatible control conditions and achieves harmonious and stable results. Additionally, we optimize the attention mechanism with Shader Causal Mutual Attention and key-value (KV) caching to reduce inference latency caused by multiple conditions, improve computational efficiency, and achieve high-quality material transfer results with low architectural complexity. Extensive experiments covering a wide variety of objects and lighting conditions consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer under arbitrary input materials. The code is available at https://github.com/haha-lisa/DealMaTe.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents DealMaTe, a diffusion transformer framework for multi-dimensional material transfer that takes depth, normal, and lighting images as conditions. It eliminates text guidance and reference networks, introducing a lightweight Multi-Dim 3D Shader LoRA for 3D information injection without altering base model weights, plus Shader Causal Mutual Attention and KV caching to improve efficiency and reduce latency from multiple conditions. The authors claim that extensive experiments across varied objects and lighting conditions demonstrate remarkable high-fidelity transfer under arbitrary input materials, with code released at the provided GitHub link.
Significance. If the quantitative validation holds, the work offers a simplified, lower-complexity alternative to existing diffusion-based material transfer methods in computer graphics, potentially reducing text dependency, auxiliary network overhead, and feature misalignment. The emphasis on compatible control conditions via LoRA and efficiency optimizations, combined with code availability, supports reproducibility and practical adoption.
major comments (2)
- [Abstract] Abstract: the central claim that 'extensive experiments... consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer' provides no quantitative metrics, baselines, ablation results, or failure cases. This is load-bearing for the contribution, as the support for the method's effectiveness cannot be assessed from the stated claims alone.
- [Method] Method (Multi-Dim 3D Shader LoRA description): the assertion that the lightweight injection 'enables compatible control conditions and achieves harmonious and stable results without modifying the base model weights' lacks any derivation, analysis of feature alignment across material distributions, or ablation isolating the LoRA contribution versus the Shader Causal Mutual Attention. This is the least-secured step for the high-fidelity claim under arbitrary inputs.
minor comments (2)
- [Abstract] The acronym expansion for DealMaTe is given but could be stated more explicitly on first use in the title or abstract for clarity.
- [Method] Notation for the attention mechanism (e.g., 'Shader Causal Mutual Attention') would benefit from a brief equation or diagram reference to distinguish it from standard causal attention.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing clarifications and indicating revisions made to strengthen the presentation of our contributions.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'extensive experiments... consistently demonstrate that DealMaTe achieves remarkable high-fidelity material transfer' provides no quantitative metrics, baselines, ablation results, or failure cases. This is load-bearing for the contribution, as the support for the method's effectiveness cannot be assessed from the stated claims alone.
Authors: We agree that the abstract, in its original form, presented the experimental outcomes at a high level without specific quantitative support. The full manuscript contains extensive quantitative evaluations, baseline comparisons, and ablation studies in Section 4, but these were not reflected in the abstract. We have revised the abstract to include key metrics (such as PSNR, SSIM, and LPIPS scores against baselines) and a brief reference to the ablation results, while noting that failure cases are analyzed in the supplementary material. This change directly addresses the concern about substantiating the high-fidelity claims. revision: yes
-
Referee: [Method] Method (Multi-Dim 3D Shader LoRA description): the assertion that the lightweight injection 'enables compatible control conditions and achieves harmonious and stable results without modifying the base model weights' lacks any derivation, analysis of feature alignment across material distributions, or ablation isolating the LoRA contribution versus the Shader Causal Mutual Attention. This is the least-secured step for the high-fidelity claim under arbitrary inputs.
Authors: We appreciate this observation regarding the need for more rigorous justification. The original manuscript describes the Multi-Dim 3D Shader LoRA design and its practical benefits in Section 3, including how it injects conditions without altering base weights. However, we acknowledge the value of explicit derivation and isolation. We have added a derivation of the feature alignment mechanism across material distributions in the revised Section 3.2 and included a dedicated ablation study in Section 4.3 that compares variants with and without the LoRA (versus the attention module alone). These additions demonstrate the contribution to harmonious and stable results under arbitrary inputs. revision: yes
Circularity Check
No derivation chain; empirical method is self-contained
full rationale
The paper describes an engineering contribution: a diffusion-based material transfer pipeline that injects depth/normal/lighting via Multi-Dim 3D Shader LoRA and optimizes attention with Shader Causal Mutual Attention. No equations, first-principles derivations, or parameter-fitting steps are presented that could reduce to their own inputs by construction. Claims of compatibility and high-fidelity results are supported by external experiments on varied objects and lighting, not by any self-referential definition or fitted-input prediction. Self-citations, if present, are not load-bearing for any uniqueness theorem or ansatz. The work is therefore scored as having no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Louis-Philippe Asselin, Denis Laurendeau, and Jean-Francois Lalonde. 2020. Deep SVBRDF estimation on real materials. InInternational Conference on 3D Vision (3DV). IEEE, 1157–1166
work page 2020
-
[2]
Tim Brooks, Aleksander Holynski, and Alexei A Efros. 2023. Instructpix2pix: Learning to follow image editing instructions. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 18392–18402
work page 2023
-
[3]
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. 2022. Wearable ImageNet: Synthesizing tileable textures via dataset distillation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2278–2282
work page 2022
-
[4]
Dave Zhenyu Chen, Yawar Siddiqui, Hsin-Ying Lee, Sergey Tulyakov, and Matthias Nießner. 2023. Text2tex: Text-driven texture synthesis via diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 18558– 18568
work page 2023
-
[5]
Junsong Chen, YU Jincheng, GE Chongjian, Lewei Yao, Enze Xie, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li. 2024. Pixart-𝛼: Fast training of diffusion transformer for photorealistic text-to-image synthesis. InInternational Conference on Learning Representations
work page 2024
-
[6]
Ta Ying Cheng, Prafull Sharma, Mark Boss, and Varun Jampani. 2025. MAR- BLE: Material Recomposition and Blending in CLIP-Space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13061–13071
work page 2025
-
[7]
Ta-Ying Cheng, Prafull Sharma, Andrew Markham, Niki Trigoni, and Varun Jampani. 2024. Zest: Zero-shot material transfer from a single image. InEuropean Conference on Computer Vision. Springer, 370–386
work page 2024
-
[8]
Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. 2025. Gemini 2.5: Pushing the frontier with advanced reasoning, multi- modality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[9]
Valentin Deschaintre, Miika Aittala, Fredo Durand, George Drettakis, and Adrien Bousseau. 2018. Single-image svbrdf capture with a rendering-aware deep net- work.ACM Transactions on Graphics (TOG)37, 4 (2018), 1–15
work page 2018
-
[10]
Valentin Deschaintre, George Drettakis, and Adrien Bousseau. 2020. Guided fine-tuning for large-scale material transfer. InComputer Graphics Forum, Vol. 39. Wiley Online Library, 91–105
work page 2020
-
[11]
Valentin Deschaintre, Yiming Lin, and Abhijeet Ghosh. 2021. Deep polarization imaging for 3d shape and svbrdf acquisition. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 15567–15576
work page 2021
-
[12]
Olga Diamanti, Connelly Barnes, Sylvain Paris, Eli Shechtman, and Olga Sorkine- Hornung. 2015. Synthesis of complex image appearance from limited exemplars. ACM Transactions on Graphics (TOG)34, 2 (2015), 1–14
work page 2015
-
[13]
Diffusers. 2023. controlnet-depth-sdxl-1.0. https://huggingface.co/diffusers/ controlnet-depth-sdxl-1.0
work page 2023
-
[14]
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. 2024. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In International Conference on Machine Learning. PMLR, 12606–12633
work page 2024
-
[15]
Stephanie Fu, Netanel Tamir, Shobhita Sundaram, Lucy Chai, Richard Zhang, Tali Dekel, and Phillip Isola. 2023. DreamSim: Learning New Dimensions of Human , Vol. 1, No. 1, Article . Publication date: May 2026. 14•Huang et al. Visual Similarity using Synthetic Data.Advances in Neural Information Processing Systems36 (2023), 50742–50768
work page 2023
-
[16]
Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-or. 2023. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. InInternational Conference on Learning Representations
work page 2023
-
[17]
Duan Gao, Xiao Li, Yue Dong, Pieter Peers, Kun Xu, and Xin Tong. 2019. Deep in- verse rendering for high-resolution SVBRDF estimation from an arbitrary number of images.ACM Transactions on Graphics (TOG)38, 4 (2019), 134–1
work page 2019
- [18]
-
[19]
D. Guarnera, G.C. Guarnera, A. Ghosh, C. Denk, and M. Glencross. 2016. BRDF Representation and Acquisition.Computer Graphics Forum35, 2 (2016), 625–650. doi:10.1111/cgf.12867
-
[20]
Philipp Henzler, Valentin Deschaintre, Niloy J Mitra, and Tobias Ritschel. 2021. Generative modelling of BRDF textures from flash images.ACM Transactions on Graphics (TOG)40, 6 (2021), 1–13
work page 2021
-
[21]
Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-or. 2023. Prompt-to-Prompt Image Editing with Cross-Attention Control. InInternational Conference on Learning Representations
work page 2023
-
[22]
Alain Hore and Djemel Ziou. 2010. Image quality metrics: PSNR vs. SSIM. In2010 20th international conference on pattern recognition. IEEE, 2366–2369
work page 2010
-
[23]
Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. LoRA: Low-Rank Adaptation of Large Language Models. InInternational Conference on Learning Representations
work page 2022
-
[24]
Yiwei Hu, Julie Dorsey, and Holly Rushmeier. 2019. A novel framework for inverse procedural texture modeling.ACM Transactions on Graphics (TOG)38, 6 (2019), 1–14
work page 2019
-
[25]
Yiwei Hu, Chengan He, Valentin Deschaintre, Julie Dorsey, and Holly Rushmeier
-
[26]
An inverse procedural modeling pipeline for svbrdf maps.ACM Transactions on Graphics (TOG)41, 2 (2022), 1–17
work page 2022
-
[27]
Nisha Huang, Weiming Dong, Yuxin Zhang, Fan Tang, Ronghui Li, Chongyang Ma, Xiu Li, Tong-Yee Lee, and Changsheng Xu. 2025. CreativeSynth: Cross- Art-Attention for Artistic Image Synthesis With Multimodal Diffusion.IEEE Transactions on Visualization and Computer Graphics(2025)
work page 2025
-
[28]
Nisha Huang, Henglin Liu, Yizhou Lin, Kaer Huang, Chubin Chen, Jie Guo, Tong- yee Lee, and Xiu Li. 2025. MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision. 15117–15126
work page 2025
-
[29]
Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Weiming Dong, and Changsheng Xu. 2024. Diffstyler: Controllable dual diffusion for text- driven image stylization.IEEE Transactions on Neural Networks and Learning Systems(2024)
work page 2024
-
[30]
Unsplash Inc. 2025. Unsplash. https://unsplash.com/
work page 2025
-
[31]
Bingxin Ke, Anton Obukhov, Shengyu Huang, Nando Metzger, Rodrigo Caye Daudt, and Konrad Schindler. 2024. Repurposing Diffusion-Based Image Genera- tors for Monocular Depth Estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
work page 2024
- [32]
-
[33]
Black Forest Labs. 2024. FLUX. https://github.com/black-forest-labs/flux
work page 2024
-
[34]
Xiao Li, Yue Dong, Pieter Peers, and Xin Tong. 2017. Modeling surface appearance from a single photograph using self-augmented convolutional neural networks. ACM Transactions on Graphics (TOG)36, 4 (2017), 1–11
work page 2017
-
[35]
Xueting Li, Xiaolong Wang, Ming-Hsuan Yang, Alexei A Efros, and Sifei Liu. 2022. Scraping textures from natural images for synthesis and editing. InEuropean Conference on Computer Vision. Springer, 391–408
work page 2022
-
[36]
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. 2023. Flow Matching for Generative Modeling. InInternational Conference on Learning Representations
work page 2023
-
[37]
Xingchao Liu, Chengyue Gong, et al. 2023. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. InInternational Conference on Learning Representations
work page 2023
-
[38]
Ivan Lopes, Fabio Pizzati, and Raoul de Charette. 2024. Material palette: Extraction of materials from a single image. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4379–4388
work page 2024
-
[39]
Xiaohe Ma, Valentin Deschaintre, Miloš Hašan, Fujun Luan, Kun Zhou, Hongzhi Wu, and Yiwei Hu. 2025. MaterialPicker: Multi-Modal DiT-Based Material Gener- ation.ACM Transactions on Graphics (TOG)44, 4 (2025), 1–12
work page 2025
-
[40]
Xinyin Ma, Gongfan Fang, Michael Bi Mi, and Xinchao Wang. 2024. Learning-to- cache: Accelerating diffusion transformer via layer caching.Advances in Neural Information Processing Systems37 (2024), 133282–133304
work page 2024
-
[41]
Rosalie Martin, Arthur Roullier, Romain Rouffet, Adrien Kaiser, and Tamy Boubekeur. 2022. MaterIA: Single Image High-Resolution Material Capture in the Wild. InComputer Graphics Forum, Vol. 41. Wiley Online Library, 163–177
work page 2022
-
[42]
Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, and Ying Shan. 2024. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 4296–4304
work page 2024
-
[43]
Zexu Pan, Zhaojie Luo, Jichen Yang, and Haizhou Li. 2020. Multi-Modal Attention for Speech Emotion Recognition. InProceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). 364–368
work page 2020
-
[44]
William Peebles and Saining Xie. 2023. Scalable diffusion models with transform- ers. InProceedings of the IEEE/CVF International Conference on Computer Vision. 4195–4205
work page 2023
-
[45]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sand- hini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al
-
[46]
In International Conference on Machine Learning
Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748–8763
-
[47]
Elad Richardson, Gal Metzer, Yuval Alaluf, Raja Giryes, and Daniel Cohen-Or
-
[48]
InACM SIGGRAPH Conference Proceedings
Texture: Text-guided texturing of 3d shapes. InACM SIGGRAPH Conference Proceedings. 1–11
-
[49]
Carlos Rodriguez-Pardo, Henar Dominguez-Elvira, David Pascual-Hernandez, and Elena Garces. 2023. Umat: Uncertainty-aware single image high resolution material capture. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5764–5774
work page 2023
-
[50]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10684–10695
work page 2022
-
[51]
Amir Rosenberger, Daniel Cohen-Or, and Dani Lischinski. 2009. Layered shape synthesis: automatic generation of control maps for non-stationary textures.ACM Transactions on Graphics (TOG)28, 5 (2009), 1–9
work page 2009
-
[52]
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, and Kfir Aberman. 2023. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 22500–22510
work page 2023
-
[53]
Prafull Sharma, Varun Jampani, Yuanzhen Li, Xuhui Jia, Dmitry Lagun, Fredo Durand, Bill Freeman, and Mark Matthews. 2024. Alchemist: Parametric control of material properties with diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24130–24141
work page 2024
-
[54]
Jiaming Song, Chenlin Meng, and Stefano Ermon. 2021. Denoising Diffusion Implicit Models. InInternational Conference on Learning Representations
work page 2021
-
[55]
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu
-
[56]
Roformer: Enhanced transformer with rotary position embedding.Neuro- computing568 (2024), 127063
work page 2024
-
[57]
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang
-
[58]
In Proceedings of the IEEE/CVF International Conference on Computer Vision
Ominicontrol: Minimal and universal control for diffusion transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 14940– 14950
-
[59]
Giuseppe Vecchio, Simone Palazzo, and Concetto Spampinato. 2021. Surfacenet: Adversarial svbrdf estimation from a single image. InProceedings of the IEEE/CVF International Conference on Computer Vision. 12840–12848
work page 2021
-
[60]
Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. 2004. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing13, 4 (2004), 600–612
work page 2004
-
[61]
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, et al. 2024. Cache me if you can: Accelerating diffusion models through block caching. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 6211–6220
work page 2024
-
[62]
You Wu, Kean Liu, Xiaoyue Mi, Fan Tang, Juan Cao, and Jintao Li. 2024. U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmenta- tion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9482–9491
work page 2024
-
[63]
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. 2023. IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models.arXiv preprint arxiv:2308.06721(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[64]
Chih-Kuo Yeh, Zhanping Liu, I-Hsuan Lin, Eugene Zhang, and Tong-Yee Lee. 2020. WYSIWYG Design of Hypnotic Line Art.IEEE Transactions on Visualization and Computer Graphics28, 6 (2020), 2517–2529
work page 2020
-
[65]
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim, Lei Xiao, Thu Nguyen-Phuoc, Numair Khan, Cheng Zhang, Manmohan Chandraker, Carl S Marshall, Zhao Dong, et al
-
[66]
InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Texturedreamer: Image-guided texture synthesis through geometry-aware diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4304–4314
-
[67]
Lvmin Zhang. 2023. Controlnet-v1.1-depth. https://huggingface.co/lllyasviel/ control_v11f1p_sd15_depth
work page 2023
-
[68]
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. 2023. Adding conditional control to text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision. 3836–3847. , Vol. 1, No. 1, Article . Publication date: May 2026. DealMaTe: Multi-Dimensional Material Transfer via Diffusion Transformer•15
work page 2023
-
[69]
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang
-
[70]
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 586–595
-
[71]
Yuxin Zhang, Weiming Dong, Fan Tang, Nisha Huang, Haibin Huang, Chongyang Ma, Tong-Yee Lee, Oliver Deussen, and Changsheng Xu. 2023. Prospect: Prompt spectrum for attribute-aware personalization of diffusion models.ACM Transac- tions on Graphics (TOG)42, 6 (2023), 1–14. , Vol. 1, No. 1, Article . Publication date: May 2026
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.