pith. sign in

arxiv: 2603.04870 · v2 · pith:6SD74NYRnew · submitted 2026-03-05 · 💻 cs.CV

Diffusion-Based sRGB Real Noise Generation via Prompt-Driven Noise Representation Learning

Pith reviewed 2026-05-21 11:57 UTC · model grok-4.3

classification 💻 cs.CV
keywords sRGB noise generationdiffusion modelsprompt learningreal noise synthesisimage denoisingcamera metadatagenerative modeling
0
0 comments X

The pith

A diffusion model learns prompt features from limited pairs to generate realistic sRGB noise without camera metadata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Prompt-Driven Noise Generation (PNG) framework that uses diffusion to synthesize realistic noisy sRGB images. It learns high-dimensional prompt features that capture real-world noise characteristics directly from available noisy-clean pairs. This approach targets the scarcity of such pairs and removes the need for camera metadata that previous generative methods require during training and testing. A sympathetic reader would care because successful noise synthesis from limited data could expand the training sets available for real-world denoising models across different devices.

Core claim

The PNG model acquires high-dimensional prompt features that capture the characteristics of real-world input noise and creates a variety of realistic noisy images consistent with the distribution of the input noise, eliminating the dependency on explicit camera metadata.

What carries the argument

High-dimensional prompt features learned by the PNG diffusion model from noisy-clean image pairs to represent and synthesize input noise distributions.

Load-bearing premise

High-dimensional prompt features learned from limited noisy-clean pairs can reliably capture and generalize the full distribution of real-world sRGB noise across unseen devices and conditions without camera metadata.

What would settle it

Generated noisy images that fail to match the noise statistics or visual appearance of real captures from a previously unseen camera device would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.04870 by Dongjin Kim, Guanghui Wang, Jaekyun Ko, Soomin Lee, Tae Hyun Kim.

Figure 1
Figure 1. Figure 1: Noise Generation Comparison. (a) Metadata-dependent approach. (b) Ours. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the proposed method. (a) Training pipeline. (b) Inference pipeline. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Sketch of the Prompt Autoencoder (PAE). (b) Details of Global and Local Prompt Blocks. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual comparison on denoising results with PSNR [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of synthetic noisy images on the SIDD [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
read the original abstract

Denoising in the sRGB image space is challenging due to large noise variability. Although end-to-end methods perform well, their effectiveness in real-world scenarios is limited by the scarcity of real noisy-clean image pairs, which are expensive and difficult to collect. To address this limitation, several generative methods have been developed to synthesize realistic noisy images from limited data. These approaches often rely on camera metadata during both training and testing to synthesize real-world noise. However, the lack of metadata or inconsistencies between devices restricts their usability. Therefore, we propose a novel framework called Prompt-Driven Noise Generation (PNG). This model is capable of acquiring high-dimensional prompt features that capture the characteristics of real-world input noise and creating a variety of realistic noisy images consistent with the distribution of the input noise. By eliminating the dependency on explicit camera metadata, our approach significantly enhances the generalizability and applicability of noise synthesis. Comprehensive experiments reveal that our model effectively produces realistic noisy images and show the successful application of these generated images in removing real-world noise across various benchmark datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the Prompt-Driven Noise Generation (PNG) framework, a diffusion-based model that learns high-dimensional prompt features directly from input noisy sRGB images to capture real-world noise characteristics and synthesize diverse realistic noisy images matching the input noise distribution. The central contribution is the elimination of explicit camera metadata during both training and inference, with claims of improved generalizability demonstrated via application to denoising benchmarks across multiple datasets.

Significance. If the generalization claims hold, the work would be significant for practical real-world denoising pipelines, as it removes a key practical barrier (metadata availability and device consistency) that limits prior generative noise synthesis methods. The prompt-driven approach to noise representation learning could enable more flexible use of limited noisy-clean pairs for training data augmentation in sRGB space.

major comments (2)
  1. [§3 and §4.2] §3 (method) and §4.2 (cross-dataset experiments): The claim of device-agnostic generalization without metadata is load-bearing, yet the reported results use benchmarks whose device distributions overlap with typical training collections; no ablation is described that trains on one set of devices and evaluates synthesis on completely disjoint unseen devices/conditions to isolate whether the learned prompts truly encode transferable statistics rather than sensor-specific patterns.
  2. [Tables 2-4] Tables 2-4: While performance on denoising benchmarks is asserted, the absence of an explicit metadata-free ablation (e.g., comparing PNG against metadata-dependent baselines when metadata is withheld at test time) leaves the central advantage unquantified relative to prior work.
minor comments (2)
  1. [Abstract] The abstract states that 'comprehensive experiments reveal...' but provides no numerical values, baselines, or error bars; moving a concise quantitative summary (e.g., PSNR/SSIM deltas on key datasets) into the abstract would improve readability.
  2. [§3.1] Notation for the prompt embedding dimension and its relation to the diffusion timestep conditioning should be clarified in §3.1 to avoid ambiguity when readers compare against standard diffusion conditioning schemes.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address the major comments point by point below and outline the revisions we will make to strengthen the evidence supporting our claims of metadata-free generalization.

read point-by-point responses
  1. Referee: [§3 and §4.2] §3 (method) and §4.2 (cross-dataset experiments): The claim of device-agnostic generalization without metadata is load-bearing, yet the reported results use benchmarks whose device distributions overlap with typical training collections; no ablation is described that trains on one set of devices and evaluates synthesis on completely disjoint unseen devices/conditions to isolate whether the learned prompts truly encode transferable statistics rather than sensor-specific patterns.

    Authors: We appreciate the referee pointing out this gap. Our cross-dataset experiments in §4.2 already span multiple real-world datasets collected under varying camera devices and imaging conditions, which provides some evidence of generalization. However, we agree that a dedicated ablation—training the model exclusively on images from one group of devices and evaluating noise synthesis performance on images from completely disjoint devices and conditions—would more rigorously isolate whether the prompt features capture transferable noise statistics. We will add this controlled ablation study to the revised manuscript. revision: yes

  2. Referee: [Tables 2-4] Tables 2-4: While performance on denoising benchmarks is asserted, the absence of an explicit metadata-free ablation (e.g., comparing PNG against metadata-dependent baselines when metadata is withheld at test time) leaves the central advantage unquantified relative to prior work.

    Authors: We concur that directly quantifying the practical benefit of our metadata-free approach requires an explicit comparison. We will add an ablation to Tables 2-4 (and associated text) in which metadata-dependent baseline methods are evaluated with metadata withheld at test time, while PNG operates without any metadata. This will allow a head-to-head quantification of the advantage on the denoising benchmarks. revision: yes

Circularity Check

0 steps flagged

No circularity; standard data-driven generative modeling from observed pairs

full rationale

The abstract and method description present a diffusion model that learns high-dimensional prompt embeddings directly from limited noisy-clean image pairs to match and synthesize noise distributions. This is a conventional supervised generative setup with no quoted equations or steps that reduce a claimed prediction back to its own fitted inputs by construction. No self-citation load-bearing uniqueness theorems, ansatz smuggling, or renaming of known results appear in the provided text. Generalization to unseen devices is asserted empirically rather than derived tautologically, leaving the central claim self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The approach rests on standard assumptions in diffusion modeling and prompt learning for generative tasks. Numerous learned parameters are expected in the neural network components, but specific free parameters are not detailed in the abstract.

free parameters (1)
  • prompt feature dimensionality
    High-dimensional prompt features are learned to capture noise characteristics; the exact dimension and related hyperparameters are fitted during training.
axioms (1)
  • domain assumption Diffusion processes can model the distribution of real sRGB noise when conditioned on learned prompts
    The framework builds the generation process on diffusion models conditioned via prompts.
invented entities (1)
  • Prompt-Driven Noise Generation (PNG) framework no independent evidence
    purpose: To synthesize realistic noisy images without camera metadata by learning prompt features
    New model introduced to address limitations of metadata-dependent methods.

pith-pipeline@v0.9.0 · 5719 in / 1348 out tokens · 59512 ms · 2026-05-21T11:57:14.638669+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · 4 internal anchors

  1. [1]

    A high-quality denoising dataset for smartphone cameras

    Abdelrahman Abdelhamed, Stephen Lin, and Michael S Brown. A high-quality denoising dataset for smartphone cameras. InCVPR, 2018. 1, 6

  2. [2]

    Noise flow: Noise modeling with con- ditional normalizing flows

    Abdelrahman Abdelhamed, Marcus A Brubaker, and Michael S Brown. Noise flow: Noise modeling with con- ditional normalizing flows. InICCV, 2019. 1, 2, 5

  3. [3]

    Ntire 2020 challenge on real image denoising: Dataset, methods and results

    Abdelrahman Abdelhamed, Mahmoud Afifi, Radu Timofte, and Michael S Brown. Ntire 2020 challenge on real image denoising: Dataset, methods and results. InCVPR, 2020. 1, 6

  4. [4]

    Language models are few-shot learners.NeurIPS, 2020

    Tom Brown, Benjamin Mann, Nick Ryder, Melanie Sub- biah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.NeurIPS, 2020. 2

  5. [5]

    Toward real-world single image super-resolution: A new benchmark and a new model

    Jianrui Cai, Hui Zeng, Hongwei Yong, Zisheng Cao, and Lei Zhang. Toward real-world single image super-resolution: A new benchmark and a new model. InICCV, 2019. 7

  6. [6]

    Hinet: Half instance normalization network for image restoration

    Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, and Cheng- peng Chen. Hinet: Half instance normalization network for image restoration. InCVPR, 2021. 1

  7. [7]

    Simple baselines for image restoration

    Liangyu Chen, Xiaojie Chu, Xiangyu Zhang, and Jian Sun. Simple baselines for image restoration. InECCV, 2022. 1

  8. [8]

    Masked and shuffled blind spot denoising for real-world images

    Hamadi Chihaoui and Paolo Favaro. Masked and shuffled blind spot denoising for real-world images. InCVPR, 2024. 2

  9. [9]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InCVPR, 2009. 5, 6

  10. [10]

    Scaling rectified flow transformers for high-resolution image synthesis

    Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim En- tezari, Jonas M¨uller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow transformers for high-resolution image synthesis. InICML,

  11. [11]

    srgb real noise synthesizing with neighboring correlation-aware noise model

    Zixuan Fu, Lanqing Guo, and Bihan Wen. srgb real noise synthesizing with neighboring correlation-aware noise model. InCVPR, 2023. 1, 2, 5, 6, 3

  12. [12]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InCVPR, 2016. 4

  13. [13]

    Denoising diffu- sion probabilistic models.NeurIPS, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.NeurIPS, 2020. 3

  14. [14]

    Estimation of non- normalized statistical models by score matching.JMLR, 2005

    Aapo Hyv ¨arinen and Peter Dayan. Estimation of non- normalized statistical models by score matching.JMLR, 2005. 3

  15. [15]

    Fast camera image denoising on mobile gpus with deep learning, mobile ai 2021 challenge: Report

    Andrey Ignatov, Kim Byeoung-su, Radu Timofte, and Ange- line Pouget. Fast camera image denoising on mobile gpus with deep learning, mobile ai 2021 challenge: Report. In CVPRW, 2021. 7

  16. [16]

    C2n: Practical generative noise modeling for real-world denoising

    Geonwoon Jang, Wooseok Lee, Sanghyun Son, and Ky- oung Mu Lee. C2n: Practical generative noise modeling for real-world denoising. InICCV, 2021. 2, 6

  17. [17]

    Progressive growing of gans for improved quality, stability, and variation

    Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. Progressive growing of gans for improved quality, stability, and variation. InICLR, 2018. 2

  18. [18]

    Elucidating the design space of diffusion-based generative models.NeurIPS, 2022

    Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models.NeurIPS, 2022. 3, 2

  19. [19]

    Analyzing and improving the training dynamics of diffusion models

    Tero Karras, Miika Aittala, Jaakko Lehtinen, Janne Hellsten, Timo Aila, and Samuli Laine. Analyzing and improving the training dynamics of diffusion models. InCVPR, 2024. 2, 3

  20. [20]

    srgb real noise modeling via noise-aware sampling with normalizing flows

    Dongjin Kim, Donggoo Jung, Sungyong Baik, and Tae Hyun Kim. srgb real noise modeling via noise-aware sampling with normalizing flows. InICLR, 2024. 1, 2, 5, 6, 3

  21. [21]

    Idf: Iterative dynamic filtering networks for generalizable image denoising

    Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali, and Tae Hyun Kim. Idf: Iterative dynamic filtering networks for generalizable image denoising. InICCV, 2025. 1

  22. [22]

    Continuous degradation modeling via latent flow matching for real-world super-resolution

    Hyeonjae Kim, Dongjin Kim, Eugene Jin, and Tae Hyun Kim. Continuous degradation modeling via latent flow matching for real-world super-resolution. InAAAI, 2026. 1, 7

  23. [23]

    Variational diffusion models.NeurIPS, 2021

    Diederik Kingma, Tim Salimans, Ben Poole, and Jonathan Ho. Variational diffusion models.NeurIPS, 2021. 3

  24. [24]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

  25. [25]

    Act-diffusion: Efficient adversarial consistency training for one-step diffusion models

    Fei Kong, Jinhao Duan, Lichao Sun, Hao Cheng, Renjing Xu, Hengtao Shen, Xiaofeng Zhu, Xiaoshuang Shi, and Kaidi Xu. Act-diffusion: Efficient adversarial consistency training for one-step diffusion models. InCVPR, 2024. 3

  26. [26]

    Modeling srgb camera noise with normalizing flows

    Shayan Kousha, Ali Maleky, Michael S Brown, and Marcus A Brubaker. Modeling srgb camera noise with normalizing flows. InCVPR, 2022. 1, 2, 6, 5

  27. [27]

    Ap- bsn: Self-supervised denoising for real-world images via asymmetric pd and blind-spot network

    Wooseok Lee, Sanghyun Son, and Kyoung Mu Lee. Ap- bsn: Self-supervised denoising for real-world images via asymmetric pd and blind-spot network. InCVPR, 2022. 2

  28. [28]

    Promptcir: blind compressed image restoration with prompt learning

    Bingchen Li, Xin Li, Yiting Lu, Ruoyu Feng, Mengxi Guo, Shijie Zhao, Li Zhang, and Zhibo Chen. Promptcir: blind compressed image restoration with prompt learning. In CVPRW, 2024. 2

  29. [29]

    Ucip: A universal framework for compressed image super-resolution using dynamic prompt

    Xin Li, Bingchen Li, Yeying Jin, Cuiling Lan, Hanxin Zhu, Yulin Ren, and Zhibo Chen. Ucip: A universal framework for compressed image super-resolution using dynamic prompt. InECCV, 2024

  30. [30]

    Prompt-in-prompt learning for universal image restoration.arXiv preprint arXiv:2312.05038, 2023

    Zilong Li, Yiming Lei, Chenglong Ma, Junping Zhang, and Hongming Shan. Prompt-in-prompt learning for universal image restoration.arXiv preprint arXiv:2312.05038, 2023. 2

  31. [31]

    Diffbir: Toward blind image restoration with generative diffusion prior

    Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Yu Qiao, Wanli Ouyang, and Chao Dong. Diffbir: Toward blind image restoration with generative diffusion prior. InECCV, 2024. 3

  32. [32]

    On the variance of the adaptive learning rate and beyond

    Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, and Jiawei Han. On the variance of the adaptive learning rate and beyond. InICLR, 2020. 5

  33. [33]

    SGDR: Stochastic Gradient Descent with Warm Restarts

    Ilya Loshchilov and Frank Hutter. Sgdr: Stochastic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983,

  34. [34]

    Cosine normalization: Using cosine simi- larity instead of dot product in neural networks

    Chunjie Luo, Jianfeng Zhan, Xiaohe Xue, Lei Wang, Rui Ren, and Qiang Yang. Cosine normalization: Using cosine simi- larity instead of dot product in neural networks. InICANN,

  35. [35]

    Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

    Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high- resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023. 2, 3

  36. [36]

    A holistic approach to cross-channel image noise modeling and its application to image denoising

    Seonghyeon Nam, Youngbae Hwang, Yasuyuki Matsushita, and Seon Joo Kim. A holistic approach to cross-channel image noise modeling and its application to image denoising. InCVPR, 2016. 1, 6

  37. [37]

    Random sub-samples generation for self- supervised real image denoising

    Yizhong Pan, Xiao Liu, Xiangyu Liao, Yuanzhouhan Cao, and Chao Ren. Random sub-samples generation for self- supervised real image denoising. InICCV, 2023. 2

  38. [38]

    Learning controllable degradation for real-world super- resolution via constrained flows

    Seobin Park, Dongjin Kim, Sungyong Baik, and Tae Hyun Kim. Learning controllable degradation for real-world super- resolution via constrained flows. InICML, 2023. 1, 7

  39. [39]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, 2023. 2, 5, 1, 3

  40. [40]

    Film: Visual reasoning with a general conditioning layer

    Ethan Perez, Florian Strub, Harm De Vries, Vincent Du- moulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018. 1

  41. [41]

    Benchmarking denoising algo- rithms with real photographs

    Tobias Plotz and Stefan Roth. Benchmarking denoising algo- rithms with real photographs. InCVPR, 2017. 1

  42. [42]

    PromptIR: Prompting for all-in-one image restoration

    Vaishnav Potlapalli, Syed Waqas Zamir, Salman Khan, and Fahad Khan. PromptIR: Prompting for all-in-one image restoration. InNeurIPS, 2023. 2

  43. [43]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 3

  44. [44]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 5

  45. [45]

    U-net: Convolutional networks for biomedical image segmentation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InMICCAI, 2015. 3

  46. [46]

    Exploiting cloze-questions for few-shot text classification and natural language inference

    Timo Schick and Hinrich Sch¨utze. Exploiting cloze-questions for few-shot text classification and natural language inference. InEACL, 2021. 2

  47. [47]

    Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network

    Wenzhe Shi, Jose Caballero, Ferenc Husz´ar, Johannes Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In CVPR, 2016. 5

  48. [48]

    Logan IV , Eric Wal- lace, and Sameer Singh

    Taylor Shin, Yasaman Razeghi, Robert L. Logan IV , Eric Wal- lace, and Sameer Singh. AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. InEMNLP, 2020. 2

  49. [49]

    Deep unsupervised learning using nonequilibrium thermodynamics

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. InICML, 2015. 3

  50. [50]

    Improved techniques for training consistency models

    Yang Song and Prafulla Dhariwal. Improved techniques for training consistency models. InICLR, 2024. 3, 5, 2

  51. [51]

    Generative modeling by estimating gradients of the data distribution.NeurIPS, 2019

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.NeurIPS, 2019. 3

  52. [52]

    Improved techniques for training score-based generative models.NeurIPS, 2020

    Yang Song and Stefano Ermon. Improved techniques for training score-based generative models.NeurIPS, 2020

  53. [53]

    Score-based generative modeling through stochastic differential equations

    Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR, 2021. 3

  54. [54]

    Consistency models

    Yang Song, Prafulla Dhariwal, Mark Chen, and Ilya Sutskever. Consistency models. InICML, 2023. 2, 3

  55. [55]

    Attention is all you need.NeurIPS, 2017

    A Vaswani. Attention is all you need.NeurIPS, 2017. 1

  56. [56]

    Promptre- storer: A prompting image restoration method with degrada- tion perception.NeurIPS, 2024

    Cong Wang, Jinshan Pan, Wei Wang, Jiangxin Dong, Mengzhu Wang, Yakun Ju, and Junyang Chen. Promptre- storer: A prompting image restoration method with degrada- tion perception.NeurIPS, 2024. 2

  57. [57]

    Promptrr: Diffusion models as prompt generators for single image reflection removal.arXiv preprint arXiv:2402.02374,

    Tao Wang, Wanglong Lu, Kaihao Zhang, Wenhan Luo, Tae- Kyun Kim, Tong Lu, Hongdong Li, and Ming-Hsuan Yang. Promptrr: Diffusion models as prompt generators for single image reflection removal.arXiv preprint arXiv:2402.02374,

  58. [58]

    Real-esrgan: Training real-world blind super-resolution with pure synthetic data

    Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. InICCV, 2021. 7

  59. [59]

    Lg-bpn: Local and global blind-patch network for self-supervised real- world denoising

    Zichun Wang, Ying Fu, Ji Liu, and Yulun Zhang. Lg-bpn: Local and global blind-patch network for self-supervised real- world denoising. InCVPR, 2023. 2, 4

  60. [60]

    Realistic noise synthesis with diffusion models.arXiv preprint arXiv:2305.14022, 2023

    Qi Wu, Mingyan Han, Ting Jiang, Haoqiang Fan, Bing Zeng, and Shuaicheng Liu. Realistic noise synthesis with diffusion models.arXiv preprint arXiv:2305.14022, 2023. 2

  61. [61]

    One-step effective diffusion network for real-world image super-resolution.NeurIPS, 2024

    Rongyuan Wu, Lingchen Sun, Zhiyuan Ma, and Lei Zhang. One-step effective diffusion network for real-world image super-resolution.NeurIPS, 2024. 3

  62. [62]

    Seesr: Towards semantics-aware real-world image super-resolution

    Rongyuan Wu, Tao Yang, Lingchen Sun, Zhengqiang Zhang, Shuai Li, and Lei Zhang. Seesr: Towards semantics-aware real-world image super-resolution. InCVPR, 2024. 3

  63. [63]

    Freprompter: Frequency self-prompt for all-in-one image restoration.Pattern Recognition, 2025

    Zhijian Wu, Wenhui Liu, Jingchao Wang, Jun Li, and Dingjiang Huang. Freprompter: Frequency self-prompt for all-in-one image restoration.Pattern Recognition, 2025. 2

  64. [64]

    Diffir: Efficient diffusion model for image restoration

    Bin Xia, Yulun Zhang, Shiyin Wang, Yitong Wang, Xinglong Wu, Yapeng Tian, Wenming Yang, and Luc Van Gool. Diffir: Efficient diffusion model for image restoration. InICCV, pages 13095–13105, 2023. 3

  65. [65]

    Real-world Noisy Image Denoising: A New Benchmark

    Jun Xu, Hui Li, Zhetong Liang, David Zhang, and Lei Zhang. Real-world noisy image denoising: A new benchmark.arXiv preprint arXiv:1804.02603, 2018. 1, 6

  66. [66]

    Synthesizing realistic image restoration training pairs: A diffusion approach.arXiv preprint arXiv:2303.06994, 2023

    Tao Yang, Peiran Ren, Lei Zhang, et al. Synthesizing realistic image restoration training pairs: A diffusion approach.arXiv preprint arXiv:2303.06994, 2023. 7

  67. [67]

    Dual adversarial network: Toward real-world noise removal and noise generation

    Zongsheng Yue, Qian Zhao, Lei Zhang, and Deyu Meng. Dual adversarial network: Toward real-world noise removal and noise generation. InECCV, 2020. 6

  68. [68]

    Cycleisp: Real image restoration via improved data synthesis

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Cycleisp: Real image restoration via improved data synthesis. InCVPR, 2020. 1, 2

  69. [69]

    Learning enriched features for real image restoration and enhancement

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Shao. Learning enriched features for real image restoration and enhancement. InECCV, 2020. 1

  70. [70]

    Restormer: Efficient transformer for high-resolution image restoration

    Syed Waqas Zamir, Aditya Arora, Salman Khan, Mu- nawar Hayat, Fahad Shahbaz Khan, and Ming-Hsuan Yang. Restormer: Efficient transformer for high-resolution image restoration. InCVPR, 2022. 1

  71. [71]

    Mm-bsn: Self-supervised image denoising for real-world with multi-mask based on blind-spot network

    Dan Zhang, Fangfang Zhou, Yuwen Jiang, and Zhengming Fu. Mm-bsn: Self-supervised image denoising for real-world with multi-mask based on blind-spot network. InCVPRW,

  72. [72]

    Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE TIP, 2017

    Kai Zhang, Wangmeng Zuo, Yunjin Chen, Deyu Meng, and Lei Zhang. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising.IEEE TIP, 2017. 5

  73. [73]

    Learning to prompt for vision-language models.IJCV,

    Kaiyang Zhou, Jingkang Yang, Chen Change Loy, and Ziwei Liu. Learning to prompt for vision-language models.IJCV,

  74. [74]

    Seg- prompt: Boosting open-world segmentation via category- level prompt learning

    Muzhi Zhu, Hengtao Li, Hao Chen, Chengxiang Fan, Weian Mao, Chenchen Jing, Yifan Liu, and Chunhua Shen. Seg- prompt: Boosting open-world segmentation via category- level prompt learning. InICCV, 2023. 2

  75. [75]

    Iterative denoiser and noise estimator for self-supervised image denoising

    Yunhao Zou, Chenggang Yan, and Ying Fu. Iterative denoiser and noise estimator for self-supervised image denoising. In ICCV, 2023. 2 Diffusion-Based sRGB Real Noise Generation via Prompt-Driven Noise Representation Learning Supplementary Material Contents

  76. [76]

    Preliminaries

    Proposed Method 3 3.1. Preliminaries . . . . . . . . . . . . . . . . . 3 3.2. Overall Flow: PNG . . . . . . . . . . . . . . 3 3.3. Prompt Autoencoder . . . . . . . . . . . . . 4 3.3.1 . Prompt Encoder . . . . . . . . . . . 4 3.3.2 . Decoder . . . . . . . . . . . . . . . 5 3.4. Prompt DiT (P-DiT) . . . . . . . . . . . . . 5

  77. [77]

    Experimental Setup

    Experiments 5 4.1. Experimental Setup . . . . . . . . . . . . . . 5 4.2. Real-World sRGB Noise Generation and Re- moval . . . . . . . . . . . . . . . . . . . . 6 4.3. Application: Metadata-Free Noise Generation 7 4.4. Ablation Study . . . . . . . . . . . . . . . . 8

  78. [78]

    Supplementary Material 1 S.1

    Conclusion 8 S. Supplementary Material 1 S.1. Prompt DiT . . . . . . . . . . . . . . . . . . 1 S.2. Training details of P-DiT . . . . . . . . . . . 2 S.2.1 . CM Parameterization . . . . . . . . 2 S.2.2 . CM Hyperparamters . . . . . . . . . 2 S.2.3 . Latent Code Normalization . . . . . 3 S.2.4 . P-DiT Hyperparamters . . . . . . . . 3 S.3. Model Size and In...