Controllable Histopathology Image Synthesis with Training-free Structural Initialization and Textural Modulation
Pith reviewed 2026-06-29 05:07 UTC · model grok-4.3
The pith
CHIS lets pretrained diffusion models generate histopathology images that follow given structural masks while keeping reference tissue style.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CHIS guides the sampling trajectory of a pretrained diffusion model through structural initialization at the start and textural modulation during generation. The initial noise state is refined by fusing the phase information from a prior mask with the amplitude of Gaussian noise in the frequency domain, yielding a structurally informed starting point. During the reverse diffusion process, adaptive modulation of both coarse-grained and fine-grained textures occurs at different wavelet decomposition levels. This enables outputs that align with prior structural masks while preserving the reference tissue style.
What carries the argument
Frequency-domain fusion of mask phase with noise amplitude for initialization, followed by multi-level adaptive wavelet modulation of texture.
If this is right
- Images generated this way improve downstream segmentation performance on histopathology tasks.
- The same pretrained model works for multiple masks and styles without any fine-tuning or annotated training data.
- Structural control and texture control can be handled separately through the initialization and modulation stages.
- The framework acts as a plug-in that requires no changes to the underlying diffusion model weights.
Where Pith is reading between the lines
- The same initialization and modulation steps might transfer to diffusion models trained on other medical image types such as radiology scans.
- Testing the method with masks of varying complexity could show how much structural detail the frequency fusion reliably carries through sampling.
- The separation of phase-based structure from wavelet-based texture offers a route to add similar controls to other unconditional generative models.
Load-bearing premise
The phase-amplitude fusion produces an initial state whose structure survives the full reverse diffusion process and the wavelet steps do not distort that imposed structure.
What would settle it
Generate images with the phase-fused initialization and wavelet modulation applied, then measure whether the output tissue boundaries and region layouts match the input structural mask at the pixel level.
Figures
read the original abstract
Deep learning has demonstrated remarkable success in high-throughput histopathology image analysis. However, the performance of learning-based models critically depends on the quality and size of annotations by expert pathologists, which is a resource-intensive and time-consuming process. To address the limitations of data scarcity and annotation burden, several methods have been proposed to synthesize paired histopathology data. Nevertheless, these frameworks typically still require annotation data, albeit in reduced quantities, to impose structural constraints during training. In this work, we present CHIS, a plug-in framework that guides the sampling trajectory of a pretrained diffusion model through two key stages: structural initialization at the start and textural modulation during generation. The initial noise state is refined by fusing the phase information from a prior mask with the amplitude of Gaussian noise in the frequency domain, yielding a structurally informed starting point. During the reverse diffusion process, we adaptively modulate both coarse-grained and fine-grained textures at different wavelet decomposition levels. This enables a diffusion model pretrained solely on unlabeled images to generate outputs that align with prior structural masks while preserving the reference tissue style. We conducted extensive experiments demonstrating the superiority of CHIS in generation fidelity and its substantial benefits for downstream segmentation tasks. Code is available at https://github.com/IBIL-Code/CHIS.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CHIS, a training-free plug-in framework for controllable histopathology image synthesis. It modifies the sampling trajectory of a diffusion model pretrained only on unlabeled images via two stages: (1) structural initialization that fuses the phase spectrum of a provided mask with the amplitude of Gaussian noise in the frequency domain to create a structurally informed starting latent, and (2) textural modulation that adaptively adjusts coarse- and fine-grained textures at multiple wavelet decomposition levels during reverse diffusion. The central claim is that this produces outputs aligned with the input structural masks while preserving reference tissue style, yielding higher fidelity than prior methods and measurable gains on downstream segmentation tasks.
Significance. If the central mechanism holds, the work would offer a practical way to impose structural control on unconditionally pretrained diffusion models without any annotated training data or fine-tuning, addressing a key bottleneck in medical image synthesis. The training-free design and code release are clear strengths. However, the abstract provides no quantitative metrics, baselines, or ablation results, so the magnitude of any improvement and the reliability of the downstream benefits cannot yet be assessed.
major comments (2)
- [Abstract] Abstract: the assertion of 'superiority in generation fidelity and its substantial benefits for downstream segmentation tasks' is made without any reported metrics, baselines, ablation studies, or error analysis. This prevents evaluation of whether the claimed improvements are load-bearing for the contribution.
- [Method] Method (structural initialization): the claim that phase fusion of the mask with noise amplitude produces a latent whose structure survives the entire unconditional reverse diffusion trajectory is central, yet the skeptic note correctly identifies that no guidance signal is injected after t=T. No analysis, ablation on phase retention, or comparison against standard noise initialization is referenced to substantiate survival of the mask-derived structure.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below. Where the comments identify gaps in the current presentation, we will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the assertion of 'superiority in generation fidelity and its substantial benefits for downstream segmentation tasks' is made without any reported metrics, baselines, ablation studies, or error analysis. This prevents evaluation of whether the claimed improvements are load-bearing for the contribution.
Authors: We agree that the abstract should contain concrete quantitative support for its claims rather than relying solely on the statement that extensive experiments were performed. The full manuscript already contains the relevant metrics (FID, LPIPS, downstream Dice/IoU gains on segmentation, and comparisons to baselines), but these were not summarized in the abstract. In the revision we will insert the key numerical results and baseline comparisons into the abstract. revision: yes
-
Referee: [Method] Method (structural initialization): the claim that phase fusion of the mask with noise amplitude produces a latent whose structure survives the entire unconditional reverse diffusion trajectory is central, yet the skeptic note correctly identifies that no guidance signal is injected after t=T. No analysis, ablation on phase retention, or comparison against standard noise initialization is referenced to substantiate survival of the mask-derived structure.
Authors: The structural initialization is performed only at t=T and the subsequent trajectory is unconditional; therefore the survival of mask-derived structure must be demonstrated empirically rather than assumed. The current manuscript shows qualitative alignment and downstream task gains, but does not include an explicit ablation that isolates phase retention (e.g., Fourier-phase similarity across denoising steps) or a direct head-to-head comparison against standard Gaussian initialization. We will add this analysis and the corresponding ablation table in the revised version. revision: yes
Circularity Check
No significant circularity; procedural modifications are independent of inputs
full rationale
The paper describes a training-free plug-in method consisting of frequency-domain phase fusion for structural initialization and multi-level wavelet amplitude modulation for texture control. These are presented as direct procedural alterations to the sampling trajectory of an existing unconditionally pretrained diffusion model. No equations, parameters, or claims are shown to reduce by construction to fitted values, self-definitions, or load-bearing self-citations. The central claim rests on the independent effectiveness of these modifications rather than any renaming or circular justification.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Phase information in the frequency domain encodes structural layout of an image.
- standard math Wavelet decomposition separates coarse and fine textures at different scales.
Reference graph
Works this paper leans on
-
[1]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Bansal, A., Chu, H.M., Schwarzschild, A., Sengupta, S., Goldblum, M., Geiping, J., Goldstein, T.: Universal guidance for diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 843–852 (2023)
2023
-
[2]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Bhosale, M., Wasi, A., Zhai, Y., Tian, Y., Border, S., Xi, N., Sarder, P., Yuan, J., Doermann, D., Gong, X.: Pathdiff: Histopathology image synthesis with un- paired text and mask conditions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 22415–22424 (2025)
2025
-
[3]
In: 2022 IEEE 19th International sympo- sium on biomedical imaging (ISBI)
Butte, S., Wang, H., Xian, M., Vakanski, A.: Sharp-gan: Sharpness loss regularized gan for histopathology image synthesis. In: 2022 IEEE 19th International sympo- sium on biomedical imaging (ISBI). pp. 1–5. IEEE (2022)
2022
-
[4]
Improved Regularization of Convolutional Neural Networks with Cutout
DeVries, T., Taylor, G.W.: Improved regularization of convolutional neural net- works with cutout. arXiv preprint arXiv:1708.04552 (2017),https://arxiv.org/ abs/1708.04552
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
Nature medicine pp
Ding, T., Wagner, S.J., Song, A.H., Chen, R.J., Lu, M.Y., Zhang, A., Vaidya, A.J., Jaume,G.,Shaban,M.,Kim,A.,etal.:Amultimodalwhole-slidefoundationmodel for pathology. Nature medicine pp. 1–13 (2025)
2025
-
[6]
In: European Congress on Digital Pathology
Gamper, J., Koohbanani, N.A., Benet, K., Khuram, A., Rajpoot, N.: Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classifica- tion. In: European Congress on Digital Pathology. pp. 11–19. Springer (2019)
2019
-
[7]
IEEE Transactions on Medical Imaging 39(12), 4124–4136 (2020)
Graham, S., Epstein, D., Rajpoot, N.: Dense steerable filter cnns for exploiting rotational symmetry in histology images. IEEE Transactions on Medical Imaging 39(12), 4124–4136 (2020)
2020
-
[8]
Medical image analysis58, 101563 (2019)
Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.: Hover-net: Simultaneous segmentation and classification of nuclei in multi- tissue histology images. Medical image analysis58, 101563 (2019)
2019
-
[9]
Signal processing: Image communication13(3), 171–181 (1998)
Huang, W.C., Chang, L.W.: Predictive subband image coding with wavelet trans- form. Signal processing: Image communication13(3), 171–181 (1998)
1998
-
[10]
Nature methods18(2), 203–211 (2021)
Isensee, F., Jaeger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: nnu-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods18(2), 203–211 (2021)
2021
-
[11]
Journal of pathology informatics 7(1), 29 (2016)
Janowczyk,A.,Madabhushi,A.:Deeplearningfordigitalpathologyimageanalysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics 7(1), 29 (2016)
2016
-
[12]
IEEE transactions on medical imaging43(3), 980–993 (2023)
Li, Y., Shao, H.C., Liang, X., Chen, L., Li, R., Jiang, S., Wang, J., Zhang, Y.: Zero-shot medical image translation via frequency-guided diffusion models. IEEE transactions on medical imaging43(3), 980–993 (2023)
2023
-
[13]
IEEE Transactions on Medical Imaging42(12), 3524–3539 (2023)
Özbey, M., Dalmaz, O., Dar, S.U., Bedel, H.A., Özturk, Ş., Güngör, A., Cukur, T.: Unsupervised medical image translation with adversarial diffusion models. IEEE Transactions on Medical Imaging42(12), 3524–3539 (2023)
2023
-
[14]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021) 10 Qiu et al
2021
-
[15]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
2022
-
[16]
Advances in neural information processing systems29(2016)
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. Advances in neural information processing systems29(2016)
2016
-
[17]
Nature Reviews Bioengineering1(12), 930–949 (2023)
Song, A.H., Jaume, G., Williamson, D.F., Lu, M.Y., Vaidya, A., Miller, T.R., Mah- mood, F.: Artificial intelligence for digital and computational pathology. Nature Reviews Bioengineering1(12), 930–949 (2023)
2023
-
[18]
net/forum?id=PxTIG12RRHS
Song,J.,Meng,C.,Ermon,S.:Denoisingdiffusionimplicitmodels.In:International Conference on Learning Representations (ICLR) (2020),https://openreview. net/forum?id=PxTIG12RRHS
2020
-
[19]
Patterns1(6) (2020)
Tschuchnig,M.E.,Oostingh,G.J.,Gadermayr,M.:Generativeadversarialnetworks in digital pathology: a survey on trends and future potential. Patterns1(6) (2020)
2020
-
[20]
Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., Wu, X., Chen, H., Huang, Y., Wang, L., Jung, H., Brown, G.T., Liu, Y., Liu, S., Jahromi, S.A.F., Khani, A.A., Montahaei, E., Baghshah, M.S., Behroozi, H., Semkin, P., Rassadin, A., Dutande, P., Lodaya, R., Baid, U., Baheti, B., Talbar, S.,...
-
[21]
arXiv preprint arXiv:2207.00050 (2022)
Wang, W., Bao, J., Zhou, W., Chen, D., Chen, D., Yuan, L., Li, H.: Semantic image synthesis via diffusion models. arXiv preprint arXiv:2207.00050 (2022)
-
[22]
arXiv preprint arXiv:2506.05127 (2025)
Yellapragada, S., Graikos, A., Li, Z., Triaridis, K., Belagali, V., Kapse, S., Nandi, T.N., Madduri, R.K., Prasanna, P., Kurc, T., et al.: Pixcell: A generative foun- dation model for digital histopathology images. arXiv preprint arXiv:2506.05127 (2025)
-
[23]
In: International Conference on Med- ical Image Computing and Computer-Assisted Intervention
Yu, X., Li, G., Lou, W., Liu, S., Wan, X., Chen, Y., Li, H.: Diffusion-based data augmentation for nuclei image segmentation. In: International Conference on Med- ical Image Computing and Computer-Assisted Intervention. pp. 592–602. Springer (2023)
2023
-
[24]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: Cutmix: Regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 6023–6032 (2019).https://doi.org/10.1109/ICCV.2019.00612
-
[25]
arXiv preprint arXiv:2512.05106 (2025)
Zeng, Y., Ochoa, C., Zhou, M., Patel, V.M., Guizilini, V., McAllister, R.: Neuralre- master: Phase-preserving diffusion for structure-aligned generation. arXiv preprint arXiv:2512.05106 (2025)
-
[26]
In: Proceedings of the Computer Vision and Pattern Recogni- tion Conference
Zhang, Y., Liu, Z., Li, Z., Li, Z., Clark, J.J., Si, X.: Decoupling training-free guided diffusion by admm. In: Proceedings of the Computer Vision and Pattern Recogni- tion Conference. pp. 23292–23302 (2025)
2025
-
[27]
Medical image analysis p
Zhu, P., Liu, C., Fu, Y., Chen, N., Qiu, A.: Cycle-conditional diffusion model for noise correction of diffusion-weighted images using unpaired data. Medical image analysis p. 103579 (2025)
2025
-
[28]
arXiv preprint arXiv:2508.06625 (2025)
Zou, S., Huang, Y., Yi, R., Zhu, C., Xu, K.: Cyclediff: Cycle diffusion models for unpaired image-to-image translation. arXiv preprint arXiv:2508.06625 (2025)
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.