pith. machine review for the scientific record. sign in

arxiv: 2605.01848 · v1 · submitted 2026-05-03 · 💻 cs.CV · cs.AI

Recognition: unknown

Disentangled Anatomy-Disease Diffusion (DADD) for Controllable Ulcerative Colitis Progression Synthesis

Authors on Pith no claims yet

Pith reviewed 2026-05-10 14:59 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords ulcerative colitisdiffusion modelsimage synthesisdisentangled representationsMayo endoscopic scoremedical image generationordinal regressionlatent diffusion
0
0 comments X

The pith

A diffusion model disentangles patient anatomy from disease severity to synthesize controllable ulcerative colitis endoscopy images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Disentangled Anatomy-Disease Diffusion (DADD) to generate images of ulcerative colitis at specific stages of progression while keeping the patient's unique anatomical features. It conditions a latent diffusion model on separate embeddings for anatomy and ordinal disease severity, using a Feature Purifier to remove disease signals from the anatomy representation. The model employs Triple-Pathway Cross-Attention to handle different levels of detail and Delta Steering for controlling the disease transition in a single pass. This results in high-quality synthetic images that can rebalance class distributions and improve classification performance on real data.

Core claim

By training a Feature Purifier to suppress disease-correlated channels in anatomy embeddings from a pretrained encoder and combining them with target disease tokens from an ordinal embedder via resolution-aware cross-attention in a diffusion U-Net, along with a directional Delta Steering vector, the framework can produce high-fidelity endoscopy images at any Mayo score while preserving patient-specific structure.

What carries the argument

The Feature Purifier, a cross-attention-based erasure mechanism that identifies and suppresses disease-correlated channels in anatomy embeddings, combined with the Triple-Pathway Cross-Attention for injecting cleaned anatomy and disease tokens at different resolutions.

If this is right

  • Generates high-fidelity images across all severity levels of the Mayo Endoscopic Score.
  • Rebalances skewed class distributions in training datasets for ulcerative colitis.
  • Enhances performance of downstream classification tasks on real images.
  • Enables training-free control over disease progression direction using ordinal embeddings.
  • Supports synthesis of longitudinal sequences with consistent patient anatomy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such controllable synthesis could allow creation of virtual patient cohorts for testing new treatments without additional real data collection.
  • Extending the disentanglement to other imaging modalities or diseases with ordinal progression might improve model robustness in medical AI.
  • The approach highlights the value of separating structural and pathological features in generative models for medical applications.

Load-bearing premise

The Feature Purifier can reliably suppress disease-correlated channels in the anatomy embeddings without discarding essential structural information.

What would settle it

Observe whether images generated for the same patient anatomy but different target disease severities show the expected changes in mucosal patterns and vascularity while keeping the same underlying colon structure, as verified by expert radiologists or quantitative metrics.

Figures

Figures reproduced from arXiv: 2605.01848 by Alptekin Temizel, Umut Dundar.

Figure 1
Figure 1. Figure 1 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Framework overview. The input image I is mapped into the latent space by the frozen encoder of the Variational Autoencoder (VAE) [28]. Simultaneously, a frozen CLIP image encoder (ViT-L/14) [27] extracts feature tokens from I, which are then compressed into N anatomy tokens via a trainable Perceiver Resampler [15]. The AOE [18] and Projection Module encode the target MES as a cumulative ordinal embedding i… view at source ↗
Figure 3
Figure 3. Figure 3: Triple Pathway Cross-Attention: All pathways share query Q = WQ h from U-Net hidden states h. The anatomy pathway applies pretrained WK, WV to purified tokens eclean. The disease pathway applies separate bias-free projections Wd K, Wd V (warm-started from WK, WV ) to AOE tokens e t aoe. The delta pathway (inference only) processes the ordinal difference ∆ using these same bias-free projections, ensuring ∆=… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison: IP-AOE vs. DADD-H. Each row shows a source image (MES 0-3, input at left) generated at four target MES levels by the IP-AOE and DADD-H [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Steering scale selection. A MES 0 source image gen￾erated by DADD-H at various steering scales across target MES levels. At λsteer=3, the model produces clearly distinguishable severity levels while preserving anatomical consistency. Lower scales under-modulate and higher scales introduce artifacts. modulation by increasing the steering scale (λsteer > 3) does not recover the missing pathological textures,… view at source ↗
Figure 6
Figure 6. Figure 6 [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Synthesizing longitudinal medical images at controllable disease stages while preserving patient-specific anatomy is hindered by the entanglement of pathological textures and structural features. We address this challenge for ulcerative colitis (UC) endoscopy, where severity follows a continuous ordinal progression along the Mayo Endoscopic Score (MES). Our framework, Disentangled Anatomy-Disease Diffusion (DADD), conditions a latent diffusion model on two complementary embeddings: a pretrained image encoder for patient anatomy and a separately trained ordinal embedder for cumulative disease severity. Since image embeddings inevitably capture disease information, we introduce a Feature Purifier, a cross-attention-based erasure mechanism that identifies and suppresses disease-correlated channels, yielding purified anatomical representations. These cleaned anatomy tokens and target disease tokens are injected into the denoising network via a Triple-Pathway Cross-Attention mechanism with resolution-dependent routing gates. This architecture leverages the U-Net hierarchy, in which different network depths encode global structure versus fine-grained pathological texture. Furthermore, we introduce Delta Steering, a training-free directional signal derived from the ordinal embeddings that enables explicit, single-pass control over disease transitions at inference without requiring additional forward passes. Validated on the LIMUC dataset, our approach produces high-fidelity images across all severity levels and effectively rebalances skewed class distributions, enhancing performance for downstream classification tasks. The dataset is available at zenodo.org/records/5827695 and the code base at github.com/umutdundar99/progressive-stable-diffusion

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes Disentangled Anatomy-Disease Diffusion (DADD), a latent diffusion framework for synthesizing ulcerative colitis endoscopy images at controllable Mayo Endoscopic Score (MES) severity levels while preserving patient-specific anatomy. It conditions the model on anatomy embeddings from a pretrained image encoder that are purified of disease signals via a cross-attention Feature Purifier, combined with ordinal disease embeddings from a separately trained embedder. These are injected through a Triple-Pathway Cross-Attention mechanism with resolution-dependent routing, and Delta Steering provides training-free directional control over disease transitions at inference. Validation on the LIMUC dataset is claimed to yield high-fidelity images across severity levels and to improve downstream classification by rebalancing skewed distributions.

Significance. If the core disentanglement holds, the work could meaningfully advance controllable medical image synthesis for ordinal disease progression, particularly for data augmentation in imbalanced endoscopic datasets. The training-free Delta Steering mechanism offers a practical advantage for inference-time control, and the overall architecture leverages U-Net hierarchy in a structured way. Reproducible code and dataset links are provided, which strengthens potential impact if quantitative support is added.

major comments (3)
  1. [Methods (Feature Purifier and ordinal embedder subsections)] The central claim of effective disentanglement rests on the Feature Purifier reliably suppressing disease-correlated channels in the anatomy embeddings without discarding essential structural information, yet no quantitative verification is provided (e.g., disease classification accuracy on purified vs. unpurified embeddings, or structural similarity metrics when disease level is held fixed). This is load-bearing for the architecture and downstream claims.
  2. [Experiments and Results] No ablation studies or baseline comparisons are described for the Triple-Pathway Cross-Attention, resolution-dependent routing gates, or Delta Steering; without these, it is unclear whether the reported high-fidelity synthesis and rebalancing benefits are attributable to the proposed components rather than the base diffusion model.
  3. [Methods (ordinal embedder)] The training objective and fitting procedure for the separately trained ordinal embedder are not specified, leaving open whether it accurately captures cumulative MES severity or introduces its own biases into the conditioning.
minor comments (2)
  1. The abstract states validation results but the full manuscript should include explicit quantitative metrics, error analysis, and tables comparing against standard conditional diffusion baselines to support the claims.
  2. Notation for the cross-attention operations and routing gates could be clarified with explicit equations to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the validation of our disentanglement claims and component contributions. We address each major comment below and will revise the manuscript to incorporate the suggested additions.

read point-by-point responses
  1. Referee: [Methods (Feature Purifier and ordinal embedder subsections)] The central claim of effective disentanglement rests on the Feature Purifier reliably suppressing disease-correlated channels in the anatomy embeddings without discarding essential structural information, yet no quantitative verification is provided (e.g., disease classification accuracy on purified vs. unpurified embeddings, or structural similarity metrics when disease level is held fixed). This is load-bearing for the architecture and downstream claims.

    Authors: We agree that quantitative verification of the Feature Purifier is essential to support the disentanglement claim. In the revised manuscript, we will add experiments reporting disease classification accuracy on purified versus unpurified anatomy embeddings (using a held-out classifier) to demonstrate suppression of disease signals. We will also include structural similarity metrics (e.g., SSIM and LPIPS) computed on images generated at fixed disease severity but varying patient anatomies to confirm preservation of structural information. These results will be presented in the Experiments section with corresponding tables and analysis. revision: yes

  2. Referee: [Experiments and Results] No ablation studies or baseline comparisons are described for the Triple-Pathway Cross-Attention, resolution-dependent routing gates, or Delta Steering; without these, it is unclear whether the reported high-fidelity synthesis and rebalancing benefits are attributable to the proposed components rather than the base diffusion model.

    Authors: We acknowledge that ablations are necessary to isolate the contributions of the proposed components. While the current manuscript reports overall performance, we will add comprehensive ablation studies in the revision. These will include: (i) the model with and without Triple-Pathway Cross-Attention, (ii) resolution-dependent routing gates versus uniform routing, and (iii) with and without Delta Steering. We will report quantitative metrics including FID for synthesis quality and downstream classification accuracy improvements to demonstrate the incremental benefits of each element over the base latent diffusion model. revision: yes

  3. Referee: [Methods (ordinal embedder)] The training objective and fitting procedure for the separately trained ordinal embedder are not specified, leaving open whether it accurately captures cumulative MES severity or introduces its own biases into the conditioning.

    Authors: We will expand the Methods section to fully specify the ordinal embedder. It is trained using an ordinal regression objective that enforces monotonic ordering of embeddings across MES levels 0-3, combined with a contrastive term to separate severity clusters. The fitting procedure involves pretraining on the LIMUC dataset with cross-validation to ensure ordinal consistency. We will include the exact loss formulation, architecture details, hyperparameters, and validation metrics (e.g., ranking accuracy) to clarify how it models cumulative disease progression without introducing unintended biases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the DADD derivation chain

full rationale

The paper presents a method that conditions a latent diffusion model on a pretrained image encoder for anatomy and a separately trained ordinal embedder for disease severity, with a cross-attention Feature Purifier and Triple-Pathway mechanism. No equations, derivations, or claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations. The approach relies on standard diffusion training, external pretrained components, and validation on the LIMUC dataset rather than internal redefinitions or predictions that are statistically forced by the inputs. The central claims about high-fidelity synthesis and class rebalancing are presented as empirical outcomes, not tautological results.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review limited to abstract; no explicit free parameters, axioms, or invented entities are detailed beyond standard diffusion assumptions and the domain premise that UC severity is ordinal and cumulative.

axioms (1)
  • domain assumption Ulcerative colitis severity follows a continuous ordinal progression along the Mayo Endoscopic Score
    Invoked to justify conditioning the model on cumulative disease severity tokens.

pith-pipeline@v0.9.0 · 5567 in / 1295 out tokens · 66302 ms · 2026-05-10T14:59:10.785445+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 8 canonical work pages · 4 internal anchors

  1. [1]

    Diffinfinite: large mask-image synthesis via parallel random patch diffu- sion in histopathology

    Marco Aversa, Gabriel Nobis, Miriam H ¨agele, Kai Stand- voss, Mihaela Chirica, Roderick Murray-Smith, Ahmed Alaa, 8 Lukas Ruff, Daniela Ivanova, Wojciech Samek, Frederick Klauschen, Bruno Sanguinetti, and Luis Oala. Diffinfinite: large mask-image synthesis via parallel random patch diffu- sion in histopathology. InProceedings of the 37th Interna- tional ...

  2. [2]

    LEACE: Perfect linear concept erasure in closed form , shorttitle =

    Nora Belrose, David Schneider-Joseph, Shauli Ravfogel, Ryan Cotterell, Edward Raff, and Stella Biderman. Leace: Perfect linear concept erasure in closed form.ArXiv, abs/2306.03819, 2023. 3

  3. [3]

    Ulcerative Colitis Mayo Endo- scopic Scoring Classification with Active Learning and Gen- erative Data Augmentation

    Umit Mert Caglar, Alperen Inci, Oguz Hanoglu, Gorkem Polat, and Alptekin Temizel. Ulcerative Colitis Mayo Endo- scopic Scoring Classification with Active Learning and Gen- erative Data Augmentation . In2023 IEEE International Con- ference on Bioinformatics and Biomedicine (BIBM), pages 462–467, Los Alamitos, CA, USA, 2023. IEEE Computer Society. 5

  4. [4]

    Roentgen: vision-language foundation model for chest x-ray generation.arXiv preprint arXiv:2211.12737, 2022

    Pierre Chambon, Christian Bluethgen, Jean-Benoit Del- brouck, Rogier Van der Sluijs, Małgorzata Połacin, Juan Manuel Zambrano Chaves, Tanishq Mathew Abraham, Shiv- anshu Purohit, Curtis P Langlotz, and Akshay Chaudhari. Roentgen: vision-language foundation model for chest x-ray generation.arXiv preprint arXiv:2211.12737, 2022. 2

  5. [5]

    Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis.Advances in neural information processing systems, 34:8780–8794, 2021. 2

  6. [6]

    Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification.Neurocomputing, 321:321–331,

    Maayan Frid-Adar, Idit Diamant, Eyal Klang, Michal Amitai, Jacob Goldberger, and Hayit Greenspan. Gan-based synthetic medical image augmentation for increased cnn performance in liver lesion classification.Neurocomputing, 321:321–331,

  7. [7]

    Casteer: Cross-attention steer- ing for controllable concept erasure.arXiv preprint arXiv:2503.09630, 2025

    Tatiana Gaintseva, Andreea-Maria Oncescu, Chengcheng Ma, Ziquan Liu, Martin Benning, Gregory Slabaugh, Jiankang Deng, and Ismail Elezi. Casteer: Cross-attention steer- ing for controllable concept erasure.arXiv preprint arXiv:2503.09630, 2025. 3

  8. [8]

    GrOCE:Graph-Guided Online Concept Erasure for Text-to-Image Diffusion Models

    Ning Han, Zhenyu Ge, Feng Han, Yuhua Sun, Chengqing Li, and Jingjing Chen. Groce: Graph-guided online concept erasure for text-to-image diffusion models.arXiv preprint arXiv:2511.12968, 2025. 3

  9. [9]

    Efficient diffu- sion training via min-snr weighting strategy

    Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffu- sion training via min-snr weighting strategy. InProceedings of the IEEE/CVF international conference on computer vision, pages 7441–7451, 2023. 5

  10. [10]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. 5

  11. [11]

    Prompt-to-prompt image editing with cross-attention control

    Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Prompt-to-prompt image editing with cross-attention control. InThe Eleventh Interna- tional Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. 3

  12. [12]

    Gans trained by a two time-scale update rule converge to a local nash equilibrium

    Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bern- hard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. InNeural Information Processing Systems, 2017. 6

  13. [13]

    Classifier-Free Diffusion Guidance

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.arXiv preprint arXiv:2207.12598, 2022. 2

  14. [14]

    Denoising diffu- sion probabilistic models

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. In34th International Conference on Neural Information Processing Systems, 2020. 5

  15. [15]

    Perceiver: General perception with iterative attention

    Andrew Jaegle, Felix Gimeno, Andy Brock, Oriol Vinyals, Andrew Zisserman, and Joao Carreira. Perceiver: General perception with iterative attention. InInternational confer- ence on machine learning, pages 4651–4664. PMLR, 2021. 2

  16. [16]

    Re- thinking fid: Towards a better evaluation metric for image generation

    Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit, Daniel Glasner, Ayan Chakrabarti, and Sanjiv Kumar. Re- thinking fid: Towards a better evaluation metric for image generation. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9307–9315,

  17. [17]

    Diffusion models in medical imaging: A comprehensive survey.Medical Image Analysis, 88:102846,

    Amirhossein Kazerouni, Ehsan Khodapanah Aghdam, Moein Heidari, Reza Azad, Mohsen Fayyaz, Ilker Hacihaliloglu, and Dorit Merhof. Diffusion models in medical imaging: A comprehensive survey.Medical Image Analysis, 88:102846,

  18. [18]

    Progressive disease image generation with ordinal-aware diffusion models.Diagnostics, 15(20), 2025

    Meryem Mine Kurt, ¨Umit Mert C ¸a˘glar, and Alptekin Tem- izel. Progressive disease image generation with ordinal-aware diffusion models.Diagnostics, 15(20), 2025. 1, 2, 3

  19. [19]

    Improved precision and recall metric for assessing generative models

    Tuomas Kynk¨a¨anniemi, Tero Karras, Samuli Laine, Jaakko Lehtinen, and Timo Aila. Improved precision and recall metric for assessing generative models. InNeural Information Processing Systems, 2019. 6, 7

  20. [20]

    Decoupled weight de- cay regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learning Representations, 2019. 5

  21. [21]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Leland McInnes and John Healy. Umap: Uniform manifold approximation and projection for dimension reduction.ArXiv, abs/1802.03426, 2018. 6, 8

  22. [22]

    T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

    Chong Mou, Xintao Wang, Liangbin Xie, Jing Zhang, Zhon- gang Qi, Ying Shan, and Xiaohu Qie. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. InAAAI Conference on Artificial Intelli- gence, 2023. 2

  23. [23]

    Suma: A subspace mapping approach for robust and effective concept erasure in text-to-image diffusion models

    Kien Nguyen, Anh Tran, and Cuong Pham. Suma: A subspace mapping approach for robust and effective concept erasure in text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19587–19596, 2025. 3

  24. [24]

    Supercharged one-step text-to-image diffusion models with negative prompts

    Viet Nguyen, Anh Nguyen, Trung Dao, Khoi Nguyen, Cuong Pham, Toan Tran, and Anh Tran. Supercharged one-step text-to-image diffusion models with negative prompts. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18004–18013, 2025. 3

  25. [25]

    Gorkem Polat, Haluk Tarik Kani, Ilkay Ergenc, Yesim Ozen Alahdab, Alptekin Temizel, and Ozlen Atug. Improving the computer-aided estimation of ulcerative colitis severity according to mayo endoscopic score by using regression- based deep learning.Inflammatory Bowel Diseases, 29(9): 1431–1439, 2022. 1

  26. [26]

    Deadiff: 9 An efficient stylization diffusion model with disentangled representations

    Tianhao Qi, Shancheng Fang, Yanze Wu, Hongtao Xie, Jiawei Liu, Lang Chen, Qian He, and Yongdong Zhang. Deadiff: 9 An efficient stylization diffusion model with disentangled representations. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8693–8702,

  27. [27]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, 2021. 2, 3

  28. [28]

    High-resolution image syn- thesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. In2022 IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2022. 1, 2, 3, 4, 5

  29. [29]

    Rousseeuw

    Peter J. Rousseeuw. Silhouettes: A graphical aid to the in- terpretation and validation of cluster analysis.Journal of Computational and Applied Mathematics, 20:53–65, 1987. 6

  30. [30]

    Schroeder, William J

    Kenneth W. Schroeder, William J. Tremaine, and Duane M. Ilstrup. Coated oral 5-aminosalicylic acid therapy for mildly to moderately active ulcerative colitis.New England Journal of Medicine, 317(26):1625–1629, 1987. 1

  31. [31]

    Tenenholtz, Jameson K

    Hoo-Chang Shin, Neil A. Tenenholtz, Jameson K. Rogers, Christopher G. Schwarz, Matthew L. Senjem, Jeffrey L. Gunter, Katherine P. Andriole, and Mark Michalski. Medical image synthesis for data augmentation and anonymization using generative adversarial networks. InSimulation and Synthesis in Medical Imaging, 2018. 3

  32. [32]

    Denoising diffusion implicit models

    Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations, 2021. 5

  33. [33]

    Effective data augmentation with diffusion models.ArXiv, abs/2302.07944, 2023

    Brandon Trabucco, Kyle Doherty, Max Gurinas, and Ruslan Salakhutdinov. Effective data augmentation with diffusion models.ArXiv, abs/2302.07944, 2023. 3

  34. [34]

    Inv- adapter: Id customization generation via image inversion and lightweight parameter adapter.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:9938–9952, 2025

    Peng Xing, Ning Wang, Jianbo Ouyang, and Zechao Li. Inv- adapter: Id customization generation via image inversion and lightweight parameter adapter.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47:9938–9952, 2025. 3

  35. [35]

    IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

    Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint arXiv:2308.06721,

  36. [36]

    Generative adversar- ial network in medical imaging: A review.Medical Image Analysis, 58:101552, 2019

    Xin Yi, Ekta Walia, and Paul Babyn. Generative adversar- ial network in medical imaging: A review.Medical Image Analysis, 58:101552, 2019. 1

  37. [37]

    Adding conditional control to text-to-image diffusion models

    Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 2 10