Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images

Alex Frydrychowicz; Fabian Jacob; Heinz Handels; Hristina Uzunova; Jan Ehrhardt

arxiv: 1907.01376 · v2 · pith:Z3343MU6new · submitted 2019-07-02 · 📡 eess.IV · cs.CV

Multi-scale GANs for Memory-efficient Generation of High Resolution Medical Images

Hristina Uzunova , Jan Ehrhardt , Fabian Jacob , Alex Frydrychowicz , Heinz Handels This is my paper

Pith reviewed 2026-05-25 10:52 UTC · model grok-4.3

classification 📡 eess.IV cs.CV

keywords multi-scale GANpatch-based image generationmedical image synthesishigh-resolution 3D imagesmemory-efficient generationdomain translationCT and X-ray synthesis

0 comments

The pith

A multi-scale patch GAN generates arbitrarily large high-resolution medical images while keeping GPU memory demand constant.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a GAN that first learns a low-resolution image and then synthesizes patches at successively higher resolutions, each conditioned on the output of the prior scale. This progressive scheme produces 512x512x512 CT volumes and 2048x2048 X-rays without the memory scaling that normally limits GANs on large medical data. Because memory use stays fixed, the method can in principle create images of any size. It also reports fewer boundary artifacts and higher overall quality than standard patch-wise GANs that do not use cross-scale conditioning.

Core claim

By training a sequence of generators where each higher-resolution stage receives the already-generated lower-resolution image as conditioning input, the approach decouples image size from memory footprint, enabling synthesis of full-resolution 3D thorax CTs and 2D X-rays in a domain-translation setting while avoiding the inconsistencies typical of independent patch generation.

What carries the argument

The multi-scale conditioning chain: each resolution stage generates patches conditioned on the lower-resolution image produced by the preceding stage.

If this is right

Images of any spatial extent become feasible on a fixed GPU because only one patch scale is processed at a time.
Patch-boundary artifacts disappear because each new scale receives the full lower-resolution context rather than independent neighboring patches.
The same trained cascade can be applied to both 2D radiographs and 3D CT volumes in a domain-translation task.
Training and inference remain feasible on hardware that cannot hold an entire high-resolution volume in memory.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same progressive conditioning might be applied to other generative tasks such as super-resolution or inpainting of medical volumes.
If the conditioning signal is sufficiently informative, the method could reduce the need for explicit overlap or blending steps used in current patch-based pipelines.
Extending the cascade to an additional scale would allow direct generation of even larger images without retraining the entire stack from scratch.

Load-bearing premise

That lower-resolution conditioning will automatically enforce global consistency across high-resolution patches without extra mechanisms that would raise memory use.

What would settle it

Generate a 1024x1024x1024 CT volume with the trained model and measure whether peak GPU memory remains identical to the 512-cubed case; any increase falsifies the constant-memory claim.

Figures

Figures reproduced from arXiv: 1907.01376 by Alex Frydrychowicz, Fabian Jacob, Heinz Handels, Hristina Uzunova, Jan Ehrhardt.

**Figure 1.** Figure 1: An overview of our method. Generate the whole image with a low resolution (LR) GAN, then subsequently increase the resolution by generating patches with multiple high resolution (HR) GANs conditioned on the previous scales. Blue: patches of original resolution for the current scale; red: upscaled patches of lower resolution. an adversarial discriminator D is enclosed in the training process, aiming to per… view at source ↗

**Figure 2.** Figure 2: RAM requirements for 3D GANs. Baselines: DCGAN, Pix2Pix and PGGAN. Dashed lines indicate cubic regression approximation. Our methods: for low resolution images of size 643 (LR 64) and high resolution patches of size 323 (HR 32), have constant memory requirement regardless the image size. Dotted lines indicate sizes under the assumed minimal size 643 . Log-scale is used on the y-axis. and PGGAN are only imp… view at source ↗

**Figure 3.** Figure 3: Exemplary images from the used datasets and results of the experiments. First row: thorax CTs (zoomed) – real B80f image; corresponding B20f; translated B80f to B20f with a standard patch-wise approach; our method. Second row: Real low-dose image; translated low-dose to high-dose; real X-ray; generated X-ray.2 The visual correspondence of the target domain and the translated images is also underlined by th… view at source ↗

**Figure 4.** Figure 4: LR generator architecture [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: HR generator architecture [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: LR and HR networks discriminator architectures [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Currently generative adversarial networks (GANs) are rarely applied to medical images of large sizes, especially 3D volumes, due to their large computational demand. We propose a novel multi-scale patch-based GAN approach to generate large high resolution 2D and 3D images. Our key idea is to first learn a low-resolution version of the image and then generate patches of successively growing resolutions conditioned on previous scales. In a domain translation use-case scenario, 3D thorax CTs of size 512x512x512 and thorax X-rays of size 2048x2048 are generated and we show that, due to the constant GPU memory demand of our method, arbitrarily large images of high resolution can be generated. Moreover, compared to common patch-based approaches, our multi-resolution scheme enables better image quality and prevents patch artifacts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The multi-scale conditioning scheme is a practical tweak for reducing patch artifacts in large medical GANs, but the constant-memory claim for arbitrarily large volumes rests on unshown details about the coarsest scale.

read the letter

The paper's core move is to train a low-res GAN first, then generate higher-res patches conditioned on the prior scale. This produces 512^3 CT volumes and 2048^2 X-rays while keeping GPU memory flat, and it claims fewer boundary artifacts than ordinary patch-based GANs. That engineering choice is the actual novelty here; it is not just another progressive GAN but a successive conditioning trick tuned for medical volumes where global consistency matters for clinical use. The demonstration on real thorax data sizes is useful and shows the method can run where standard full-volume GANs cannot. Credit to the authors for targeting a concrete pain point in data augmentation for 3D imaging. The soft spot is the memory claim. The abstract says memory demand stays constant so images can be arbitrarily large, yet the low-res base still has to be generated or stored before patching higher scales. In 3D that base grows with overall size unless it is itself produced patch-wise, and nothing in the provided text confirms that step or gives measured memory curves versus N. Without those numbers or an ablation on the conditioning tensor size, the 'arbitrarily large' guarantee is not yet secured. No quantitative metrics, FID scores, or reader studies appear in the abstract either, so the quality improvement is stated but not measured. This work is for groups already running patch or progressive GANs on medical data who need a drop-in way to scale resolution without new hardware. It is coherent on its own terms and shows honest engagement with the practical constraints, so it deserves a serious referee who can check the implementation details and ask for the missing memory and quality numbers. I would send it to review rather than desk reject.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a multi-scale patch-based GAN for generating large high-resolution 2D and 3D medical images. The core idea is to first learn a low-resolution version of the target image and then synthesize patches at successively higher resolutions, each conditioned on the outputs of previous scales. In a domain-translation setting the method is demonstrated on 512×512×512 thorax CT volumes and 2048×2048 thorax X-rays; the authors claim that the scheme maintains constant GPU memory (thereby permitting arbitrarily large images) and yields higher visual quality with fewer patch artifacts than conventional single-scale patch-based GANs.

Significance. If the constant-memory property and the claimed quality gains can be rigorously established, the work would enable practical GAN-based synthesis of full-resolution clinical volumes that currently exceed GPU limits, removing a major barrier to the use of generative models in large-scale medical imaging.

major comments (3)

[Abstract, §3] Abstract and §3 (Method description): The central claim that GPU memory remains constant independent of final image size N is not supported by the given scheme. The low-resolution base is described as being generated first at full spatial extent (scaled only by the fixed down-sampling ratio). For 3-D data this base still scales as O((N/s)^3); unless an additional patch-wise or fixed-size coarsest-scale procedure is introduced, memory cannot stay constant for arbitrarily large N. The manuscript provides neither a description of such a mechanism nor a bound on the size of the conditioning tensor passed to higher-resolution patches.
[Abstract, §4] Abstract and §4 (Experiments): The paper asserts qualitative improvements in image quality and the absence of patch artifacts, yet reports no quantitative metrics (FID, PSNR, SSIM, or perceptual scores), no ablation on the number of scales, and no analysis of inter-patch consistency or failure modes. Without these measurements the central claims of superiority over standard patch-based approaches cannot be evaluated.
[§3] §3 (Multi-scale conditioning): The description does not specify whether the same generator weights are reused across all resolution stages or whether scale-specific retraining occurs. If the latter, the constant-memory guarantee is threatened; if the former, the manuscript must demonstrate that a single set of weights can be conditioned on arbitrarily large low-resolution inputs without memory growth.

minor comments (1)

[Abstract, §4] The abstract states results for both 2-D and 3-D data, but the experimental section should explicitly separate quantitative or qualitative findings for each modality.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (Method description): The central claim that GPU memory remains constant independent of final image size N is not supported by the given scheme. The low-resolution base is described as being generated first at full spatial extent (scaled only by the fixed down-sampling ratio). For 3-D data this base still scales as O((N/s)^3); unless an additional patch-wise or fixed-size coarsest-scale procedure is introduced, memory cannot stay constant for arbitrarily large N. The manuscript provides neither a description of such a mechanism nor a bound on the size of the conditioning tensor passed to higher-resolution patches.

Authors: We acknowledge that the current manuscript description does not explicitly introduce a patch-wise mechanism at the coarsest scale, which is required to rigorously support constant memory for arbitrarily large N. We will revise §3 to add a patch-based generation procedure at the lowest resolution (with the same fixed patch size used at higher scales) and will include explicit bounds on the size of the low-resolution conditioning tensor passed to subsequent stages. This ensures the constant-memory property holds throughout and will be reflected in an updated abstract. revision: yes
Referee: [Abstract, §4] Abstract and §4 (Experiments): The paper asserts qualitative improvements in image quality and the absence of patch artifacts, yet reports no quantitative metrics (FID, PSNR, SSIM, or perceptual scores), no ablation on the number of scales, and no analysis of inter-patch consistency or failure modes. Without these measurements the central claims of superiority over standard patch-based approaches cannot be evaluated.

Authors: We agree that quantitative metrics and ablations would allow a more rigorous evaluation of the claimed quality improvements. In the revised manuscript we will add FID scores, SSIM, and perceptual metrics for both the 2D and 3D experiments, include an ablation study on the number of scales, and provide an analysis of inter-patch consistency (e.g., boundary continuity metrics) together with observed failure modes. These additions will appear in an expanded §4. revision: yes
Referee: [§3] §3 (Multi-scale conditioning): The description does not specify whether the same generator weights are reused across all resolution stages or whether scale-specific retraining occurs. If the latter, the constant-memory guarantee is threatened; if the former, the manuscript must demonstrate that a single set of weights can be conditioned on arbitrarily large low-resolution inputs without memory growth.

Authors: The method reuses a single set of generator weights across all scales; the network is conditioned on the lower-resolution output via the multi-scale patch mechanism. We will explicitly state this design choice in the revised §3 and will add a short demonstration (memory measurements versus conditioning input size) confirming that patch-wise processing keeps memory constant even when the low-resolution conditioning field grows. This clarification will also address the related memory-bound concern raised in the first comment. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical architecture with no self-referential derivations

full rationale

The paper describes a multi-scale patch-based GAN design for high-resolution medical image generation. No equations, fitted parameters, or predictions are presented that reduce to inputs by construction. Claims rest on the architectural description and empirical evaluation rather than any self-definitional, fitted-input, or self-citation load-bearing steps. The method is self-contained as an engineering proposal without circular reductions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach assumes standard GAN training dynamics and that multi-scale conditioning will enforce consistency; these are domain assumptions not derived from first principles.

axioms (1)

domain assumption Multi-scale conditioning on lower resolutions produces globally coherent high-resolution patches without additional regularization
Invoked in the description of the generation process in the abstract.

pith-pipeline@v0.9.0 · 5682 in / 1201 out tokens · 28711 ms · 2026-05-25T10:52:23.622641+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

https://keras.io (2015), last access: July 9, 2019

Chollet, F., et al.: Keras. https://keras.io (2015), last access: July 9, 2019

work page 2015
[2]

In: Advances in Neural Infor- mation Processing Systems, pp

Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Infor- mation Processing Systems, pp. 1486–1494 (2015)

work page 2015
[3]

In: Advances in Neural Information Processing Systems, pp

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

work page 2014
[4]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with con- ditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5967–5976 (2017)

work page 2017
[5]

Kamnitsas, K., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Rueck- ert, D., Glocker, B.: Eﬃcient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation (2017)

work page 2017
[6]

In: International Conference on Learning Representations (2018)

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)

work page 2018
[7]

In: SPIE Medical Imaging

Lei, Y., Wang, T., Liu, Y., Higgins, K., Tian, S., Liu, T., Mao, H., Shim, H., Curran, W.J., Shu, H.K., Yang, X.: MRI-based synthetic CT generation using deep convolutional neural network. In: SPIE Medical Imaging. vol. 10949 (2019)

work page 2019
[8]

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic diﬀerentiation in PyTorch (2017)

work page 2017
[9]

In: Simulation and Synthesis in Medical Imaging

Shin, H.C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L., Andriole, K.P., Michalski, M.: Medical image synthesis for data augmenta- tion and anonymization using generative adversarial networks. In: Simulation and Synthesis in Medical Imaging. pp. 1–11 (2018)

work page 2018
[10]

In: Advances in Neural Information Processing Systems

Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a prob- abilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems. pp. 82–90 (2016)

work page 2016
[11]

In: IEEE International Sym- posium on Biomedical Imaging (ISBI)

Yu, B., Zhou, L., Wang, L., Fripp, J., Bourgeat, P.: 3D cGAN based cross-modality MR image synthesis for brain tumor segmentation. In: IEEE International Sym- posium on Biomedical Imaging (ISBI). pp. 626–630 (2018)

work page 2018
[12]

IEEE Transactions on Pattern Analysis and Machine Intelligence pp

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stack- GAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 1–1 (2018) Multi-scale GANs for High Resolution Medical Images 9 A Supplementary A.1 Network Architectures Fig. 4. LR generator ar...

work page 2018

[1] [1]

https://keras.io (2015), last access: July 9, 2019

Chollet, F., et al.: Keras. https://keras.io (2015), last access: July 9, 2019

work page 2015

[2] [2]

In: Advances in Neural Infor- mation Processing Systems, pp

Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Infor- mation Processing Systems, pp. 1486–1494 (2015)

work page 2015

[3] [3]

In: Advances in Neural Information Processing Systems, pp

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)

work page 2014

[4] [4]

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp

Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with con- ditional adversarial networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 5967–5976 (2017)

work page 2017

[5] [5]

Kamnitsas, K., Ledig, C., Newcombe, V., Simpson, J., Kane, A., Menon, D., Rueck- ert, D., Glocker, B.: Eﬃcient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation (2017)

work page 2017

[6] [6]

In: International Conference on Learning Representations (2018)

Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)

work page 2018

[7] [7]

In: SPIE Medical Imaging

Lei, Y., Wang, T., Liu, Y., Higgins, K., Tian, S., Liu, T., Mao, H., Shim, H., Curran, W.J., Shu, H.K., Yang, X.: MRI-based synthetic CT generation using deep convolutional neural network. In: SPIE Medical Imaging. vol. 10949 (2019)

work page 2019

[8] [8]

Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic diﬀerentiation in PyTorch (2017)

work page 2017

[9] [9]

In: Simulation and Synthesis in Medical Imaging

Shin, H.C., Tenenholtz, N.A., Rogers, J.K., Schwarz, C.G., Senjem, M.L., Gunter, J.L., Andriole, K.P., Michalski, M.: Medical image synthesis for data augmenta- tion and anonymization using generative adversarial networks. In: Simulation and Synthesis in Medical Imaging. pp. 1–11 (2018)

work page 2018

[10] [10]

In: Advances in Neural Information Processing Systems

Wu, J., Zhang, C., Xue, T., Freeman, W.T., Tenenbaum, J.B.: Learning a prob- abilistic latent space of object shapes via 3D generative-adversarial modeling. In: Advances in Neural Information Processing Systems. pp. 82–90 (2016)

work page 2016

[11] [11]

In: IEEE International Sym- posium on Biomedical Imaging (ISBI)

Yu, B., Zhou, L., Wang, L., Fripp, J., Bourgeat, P.: 3D cGAN based cross-modality MR image synthesis for brain tumor segmentation. In: IEEE International Sym- posium on Biomedical Imaging (ISBI). pp. 626–630 (2018)

work page 2018

[12] [12]

IEEE Transactions on Pattern Analysis and Machine Intelligence pp

Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stack- GAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence pp. 1–1 (2018) Multi-scale GANs for High Resolution Medical Images 9 A Supplementary A.1 Network Architectures Fig. 4. LR generator ar...

work page 2018