pith. sign in

arxiv: 2606.20095 · v1 · pith:TTSZC7FKnew · submitted 2026-06-18 · 💻 cs.CV

Stitching and dimensionality effects on large artificially generated volume datasets

Pith reviewed 2026-06-26 17:55 UTC · model grok-4.3

classification 💻 cs.CV
keywords stitching artifactscycleGANcryo-EMFID scoresvolume generationsegmentation performance2D vs 3D models
0
0 comments X

The pith

Stitching artifacts in cycleGAN-generated cryo-EM volumes evade FID detection yet reduce downstream mitochondria segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how three different stitching methods and the choice of 2D versus 3D patching influence the quality of large volumes produced by cycleGAN models trained on cryo-electron microscopy data. It demonstrates that standard perceptual scores such as FID can rate outputs as acceptable while subtle border mismatches still lower performance on a mitochondria segmentation task. Comparisons show that clean 3D stitching gives only small gains over 2D, gains that may not offset the added compute, while 2D training remains more stable because larger batches fit in memory. Orthogonal ensembling helps only the poorer stitched volumes and adds nothing once stitching quality is already high. The work therefore argues that artifact mitigation must be checked directly against task performance rather than relying on perceptual metrics alone when generating large scientific volumes.

Core claim

When large cryo-EM volumes are assembled from cycleGAN patches, FID scores overlook subtle stitching artifacts that nevertheless lower accuracy on mitochondria segmentation; artifact-free 3D stitching yields only marginal downstream gains over 2D that barely justify the extra cost, while 2D models train more stably from larger batch sizes and orthogonal ensembling improves only low-quality outputs.

What carries the argument

Three stitching approaches combined with 2D versus 3D patch dimensionality inside cycleGAN models, evaluated by FID and by accuracy on a mitochondria segmentation downstream task.

If this is right

  • Subtle stitching artifacts can degrade segmentation even when FID reports good perceptual quality.
  • Artifact-free 3D stitching produces only marginal segmentation gains over 2D that may not offset higher computational cost.
  • 2D models train more stably because they allow larger batch sizes.
  • Ensembling predictions from three orthogonal directions improves low-quality stitched volumes but adds no value to already high-quality outputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Developers of generative volume models may need task-specific metrics that directly measure border consistency rather than relying solely on FID.
  • Practical pipelines could prioritize improved 2D stitching techniques over switching to 3D if the marginal accuracy gain remains small.
  • The observed mismatch between perceptual and task metrics could appear in other biomedical generation settings such as denoising or super-resolution of volumes.

Load-bearing premise

That the mitochondria segmentation task and the chosen cryo-EM datasets are representative of how stitching artifacts will behave in other generative models and other scientific volume tasks.

What would settle it

Running the same cycleGAN training, stitching variants, and FID-plus-segmentation evaluation on a different volume dataset or a different downstream task such as nuclei counting and checking whether FID still fails to predict the segmentation drop.

Figures

Figures reproduced from arXiv: 2606.20095 by Dagmar Kainm\"uller, Jan Philipp Albrecht, Lucas von Chamier.

Figure 1
Figure 1. Figure 1: Experimental Design: 1. Three different stitching methods are compared. 2. Three dif￾ferent types of volume assembly are used, combined with the stitching methods. 3. For each mode of assembly (stitching x volume assembly), volumes are generated for each training checkpoint of the generator (rat-to-human) model and FID wrt ground-truth (human) domain is determined. Out of 20 checkpoints, the three highest,… view at source ↗
Figure 2
Figure 2. Figure 2: Stitching approaches and generated images - Top: Example target (Human FIB-SEM) and input (Rat FIB-SEM). Middle (Left): ”Generated Images” with different stitching meth￾ods: A) Tile-and-Stitch B) padded convolutions with overlap C) Valid convolutions, no overlap. ”Residuals”: These are constructed by subtraction of the normalised images from each other, with A-B and A-C, referring to the images in columns … view at source ↗
Figure 3
Figure 3. Figure 3: Effect of stitching on downstream segmentation - Probability maps, masks generated by thresholding and overlays with the input image are shown, in rows 1, 3 and 5 for a full slice of the input volume and in rows 2, 4 and 6 for the insets. The green lines indicate the edges of the tiles used to assemble the input volume. The residuals represent the differences between the respective stitching type (shown on… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of segmentations between different dimensionality settings, in tile-and￾stitch: A rat FIB-SEM volume is transformed into the human FIB-SEM domain via three inference settings with different respective assembly strategies (2D, 3D or orthoslice). 9 checkpoints (3 best, 3 mid-range, 3 worst) were chosen for downstream segmentation performance analysis. FID scores are shown on the generated images, … view at source ↗
Figure 5
Figure 5. Figure 5: Effect of parameter choice on FID: Shown are the FID scores for each generated checkpoint during training of the openorganelle dataset, in the direction HeLa-to-jurkat, when compared to the GT openorganelle jurkat dataset. Marks represent the average FID from a triplicate of each checkpoint, error bars indicate one standard deviation around these means. The dotted line represents the baseline FID (63.93) b… view at source ↗
read the original abstract

Generating large images via deep learning requires patching input data to accommodate hardware memory limitations, then assembling output patches, a process that can introduce stitching artifacts when neighboring patches do not align at borders. While these artifacts are known to affect segmentation tasks, their impact on generative models for style-transfer remains poorly understood. We investigated three stitching approaches and two patch dimensionalities (2D vs 3D) using cycleGAN models trained on cryo-electron microscopy datasets. We evaluated both perceptual quality and performance on downstream mitochondria segmentation. Our key findings reveal that: (1) FID scores fail to detect subtle stitching artifacts that significantly impact downstream segmentation performance, (2) 3D models with artifact-free stitching marginally outperform 2D models on downstream tasks, though the improvement barely justifies the computational cost, and (3) 2D models train more stably due to larger batch sizes. Additionally, we demonstrate that ensembling predictions from three orthogonal directions can improve low-quality volumes but provides no benefit for high-quality outputs. These results demonstrate that maximizing generative model performance on large scientific datasets requires careful consideration and mitigation of stitching artifacts, and that perceptual metrics alone are insufficient for evaluating domain adaptation quality in biomedical imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper examines three stitching approaches and 2D vs. 3D patch dimensionalities when using cycleGAN for style transfer on cryo-EM volumes to generate large artificially stitched datasets. It reports that FID scores miss subtle border artifacts that degrade downstream mitochondria segmentation performance, that artifact-free 3D stitching yields only marginal gains over 2D at high computational cost, that 2D training is more stable due to batch size, and that ensembling orthogonal predictions improves only low-quality outputs.

Significance. If the empirical discrepancy between FID and segmentation holds, the work supplies concrete evidence that standard perceptual metrics are inadequate for evaluating generative models on scientific volumes and offers practical guidance on stitching and dimensionality trade-offs for large biomedical datasets.

major comments (2)
  1. [Abstract / Experiments] Abstract and experiments section: the central claim that 'perceptual metrics alone are insufficient' and that stitching artifacts 'significantly impact downstream segmentation' rests on a single cycleGAN + mitochondria segmentation task on cryo-EM data; no ablation on other generative architectures (diffusion, etc.) or downstream tasks is described, so the asserted generality to 'large scientific volume datasets' is not load-bearing.
  2. [Results] Results on 3D vs 2D: the statement that 3D 'marginally outperform[s] 2D ... though the improvement barely justifies the computational cost' requires explicit quantitative comparison (performance delta vs. training/inference FLOPs or wall-clock time); without those numbers the cost-benefit conclusion cannot be assessed.
minor comments (1)
  1. [Abstract] The abstract states 'we investigated three stitching approaches' but does not name them; a brief enumeration would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract / Experiments] Abstract and experiments section: the central claim that 'perceptual metrics alone are insufficient' and that stitching artifacts 'significantly impact downstream segmentation' rests on a single cycleGAN + mitochondria segmentation task on cryo-EM data; no ablation on other generative architectures (diffusion, etc.) or downstream tasks is described, so the asserted generality to 'large scientific volume datasets' is not load-bearing.

    Authors: We agree that the experiments are limited to cycleGAN and a single downstream segmentation task on cryo-EM data. The study was designed to examine stitching effects in this representative setting for large biomedical volumes. To address the concern about overgeneralization, we will revise the abstract and discussion sections to clarify the scope of the claims and avoid implying broad applicability to all generative architectures or tasks. revision: yes

  2. Referee: [Results] Results on 3D vs 2D: the statement that 3D 'marginally outperform[s] 2D ... though the improvement barely justifies the computational cost' requires explicit quantitative comparison (performance delta vs. training/inference FLOPs or wall-clock time); without those numbers the cost-benefit conclusion cannot be assessed.

    Authors: We agree that the cost-benefit statement requires supporting quantitative data. In the revised manuscript we will add explicit comparisons of segmentation performance deltas against training and inference costs measured in FLOPs and wall-clock time. revision: yes

Circularity Check

0 steps flagged

No circularity; purely empirical comparison

full rationale

The paper conducts an experimental study training cycleGAN models on cryo-EM volumes, testing three stitching approaches and 2D vs 3D patch dimensionalities, then measuring FID scores and downstream mitochondria segmentation performance. No derivation chain, fitted parameters renamed as predictions, self-citation load-bearing premises, or ansatzes appear in the described work. All reported findings (FID failing to detect artifacts, marginal 3D gains, etc.) rest on direct empirical measurements rather than any reduction to prior inputs by construction. The study is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract; the work is an empirical comparison study.

pith-pipeline@v0.9.1-grok · 5740 in / 1094 out tokens · 17479 ms · 2026-06-26T17:55:11.130339+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 1 canonical work pages

  1. [1]

    Ledig, C.et al. Photo-realistic single image super-resolution using a generative adver- sarial networkinProceedings of the IEEE conference on computer vision and pattern recognition(2017), 4681–4690

  2. [2]

    Saharia, C.et al.Image super-resolution via iterative refinement.IEEE transactions on pattern analysis and machine intelligence45,4713–4726 (2022)

  3. [3]

    Z., Sit, M

    Demiray, B. Z., Sit, M. & Demir, I. D-SRGAN: DEM super-resolution with generative adversarial networks.SN Computer Science2,48 (2021)

  4. [4]

    Jansche, A.et al.Deep learning-based image super resolution methods in microscopy–a review.Methods in Microscopy2,235–275 (2025)

  5. [5]

    & Efros, A

    Isola, P., Zhu, J.-Y ., Zhou, T. & Efros, A. A.Image-to-image translation with conditional adversarial networksinProceedings of the IEEE conference on computer vision and pat- tern recognition(2017), 1125–1134. 19

  6. [6]

    & Efros, A

    Zhu, J.-Y ., Park, T., Isola, P. & Efros, A. A.Unpaired image-to-image translation using cycle-consistent adversarial networksinProceedings of the IEEE international conference on computer vision(2017), 2223–2232

  7. [7]

    Lauenburg, L.et al.Instance segmentation of unlabeled modalities via cyclic segmentation gan.arXiv preprint arXiv:2204.03082(2022)

  8. [8]

    P., Fuller, C

    Kieselmann, J. P., Fuller, C. D., Gurney-Champion, O. J. & Oelfke, U. Cross-modality deep learning: contouring of MRI data from annotated CT data only.Medical physics48, 1673–1684 (2021)

  9. [9]

    Zhang, Z., Yang, L. & Zheng, Y .Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial networkinProceedings of the IEEE conference on computer vision and pattern Recognition(2018), 9242–9251

  10. [10]

    & Litjens, G

    De Bel, T., Bokhorst, J.-M., van der Laak, J. & Litjens, G. Residual cyclegan for robust do- main transformation of histopathological tissue slides.Medical Image Analysis70,102004 (2021)

  11. [11]

    Scientific Reports13,7303 (2023)

    Khader, F.et al.Denoising diffusion probabilistic models for 3D medical image generation. Scientific Reports13,7303 (2023)

  12. [12]

    Thambawita, V .et al.SinGAN-Seg: Synthetic training data generation for medical image segmentation.PloS one17,e0267976 (2022)

  13. [13]

    & Stegmaier, J

    Eschweiler, D., Rethwisch, M., Jarchow, M., Koppers, S. & Stegmaier, J. 3D fluorescence microscopy data synthesis for segmentation and benchmarking.Plos one16,e0260509 (2021)

  14. [14]

    & Grauman, K.Fine-grained visual comparisons with local learninginProceedings of the IEEE conference on computer vision and pattern recognition(2014), 192–199

    Yu, A. & Grauman, K.Fine-grained visual comparisons with local learninginProceedings of the IEEE conference on computer vision and pattern recognition(2014), 192–199

  15. [15]

    & Grauman, K.Semantic jitter: Dense supervision for visual comparisons via syn- thetic imagesinProceedings of the IEEE International Conference on Computer Vision (2017), 5570–5579

    Yu, A. & Grauman, K.Semantic jitter: Dense supervision for visual comparisons via syn- thetic imagesinProceedings of the IEEE International Conference on Computer Vision (2017), 5570–5579

  16. [16]

    Russakovsky, O.et al.ImageNet Large Scale Visual Recognition Challenge.International Journal of Computer Vision (IJCV)115,211–252 (2015)

  17. [17]

    & Tang, X.Deep Learning Face Attributes in the WildinPro- ceedings of International Conference on Computer Vision (ICCV)(Dec

    Liu, Z., Luo, P., Wang, X. & Tang, X.Deep Learning Face Attributes in the WildinPro- ceedings of International Conference on Computer Vision (ICCV)(Dec. 2015)

  18. [18]

    & Tomancak, P

    Preibisch, S., Saalfeld, S. & Tomancak, P. Globally optimal stitching of tiled 3D micro- scopic image acquisitions.Bioinformatics25,1463–1465 (2009)

  19. [19]

    & Iannello, G

    Bria, A. & Iannello, G. TeraStitcher-a tool for fast automatic 3D-stitching of teravoxel- sized microscopy images.BMC bioinformatics13,316 (2012)

  20. [20]

    A., Panchumarthy, R., Thakur, S

    Reina, G. A., Panchumarthy, R., Thakur, S. P., Bastidas, A. & Bakas, S. Systematic evalu- ation of image tiling adverse effects on deep learning semantic segmentation.Frontiers in neuroscience14,65 (2020)

  21. [21]

    Rumberger, J. L.et al. How shift equivariance impacts metric learning for instance seg- mentationinProceedings of the IEEE/CVF International Conference on Computer Vision (2021), 7128–7136

  22. [22]

    Buglakova, E.et al.Tiling artifacts and trade-offs of feature normalization in the segmen- tation of large biological images.arXiv preprint arXiv:2503.19545(2025). 20

  23. [23]

    & Sun, J.Deep residual learning for image recognitionin Proceedings of the IEEE conference on computer vision and pattern recognition(2016), 770–778

    He, K., Zhang, X., Ren, S. & Sun, J.Deep residual learning for image recognitionin Proceedings of the IEEE conference on computer vision and pattern recognition(2016), 770–778

  24. [24]

    & Bajcsy, P

    Possolo, M. & Bajcsy, P. Exact tile-based segmentation inference for images larger than gpu memory.Journal of Research of the National Institute of Standards and Technology 126,126009 (2021)

  25. [25]

    S., Bergmann, D

    Wolny, A.et al.Accurate and versatile 3D segmentation of plant tissues at cellular resolu- tion.eLife9(eds Hardtke, C. S., Bergmann, D. C., Bergmann, D. C. & Graeff, M.) e57613. ISSN: 2050-084X.https://doi.org/10.7554/eLife.57613(July 2020)

  26. [26]

    Computational methods for stitching, alignment, and artifact correction of se- rial section data.Methods in Cell Biology152,261–276 (2019)

    Saalfeld, S. Computational methods for stitching, alignment, and artifact correction of se- rial section data.Methods in Cell Biology152,261–276 (2019)

  27. [27]

    & Brox, T.U-net: Convolutional networks for biomedical im- age segmentationinInternational Conference on Medical image computing and computer- assisted intervention(2015), 234–241

    Ronneberger, O., Fischer, P. & Brox, T.U-net: Convolutional networks for biomedical im- age segmentationinInternational Conference on Medical image computing and computer- assisted intervention(2015), 234–241

  28. [28]

    Kayhan, O. S. & Gemert, J. C. v.On translation invariance in cnns: Convolutional lay- ers can exploit absolute spatial locationinProceedings of the IEEE/CVF conference on computer vision and pattern recognition(2020), 14274–14285

  29. [29]

    S., Brox, T

    C ¸ ic ¸ek,¨O., Abdulkadir, A., Lienkamp, S. S., Brox, T. & Ronneberger, O.3D U-Net: learn- ing dense volumetric segmentation from sparse annotationinInternational conference on medical image computing and computer-assisted intervention(2016), 424–432

  30. [30]

    & Hochreiter, S

    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B. & Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems30(2017)

  31. [31]

    Wei, D.et al. Mitoem dataset: Large-scale 3d mitochondria instance segmentation from em imagesinInternational Conference on Medical Image Computing and Computer-Assisted Intervention(2020), 66–76

  32. [32]

    Heinrich, L.et al.Whole-cell organelle segmentation in volume electron microscopy.Na- ture599,141–146 (2021)

  33. [33]

    T.et al.Isotropic 3D electron microscopy reference data of wild-type, immor- talized T-Cells (jrc jurkat-1).https : / / janelia

    Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, immor- talized T-Cells (jrc jurkat-1).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild - type _ immortalized _ T - Cells _ jrc _ jurkat - 1 _ /13114259(Nov. 2020)

  34. [34]

    Xu, C. S.et al.Isotropic 3D electron microscopy reference data of wild-type, interphase HeLa cell (jrc hela-1).https://janelia.figshare.com/articles/dataset/ Isotropic _ 3D _ electron _ microscopy _ reference _ data _ of _ wild - type_interphase_HeLa_cell_jrc_hela-1_/13123415(Nov. 2020)

  35. [35]

    T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-2).https : / / janelia

    Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-2).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild-type_interphase_HeLa_cell_jrc_hela-2_/13114211(Nov. 2020)

  36. [36]

    T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-3).https : / / janelia

    Group, F.-S. T.et al.Isotropic 3D electron microscopy reference data of wild-type, inter- phase HeLa cell (jrc hela-3).https : / / janelia . figshare . com / articles / dataset/Isotropic_3D_electron_microscopy_reference_data_of_ wild-type_interphase_HeLa_cell_jrc_hela-3_/13114244(Nov. 2020). 21

  37. [37]

    Jin, X., Qi, Y . & Wu, S. Cyclegan face-off.arXiv preprint arXiv:1712.03451(2017)

  38. [38]

    Version 0.3.0

    Seitzer, M.pytorch-fid: FID Score for PyTorchhttps://github.com/mseitzer/ pytorch-fid. Version 0.3.0. Aug. 2020. 22