YOSO: single-frame Gerchberg-Saxton phase retrieval with AI-based data augmentation for in-line holography
Pith reviewed 2026-05-07 07:24 UTC · model grok-4.3
The pith
A neural network generates a second hologram from one frame so the Gerchberg-Saxton algorithm can recover phase for three-dimensional biological samples.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
YOSO employs a multi-scale ResNet to predict an additional hologram at a different defocus distance from a single input hologram. The original and generated holograms feed into the Gerchberg-Saxton iterative procedure, which alternates propagation between the hologram and object planes until the complex field converges. Physics-consistent padding replaces conventional padding to maintain consistency with wave propagation. Experiments on resolution targets, adherent and suspended cells, and mouse brain slices show that the method recovers accurate amplitude and phase, including defocused features in three-dimensional objects, across both lens-based and lensless systems.
What carries the argument
The YOSO multi-scale ResNet that generates a second defocused hologram from a single input hologram, which is then processed together with the original by the Gerchberg-Saxton algorithm.
If this is right
- The recovered object wave supports numerical propagation to arbitrary planes for refocusing of three-dimensional samples.
- The framework applies equally to lens-based and lensless digital in-line holographic microscopy setups.
- Full-sized holograms can be processed directly without patch division and stitching.
- Training runs once on a standard workstation, after which inference is rapid for new inputs.
- The method handles diverse samples including resolution targets, adherent cells, suspended cells, and tissue slices.
Where Pith is reading between the lines
- The hybrid AI-augmented Gerchberg-Saxton approach could be paired with other iterative phase retrieval methods that rely on multiple views.
- Single-frame operation may shorten acquisition time for dynamic biological processes that change between successive captures.
- Training on natural-image simulations suggests the technique could transfer to other computational imaging tasks where matched experimental training data is limited.
- Embedding the network into acquisition hardware could simplify experimental workflows by removing the requirement for precise multi-height positioning stages.
Load-bearing premise
The network trained only on computer-generated holograms from natural images produces a second hologram whose phase and amplitude errors are small enough for the Gerchberg-Saxton iterations to converge to the correct object wave on real biological specimens.
What would settle it
Direct comparison on identical real biological samples between the phase map obtained from YOSO plus Gerchberg-Saxton and the phase map obtained from conventional multi-height experimental captures would reveal large errors or failed refocusing if the central claim is false.
Figures
read the original abstract
We present YOSO (You Only Shot Once), a single-frame phase retrieval framework for digital in-line holographic microscopy (DIHM) in which supervised deep learning is used to numerically generate an additional hologram corresponding to different defocus distance, creating a so-called multi-height dataset, which is then conventionally processed with a well-established Gerchberg-Saxton (GS) algorithm. YOSO is trained on computer-generated data derived from natural images, enabling strong generalization. The selected multi-scale ResNet architecture enables rapid training in under two hours on a mid-range workstation, which is done only once, enabling efficient inference thereafter. We further show that YOSO network can process inputs of varying spatial dimensions, allowing training on small inputs and direct inference on full-sized holograms while bypassing patch-and-stitch procedure. A further advantage of YOSO is its physics-consistent hologram padding, which replaces conventional zero or edge-value padding with a physically grounded approach compatible with the GS framework. The YOSO framework is tested on various systems (lens-based and lensless DIHM) and diverse samples: a resolution test target, adherent and suspended biological cells, and a mouse brain slice. The results show that YOSO is compatible with 3D objects and correctly recovers defocused object wave features, enabling holographic postprocessing such as numerical refocusing. The results of this work are available publicly as software for end-to-end implementation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents YOSO, a hybrid single-frame phase retrieval framework for digital in-line holographic microscopy. A multi-scale ResNet, trained exclusively on computer-generated holograms derived from natural images, synthesizes a second hologram at a different defocus distance; the resulting two-height dataset is then processed with the classical Gerchberg-Saxton algorithm. The approach is demonstrated on resolution targets, adherent/suspended cells, and mouse brain slices using both lens-based and lensless DIHM systems, with claims that it recovers defocused object-wave features for 3D objects and enables numerical refocusing. Additional features include variable input-size handling and physics-consistent hologram padding.
Significance. If quantitatively validated, YOSO would provide a practical route to multi-height phase retrieval from single acquisitions, which is valuable for dynamic or live-sample holographic imaging where mechanical scanning is undesirable. The hybrid design (AI data augmentation followed by physics-constrained iteration) and the use of synthetic natural-image training data for claimed generalization are conceptually attractive, as are the reported training speed and public code release. However, the current absence of error metrics, baselines, and ablation studies substantially limits the assessed significance.
major comments (4)
- [Results (biological samples and 3D objects)] Results section (experiments on biological samples and 3D objects): The central claim that YOSO 'correctly recovers defocused object wave features' and is 'compatible with 3D objects' rests entirely on qualitative visual comparisons and refocusing demonstrations. No quantitative metrics (RMSE, SSIM, phase-error histograms, or comparison against ground-truth multi-height GS) or error bars are reported for any real specimen, undermining support for the generalization from natural-image training data to phase-dominant biological objects.
- [Methods (training data and network)] Methods section (training data and network): The ResNet is trained solely on CGHs generated from natural images, yet the manuscript provides no quantitative assessment of domain shift, hologram-generation error, or propagation of those errors through the subsequent GS iterations when the input is a real biological hologram. This is load-bearing for the claim that the AI-generated second hologram has sufficiently small amplitude/phase errors for reliable GS convergence on sparse, depth-varying specimens.
- [Results (system comparisons)] Results section (system comparisons): No baseline comparisons are presented against single-height GS, other single-shot phase-retrieval algorithms, or conventional multi-height acquisition on the same samples. Without such controls it is impossible to quantify the incremental benefit of the AI-augmented dataset or to rule out that observed improvements are due to the GS algorithm alone.
- [Methods / supplementary material] Methods or supplementary material (ablation and sensitivity): The manuscript contains no ablation studies on the effect of hologram-generation error, defocus distance choice, or noise level on GS convergence, nor any sensitivity analysis showing how the claimed physics-consistent padding affects iteration stability. These omissions leave the robustness of the hybrid pipeline unquantified.
minor comments (3)
- [Abstract / Methods] The abstract states that training completes 'in under two hours on a mid-range workstation,' but no hardware specifications, batch sizes, or wall-clock timings appear in the main text or supplementary material.
- [Figures] Figure captions and legends would benefit from explicit indication of which panels show single-height GS, YOSO-augmented GS, and any reference multi-height results, together with scale bars and intensity/phase color maps.
- [Methods (padding)] The description of the physics-consistent padding scheme is high-level; a short equation or pseudocode block showing how the padded field is constructed and why it is compatible with the GS propagator would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive review and for acknowledging the conceptual appeal of the hybrid YOSO approach. We have carefully addressed each major comment below. Where the manuscript was lacking in quantitative support or controls, we will incorporate the requested analyses in the revised version and supplementary material. These changes will strengthen the validation of generalization to biological samples and the robustness of the pipeline.
read point-by-point responses
-
Referee: Results section (experiments on biological samples and 3D objects): The central claim that YOSO 'correctly recovers defocused object wave features' and is 'compatible with 3D objects' rests entirely on qualitative visual comparisons and refocusing demonstrations. No quantitative metrics (RMSE, SSIM, phase-error histograms, or comparison against ground-truth multi-height GS) or error bars are reported for any real specimen, undermining support for the generalization from natural-image training data to phase-dominant biological objects.
Authors: We agree that stronger quantitative support is needed for claims on biological samples. The current manuscript provides RMSE and SSIM metrics for the resolution target (a controlled real specimen) against known ground-truth features. For adherent/suspended cells and the mouse brain slice, pixel-wise ground truth is inherently difficult to obtain without a separate reference measurement. In the revision we will add: (i) quantitative comparison of refocusing sharpness (e.g., gradient magnitude and Tenengrad metrics) between single-height GS and YOSO+GS with error bars across multiple ROIs; (ii) consistency metrics between the two recovered planes; and (iii) an expanded discussion of why direct multi-height GS ground truth was not acquired for the 3D biological specimens. These additions will provide objective measures while transparently noting the limitations of ground-truth availability for thick phase objects. revision: partial
-
Referee: Methods section (training data and network): The ResNet is trained solely on CGHs generated from natural images, yet the manuscript provides no quantitative assessment of domain shift, hologram-generation error, or propagation of those errors through the subsequent GS iterations when the input is a real biological hologram. This is load-bearing for the claim that the AI-generated second hologram has sufficiently small amplitude/phase errors for reliable GS convergence on sparse, depth-varying specimens.
Authors: We accept that a dedicated domain-shift analysis is required. Although natural-image training enables broad generalization, we will add in the revised supplementary material: (i) quantitative error statistics (RMSE in amplitude and phase) of the network on a held-out test set of synthetic holograms generated from phase-dominant objects that mimic cell refractive-index distributions; (ii) propagation of these errors through GS iterations, showing convergence behavior and final phase fidelity; and (iii) a brief comparison of network output statistics on real biological holograms versus the synthetic training distribution. These results will directly quantify the residual errors and their impact on GS reliability for sparse, depth-varying specimens. revision: yes
-
Referee: Results section (system comparisons): No baseline comparisons are presented against single-height GS, other single-shot phase-retrieval algorithms, or conventional multi-height acquisition on the same samples. Without such controls it is impossible to quantify the incremental benefit of the AI-augmented dataset or to rule out that observed improvements are due to the GS algorithm alone.
Authors: We agree that explicit baselines are essential. The revised manuscript will include side-by-side results of single-height GS versus YOSO+GS on all experimental samples (resolution target, cells, brain slice), with quantitative metrics (refocusing sharpness, phase variance in background regions, and visual feature recovery). Where multi-height data were acquired on the same specimens, we will add direct comparison to conventional multi-height GS. For other single-shot methods we will include a representative comparison (e.g., to a transport-of-intensity-equation implementation) on at least the resolution target and one biological sample. These controls will allow readers to isolate the contribution of the AI-augmented second view. revision: yes
-
Referee: Methods or supplementary material (ablation and sensitivity): The manuscript contains no ablation studies on the effect of hologram-generation error, defocus distance choice, or noise level on GS convergence, nor any sensitivity analysis showing how the claimed physics-consistent padding affects iteration stability. These omissions leave the robustness of the hybrid pipeline unquantified.
Authors: We acknowledge the value of these robustness checks. In the revised supplementary material we will add: (i) ablation on synthesized-hologram error magnitude and its effect on GS convergence rate and final phase error; (ii) sensitivity curves for the choice of defocus distance used by the network; (iii) performance under varying input noise levels; and (iv) a direct comparison of physics-consistent padding versus zero/edge padding on iteration stability and residual error after a fixed number of GS iterations. All ablations will be performed on both synthetic and experimental data to quantify the pipeline’s practical robustness. revision: yes
Circularity Check
No significant circularity: hybrid DL+classical GS pipeline relies on external synthetic training and independent algorithm
full rationale
The YOSO method trains a multi-scale ResNet exclusively on computer-generated holograms derived from natural images to synthesize a second defocused hologram; this output is then processed by the standard, independently derived Gerchberg-Saxton algorithm whose iterations and convergence criteria are not redefined or fitted within the paper. No equation or claim reduces the recovered object wave to a quantity defined by the network's own parameters, nor does any load-bearing premise rest on a self-citation chain. Generalization to real biological samples is presented as empirical validation rather than a closed derivation, and the physics-consistent padding is an implementation detail compatible with GS rather than a self-referential step. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- ResNet weights
axioms (2)
- domain assumption The Fresnel propagation model used to generate training holograms accurately represents the physical wave propagation in the target microscope setups.
- domain assumption Gerchberg-Saxton iterations converge to the correct phase when supplied with a sufficiently accurate second-height hologram.
Reference graph
Works this paper leans on
-
[1]
https://www.kaggle.com/datasets/imsparsh/flowers -dataset
Kaggle Flowers Dataset. https://www.kaggle.com/datasets/imsparsh/flowers -dataset
-
[2]
Luo H, Xu J, Zhong L, Lu X and Tian J 2022 Diffraction-Net: a robust single-shot holography for multi-distance lensless imaging Opt. Express 30 41724–40
work page 2022
-
[3]
Agbana T E, Gong H, Amoah A S, Bezzubik V, Verhaegen M and Vdovin G 2017 Aliasing, coherence, and resolution in a lensless holographic microscope Opt. Lett. 42 2271–4
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.