YOSO: single-frame Gerchberg-Saxton phase retrieval with AI-based data augmentation for in-line holography

Adam Walocha; Aleksandra Rutkowska; Jos\'e \'Angel Picazo-Bueno; Julianna Winnik; Maciej Trusiak; Maria Cywi\'nska; Marzena Stefaniuk; Miko{\l}aj Rogalski; Piotr Arcab; Vicente Mic\'o

arxiv: 2604.27777 · v1 · submitted 2026-04-30 · ⚛️ physics.optics

YOSO: single-frame Gerchberg-Saxton phase retrieval with AI-based data augmentation for in-line holography

Julianna Winnik , Adam Walocha , Wojciech Ogonowski , Wiktor Forjasz , Piotr Arcab , Miko{\l}aj Rogalski , Aleksandra Rutkowska , Marzena Stefaniuk

show 4 more authors

Jos\'e \'Angel Picazo-Bueno Vicente Mic\'o Maciej Trusiak Maria Cywi\'nska

This is my paper

Pith reviewed 2026-05-07 07:24 UTC · model grok-4.3

classification ⚛️ physics.optics

keywords digital in-line holographyphase retrievalGerchberg-Saxton algorithmdeep learningsingle-frame reconstructionbiological microscopynumerical refocusing

0 comments

The pith

A neural network generates a second hologram from one frame so the Gerchberg-Saxton algorithm can recover phase for three-dimensional biological samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents YOSO as a single-frame method for phase retrieval in digital in-line holographic microscopy. A trained network creates an artificial second hologram at a shifted defocus distance, producing a two-height dataset that the established Gerchberg-Saxton algorithm then processes to recover the complex object wave. Training occurs exclusively on simulated holograms derived from natural images, which supports generalization to real experimental data from cells and tissue. The approach avoids the need for multiple physical captures while still allowing numerical refocusing and other holographic post-processing on the recovered wave.

Core claim

YOSO employs a multi-scale ResNet to predict an additional hologram at a different defocus distance from a single input hologram. The original and generated holograms feed into the Gerchberg-Saxton iterative procedure, which alternates propagation between the hologram and object planes until the complex field converges. Physics-consistent padding replaces conventional padding to maintain consistency with wave propagation. Experiments on resolution targets, adherent and suspended cells, and mouse brain slices show that the method recovers accurate amplitude and phase, including defocused features in three-dimensional objects, across both lens-based and lensless systems.

What carries the argument

The YOSO multi-scale ResNet that generates a second defocused hologram from a single input hologram, which is then processed together with the original by the Gerchberg-Saxton algorithm.

If this is right

The recovered object wave supports numerical propagation to arbitrary planes for refocusing of three-dimensional samples.
The framework applies equally to lens-based and lensless digital in-line holographic microscopy setups.
Full-sized holograms can be processed directly without patch division and stitching.
Training runs once on a standard workstation, after which inference is rapid for new inputs.
The method handles diverse samples including resolution targets, adherent cells, suspended cells, and tissue slices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The hybrid AI-augmented Gerchberg-Saxton approach could be paired with other iterative phase retrieval methods that rely on multiple views.
Single-frame operation may shorten acquisition time for dynamic biological processes that change between successive captures.
Training on natural-image simulations suggests the technique could transfer to other computational imaging tasks where matched experimental training data is limited.
Embedding the network into acquisition hardware could simplify experimental workflows by removing the requirement for precise multi-height positioning stages.

Load-bearing premise

The network trained only on computer-generated holograms from natural images produces a second hologram whose phase and amplitude errors are small enough for the Gerchberg-Saxton iterations to converge to the correct object wave on real biological specimens.

What would settle it

Direct comparison on identical real biological samples between the phase map obtained from YOSO plus Gerchberg-Saxton and the phase map obtained from conventional multi-height experimental captures would reveal large errors or failed refocusing if the central claim is false.

Figures

Figures reproduced from arXiv: 2604.27777 by Adam Walocha, Aleksandra Rutkowska, Jos\'e \'Angel Picazo-Bueno, Julianna Winnik, Maciej Trusiak, Maria Cywi\'nska, Marzena Stefaniuk, Miko{\l}aj Rogalski, Piotr Arcab, Vicente Mic\'o, Wiktor Forjasz, Wojciech Ogonowski.

**Figure 1.** Figure 1: Schematic illustration of the YOSO framework. Supervised training of multi-scale ResNet using computer-generated training data enables estimation of second defocused hologram 𝐼 ̃ 2 , based on the physically captured hologram 𝐼1 . The pair of holograms is used as input to conventional Gerchberg-Saxton (GS) algorithm, which, finally, retrieves the object wave 𝑢̃𝑜 . 3. Results 3.1 Simulation study The perform… view at source ↗

**Figure 4.** Figure 4: Experimental results for the human cheek cells sample. Scalability test: (a) full field of view (2048 × 2048) of the in-line hologram 𝐼1 with central 256 × 256 region marked with a red solid-line rectangle. Both the full-sized and cropped holograms were input to the DNN, yielding two results, which central 100 × 100 regions ((b1) and (b2), respectively) show perfect agreement. The results are compared with… view at source ↗

read the original abstract

We present YOSO (You Only Shot Once), a single-frame phase retrieval framework for digital in-line holographic microscopy (DIHM) in which supervised deep learning is used to numerically generate an additional hologram corresponding to different defocus distance, creating a so-called multi-height dataset, which is then conventionally processed with a well-established Gerchberg-Saxton (GS) algorithm. YOSO is trained on computer-generated data derived from natural images, enabling strong generalization. The selected multi-scale ResNet architecture enables rapid training in under two hours on a mid-range workstation, which is done only once, enabling efficient inference thereafter. We further show that YOSO network can process inputs of varying spatial dimensions, allowing training on small inputs and direct inference on full-sized holograms while bypassing patch-and-stitch procedure. A further advantage of YOSO is its physics-consistent hologram padding, which replaces conventional zero or edge-value padding with a physically grounded approach compatible with the GS framework. The YOSO framework is tested on various systems (lens-based and lensless DIHM) and diverse samples: a resolution test target, adherent and suspended biological cells, and a mouse brain slice. The results show that YOSO is compatible with 3D objects and correctly recovers defocused object wave features, enabling holographic postprocessing such as numerical refocusing. The results of this work are available publicly as software for end-to-end implementation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

YOSO trains a ResNet on natural-image CGHs to synthesize a second hologram so single-shot in-line data can run standard multi-height Gerchberg-Saxton, but the biological results are shown only through pictures.

read the letter

YOSO trains a ResNet on computer-generated holograms from natural images to create a second defocused hologram from one input frame. The pair then goes into the usual Gerchberg-Saxton routine. This removes the need for mechanical height scanning while keeping the phase-retrieval step classical and well-understood. The network is trained once, runs fast, and accepts full-size inputs without patch stitching. They also replace zero-padding with a physics-consistent scheme that fits the propagation model used in GS. Those choices are practical and worth noting. The paper tests the pipeline on resolution targets, adherent and suspended cells, and a mouse brain slice across both lens-based and lensless setups, and shows that the recovered waves support numerical refocusing on 3D objects. Releasing the full end-to-end code is helpful for anyone who wants to reproduce or adapt it. The central limitation is the evidence. All reported outcomes are qualitative visual comparisons and refocusing demonstrations. There are no phase-error numbers, no RMSE or SSIM against ground truth, no direct side-by-side with actual multi-height acquisitions, and no ablation on how hologram-generation error affects GS convergence. Training on natural-image statistics for sparse, phase-dominant biological samples is plausible but unproven here, especially for thicker 3D tissue. The stress-test concern about generalization therefore lands. This work is aimed at experimental groups doing live-cell or dynamic DIHM who want a software-only speed-up. A reader who needs a working single-frame method with public code will get immediate value. It deserves peer review because the idea is clean, the implementation details are concrete, and the code release lets referees and readers check the claims directly. The results section simply needs quantitative benchmarks and baseline comparisons before it can be trusted for routine use.

Referee Report

4 major / 3 minor

Summary. The manuscript presents YOSO, a hybrid single-frame phase retrieval framework for digital in-line holographic microscopy. A multi-scale ResNet, trained exclusively on computer-generated holograms derived from natural images, synthesizes a second hologram at a different defocus distance; the resulting two-height dataset is then processed with the classical Gerchberg-Saxton algorithm. The approach is demonstrated on resolution targets, adherent/suspended cells, and mouse brain slices using both lens-based and lensless DIHM systems, with claims that it recovers defocused object-wave features for 3D objects and enables numerical refocusing. Additional features include variable input-size handling and physics-consistent hologram padding.

Significance. If quantitatively validated, YOSO would provide a practical route to multi-height phase retrieval from single acquisitions, which is valuable for dynamic or live-sample holographic imaging where mechanical scanning is undesirable. The hybrid design (AI data augmentation followed by physics-constrained iteration) and the use of synthetic natural-image training data for claimed generalization are conceptually attractive, as are the reported training speed and public code release. However, the current absence of error metrics, baselines, and ablation studies substantially limits the assessed significance.

major comments (4)

[Results (biological samples and 3D objects)] Results section (experiments on biological samples and 3D objects): The central claim that YOSO 'correctly recovers defocused object wave features' and is 'compatible with 3D objects' rests entirely on qualitative visual comparisons and refocusing demonstrations. No quantitative metrics (RMSE, SSIM, phase-error histograms, or comparison against ground-truth multi-height GS) or error bars are reported for any real specimen, undermining support for the generalization from natural-image training data to phase-dominant biological objects.
[Methods (training data and network)] Methods section (training data and network): The ResNet is trained solely on CGHs generated from natural images, yet the manuscript provides no quantitative assessment of domain shift, hologram-generation error, or propagation of those errors through the subsequent GS iterations when the input is a real biological hologram. This is load-bearing for the claim that the AI-generated second hologram has sufficiently small amplitude/phase errors for reliable GS convergence on sparse, depth-varying specimens.
[Results (system comparisons)] Results section (system comparisons): No baseline comparisons are presented against single-height GS, other single-shot phase-retrieval algorithms, or conventional multi-height acquisition on the same samples. Without such controls it is impossible to quantify the incremental benefit of the AI-augmented dataset or to rule out that observed improvements are due to the GS algorithm alone.
[Methods / supplementary material] Methods or supplementary material (ablation and sensitivity): The manuscript contains no ablation studies on the effect of hologram-generation error, defocus distance choice, or noise level on GS convergence, nor any sensitivity analysis showing how the claimed physics-consistent padding affects iteration stability. These omissions leave the robustness of the hybrid pipeline unquantified.

minor comments (3)

[Abstract / Methods] The abstract states that training completes 'in under two hours on a mid-range workstation,' but no hardware specifications, batch sizes, or wall-clock timings appear in the main text or supplementary material.
[Figures] Figure captions and legends would benefit from explicit indication of which panels show single-height GS, YOSO-augmented GS, and any reference multi-height results, together with scale bars and intensity/phase color maps.
[Methods (padding)] The description of the physics-consistent padding scheme is high-level; a short equation or pseudocode block showing how the padded field is constructed and why it is compatible with the GS propagator would improve reproducibility.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their constructive review and for acknowledging the conceptual appeal of the hybrid YOSO approach. We have carefully addressed each major comment below. Where the manuscript was lacking in quantitative support or controls, we will incorporate the requested analyses in the revised version and supplementary material. These changes will strengthen the validation of generalization to biological samples and the robustness of the pipeline.

read point-by-point responses

Referee: Results section (experiments on biological samples and 3D objects): The central claim that YOSO 'correctly recovers defocused object wave features' and is 'compatible with 3D objects' rests entirely on qualitative visual comparisons and refocusing demonstrations. No quantitative metrics (RMSE, SSIM, phase-error histograms, or comparison against ground-truth multi-height GS) or error bars are reported for any real specimen, undermining support for the generalization from natural-image training data to phase-dominant biological objects.

Authors: We agree that stronger quantitative support is needed for claims on biological samples. The current manuscript provides RMSE and SSIM metrics for the resolution target (a controlled real specimen) against known ground-truth features. For adherent/suspended cells and the mouse brain slice, pixel-wise ground truth is inherently difficult to obtain without a separate reference measurement. In the revision we will add: (i) quantitative comparison of refocusing sharpness (e.g., gradient magnitude and Tenengrad metrics) between single-height GS and YOSO+GS with error bars across multiple ROIs; (ii) consistency metrics between the two recovered planes; and (iii) an expanded discussion of why direct multi-height GS ground truth was not acquired for the 3D biological specimens. These additions will provide objective measures while transparently noting the limitations of ground-truth availability for thick phase objects. revision: partial
Referee: Methods section (training data and network): The ResNet is trained solely on CGHs generated from natural images, yet the manuscript provides no quantitative assessment of domain shift, hologram-generation error, or propagation of those errors through the subsequent GS iterations when the input is a real biological hologram. This is load-bearing for the claim that the AI-generated second hologram has sufficiently small amplitude/phase errors for reliable GS convergence on sparse, depth-varying specimens.

Authors: We accept that a dedicated domain-shift analysis is required. Although natural-image training enables broad generalization, we will add in the revised supplementary material: (i) quantitative error statistics (RMSE in amplitude and phase) of the network on a held-out test set of synthetic holograms generated from phase-dominant objects that mimic cell refractive-index distributions; (ii) propagation of these errors through GS iterations, showing convergence behavior and final phase fidelity; and (iii) a brief comparison of network output statistics on real biological holograms versus the synthetic training distribution. These results will directly quantify the residual errors and their impact on GS reliability for sparse, depth-varying specimens. revision: yes
Referee: Results section (system comparisons): No baseline comparisons are presented against single-height GS, other single-shot phase-retrieval algorithms, or conventional multi-height acquisition on the same samples. Without such controls it is impossible to quantify the incremental benefit of the AI-augmented dataset or to rule out that observed improvements are due to the GS algorithm alone.

Authors: We agree that explicit baselines are essential. The revised manuscript will include side-by-side results of single-height GS versus YOSO+GS on all experimental samples (resolution target, cells, brain slice), with quantitative metrics (refocusing sharpness, phase variance in background regions, and visual feature recovery). Where multi-height data were acquired on the same specimens, we will add direct comparison to conventional multi-height GS. For other single-shot methods we will include a representative comparison (e.g., to a transport-of-intensity-equation implementation) on at least the resolution target and one biological sample. These controls will allow readers to isolate the contribution of the AI-augmented second view. revision: yes
Referee: Methods or supplementary material (ablation and sensitivity): The manuscript contains no ablation studies on the effect of hologram-generation error, defocus distance choice, or noise level on GS convergence, nor any sensitivity analysis showing how the claimed physics-consistent padding affects iteration stability. These omissions leave the robustness of the hybrid pipeline unquantified.

Authors: We acknowledge the value of these robustness checks. In the revised supplementary material we will add: (i) ablation on synthesized-hologram error magnitude and its effect on GS convergence rate and final phase error; (ii) sensitivity curves for the choice of defocus distance used by the network; (iii) performance under varying input noise levels; and (iv) a direct comparison of physics-consistent padding versus zero/edge padding on iteration stability and residual error after a fixed number of GS iterations. All ablations will be performed on both synthetic and experimental data to quantify the pipeline’s practical robustness. revision: yes

Circularity Check

0 steps flagged

No significant circularity: hybrid DL+classical GS pipeline relies on external synthetic training and independent algorithm

full rationale

The YOSO method trains a multi-scale ResNet exclusively on computer-generated holograms derived from natural images to synthesize a second defocused hologram; this output is then processed by the standard, independently derived Gerchberg-Saxton algorithm whose iterations and convergence criteria are not redefined or fitted within the paper. No equation or claim reduces the recovered object wave to a quantity defined by the network's own parameters, nor does any load-bearing premise rest on a self-citation chain. Generalization to real biological samples is presented as empirical validation rather than a closed derivation, and the physics-consistent padding is an implementation detail compatible with GS rather than a self-referential step. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the learned mapping from single to dual-height holograms being sufficiently accurate for iterative phase retrieval; this mapping is obtained by supervised training rather than derived from first principles, introducing many implicit free parameters inside the network.

free parameters (1)

ResNet weights
All network parameters are fitted during supervised training on synthetic hologram pairs; the claim depends on these weights producing holograms whose errors are tolerable by GS.

axioms (2)

domain assumption The Fresnel propagation model used to generate training holograms accurately represents the physical wave propagation in the target microscope setups.
The abstract states training data are computer-generated; correctness of the forward model is presupposed.
domain assumption Gerchberg-Saxton iterations converge to the correct phase when supplied with a sufficiently accurate second-height hologram.
The method delegates final phase recovery to the classical GS algorithm whose convergence properties are taken as given.

pith-pipeline@v0.9.0 · 5627 in / 1640 out tokens · 45807 ms · 2026-05-07T07:24:01.914562+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

https://www.kaggle.com/datasets/imsparsh/flowers -dataset

Kaggle Flowers Dataset. https://www.kaggle.com/datasets/imsparsh/flowers -dataset

work page
[2]

Express 30 41724–40

Luo H, Xu J, Zhong L, Lu X and Tian J 2022 Diffraction-Net: a robust single-shot holography for multi-distance lensless imaging Opt. Express 30 41724–40

work page 2022
[3]

Agbana T E, Gong H, Amoah A S, Bezzubik V, Verhaegen M and Vdovin G 2017 Aliasing, coherence, and resolution in a lensless holographic microscope Opt. Lett. 42 2271–4

work page 2017

[1] [1]

https://www.kaggle.com/datasets/imsparsh/flowers -dataset

Kaggle Flowers Dataset. https://www.kaggle.com/datasets/imsparsh/flowers -dataset

work page

[2] [2]

Express 30 41724–40

Luo H, Xu J, Zhong L, Lu X and Tian J 2022 Diffraction-Net: a robust single-shot holography for multi-distance lensless imaging Opt. Express 30 41724–40

work page 2022

[3] [3]

Agbana T E, Gong H, Amoah A S, Bezzubik V, Verhaegen M and Vdovin G 2017 Aliasing, coherence, and resolution in a lensless holographic microscope Opt. Lett. 42 2271–4

work page 2017