Dual-Control Frequency-Aware Diffusion Model for Depth-Dependent Optical Microrobot Microscopy Image Generation

Dandan Zhang; Jian-Qing Zheng; Kangyi Lu; Lan Wei; Zongcai Tan

arxiv: 2604.11680 · v1 · submitted 2026-04-13 · 💻 cs.RO

Dual-Control Frequency-Aware Diffusion Model for Depth-Dependent Optical Microrobot Microscopy Image Generation

Lan Wei , Zongcai Tan , Kangyi Lu , Jian-Qing Zheng , Dandan Zhang This is my paper

Pith reviewed 2026-05-10 15:06 UTC · model grok-4.3

classification 💻 cs.RO

keywords diffusion modelsimage synthesisoptical microrobotsdepth estimationfrequency domainControlNetmicroscopy imagingoptical tweezers

0 comments

The pith

A dual-control diffusion model generates physically consistent, depth-dependent microscopy images of optical microrobots from small datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a generative AI method to create training data for 3D perception of optical microrobots, which are hard to image in large quantities due to fabrication challenges. Existing GAN approaches fail to capture how images change with depth due to optical effects like diffraction and defocus. The proposed Du-FreqNet uses two ControlNet branches to control the generation with 3D geometry and depth information, plus a special loss that supervises the frequency content of the image based on distance from the focal plane. This allows the model to produce realistic images that improve performance on tasks like estimating the robot's 3D pose and depth.

Core claim

Du-FreqNet is a dual-control, frequency-aware diffusion model that encodes microrobot 3D point clouds and depth-specific mesh layers through separate ControlNet branches and applies an adaptive frequency-domain loss that reweights components based on distance to the focal plane using differentiable FFT. This enables controllable synthesis of depth-dependent microscopy images that match physical optical characteristics.

What carries the argument

The dual ControlNet branches for 3D point clouds and depth mesh layers, combined with the adaptive frequency-domain loss supervised via differentiable FFT.

If this is right

Achieves controllable depth-dependent image synthesis from limited data.
Improves SSIM by 20.7% compared to baseline methods.
Generalizes to unseen poses not seen during training.
Enhances accuracy of downstream 3D pose and depth estimation tasks.
Supports robust closed-loop control in microrobotic systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may allow researchers to train perception models without extensive real-world data collection for similar microscale optical systems.
Frequency reweighting based on focal distance could be applied to generate synthetic data for other depth-sensitive imaging modalities.
Improved image generation might accelerate the development of autonomous microrobots for biological applications like cell manipulation.
The approach highlights the value of incorporating physical priors, such as Fourier transforms, into generative models for scientific imaging.

Load-bearing premise

The adaptive frequency-domain loss, which reweights high- and low-frequency components according to distance to the focal plane, accurately captures real physical diffraction and defocus effects without adding artifacts or overfitting the small dataset.

What would settle it

If real microscopy images at known depths show frequency distributions that do not match those produced by the model when conditioned on the same depth and pose, or if performance on downstream tasks does not improve when using the generated images.

Figures

Figures reproduced from arXiv: 2604.11680 by Dandan Zhang, Jian-Qing Zheng, Kangyi Lu, Lan Wei, Zongcai Tan.

**Figure 2.** Figure 2: Overview of the proposed Du-FreqNet framework. (A) Multi-Modal Input Conditions: The model utilises two distinct geometric priors: volumetric [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Visualisation of image synthesis results. White numbers indicate the SSIM value relative to the Ground Truth. Our method preserves high-frequency [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative generalization results on unseen poses. The numbers [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Optical microrobots actuated by optical tweezers (OT) are important for cell manipulation and microscale assembly, but their autonomous operation depends on accurate 3D perception. Developing such perception systems is challenging because large-scale, high-quality microscopy datasets are scarce, owing to complex fabrication processes and labor-intensive annotation. Although generative AI offers a promising route for data augmentation, existing generative adversarial network (GAN)-based methods struggle to reproduce key optical characteristics, particularly depth-dependent diffraction and defocus effects. To address this limitation, we propose Du-FreqNet, a dual-control, frequency-aware diffusion model for physically consistent microscopy image synthesis. The framework features two independent ControlNet branches to encode microrobot 3D point clouds and depth-specific mesh layers, respectively. We introduce an adaptive frequency-domain loss that dynamically reweights high- and low-frequency components based on the distance to the focal plane. By leveraging differentiable FFT-based supervision, Du-FreqNet captures physically meaningful frequency distributions often missed by pixel-space methods. Trained on a limited dataset (e.g., 80 images per pose), our model achieves controllable, depth-dependent image synthesis, improving SSIM by 20.7% over baselines. Extensive experiments demonstrate that Du-FreqNet generalizes effectively to unseen poses and significantly enhances downstream tasks, including 3D pose and depth estimation, thereby facilitating robust closed-loop control in microrobotic systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Du-FreqNet adapts dual ControlNets and a frequency reweighting loss to generate depth-dependent microrobot images, delivering SSIM gains and downstream task help on limited data, but the loss lacks optical derivation and the small set invites overfitting questions.

read the letter

The paper introduces Du-FreqNet, a diffusion model with dual independent ControlNet branches for 3D point clouds and depth meshes, plus an adaptive frequency loss for depth-dependent microrobot image generation. This combination is new for optical microrobotics, where data is hard to get. They train on limited data and report a 20.7% SSIM improvement over baselines, with better results on 3D pose and depth estimation for unseen poses. That directly helps the perception side of autonomous microrobots for cell work and assembly. The setup separates the geometry control from the depth effects, which fits the problem. Using differentiable FFT to reweight frequencies based on focal distance is a smart way to target the optical characteristics that pixel losses ignore. The main soft spot is the small training set of 80 images per pose. This raises a legitimate overfitting risk, and the abstract lacks details on baseline implementations, error bars, or exact evaluation criteria. The frequency reweighting is heuristic rather than derived from wave optics or the optical transfer function, so the gains may not guarantee physical accuracy on new cases. Without ablations against a simulator, it's possible the improvements come from the dual architecture alone. This is for people working on data generation for scientific robotics or microscopy-based perception. A reader focused on practical applications in microscale systems would get concrete value from the reported downstream task boosts. It has enough substance and addresses a real gap, so it should go to peer review for a closer look at the experiments and the loss justification.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces Du-FreqNet, a dual-control frequency-aware diffusion model for synthesizing depth-dependent optical microrobot microscopy images. It uses two independent ControlNet branches to encode 3D point clouds and depth-specific mesh layers, respectively, together with an adaptive frequency-domain loss that dynamically reweights high- and low-frequency components via differentiable FFT according to distance to the focal plane. Trained on a small dataset (80 images per pose), the model is reported to achieve controllable depth-dependent synthesis with a 20.7% SSIM improvement over baselines, effective generalization to unseen poses, and improved performance on downstream 3D pose and depth estimation tasks.

Significance. If the frequency-aware loss produces images whose frequency content is physically consistent with diffraction and defocus rather than merely fitting training-set statistics, the approach would provide a useful data-augmentation tool for perception in optical microrobotics, where real annotated datasets are scarce. The dual-control architecture is a constructive design choice for separating pose and depth conditioning, and the use of differentiable FFT supervision is a clear technical strength for frequency-aware generation.

major comments (2)

[Method (adaptive frequency-domain loss)] Method section (adaptive frequency-domain loss): The loss is described as dynamically reweighting FFT components based on distance to the focal plane, yet no derivation from the optical transfer function, pupil function, or measured point-spread function is supplied. Without such grounding or an ablation against a physics-based simulator, it remains unclear whether the 20.7% SSIM gain and downstream-task improvements reflect physical consistency or overfitting to the limited training distribution.
[Experiments] Experiments section: The reported 20.7% SSIM improvement and generalization to unseen poses are presented without baseline implementation details, statistical significance tests, error bars, or explicit train/test split and data-exclusion criteria. Given the small training set size (80 images per pose), these omissions make it difficult to evaluate the robustness of the central empirical claims.

minor comments (1)

[Abstract] The abstract and results would benefit from a brief statement of the total number of distinct poses and the precise train/validation/test partitioning to allow readers to assess the scale of the generalization experiments.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us identify areas for improvement in clarity and rigor. We address each major comment point by point below and will revise the manuscript accordingly to strengthen the presentation of our method and experiments.

read point-by-point responses

Referee: Method section (adaptive frequency-domain loss): The loss is described as dynamically reweighting FFT components based on distance to the focal plane, yet no derivation from the optical transfer function, pupil function, or measured point-spread function is supplied. Without such grounding or an ablation against a physics-based simulator, it remains unclear whether the 20.7% SSIM gain and downstream-task improvements reflect physical consistency or overfitting to the limited training distribution.

Authors: We appreciate the referee highlighting this point. The adaptive frequency-domain loss was developed from empirical observations of frequency attenuation in our microscopy dataset, where high-frequency components diminish with increasing distance from the focal plane due to defocus. The reweighting is implemented via a differentiable FFT that modulates the loss based on a distance-dependent schedule derived from measured image statistics rather than a closed-form optical model. In the revised manuscript, we will expand the method section with the explicit weighting formula, its empirical motivation, and a new ablation comparing the adaptive loss to a uniform-frequency baseline. We will also add a limitations paragraph acknowledging that the approach approximates observed optical effects without a direct derivation from the optical transfer function or pupil function, and that future work could incorporate physics-based simulators for stricter consistency. These changes will clarify the distinction between data-driven frequency awareness and full physical modeling while demonstrating that the reported gains are supported by improved generalization to unseen poses. revision: partial
Referee: Experiments section: The reported 20.7% SSIM improvement and generalization to unseen poses are presented without baseline implementation details, statistical significance tests, error bars, or explicit train/test split and data-exclusion criteria. Given the small training set size (80 images per pose), these omissions make it difficult to evaluate the robustness of the central empirical claims.

Authors: We agree that these details are essential for assessing robustness, particularly with the modest dataset size. In the revised manuscript, we will add a dedicated experimental details subsection that includes full baseline implementations (architectures, hyperparameters, and training protocols), results with standard error bars computed over five independent runs, and statistical significance testing (paired t-tests with p-values) for the SSIM improvements. The train/test protocol will be explicitly stated, specifying that 80 images per pose were used for training with a held-out set of unseen poses for generalization evaluation, along with the precise data-exclusion criteria applied during collection. These additions will enable readers to better judge the reliability of the empirical claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical training and external evaluation

full rationale

The paper proposes a dual-ControlNet diffusion architecture with an adaptive frequency-domain loss, trains it on a small set of real microscopy images (80 per pose), and reports SSIM gains plus downstream improvements on held-out poses and tasks. No load-bearing step reduces a claimed prediction or result to its own inputs by construction, no self-citation chain justifies a uniqueness claim, and the loss is presented as a design choice rather than a derived identity. The central claims rest on standard empirical benchmarks against external image data and baselines, making the work self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach relies on standard diffusion model assumptions plus domain-specific claims about frequency content in optical microscopy; limited free parameters are introduced in the loss weighting.

free parameters (1)

frequency reweighting schedule
Dynamically adjusts high- and low-frequency emphasis based on focal-plane distance; exact functional form and hyperparameters are not specified in the abstract.

axioms (1)

domain assumption Differentiable FFT supervision captures physically meaningful frequency distributions missed by pixel-space losses
Invoked to justify the adaptive frequency loss; treated as self-evident for optical microscopy.

pith-pipeline@v0.9.0 · 5565 in / 1442 out tokens · 80388 ms · 2026-05-10T15:06:50.830113+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

[1]

Distributed force control for microrobot manipulation via planar multi-spot optical tweezer,

D. Zhang, A. Barbot, B. Lo, and G.-Z. Yang, “Distributed force control for microrobot manipulation via planar multi-spot optical tweezer,” Advanced Optical Materials, vol. 8, no. 21, p. 2000543, 2020

work page 2020
[2]

Optical chiral microrobot for out-of-plane rotation,

A. M. Ali, E. Gerena, J. A. I. Mart ´ınez, G. Ulliac, B. Lemkalli, A. Mohand-Ousaid, S. Haliyo, A. Bolopion, and M. Kadic, “Optical chiral microrobot for out-of-plane rotation,”Communications Physics, vol. 8, no. 1, p. 230, 2025

work page 2025
[3]

Optical-driven miniature robots: driving mechanism, applications and future trends,

X. Wang, S. Jia, Y . Gao, C. Liu, Y . Wang, A. Liu, and W. Yang, “Optical-driven miniature robots: driving mechanism, applications and future trends,”Lab on a Chip, vol. 25, pp. 4473–4507, 2025

work page 2025
[4]

Optical tweezers in single-molecule biophysics,

C. J. Bustamante, Y . R. Chemla, S. Liu, and M. D. Wang, “Optical tweezers in single-molecule biophysics,”Nature Reviews Methods Primers, vol. 1, no. 1, p. 25, 2021

work page 2021
[5]

Physics-informed machine learn- ing with adaptive grids for optical microrobot depth estimation,

L. Wei, L. Genoud, and D. Zhang, “Physics-informed machine learn- ing with adaptive grids for optical microrobot depth estimation,” in 2025 IEEE International Conference on Cyborg and Bionic Systems (CBS). IEEE, 2025, pp. 1–6

work page 2025
[6]

A dataset and benchmarks for deep learning- based optical microrobot pose and depth perception,

L. Wei and D. Zhang, “A dataset and benchmarks for deep learning- based optical microrobot pose and depth perception,” in2025 Interna- tional Conference on Manipulation, Automation and Robotics at Small Scales (MARSS). IEEE, 2025, pp. 1–8

work page 2025
[7]

Diffusion models in medical imaging: A comprehensive survey,

L. Qiegen, G. Yu, W. Weiwen, S. Hongming, and L. Dong, “Diffusion models in medical imaging: A comprehensive survey,”CT Theory and Applications, vol. 34, no. 3, pp. 506–524, 2025

work page 2025
[8]

Deep learning approaches for data augmentation in medical imaging: a review,

A. Kebaili, J. Lapuyade-Lahorgue, and S. Ruan, “Deep learning approaches for data augmentation in medical imaging: a review,” Journal of imaging, vol. 9, no. 4, p. 81, 2023

work page 2023
[9]

Medical image data augmentation: techniques, compar- isons and interpretations,

E. Goceri, “Medical image data augmentation: techniques, compar- isons and interpretations,”Artificial intelligence review, vol. 56, no. 11, pp. 12 561–12 605, 2023

work page 2023
[10]

A review and systematic guide to counteracting medical data scarcity for ai applications,

F. Gr ¨oger, L. Amruthalingam, S. Lionetti, A. A. Navarini, F. Ille, and M. Pouly, “A review and systematic guide to counteracting medical data scarcity for ai applications,”Computer Methods and Programs in Biomedicine Update, p. 100220, 2025

work page 2025
[11]

Data-driven microscopic pose and depth estimation for optical microrobot manipulation,

D. Zhang, F. P.-W. Lo, J.-Q. Zheng, W. Bai, G.-Z. Yang, and B. Lo, “Data-driven microscopic pose and depth estimation for optical microrobot manipulation,”Acs Photonics, vol. 7, no. 11, pp. 3003– 3014, 2020

work page 2020
[12]

Fabrication and optical manipulation of micro-robots for biomedical applications,

D. Zhang, Y . Ren, A. Barbot, F. Seichepine, B. Lo, Z.-C. Ma, and G.-Z. Yang, “Fabrication and optical manipulation of micro-robots for biomedical applications,”Matter, vol. 5, no. 10, pp. 3135–3160, 2022

work page 2022
[13]

Incorporating the image formation process into deep learning improves network performance,

Y . Li, Y . Su, M. Guo, X. Han, J. Liu, H. D. Vishwasrao, X. Li, R. Christensen, T. Sengupta, M. W. Moyleet al., “Incorporating the image formation process into deep learning improves network performance,”Nature Methods, vol. 19, no. 11, pp. 1427–1437, 2022

work page 2022
[14]

Deep self-learning enables fast, high- fidelity isotropic resolution restoration for volumetric fluorescence microscopy,

K. Ning, B. Lu, X. Wang, X. Zhang, S. Nie, T. Jiang, A. Li, G. Fan, X. Wang, Q. Luoet al., “Deep self-learning enables fast, high- fidelity isotropic resolution restoration for volumetric fluorescence microscopy,”Light: Science & Applications, vol. 12, no. 1, p. 204, 2023

work page 2023
[15]

Deep learning- based aberration compensation improves contrast and resolution in fluorescence microscopy,

M. Guo, Y . Wu, C. M. Hobson, Y . Su, S. Qian, E. Krueger, R. Christensen, G. Kroeschell, J. Bui, M. Chawet al., “Deep learning- based aberration compensation improves contrast and resolution in fluorescence microscopy,”Nature Communications, vol. 16, no. 1, p. 313, 2025

work page 2025
[16]

A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis,

Y . Liu, T. Jiang, R. Li, L. Yuan, M. Grzegorzek, C. Li, and X. Li, “A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis,”Frontiers in Medicine, vol. 12, p. 1551894, 2025

work page 2025
[17]

Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy,

C. Qiao, Y . Zeng, Q. Meng, X. Chen, H. Chen, T. Jiang, R. Wei, J. Guo, W. Fu, H. Luet al., “Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy,” Nature communications, vol. 15, no. 1, p. 4180, 2024

work page 2024
[18]

Pixel super-resolved virtual staining of label-free tissue using diffusion models,

Y . Zhang, L. Huang, N. Pillar, Y . Li, H. Chen, and A. Ozcan, “Pixel super-resolved virtual staining of label-free tissue using diffusion models,”Nature Communications, vol. 16, no. 1, p. 5016, 2025

work page 2025
[19]

Interactive ot gym: A reinforcement learning- based interactive optical tweezer (ot)-driven microrobotics simulation platform,

Z. Tan and D. Zhang, “Interactive ot gym: A reinforcement learning- based interactive optical tweezer (ot)-driven microrobotics simulation platform,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1–7

work page 2025
[20]

Micro-object pose estimation with sim-to-real transfer learning using small dataset,

D. Zhang, A. Barbot, F. Seichepine, F. P.-W. Lo, W. Bai, G.-Z. Yang, and B. Lo, “Micro-object pose estimation with sim-to-real transfer learning using small dataset,”Communications Physics, vol. 5, no. 1, p. 80, 2022

work page 2022
[21]

Physics-informed machine learning for efficient sim-to-real data augmentation in micro-object pose esti- mation,

Z. Tan, L. Wei, and D. Zhang, “Physics-informed machine learning for efficient sim-to-real data augmentation in micro-object pose esti- mation,”arXiv preprint arXiv:2511.16494, 2025

work page arXiv 2025
[22]

Spatial frequency bias in convo- lutional generative adversarial networks,

M. Khayatkhoei and A. Elgammal, “Spatial frequency bias in convo- lutional generative adversarial networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7152– 7159

work page 2022
[23]

A survey on training challenges in generative adversarial networks for biomedical image analysis,

M. M. Saad, R. O’Reilly, and M. H. Rehmani, “A survey on training challenges in generative adversarial networks for biomedical image analysis,”Artificial Intelligence Review, vol. 57, no. 2, p. 19, 2024

work page 2024
[24]

Diffusion models beat gans on image synthesis,

P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021

work page 2021
[25]

Focal frequency loss for image reconstruction and synthesis,

L. Jiang, B. Dai, W. Wu, and C. C. Loy, “Focal frequency loss for image reconstruction and synthesis,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13 919–13 929

work page 2021
[26]

On the spectral bias of neural networks,

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 5301–5310

work page 2019
[27]

This microtubule does not exist: Super-resolution microscopy image generation by a diffusion model,

A. Saguy, T. Nahimov, M. Lehrman, E. G ´omez-de Mariscal, I. Hidalgo-Cenalmor, O. Alalouf, A. Balakrishnan, M. Heilemann, R. Henriques, and Y . Shechtman, “This microtubule does not exist: Super-resolution microscopy image generation by a diffusion model,” Small Methods, vol. 9, no. 3, p. 2400672, 2025

work page 2025
[28]

Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets,

D. Eschweiler, R. Yilmaz, M. Baumann, I. Laube, R. Roy, A. Jose, D. Br ¨uckner, and J. Stegmaier, “Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets,”PLOS Computational Biology, vol. 20, no. 2, p. e1011890, 2024

work page 2024
[29]

Microscopy image reconstruction with physics-informed denoising diffusion prob- abilistic model,

R. Li, G. Della Maggiora, V . Andriasyan, A. Petkidis, A. Yushkevich, N. Deshpande, M. Kudryashev, and A. Yakimovich, “Microscopy image reconstruction with physics-informed denoising diffusion prob- abilistic model,”Communications Engineering, vol. 3, no. 1, p. 186, 2024

work page 2024
[30]

Conditional diffusion model to enhance optical sectioning microscopy,

X. Liu, J. Z. Li, X. F. Chen, S. An, Y . Lu, N. Ali, K. Wen, P. Gao, J. J. Zheng, L. Liuet al., “Conditional diffusion model to enhance optical sectioning microscopy,”Optics Express, vol. 33, no. 21, pp. 45 381–45 397, 2025

work page 2025
[31]

Three-dimensional optical microrobot orientation estimation and tracking using deep learning,

S. Choudhary, F. Sadak, E. Gerena, and S. Haliyo, “Three-dimensional optical microrobot orientation estimation and tracking using deep learning,”Robotica, vol. 43, no. 2, pp. 616–637, 2025

work page 2025
[32]

Fair data for optical tweezers experiments,

M. T. Halma, S. Kumar, J. van Eck, S. Abeln, A. Gates, and G. J. Wuite, “Fair data for optical tweezers experiments,”Biophysical Journal, vol. 124, no. 8, pp. 1255–1272, 2025

work page 2025
[33]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

work page 2022
[34]

Resolution in super-resolution microscopy–facts, artifacts, technolog- ical advancements and biological applications,

K. Prakash, D. Baddeley, C. Eggeling, R. Fiolka, R. Heintzmann, S. Manley, A. Radenovic, H. Shroff, C. Smith, and L. Schermelleh, “Resolution in super-resolution microscopy–facts, artifacts, technolog- ical advancements and biological applications,”Journal of cell science, vol. 138, no. 10, p. jcs263567, 2025

work page 2025
[35]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3836–3847

work page 2023
[36]

Image-to-image translation with conditional adversarial networks,

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134

work page 2017

[1] [1]

Distributed force control for microrobot manipulation via planar multi-spot optical tweezer,

D. Zhang, A. Barbot, B. Lo, and G.-Z. Yang, “Distributed force control for microrobot manipulation via planar multi-spot optical tweezer,” Advanced Optical Materials, vol. 8, no. 21, p. 2000543, 2020

work page 2020

[2] [2]

Optical chiral microrobot for out-of-plane rotation,

A. M. Ali, E. Gerena, J. A. I. Mart ´ınez, G. Ulliac, B. Lemkalli, A. Mohand-Ousaid, S. Haliyo, A. Bolopion, and M. Kadic, “Optical chiral microrobot for out-of-plane rotation,”Communications Physics, vol. 8, no. 1, p. 230, 2025

work page 2025

[3] [3]

Optical-driven miniature robots: driving mechanism, applications and future trends,

X. Wang, S. Jia, Y . Gao, C. Liu, Y . Wang, A. Liu, and W. Yang, “Optical-driven miniature robots: driving mechanism, applications and future trends,”Lab on a Chip, vol. 25, pp. 4473–4507, 2025

work page 2025

[4] [4]

Optical tweezers in single-molecule biophysics,

C. J. Bustamante, Y . R. Chemla, S. Liu, and M. D. Wang, “Optical tweezers in single-molecule biophysics,”Nature Reviews Methods Primers, vol. 1, no. 1, p. 25, 2021

work page 2021

[5] [5]

Physics-informed machine learn- ing with adaptive grids for optical microrobot depth estimation,

L. Wei, L. Genoud, and D. Zhang, “Physics-informed machine learn- ing with adaptive grids for optical microrobot depth estimation,” in 2025 IEEE International Conference on Cyborg and Bionic Systems (CBS). IEEE, 2025, pp. 1–6

work page 2025

[6] [6]

A dataset and benchmarks for deep learning- based optical microrobot pose and depth perception,

L. Wei and D. Zhang, “A dataset and benchmarks for deep learning- based optical microrobot pose and depth perception,” in2025 Interna- tional Conference on Manipulation, Automation and Robotics at Small Scales (MARSS). IEEE, 2025, pp. 1–8

work page 2025

[7] [7]

Diffusion models in medical imaging: A comprehensive survey,

L. Qiegen, G. Yu, W. Weiwen, S. Hongming, and L. Dong, “Diffusion models in medical imaging: A comprehensive survey,”CT Theory and Applications, vol. 34, no. 3, pp. 506–524, 2025

work page 2025

[8] [8]

Deep learning approaches for data augmentation in medical imaging: a review,

A. Kebaili, J. Lapuyade-Lahorgue, and S. Ruan, “Deep learning approaches for data augmentation in medical imaging: a review,” Journal of imaging, vol. 9, no. 4, p. 81, 2023

work page 2023

[9] [9]

Medical image data augmentation: techniques, compar- isons and interpretations,

E. Goceri, “Medical image data augmentation: techniques, compar- isons and interpretations,”Artificial intelligence review, vol. 56, no. 11, pp. 12 561–12 605, 2023

work page 2023

[10] [10]

A review and systematic guide to counteracting medical data scarcity for ai applications,

F. Gr ¨oger, L. Amruthalingam, S. Lionetti, A. A. Navarini, F. Ille, and M. Pouly, “A review and systematic guide to counteracting medical data scarcity for ai applications,”Computer Methods and Programs in Biomedicine Update, p. 100220, 2025

work page 2025

[11] [11]

Data-driven microscopic pose and depth estimation for optical microrobot manipulation,

D. Zhang, F. P.-W. Lo, J.-Q. Zheng, W. Bai, G.-Z. Yang, and B. Lo, “Data-driven microscopic pose and depth estimation for optical microrobot manipulation,”Acs Photonics, vol. 7, no. 11, pp. 3003– 3014, 2020

work page 2020

[12] [12]

Fabrication and optical manipulation of micro-robots for biomedical applications,

D. Zhang, Y . Ren, A. Barbot, F. Seichepine, B. Lo, Z.-C. Ma, and G.-Z. Yang, “Fabrication and optical manipulation of micro-robots for biomedical applications,”Matter, vol. 5, no. 10, pp. 3135–3160, 2022

work page 2022

[13] [13]

Incorporating the image formation process into deep learning improves network performance,

Y . Li, Y . Su, M. Guo, X. Han, J. Liu, H. D. Vishwasrao, X. Li, R. Christensen, T. Sengupta, M. W. Moyleet al., “Incorporating the image formation process into deep learning improves network performance,”Nature Methods, vol. 19, no. 11, pp. 1427–1437, 2022

work page 2022

[14] [14]

Deep self-learning enables fast, high- fidelity isotropic resolution restoration for volumetric fluorescence microscopy,

K. Ning, B. Lu, X. Wang, X. Zhang, S. Nie, T. Jiang, A. Li, G. Fan, X. Wang, Q. Luoet al., “Deep self-learning enables fast, high- fidelity isotropic resolution restoration for volumetric fluorescence microscopy,”Light: Science & Applications, vol. 12, no. 1, p. 204, 2023

work page 2023

[15] [15]

Deep learning- based aberration compensation improves contrast and resolution in fluorescence microscopy,

M. Guo, Y . Wu, C. M. Hobson, Y . Su, S. Qian, E. Krueger, R. Christensen, G. Kroeschell, J. Bui, M. Chawet al., “Deep learning- based aberration compensation improves contrast and resolution in fluorescence microscopy,”Nature Communications, vol. 16, no. 1, p. 313, 2025

work page 2025

[16] [16]

A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis,

Y . Liu, T. Jiang, R. Li, L. Yuan, M. Grzegorzek, C. Li, and X. Li, “A state-of-the-art review of diffusion model applications for microscopic image and micro-alike image analysis,”Frontiers in Medicine, vol. 12, p. 1551894, 2025

work page 2025

[17] [17]

Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy,

C. Qiao, Y . Zeng, Q. Meng, X. Chen, H. Chen, T. Jiang, R. Wei, J. Guo, W. Fu, H. Luet al., “Zero-shot learning enables instant denoising and super-resolution in optical fluorescence microscopy,” Nature communications, vol. 15, no. 1, p. 4180, 2024

work page 2024

[18] [18]

Pixel super-resolved virtual staining of label-free tissue using diffusion models,

Y . Zhang, L. Huang, N. Pillar, Y . Li, H. Chen, and A. Ozcan, “Pixel super-resolved virtual staining of label-free tissue using diffusion models,”Nature Communications, vol. 16, no. 1, p. 5016, 2025

work page 2025

[19] [19]

Interactive ot gym: A reinforcement learning- based interactive optical tweezer (ot)-driven microrobotics simulation platform,

Z. Tan and D. Zhang, “Interactive ot gym: A reinforcement learning- based interactive optical tweezer (ot)-driven microrobotics simulation platform,” in2025 IEEE International Conference on Robotics and Automation (ICRA), 2025, pp. 1–7

work page 2025

[20] [20]

Micro-object pose estimation with sim-to-real transfer learning using small dataset,

D. Zhang, A. Barbot, F. Seichepine, F. P.-W. Lo, W. Bai, G.-Z. Yang, and B. Lo, “Micro-object pose estimation with sim-to-real transfer learning using small dataset,”Communications Physics, vol. 5, no. 1, p. 80, 2022

work page 2022

[21] [21]

Physics-informed machine learning for efficient sim-to-real data augmentation in micro-object pose esti- mation,

Z. Tan, L. Wei, and D. Zhang, “Physics-informed machine learning for efficient sim-to-real data augmentation in micro-object pose esti- mation,”arXiv preprint arXiv:2511.16494, 2025

work page arXiv 2025

[22] [22]

Spatial frequency bias in convo- lutional generative adversarial networks,

M. Khayatkhoei and A. Elgammal, “Spatial frequency bias in convo- lutional generative adversarial networks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 7, 2022, pp. 7152– 7159

work page 2022

[23] [23]

A survey on training challenges in generative adversarial networks for biomedical image analysis,

M. M. Saad, R. O’Reilly, and M. H. Rehmani, “A survey on training challenges in generative adversarial networks for biomedical image analysis,”Artificial Intelligence Review, vol. 57, no. 2, p. 19, 2024

work page 2024

[24] [24]

Diffusion models beat gans on image synthesis,

P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,”Advances in neural information processing systems, vol. 34, pp. 8780–8794, 2021

work page 2021

[25] [25]

Focal frequency loss for image reconstruction and synthesis,

L. Jiang, B. Dai, W. Wu, and C. C. Loy, “Focal frequency loss for image reconstruction and synthesis,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 13 919–13 929

work page 2021

[26] [26]

On the spectral bias of neural networks,

N. Rahaman, A. Baratin, D. Arpit, F. Draxler, M. Lin, F. Hamprecht, Y . Bengio, and A. Courville, “On the spectral bias of neural networks,” inInternational conference on machine learning. PMLR, 2019, pp. 5301–5310

work page 2019

[27] [27]

This microtubule does not exist: Super-resolution microscopy image generation by a diffusion model,

A. Saguy, T. Nahimov, M. Lehrman, E. G ´omez-de Mariscal, I. Hidalgo-Cenalmor, O. Alalouf, A. Balakrishnan, M. Heilemann, R. Henriques, and Y . Shechtman, “This microtubule does not exist: Super-resolution microscopy image generation by a diffusion model,” Small Methods, vol. 9, no. 3, p. 2400672, 2025

work page 2025

[28] [28]

Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets,

D. Eschweiler, R. Yilmaz, M. Baumann, I. Laube, R. Roy, A. Jose, D. Br ¨uckner, and J. Stegmaier, “Denoising diffusion probabilistic models for generation of realistic fully-annotated microscopy image datasets,”PLOS Computational Biology, vol. 20, no. 2, p. e1011890, 2024

work page 2024

[29] [29]

Microscopy image reconstruction with physics-informed denoising diffusion prob- abilistic model,

R. Li, G. Della Maggiora, V . Andriasyan, A. Petkidis, A. Yushkevich, N. Deshpande, M. Kudryashev, and A. Yakimovich, “Microscopy image reconstruction with physics-informed denoising diffusion prob- abilistic model,”Communications Engineering, vol. 3, no. 1, p. 186, 2024

work page 2024

[30] [30]

Conditional diffusion model to enhance optical sectioning microscopy,

X. Liu, J. Z. Li, X. F. Chen, S. An, Y . Lu, N. Ali, K. Wen, P. Gao, J. J. Zheng, L. Liuet al., “Conditional diffusion model to enhance optical sectioning microscopy,”Optics Express, vol. 33, no. 21, pp. 45 381–45 397, 2025

work page 2025

[31] [31]

Three-dimensional optical microrobot orientation estimation and tracking using deep learning,

S. Choudhary, F. Sadak, E. Gerena, and S. Haliyo, “Three-dimensional optical microrobot orientation estimation and tracking using deep learning,”Robotica, vol. 43, no. 2, pp. 616–637, 2025

work page 2025

[32] [32]

Fair data for optical tweezers experiments,

M. T. Halma, S. Kumar, J. van Eck, S. Abeln, A. Gates, and G. J. Wuite, “Fair data for optical tweezers experiments,”Biophysical Journal, vol. 124, no. 8, pp. 1255–1272, 2025

work page 2025

[33] [33]

High-resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 684–10 695

work page 2022

[34] [34]

Resolution in super-resolution microscopy–facts, artifacts, technolog- ical advancements and biological applications,

K. Prakash, D. Baddeley, C. Eggeling, R. Fiolka, R. Heintzmann, S. Manley, A. Radenovic, H. Shroff, C. Smith, and L. Schermelleh, “Resolution in super-resolution microscopy–facts, artifacts, technolog- ical advancements and biological applications,”Journal of cell science, vol. 138, no. 10, p. jcs263567, 2025

work page 2025

[35] [35]

Adding conditional control to text-to-image diffusion models,

L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 3836–3847

work page 2023

[36] [36]

Image-to-image translation with conditional adversarial networks,

P. Isola, J.-Y . Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1125–1134

work page 2017