pith. sign in

arxiv: 2604.00993 · v1 · pith:DSDUZMT5new · submitted 2026-04-01 · 🌌 astro-ph.IM · astro-ph.EP· cs.LG· cs.RO

Focal plane wavefront control with model-based reinforcement learning

Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3

classification 🌌 astro-ph.IM astro-ph.EPcs.LGcs.RO
keywords non-common path aberrationsreinforcement learningfocal plane controlhigh contrast imagingexoplanetsadaptive opticswavefront sensing
0
0 comments X

The pith

A model-based reinforcement learning algorithm corrects non-common-path aberrations from focal-plane images alone.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces PO4NCPA, a reinforcement learning method that uses focal-plane images and sequential phase diversity to automatically detect and correct both static and dynamic non-common-path aberrations in high-contrast imaging. Conventional methods rely on mechanical mirror probes that can degrade performance, but this approach learns corrections without prior system knowledge. Simulations on ground-based telescope models and water-vapor-induced dynamic errors show it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. For dynamic cases, it performs as well as modal least-squares reconstruction with a delay integrator, even under noise and for ELT pupils with vector vortex coronagraphs. The sub-millisecond inference makes it viable for real-time applications beyond exoplanet imaging.

Core claim

PO4NCPA interprets focal-plane images as input and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic point spread functions, robustly compensating static and dynamic NCPAs in numerical simulations to near-optimal levels.

What carries the argument

Policy Optimization for NCPAs (PO4NCPA), a model-based reinforcement learning algorithm that maps focal plane images to phase corrections via sequential phase diversity to optimize PSFs.

If this is right

  • It allows NCPA correction without mechanical probes, avoiding performance compromises during operation.
  • The method works for standard imaging as well as any coronagraph setup.
  • It matches conventional performance for dynamic NCPAs while being model-free.
  • Fast inference enables potential real-time low-order turbulence correction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Real-world deployment on operating telescopes could validate or reveal gaps in the simulation assumptions about noise and temporal dynamics.
  • Integration with existing adaptive optics systems might further enhance contrast ratios for habitable exoplanet detection.
  • Since it requires no prior system knowledge, it could simplify instrument calibration procedures across different facilities.

Load-bearing premise

The simulations of static NCPA on ground-based telescopes and water-vapor-induced dynamic NCPA capture the actual statistics, noise, and behavior on real ELT-class instruments with vector vortex coronagraphs.

What would settle it

On-sky testing with an actual high-contrast imager where the achieved light suppression or Strehl ratio deviates significantly from the simulated near-optimal performance would falsify the robustness claim.

Figures

Figures reproduced from arXiv: 2604.00993 by Gilles Orban De Xivry, Iremsu Taskin, Jalo Nousiainen, Markus Kasper, Olivier Absil.

Figure 1
Figure 1. Figure 1: Illustration of the preprocessing step of the focal plane [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Dynamics model NN design. Trained on closed-loop data [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Policy model NN design. In the control loop, the im [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training plots of PO4NCPA on circular pupil with SI and PC. Here we plot the negative cumulative reward (loss) after each [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: PO4NCPA convergence on circular pupil with SI (left) and PC (right) in the case of static NCPA. Top row: reward on time [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 8
Figure 8. Figure 8: In the SI case, we can clearly see the di [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 6
Figure 6. Figure 6: Circular pupil PSF sharpness/raw contrast with static NCPA for (a) standard imaging and (b) perfect coronagraph cases. Here, we take the average (over 500 episodes) of the last PSF frame and plot the resulting PSF and radial average divided by the peak intensity. 0 20 40 60 80 100 Timestep (t) 30 35 40 45 50 55 60 65 70 75 RMS (nm) Standard imaging Wavefront residual Fitting Error Fitting error + delay err… view at source ↗
Figure 7
Figure 7. Figure 7: A 100-time-step window during the long exposure dynamic NCPA, showing the wavefront error RMS as a function of time [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: RMS per mode over the long episode with dynamic NCPA for (a) standard imaging and (b) perfect coronagraph cases. [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: PSF sharpness/contrast with dynamic NCPA for (a) standard imaging and (b) perfect coronagraph cases. 5.2.4. Robustness to larger wavefront errors The NCPA errors (from the WV seeing spectrum) simulated in the previous subsections are rather small and have a small im￾pact on the Strehl ratio. A natural follow-up question is: can PO4NCPA handle larger wavefront errors? This way it could be used, for example,… view at source ↗
Figure 10
Figure 10. Figure 10: PO4NCPA performance compared against “Fitting error [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Raw “residual” PSF contrast for ELT-METIS. Upper [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
read the original abstract

The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces PO4NCPA, a model-based reinforcement learning algorithm for focal-plane wavefront control of non-common-path aberrations (NCPA). It interprets focal-plane images via sequential phase diversity to compute phase corrections without prior system knowledge or mechanical probes. Numerical simulations on ground-based telescopes (including ELT pupil and vector vortex coronagraph) show that the method compensates both static NCPA (near-optimal light suppression with coronagraph, near-optimal Strehl without) and dynamic water-vapor-induced NCPA (matching modal least-squares reconstruction plus 1-step delay integrator), remaining effective under photon/background noise with sub-millisecond inference.

Significance. If the simulation results hold under realistic conditions, PO4NCPA would provide a model-free, real-time alternative for NCPA correction that avoids probe-induced performance loss and extends RL techniques from AO to focal-plane control. The reported applicability to both coronagraphic and non-coronagraphic imaging, plus suitability for low-order atmospheric turbulence correction, strengthens its potential impact for ELT-class high-contrast imaging.

major comments (2)
  1. [Simulation results] Simulation results section: the headline claims of 'near-optimal' focal-plane suppression and Strehl, plus equivalence to modal least-squares + integrator, are presented without quantitative metrics (e.g., exact Strehl values, suppression factors in dB, RMS wavefront error), error bars, or statistics on training stability and hyper-parameter sensitivity. This absence makes it impossible to judge whether the reported performance is statistically distinguishable from the baseline.
  2. [Methods] Methods and simulation setup: the central performance claims rest on the assumption that the generated static NCPA fields and water-vapor dynamic NCPA (spatial spectra, temporal correlation times, noise properties) are statistically representative of on-sky ELT NCPA with a vector vortex coronagraph. No cross-validation against measured NCPA from existing instruments (e.g., SPHERE, GPI) is reported, so any mismatch would directly invalidate the 'robustly compensates' conclusion.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'near-optimal' is used without defining the theoretical optimum or supplying the numerical values achieved.
  2. [Algorithm description] The manuscript would benefit from an explicit equation or pseudocode block showing how the RL policy maps the sequence of phase-diversity images to the correction command.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the quantitative presentation and clarify simulation assumptions.

read point-by-point responses
  1. Referee: Simulation results section: the headline claims of 'near-optimal' focal-plane suppression and Strehl, plus equivalence to modal least-squares + integrator, are presented without quantitative metrics (e.g., exact Strehl values, suppression factors in dB, RMS wavefront error), error bars, or statistics on training stability and hyper-parameter sensitivity. This absence makes it impossible to judge whether the reported performance is statistically distinguishable from the baseline.

    Authors: We agree that explicit quantitative metrics are needed to support the claims. The current manuscript relies on descriptive language and figures without tabulated values or statistical analysis. In the revised version, we will add tables reporting exact Strehl ratios, suppression factors in dB, RMS wavefront errors (in nm), error bars from multiple independent runs, and results on training stability and hyperparameter sensitivity. This will enable direct statistical comparison to the modal least-squares baseline. revision: yes

  2. Referee: Methods and simulation setup: the central performance claims rest on the assumption that the generated static NCPA fields and water-vapor dynamic NCPA (spatial spectra, temporal correlation times, noise properties) are statistically representative of on-sky ELT NCPA with a vector vortex coronagraph. No cross-validation against measured NCPA from existing instruments (e.g., SPHERE, GPI) is reported, so any mismatch would directly invalidate the 'robustly compensates' conclusion.

    Authors: Our NCPA models are constructed from established physical descriptions of static optical errors and water-vapor turbulence spectra with appropriate temporal correlation times for infrared observations. We acknowledge that no direct cross-validation against measured data from SPHERE, GPI or similar instruments is included. In the revision we will expand the methods section with explicit parameter justification and add a dedicated limitations paragraph discussing the modeled conditions and the value of future empirical validation, while retaining the demonstration under the simulated regimes. revision: partial

Circularity Check

0 steps flagged

PO4NCPA performance shown in simulations; minor extension of prior RL work but no load-bearing reduction to self-citation or fitted inputs

full rationale

The paper introduces PO4NCPA as an extension of prior reinforcement learning work for adaptive optics to focal-plane NCPA control. Performance is demonstrated through numerical simulations of static and dynamic NCPA cases, with direct comparison to an external modal least-squares reconstruction plus integrator method. No equations or derivations in the presented material reduce the reported suppression or Strehl metrics to a fitted parameter by construction, nor does the central claim rely on a self-citation chain for its validity. The algorithm is described as model-free at inference, with training on simulated data serving as an independent validation step rather than a tautological input. This results in low circularity, consistent with a score of 2 for the minor self-referential extension without affecting the simulation-based claims.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no explicit free parameters, axioms, or invented entities are described. The method relies on standard RL assumptions (Markov decision process, reward defined on PSF quality) and simulation fidelity.

pith-pipeline@v0.9.0 · 5649 in / 1286 out tokens · 28979 ms · 2026-05-13T22:00:03.951097+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 1 internal anchor

  1. [1]

    2022, in Proc

    Absil, O., Delacroix, C., Orban de Xivry, G., et al. 2022, in Proc. SPIE Conf., V ol. 12185, SPIE, 298–310

  2. [2]

    2016, in Proc

    Absil, O., Mawet, D., Karlsson, M., et al. 2016, in Proc. SPIE Conf., V ol. 9908, 99080Q

  3. [3]

    Angel, J. R. P., Wizinowich, P., Lloyd-Hart, M., & Sandler, D. 1990, Nat, 348, 221

  4. [4]

    P., Vievard, S., Wilby, M

    Bos, S. P., Vievard, S., Wilby, M. J., et al. 2020, A&A, 639, A52

  5. [5]

    Bottom, M., Walker, S. A. U., Cunnyngham, I., Guthery, C., & Delorme, J.-R. 2023, arXiv e-prints, arXiv:2312.06806

  6. [6]

    2024, in Proc

    Brandl, B., Absil, O., Feldt, M., et al. 2024, in Proc. SPIE, V ol. 13096

  7. [7]

    2006, A&A, 447, 397

    Cavarroc, C., Boccaletti, A., Baudoz, P., Fusco, T., & Rouan, D. 2006, A&A, 447, 397

  8. [8]

    2018, in NeurIPS, 4754– 4765

    Chua, K., Calandra, R., McAllister, R., & Levine, S. 2018, in NeurIPS, 4754– 4765

  9. [9]

    2023, in Astronomical Society of the Pacific Conference Series, V ol

    Currie, T., Biller, B., Lagrange, A., et al. 2023, in Astronomical Society of the Pacific Conference Series, V ol. 534, Protostars and Planets VII, ed. S. Inut- suka, Y . Aikawa, T. Muto, K. Tomida, & M. Tamura, 799

  10. [10]

    & Rasmussen, C

    Deisenroth, M. & Rasmussen, C. E. 2011, in ICML-11, Citeseer, 465–472

  11. [11]

    2024, in Proc

    Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891

  12. [12]

    Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomed. Opt. Ex- press, 12, 5423 Give’on, A., Kern, B., Shaklan, S., Moody, D. C., & Pueyo, L. 2007, in Proc. SPIE Conf., V ol. 6691, SPIE, 63–73 Give’on, A., Kern, B. D., & Shaklan, S. 2011, in Proc. SPIE Conf., V ol. 8151, SPIE, 376–385

  13. [13]

    Gonsalves, R. A. 1982, Optical Engineering, 21, 829

  14. [14]

    Gonsalves, R. A. 2002, in ESO Conf. and Works. Proc., V ol. 58, ESO Conf. and Works. Proc., ed. E. Vernet, R. Ragazzoni, S. Esposito, & N. Hubin, 121

  15. [15]

    Gonsalves, R. A. 2010, in Frontiers in Optics, Optica Publishing Group, FWV1

  16. [16]

    M., Herscovici-Schiller, O., & Abeloos, B

    Gutierrez, Y ., Mazoyer, J., Mugnier, L. M., Herscovici-Schiller, O., & Abeloos, B. 2024, Opt. Express, 32, 31247

  17. [17]

    2018, ARA&A, 56, 315

    Guyon, O. 2018, ARA&A, 56, 315

  18. [18]

    & Males, J

    Guyon, O. & Males, J. 2017, ArXiv preprint [arXiv:1707.00570]

  19. [19]

    2009, ApJ, 693, 75

    Guyon, O., Matsuo, T., & Angel, R. 2009, ApJ, 693, 75

  20. [20]

    2023, A&A, 673, A28

    Haffert, S., Males, J., Ahn, K., et al. 2023, A&A, 673, A28

  21. [21]

    2015, A&A, 584, A74

    Huby, E., Baudoz, P., Mawet, D., & Absil, O. 2015, A&A, 584, A74

  22. [22]

    2015, PASP, 127, 890

    Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890

  23. [23]

    2019, Optik, 178, 785

    Ke, H., Xu, B., Xu, Z., et al. 2019, Optik, 178, 785

  24. [24]

    U., Korkiakoski, V ., Doelman, N., et al

    Keller, C. U., Korkiakoski, V ., Doelman, N., et al. 2012, in Proc. SPIE Conf, V ol. 8447, SPIE, 749–758

  25. [25]

    Kingma, D. P. & Ba, J. 2014, ArXiv e-prints [arXiv:1412.6980]

  26. [26]

    U., Doelman, N., et al

    Korkiakoski, V ., Keller, C. U., Doelman, N., et al. 2014, Appl. Opt., 53, 4565

  27. [27]

    2023, in AO4ELT

    Kuznetsov, A., Neichel, B., Oberti, S., & Fusco, T. 2023, in AO4ELT

  28. [28]

    2025, A&A, 696, L1

    Landman, R., Haffert, S., Long, J., et al. 2025, A&A, 696, L1

  29. [29]

    2024, A&A, 684, A114

    Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114

  30. [30]

    & Haffert, S

    Landman, R. & Haffert, S. Y . 2020, Opt. Express, 28, 16644

  31. [31]

    A., Graham, J

    Macintosh, B. A., Graham, J. R., Palmer, D. W., et al. 2008, in Proc. SPIE Conf, V ol. 7015, SPIE, 315–327

  32. [32]

    Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001

  33. [33]

    2006, ApJ, 641, 556

    Marois, C., Lafrenière, D., Doyon, R., Macintosh, B., & Nadeau, D. 2006, ApJ, 641, 556

  34. [34]

    2004, ApJ, 615, L61

    Marois, C., Racine, R., Doyon, R., Lafrenière, D., & Nadeau, D. 2004, ApJ, 615, L61

  35. [35]

    2013, PASP, 125, 422

    Martinache, F. 2013, PASP, 125, 422

  36. [36]

    2005, ApJ, 633, 1191

    Mawet, D., Riaud, P., Absil, O., & Surdej, J. 2005, ApJ, 633, 1191

  37. [37]

    2018, in Proc

    Milli, J., Kasper, M., Bourget, P., et al. 2018, in Proc. SPIE Conf, V ol. 10703, SPIE, 752–771

  38. [38]

    S., & Levine, S

    Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. 2018, in 2018 IEEE Inter- national Conference on Robotics and Automation (ICRA), IEEE, 7559–7566

  39. [39]

    2021, Opt

    Nousiainen, J., Rajani, C., Kasper, M., & Helin, T. 2021, Opt. Express, 29, 15327

  40. [40]

    2022, A&A, 664, A71 Orban de Xivry, G

    Nousiainen, J., Rajani, C., Kasper, M., et al. 2022, A&A, 664, A71 Orban de Xivry, G. & Absil, O. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 982–988 Orban de Xivry, G., Absil, O., Delacroix, C., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 974–981 Orban de Xivry, G., Quesnel, M., Vanberg, P. O., Absil, O., & Louppe, G. 2021, MNRAS, 505, 5702

  41. [41]

    Otten, G. P. P. L., Vigan, A., Muslimov, E., et al. 2021, A&A, 646, A150

  42. [42]

    2023, Photonics, 10

    Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10

  43. [43]

    H., Haffert, S

    Por, E. H., Haffert, S. Y ., Radhakrishnan, V . M., et al. 2018, in Proc. SPIE Conf, V ol. 10703, SPIE, 1112–1125

  44. [44]

    2024, Opt

    Pou, B., Smith, J., Quinones, E., Martin, M., & Gratadour, D. 2024, Opt. Express, 32, 37011

  45. [45]

    2022, in Proc

    Quesnel, M., Orban de Xivry, G., Absil, O., & Louppe, G. 2022, in Proc. SPIE Conf., V ol. 12185, SPIE, 982–990

  46. [46]

    2022, A&A, 668, A36

    Quesnel, M., Orban de Xivry, G., Louppe, G., & Absil, O. 2022, A&A, 668, A36

  47. [47]

    2012, A&A, 545, A151

    Riaud, P., Mawet, D., & Magette, A. 2012, A&A, 545, A151

  48. [48]

    & Kasper, M

    Ruffio, J.-B. & Kasper, M. 2022, arXiv e-prints [arXiv:2211.00775]

  49. [49]

    2014, PASP, 126, 586

    Singh, G., Martinache, F., Baudoz, P., et al. 2014, PASP, 126, 586

  50. [50]

    2022, A&A, 659, A170

    Skaf, N., Guyon, O., Gendron, É., et al. 2022, A&A, 659, A170

  51. [51]

    L., et al

    Snellen, I., de Kok, R., Birkby, J. L., et al. 2015, A&A, 576, A59

  52. [52]

    2022, A&A, 666, A70 Van Gorkom, K., Males, J

    Terreri, A., Pedichini, F., Del Moro, D., et al. 2022, A&A, 666, A70 Van Gorkom, K., Males, J. R., Close, L. M., et al. 2021, JATIS, 7, 039001 van Kooten, M. A., Jensen-Clem, R., Cetre, S., et al. 2022, JATIS, 8, 029006

  53. [53]

    2019, A&A, 629, A11

    Vigan, A., N’diaye, M., Dohlen, K., et al. 2019, A&A, 629, A11

  54. [54]

    P., Norris, B

    Wong, A. P., Norris, B. R. M., Deo, V ., et al. 2023, PASP, 135, 114501 Article number, page 13 of 13