Focal plane wavefront control with model-based reinforcement learning
Pith reviewed 2026-05-13 22:00 UTC · model grok-4.3
The pith
A model-based reinforcement learning algorithm corrects non-common-path aberrations from focal-plane images alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PO4NCPA interprets focal-plane images as input and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic point spread functions, robustly compensating static and dynamic NCPAs in numerical simulations to near-optimal levels.
What carries the argument
Policy Optimization for NCPAs (PO4NCPA), a model-based reinforcement learning algorithm that maps focal plane images to phase corrections via sequential phase diversity to optimize PSFs.
If this is right
- It allows NCPA correction without mechanical probes, avoiding performance compromises during operation.
- The method works for standard imaging as well as any coronagraph setup.
- It matches conventional performance for dynamic NCPAs while being model-free.
- Fast inference enables potential real-time low-order turbulence correction.
Where Pith is reading between the lines
- Real-world deployment on operating telescopes could validate or reveal gaps in the simulation assumptions about noise and temporal dynamics.
- Integration with existing adaptive optics systems might further enhance contrast ratios for habitable exoplanet detection.
- Since it requires no prior system knowledge, it could simplify instrument calibration procedures across different facilities.
Load-bearing premise
The simulations of static NCPA on ground-based telescopes and water-vapor-induced dynamic NCPA capture the actual statistics, noise, and behavior on real ELT-class instruments with vector vortex coronagraphs.
What would settle it
On-sky testing with an actual high-contrast imager where the achieved light suppression or Strehl ratio deviates significantly from the simulated near-optimal performance would falsify the robustness claim.
Figures
read the original abstract
The direct imaging of potentially habitable exoplanets is one prime science case for high-contrast imaging instruments on extremely large telescopes. Most such exoplanets orbit close to their host stars, where their observation is limited by fast-moving atmospheric speckles and quasi-static non-common-path aberrations (NCPA). Conventional NCPA correction methods often use mechanical mirror probes, which compromise performance during operation. This work presents machine-learning-based NCPA control methods that automatically detect and correct both dynamic and static NCPA errors by leveraging sequential phase diversity. We extend previous work in reinforcement learning for AO to focal plane control. A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections that optimize both non-coronagraphic and post-coronagraphic PSFs without prior system knowledge. Further, we demonstrate the effectiveness of this approach by numerically simulating static NCPA errors on a ground-based telescope and an infrared imager affected by water-vapor-induced seeing (dynamic NCPAs). Simulations show that PO4NCPA robustly compensates static and dynamic NCPAs. In static cases, it achieves near-optimal focal-plane light suppression with a coronagraph and near-optimal Strehl without one. With dynamics NCPA, it matches the performance of the modal least-squares reconstruction combined with a 1-step delay integrator in these metrics. The method remains effective for the ELT pupil, vector vortex coronagraph, and under photon and background noise. PO4NCPA is model-free and can be directly applied to standard imaging as well as to any coronagraph. Its sub-millisecond inference times and performance also make it suitable for real-time low-order correction of atmospheric turbulence beyond HCI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PO4NCPA, a model-based reinforcement learning algorithm for focal-plane wavefront control of non-common-path aberrations (NCPA). It interprets focal-plane images via sequential phase diversity to compute phase corrections without prior system knowledge or mechanical probes. Numerical simulations on ground-based telescopes (including ELT pupil and vector vortex coronagraph) show that the method compensates both static NCPA (near-optimal light suppression with coronagraph, near-optimal Strehl without) and dynamic water-vapor-induced NCPA (matching modal least-squares reconstruction plus 1-step delay integrator), remaining effective under photon/background noise with sub-millisecond inference.
Significance. If the simulation results hold under realistic conditions, PO4NCPA would provide a model-free, real-time alternative for NCPA correction that avoids probe-induced performance loss and extends RL techniques from AO to focal-plane control. The reported applicability to both coronagraphic and non-coronagraphic imaging, plus suitability for low-order atmospheric turbulence correction, strengthens its potential impact for ELT-class high-contrast imaging.
major comments (2)
- [Simulation results] Simulation results section: the headline claims of 'near-optimal' focal-plane suppression and Strehl, plus equivalence to modal least-squares + integrator, are presented without quantitative metrics (e.g., exact Strehl values, suppression factors in dB, RMS wavefront error), error bars, or statistics on training stability and hyper-parameter sensitivity. This absence makes it impossible to judge whether the reported performance is statistically distinguishable from the baseline.
- [Methods] Methods and simulation setup: the central performance claims rest on the assumption that the generated static NCPA fields and water-vapor dynamic NCPA (spatial spectra, temporal correlation times, noise properties) are statistically representative of on-sky ELT NCPA with a vector vortex coronagraph. No cross-validation against measured NCPA from existing instruments (e.g., SPHERE, GPI) is reported, so any mismatch would directly invalidate the 'robustly compensates' conclusion.
minor comments (2)
- [Abstract] Abstract: the phrase 'near-optimal' is used without defining the theoretical optimum or supplying the numerical values achieved.
- [Algorithm description] The manuscript would benefit from an explicit equation or pseudocode block showing how the RL policy maps the sequence of phase-diversity images to the correction command.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the paper to strengthen the quantitative presentation and clarify simulation assumptions.
read point-by-point responses
-
Referee: Simulation results section: the headline claims of 'near-optimal' focal-plane suppression and Strehl, plus equivalence to modal least-squares + integrator, are presented without quantitative metrics (e.g., exact Strehl values, suppression factors in dB, RMS wavefront error), error bars, or statistics on training stability and hyper-parameter sensitivity. This absence makes it impossible to judge whether the reported performance is statistically distinguishable from the baseline.
Authors: We agree that explicit quantitative metrics are needed to support the claims. The current manuscript relies on descriptive language and figures without tabulated values or statistical analysis. In the revised version, we will add tables reporting exact Strehl ratios, suppression factors in dB, RMS wavefront errors (in nm), error bars from multiple independent runs, and results on training stability and hyperparameter sensitivity. This will enable direct statistical comparison to the modal least-squares baseline. revision: yes
-
Referee: Methods and simulation setup: the central performance claims rest on the assumption that the generated static NCPA fields and water-vapor dynamic NCPA (spatial spectra, temporal correlation times, noise properties) are statistically representative of on-sky ELT NCPA with a vector vortex coronagraph. No cross-validation against measured NCPA from existing instruments (e.g., SPHERE, GPI) is reported, so any mismatch would directly invalidate the 'robustly compensates' conclusion.
Authors: Our NCPA models are constructed from established physical descriptions of static optical errors and water-vapor turbulence spectra with appropriate temporal correlation times for infrared observations. We acknowledge that no direct cross-validation against measured data from SPHERE, GPI or similar instruments is included. In the revision we will expand the methods section with explicit parameter justification and add a dedicated limitations paragraph discussing the modeled conditions and the value of future empirical validation, while retaining the demonstration under the simulated regimes. revision: partial
Circularity Check
PO4NCPA performance shown in simulations; minor extension of prior RL work but no load-bearing reduction to self-citation or fitted inputs
full rationale
The paper introduces PO4NCPA as an extension of prior reinforcement learning work for adaptive optics to focal-plane NCPA control. Performance is demonstrated through numerical simulations of static and dynamic NCPA cases, with direct comparison to an external modal least-squares reconstruction plus integrator method. No equations or derivations in the presented material reduce the reported suppression or Strehl metrics to a fitted parameter by construction, nor does the central claim rely on a self-citation chain for its validity. The algorithm is described as model-free at inference, with training on simulated data serving as an independent validation step rather than a tautological input. This results in low circularity, consistent with a score of 2 for the minor self-referential extension without affecting the simulation-based claims.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A new model-based RL algorithm, Policy Optimization for NCPAs (PO4NCPA), interprets the focal-plane image as input data and, through sequential phase diversity, determines phase corrections...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Absil, O., Delacroix, C., Orban de Xivry, G., et al. 2022, in Proc. SPIE Conf., V ol. 12185, SPIE, 298–310
work page 2022
-
[2]
Absil, O., Mawet, D., Karlsson, M., et al. 2016, in Proc. SPIE Conf., V ol. 9908, 99080Q
work page 2016
-
[3]
Angel, J. R. P., Wizinowich, P., Lloyd-Hart, M., & Sandler, D. 1990, Nat, 348, 221
work page 1990
-
[4]
Bos, S. P., Vievard, S., Wilby, M. J., et al. 2020, A&A, 639, A52
work page 2020
- [5]
-
[6]
Brandl, B., Absil, O., Feldt, M., et al. 2024, in Proc. SPIE, V ol. 13096
work page 2024
-
[7]
Cavarroc, C., Boccaletti, A., Baudoz, P., Fusco, T., & Rouan, D. 2006, A&A, 447, 397
work page 2006
-
[8]
Chua, K., Calandra, R., McAllister, R., & Levine, S. 2018, in NeurIPS, 4754– 4765
work page 2018
-
[9]
2023, in Astronomical Society of the Pacific Conference Series, V ol
Currie, T., Biller, B., Lagrange, A., et al. 2023, in Astronomical Society of the Pacific Conference Series, V ol. 534, Protostars and Planets VII, ed. S. Inut- suka, Y . Aikawa, T. Muto, K. Tomida, & M. Tamura, 799
work page 2023
- [10]
-
[11]
Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891
work page 2024
-
[12]
Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomed. Opt. Ex- press, 12, 5423 Give’on, A., Kern, B., Shaklan, S., Moody, D. C., & Pueyo, L. 2007, in Proc. SPIE Conf., V ol. 6691, SPIE, 63–73 Give’on, A., Kern, B. D., & Shaklan, S. 2011, in Proc. SPIE Conf., V ol. 8151, SPIE, 376–385
work page 2021
-
[13]
Gonsalves, R. A. 1982, Optical Engineering, 21, 829
work page 1982
-
[14]
Gonsalves, R. A. 2002, in ESO Conf. and Works. Proc., V ol. 58, ESO Conf. and Works. Proc., ed. E. Vernet, R. Ragazzoni, S. Esposito, & N. Hubin, 121
work page 2002
-
[15]
Gonsalves, R. A. 2010, in Frontiers in Optics, Optica Publishing Group, FWV1
work page 2010
-
[16]
M., Herscovici-Schiller, O., & Abeloos, B
Gutierrez, Y ., Mazoyer, J., Mugnier, L. M., Herscovici-Schiller, O., & Abeloos, B. 2024, Opt. Express, 32, 31247
work page 2024
- [17]
- [18]
- [19]
- [20]
- [21]
-
[22]
Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890
work page 2015
- [23]
-
[24]
U., Korkiakoski, V ., Doelman, N., et al
Keller, C. U., Korkiakoski, V ., Doelman, N., et al. 2012, in Proc. SPIE Conf, V ol. 8447, SPIE, 749–758
work page 2012
-
[25]
Kingma, D. P. & Ba, J. 2014, ArXiv e-prints [arXiv:1412.6980]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[26]
Korkiakoski, V ., Keller, C. U., Doelman, N., et al. 2014, Appl. Opt., 53, 4565
work page 2014
- [27]
- [28]
-
[29]
Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114
work page 2024
- [30]
-
[31]
Macintosh, B. A., Graham, J. R., Palmer, D. W., et al. 2008, in Proc. SPIE Conf, V ol. 7015, SPIE, 315–327
work page 2008
-
[32]
Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001
work page 2018
-
[33]
Marois, C., Lafrenière, D., Doyon, R., Macintosh, B., & Nadeau, D. 2006, ApJ, 641, 556
work page 2006
-
[34]
Marois, C., Racine, R., Doyon, R., Lafrenière, D., & Nadeau, D. 2004, ApJ, 615, L61
work page 2004
- [35]
-
[36]
Mawet, D., Riaud, P., Absil, O., & Surdej, J. 2005, ApJ, 633, 1191
work page 2005
-
[37]
Milli, J., Kasper, M., Bourget, P., et al. 2018, in Proc. SPIE Conf, V ol. 10703, SPIE, 752–771
work page 2018
-
[38]
Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. 2018, in 2018 IEEE Inter- national Conference on Robotics and Automation (ICRA), IEEE, 7559–7566
work page 2018
- [39]
-
[40]
2022, A&A, 664, A71 Orban de Xivry, G
Nousiainen, J., Rajani, C., Kasper, M., et al. 2022, A&A, 664, A71 Orban de Xivry, G. & Absil, O. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 982–988 Orban de Xivry, G., Absil, O., Delacroix, C., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 974–981 Orban de Xivry, G., Quesnel, M., Vanberg, P. O., Absil, O., & Louppe, G. 2021, MNRAS, 505, 5702
work page 2022
-
[41]
Otten, G. P. P. L., Vigan, A., Muslimov, E., et al. 2021, A&A, 646, A150
work page 2021
-
[42]
Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10
work page 2023
-
[43]
Por, E. H., Haffert, S. Y ., Radhakrishnan, V . M., et al. 2018, in Proc. SPIE Conf, V ol. 10703, SPIE, 1112–1125
work page 2018
- [44]
-
[45]
Quesnel, M., Orban de Xivry, G., Absil, O., & Louppe, G. 2022, in Proc. SPIE Conf., V ol. 12185, SPIE, 982–990
work page 2022
-
[46]
Quesnel, M., Orban de Xivry, G., Louppe, G., & Absil, O. 2022, A&A, 668, A36
work page 2022
- [47]
- [48]
-
[49]
Singh, G., Martinache, F., Baudoz, P., et al. 2014, PASP, 126, 586
work page 2014
- [50]
- [51]
-
[52]
2022, A&A, 666, A70 Van Gorkom, K., Males, J
Terreri, A., Pedichini, F., Del Moro, D., et al. 2022, A&A, 666, A70 Van Gorkom, K., Males, J. R., Close, L. M., et al. 2021, JATIS, 7, 039001 van Kooten, M. A., Jensen-Clem, R., Cetre, S., et al. 2022, JATIS, 8, 029006
work page 2022
- [53]
-
[54]
Wong, A. P., Norris, B. R. M., Deo, V ., et al. 2023, PASP, 135, 114501 Article number, page 13 of 13
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.