On-sky demonstration of reinforcement learning for adaptive optics control
Pith reviewed 2026-06-27 11:38 UTC · model grok-4.3
The pith
Reinforcement learning controller PO4AO outperforms standard integrator in first on-sky adaptive optics tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PO4AO, a policy-optimization reinforcement learning controller, was interfaced with the existing real-time controller via shared memory and tested on sky against a standard integrator. It delivered higher performance in every configuration tested, compensated for vibrations, remained robust to noise, and required no retuning of hyperparameters when flux levels or atmospheric conditions changed.
What carries the argument
The PO4AO reinforcement learning policy that maps wavefront-sensor measurements to deformable-mirror commands and learns corrections online instead of using a fixed integrator.
If this is right
- The controller learns and compensates for vibration patterns present in the real telescope environment.
- It maintains performance under photon and detector noise without special tuning.
- A single set of hyperparameters suffices across a range of flux levels and atmospheric conditions.
- When ported to an optimized real-time language the method becomes a practical turnkey option for single-conjugate adaptive optics.
Where Pith is reading between the lines
- The same learning approach could be tested on multi-conjugate or extreme adaptive optics systems where vibrations and misregistrations are more complex.
- Faster implementations would remove the current latency penalty and allow direct comparison of control bandwidths.
- RL controllers might be applied to other real-time astronomy tasks such as tip-tilt or coronagraph alignment once the on-sky proof is established.
Load-bearing premise
The performance comparison remains fair even though the Python implementation of PO4AO added 750 microseconds of latency, control jitter, and occasional frame drops that the baseline integrator did not experience.
What would settle it
A side-by-side test in which the standard integrator is also run through the same Python interface with identical added latency and frame-drop statistics, checking whether PO4AO still outperforms.
Figures
read the original abstract
Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coud\'e focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,\mu\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports the first on-sky demonstration of a reinforcement learning controller (PO4AO) for adaptive optics, implemented on the Papyrus system at the 1.52 m OHP telescope. Across multiple nights and a range of flux levels and atmospheric conditions, PO4AO outperformed a standard integrator controller, learned and compensated for vibration patterns, and exhibited robustness to measurement noise while operating in a turnkey manner with fixed hyperparameters. These gains occurred despite added latency (~750 μs), jitter, and frame drops from a non-optimized Python implementation interfaced via shared memory to the DAO RTC.
Significance. If the performance comparison is shown to be fair, this constitutes a significant empirical result as the first on-sky validation of RL-based AO control. The multi-night dataset across conditions, combined with the demonstration of vibration compensation and noise robustness, provides concrete evidence supporting RL as a practical alternative to classical integrators for single-conjugate AO, with potential for broader operational adoption once optimized.
major comments (1)
- [Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.
minor comments (1)
- [Abstract] Abstract: No quantitative performance metrics (e.g., Strehl ratio, residual wavefront error, or improvement factors with uncertainties) are provided to support the outperformance claim; adding these would improve clarity and allow readers to assess the magnitude of the gains.
Simulated Author's Rebuttal
We thank the referee for their thorough review and for recognizing the significance of our on-sky demonstration of PO4AO. We provide a point-by-point response to the major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.
Authors: We agree that this information is important for a fair interpretation of the results. The standard integrator controller is the one already deployed in the DAO RTC and was used in its standard operational configuration without additional retuning or optimization for the purpose of this comparison. PO4AO was interfaced via shared memory, introducing the reported additional latency, jitter, and frame drops, while the integrator operated at the native latency of the RTC. We will revise the abstract to explicitly clarify that the baseline integrator was evaluated under its native conditions without matched implementation overhead, thereby strengthening the claim that the performance gains are attributable to the RL algorithm despite these disadvantages. revision: yes
Circularity Check
Empirical demonstration with no derivation chain present
full rationale
The paper reports on-sky experimental results comparing the PO4AO reinforcement learning controller to a standard integrator across flux levels and conditions. No mathematical derivation, first-principles result, fitted parameter renamed as prediction, or self-citation chain is invoked to support a claimed prediction. The central claim (outperformance) is measured against an external baseline controller and is therefore falsifiable by direct observation rather than reducing to the paper's own inputs by construction. No steps matching the enumerated circularity patterns exist.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
2023, JATIS, 9, 049005
Archinuk, F., Hafeez, R., Fabbro, S., Teimoorinia, H., & Véran, J.-P. 2023, JATIS, 9, 049005
2023
-
[2]
Babcock, H. W. 1953, PASP, 65, 229
1953
-
[3]
2025, Durham-Adaptive- Optics/daoBase: Initial Release
Barr, D., Cetre, S., Connolly, J., & Davies, T. 2025, Durham-Adaptive- Optics/daoBase: Initial Release
2025
-
[4]
2020, arXiv preprint arXiv:2003.05714
Boccaletti, A., Chauvin, G., Mouillet, D., et al. 2020, arXiv preprint arXiv:2003.05714
-
[5]
2013, in Proc
Bonneville, C., Thomas, F., de Mengin Poirier, M., et al. 2013, in Proc. SPIE Conf., V ol. 8616, SPIE, 163–177
2013
-
[6]
2025, Science, 389, 1012
Buchli, J., Tracey, B., Andric, T., et al. 2025, Science, 389, 1012
2025
-
[7]
T., Gray, M., & Neichel, B
Camelo, R., Nousiainen, J., Heritier, C. T., Gray, M., & Neichel, B. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 233–239
2024
-
[8]
T., Morgan, G., & Neichel, B
Camelo, R., Nousiainen, J., Heritier, C. T., Morgan, G., & Neichel, B. 2023, in AO4ELT7
2023
-
[9]
H., Dohlen, K., et al
Cantalloube, F., Por, E. H., Dohlen, K., et al. 2018, A&A, 620, L10
2018
-
[10]
2022, in Proc
Carlotti, A., Bidot, A., Mouillet, D., et al. 2022, in Proc. SPIE Conf., V ol. 12184, SPIE, 523–543
2022
-
[11]
2020, A&A, 644, A6
Chambouleyron, V ., Fauvarque, O., Janin-Potiron, P., et al. 2020, A&A, 644, A6
2020
-
[12]
2024, A&A, 681, A48
Chambouleyron, V ., Sengupta, A., Salama, M., et al. 2024, A&A, 681, A48
2024
-
[13]
2011, in AO4ELT
Conan, J.-M., Raynaud, H., AR, Kulcsár, C., Meimon, S., & Sivo, G. 2011, in AO4ELT
2011
-
[14]
M., Bond, C
Correia, C. M., Bond, C. Z., Sauvage, J.-F., et al. 2017, JOSA A, 34, 1877
2017
-
[15]
2022, Nat, 602, 414
Degrave, J., Felici, F., Buchli, J., et al. 2022, Nat, 602, 414
2022
-
[16]
2019, A&A, 629, A107
Deo, V ., Gendron, É., Rousset, G., et al. 2019, A&A, 629, A107
2019
-
[17]
1998, Appl
Dessenne, C., Madec, P.-Y ., & Rousset, G. 1998, Appl. Opt., 37, 4623
1998
-
[18]
2024, in Proc
Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891
2024
-
[19]
2024, in Proc
Dray, J., Sinquin, B., Gray, M., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1862–1868
2024
-
[20]
Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomedical Opt. Express, 12, 5423
2021
-
[21]
2023, in AO4ELT7 Fétick, R
Fetick, R., Chambouleyron, V ., Muslimov, E., et al. 2023, in AO4ELT7 Fétick, R. J. L., Fusco, T., Neichel, B., et al. 2019, A&A, 628, A99
2023
-
[22]
& Landman, R
Fowler, J. & Landman, R. 2023, Proc. SPIE Conf., 12680, 100
2023
-
[23]
Frazin, R. A. 2018, arXiv preprint arXiv:1804.01011
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
1994, in European Southern Observatory Conference and Workshop
Gendron, E. 1994, in European Southern Observatory Conference and Workshop
1994
-
[25]
& Le Roux, B
Gray, M. & Le Roux, B. 2012, in Proc. SPIE Conf., V ol. 8447, SPIE, 84471T
2012
-
[26]
Guerra-Ramos, D., Trujillo-Sevilla, J., & Rodríguez-Ramos, J. M. 2020, applied sciences, 10, 3207
2020
-
[27]
2018, Annual Review of Astronomy and Astrophysics, 56, 315
Guyon, O. 2018, Annual Review of Astronomy and Astrophysics, 56, 315
2018
-
[28]
Adaptive Optics Predictive Control with Empirical Orthogonal Functions (EOFs)
Guyon, O. & Males, J. 2017, arXiv preprint arXiv:1707.00570
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Y ., Males, J., Close, L., et al
Haffert, S. Y ., Males, J., Close, L., et al. 2021, in Proc. SPIE Conf., V ol. 11823, SPIE, 118231C
2021
-
[30]
2018, MNRAS, 481, 2829
Heritier, C., Esposito, S., Fusco, T., et al. 2018, MNRAS, 481, 2829
2018
-
[31]
& Ramlau, R
Hutterer, V . & Ramlau, R. 2018, Appl. Opt., 57, 8790
2018
-
[32]
2019, Inverse Problems, 35, 045008
Hutterer, V ., Ramlau, R., & Shatokhina, I. 2019, Inverse Problems, 35, 045008
2019
-
[33]
2015, PASP, 127, 890
Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890
2015
-
[34]
2024, Scientific reports, 14, 15733
Kaiser, J., Xu, C., Eichler, A., et al. 2024, Scientific reports, 14, 15733
2024
-
[35]
2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P
Ke, H., Xu, B., Xu, Z., et al. 2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P. V . D. 2006, Opt. Express, 14(17):7464–7476
2019
-
[36]
2025, A&A, 696, L1
Landman, R., Haffert, S., Long, J., et al. 2025, A&A, 696, L1
2025
-
[37]
2024, A&A, 684, A114
Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114
2024
-
[38]
& Haffert, S
Landman, R. & Haffert, S. Y . 2020, Opt. Express, 28, 16644
2020
-
[39]
Y ., Radhakrishnan, V
Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2020, in Proc. SPIE Conf., V ol. 11448, SPIE, 1144849
2020
-
[40]
Y ., Radhakrishnan, V
Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2021, JATIS, 7, 039002
2021
-
[41]
2019, in ICANN, Springer, 537–542
Liu, X., Morris, T., & Saunter, C. 2019, in ICANN, Springer, 537–542
2019
-
[42]
2024, in Proc
Lovis, C., Blind, N., Chazelas, B., et al. 2024, in Proc. SPIE Conf., V ol. 13096, SPIE, 412–417 Article number, page 10 of 14 J. Nousiainen et al.: On-sky demonstration of reinforcement learning for adaptive optics control
2024
-
[43]
R., Close, L
Males, J. R., Close, L. M., Miller, K., et al. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 1070309
2018
-
[44]
Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001
2018
-
[45]
1989, The Messenger, 58, 1
Merkle, F., Kern, P., Léna, P., et al. 1989, The Messenger, 58, 1
1989
-
[46]
2021, in Proc
Muslimov, E., Levraud, N., Chambouleyron, V ., et al. 2021, in Proc. SPIE Conf., V ol. 11876, SPIE, 56–68
2021
-
[47]
2021, Opt
Nousiainen, J., Rajani, C., Kasper, M., & Helin, T. 2021, Opt. Express, 29, 15327
2021
-
[48]
2023, Photonics, 10
Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10
2023
-
[49]
Paschall, R. N. & Anderson, D. J. 1993, Appl. Opt., 32, 6347 Pérez-Fernández, S., Buendía-Roca, A., González-Gutiérrez, C., et al. 2025, Mathematics, 13, 1028
1993
-
[50]
2022, Opt
Pou, B., Ferreira, F., Quinones, E., Gratadour, D., & Martin, M. 2022, Opt. Ex- press, 30, 2991
2022
-
[51]
2024, Opt
Pou, B., Smith, J., Quinones, E., Martin, M., & Gratadour, D. 2024, Opt. Express, 32, 37011
2024
-
[52]
A., Macintosh, B
Poyneer, L. A., Macintosh, B. A., & Véran, J.-P. 2007, JOSA A, 24, 2645
2007
-
[53]
1999, Adaptive optics in astronomy (Cambridge University)
Roddier, F. 1999, Adaptive optics in astronomy (Cambridge University)
1999
-
[54]
2020, MNRAS, 498, 3228
Sinquin, B., Prengère, L., Kulcsár, C., et al. 2020, MNRAS, 498, 3228
2020
-
[55]
2023, in AO4ELT7, 457940
Striffling, A., Fétick, R., Chambouleyron, V ., et al. 2023, in AO4ELT7, 457940
2023
-
[56]
J.-L., et al
Striffling, A., Héritier, C.-T., Fétick, R. J.-L., et al. 2025, A&A, 703, A253
2025
-
[57]
2017, Optics Communications, 382, 519
Sun, Z., Chen, Y ., Li, X., Qin, X., & Wang, H. 2017, Optics Communications, 382, 519
2017
-
[58]
2018, in Proc
Swanson, R., Lamb, M., Correia, C., Sivanandam, S., & Kutulakos, K. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 107031F
2018
-
[59]
M., Sivanandam, S., & Kutulakos, K
Swanson, R., Lamb, M., Correia, C. M., Sivanandam, S., & Kutulakos, K. 2021, MNRAS, 503, 2944 van Kooten, M., Doelman, N., & Kenworthy, M. 2017, Performance of AO pre- dictive control in the presence of non-stationary turbulence (Instituto de As- trofisica de Canarias) van Kooten, M., Doelman, N., & Kenworthy, M. 2019, JOSA A, 36, 731 van Kooten, M. A., J...
2021
-
[60]
2024, in Proc
Weinberger, C., Neichel, B., Tapia, J., & Vera, E. 2024, in Proc. SPIE Conf., V ol. 13097, 130970S
2024
-
[61]
2024, A&A, 687, A202
Weinberger, C., Tapia, J., Neichel, B., & Vera, E. 2024, A&A, 687, A202
2024
-
[62]
P., Norris, B
Wong, A. P., Norris, B. R., Deo, V ., et al. 2023, PASP, 135, 114501
2023
-
[63]
P., Norris, B
Wong, A. P., Norris, B. R., Tuthill, P. G., et al. 2021, JATIS, 7, 019001
2021
-
[64]
Xiong, Y ., Guo, L., Huang, Y ., & Chen, L. 2020, J. Thermophys. Heat Transf., 34, 37
2020
-
[65]
& Avruch, I
Yatawatta, S. & Avruch, I. M. 2021, MNRAS, 505, 2141 Article number, page 11 of 14 A&A proofs:manuscript no. aa59769-26 Appendix A: Additional telemetry analysis For interested readers, we have added several additional teleme- try plots. For each dataset presented in the paper, we plot the wavefront mean-squared error (MSE) at each time step and com- pare...
2021
-
[66]
A.1: Additional first night telemetry analysis
Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) Fig. A.1: Additional first night telemetry analysis. (a) mean- squared wavefront error at each time step: blue line is for the integrator and orange line for PO4AO. (b) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO variance for each KL mode. Moreover,...
1950
-
[67]
Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) 0 2000 4000 6000 8000 10000 12000 14000 time step (t) 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007T otal MSE HD177809, V = 5.72 Total MSE Integrator PO4AO (c) 0 25 50 75 100 125 150 175 200 KL mode index 1.0 1.5 2.0 2.5 3.0 3.5Residual variance gain HD177809, V = 5.72 PO4AO gain in Variance ...
2000
-
[68]
A.3: Additional second night telemetry analysis
Cygni, V = 6.66 PO4AO gain in Variance Integrator / PO4AO (f) Fig. A.3: Additional second night telemetry analysis. (a, c, e) mean-squared wavefront error at each time step for each target: blue lines are for the integrator and orange lines for PO4AO. (b, d, f) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO vari...
2000
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.