On-sky demonstration of reinforcement learning for adaptive optics control

Angelie Alagao; Benoit Neichel; Byron Engler; Jalo Nousiainen; Jean-Francois Sauvage; Jonathan Dray; Markus Kasper; Romain Fetick; Sylvain Cetre; Vincent Chambouleyron

arxiv: 2606.10771 · v1 · pith:VMK2H2XNnew · submitted 2026-06-09 · 🌌 astro-ph.IM · cs.LG· cs.RO

On-sky demonstration of reinforcement learning for adaptive optics control

Jalo Nousiainen , Vincent Chambouleyron , Benoit Neichel , Sylvain Cetre , Jean-Francois Sauvage , Angelie Alagao , Markus Kasper , Jonathan Dray

show 2 more authors

Romain Fetick Byron Engler

This is my paper

Pith reviewed 2026-06-27 11:38 UTC · model grok-4.3

classification 🌌 astro-ph.IM cs.LGcs.RO

keywords adaptive opticsreinforcement learningon-sky demonstrationvibration compensationreal-time controltelescope instrumentationpolicy optimization

0 comments

The pith

Reinforcement learning controller PO4AO outperforms standard integrator in first on-sky adaptive optics tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the first on-sky validation of a reinforcement learning controller for adaptive optics, deployed on the Papyrus system at a 1.52 m telescope. PO4AO beat the conventional integrator across multiple nights and conditions while learning vibration patterns and resisting measurement noise. It ran with one fixed set of settings for different targets and seeing levels even though its Python code added latency and occasional frame drops. The results indicate that a properly optimized version could serve as a reliable, turnkey controller for single-conjugate adaptive optics.

Core claim

PO4AO, a policy-optimization reinforcement learning controller, was interfaced with the existing real-time controller via shared memory and tested on sky against a standard integrator. It delivered higher performance in every configuration tested, compensated for vibrations, remained robust to noise, and required no retuning of hyperparameters when flux levels or atmospheric conditions changed.

What carries the argument

The PO4AO reinforcement learning policy that maps wavefront-sensor measurements to deformable-mirror commands and learns corrections online instead of using a fixed integrator.

If this is right

The controller learns and compensates for vibration patterns present in the real telescope environment.
It maintains performance under photon and detector noise without special tuning.
A single set of hyperparameters suffices across a range of flux levels and atmospheric conditions.
When ported to an optimized real-time language the method becomes a practical turnkey option for single-conjugate adaptive optics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same learning approach could be tested on multi-conjugate or extreme adaptive optics systems where vibrations and misregistrations are more complex.
Faster implementations would remove the current latency penalty and allow direct comparison of control bandwidths.
RL controllers might be applied to other real-time astronomy tasks such as tip-tilt or coronagraph alignment once the on-sky proof is established.

Load-bearing premise

The performance comparison remains fair even though the Python implementation of PO4AO added 750 microseconds of latency, control jitter, and occasional frame drops that the baseline integrator did not experience.

What would settle it

A side-by-side test in which the standard integrator is also run through the same Python interface with identical added latency and frame-drop statistics, checking whether PO4AO still outperforms.

Figures

Figures reproduced from arXiv: 2606.10771 by Angelie Alagao, Benoit Neichel, Byron Engler, Jalo Nousiainen, Jean-Francois Sauvage, Jonathan Dray, Markus Kasper, Romain Fetick, Sylvain Cetre, Vincent Chambouleyron.

**Figure 2.** Figure 2: PSF during the 1st night, under the strong vibration. Left [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: First night telemetry analysis. Top: Residual variance per [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 2.** Figure 2: The vibration was irregular, making it di [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 4.** Figure 4: PSF during the second night. Each row compares the best [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Strehl estimation of each PSF data set ( [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Second night telemetry analysis: Vega modal variance [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 9.** Figure 9: Control matrices Cˆ for the integrator and po4ao controllers. Top: Integrator. Bottom: PO4AO. 5.2. Application on PAPYRUS telemetry Using PAPYRUS telemetry acquired with the integrator controller and the PO4AO closed-loop configuration described in Sec. 3, we computed the control matrix Cˆ for each case. The computation used l = 15, 000 timesteps and included only the actuators fully illuminated on the d… view at source ↗

**Figure 8.** Figure 8: Third night telemetry analysis. To retrieve the control matrix Cˆ, one can build two history matrices Hs ∈ R nact×l and H ∈ R 2nact×l from l timesteps, such as: H s = h a(1) · · · a(t) · · · a(l) i H = " o(1) · · · o(t) · · · o(l) o(0) · · · o(t − 1) · · · o(l − 1)# (7) Using these history matrices, we can estimate the control matrix: Cˆ = H sH + (8) where · + denotes the pseudo-inverse. po4ao integrator … view at source ↗

**Figure 12.** Figure 12: Eigenvalues for the first nact × nact block of the control matrices of the integrator and PO4AO controllers. Article number, page 9 of 14 [PITH_FULL_IMAGE:figures/full_fig_p009_12.png] view at source ↗

read the original abstract

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coud\'e focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,\mu\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

First on-sky RL AO controller that beats the integrator on multiple nights, but the baseline comparison needs explicit checks on latency and jitter matching.

read the letter

The paper reports the first on-sky demonstration of reinforcement learning for adaptive optics control. PO4AO ran on the Papyrus system at the 1.52 m OHP telescope and outperformed the standard integrator across several nights and flux levels.

The work moves the method from simulation and lab tests to real telescope data. It learned vibration patterns, stayed robust to measurement noise, and used one fixed set of hyperparameters across targets and conditions. The Python implementation with shared-memory interface to the existing RTC is a concrete deployment step, even if not optimized.

The main soft spot is the fairness of the integrator comparison. The abstract states that gains occurred despite the added 750 μs latency, jitter, and occasional frame drops from the Python code. It does not say whether the baseline integrator was retuned to the same conditions or left in its native state. Without that detail, the margin could partly reflect mismatched operating conditions rather than controller performance. The abstract also omits quantitative metrics and error bars, which limits how much can be judged from the summary alone.

The evidence is empirical and measured against an external baseline, with no circular fitting or invented parameters. Data from multiple nights and conditions supports the robustness claim.

This is for AO control and instrumentation groups. It deserves a serious referee because the on-sky validation is new and practically relevant, provided the full paper supplies the numbers and clarifies the baseline setup.

Referee Report

1 major / 1 minor

Summary. The manuscript reports the first on-sky demonstration of a reinforcement learning controller (PO4AO) for adaptive optics, implemented on the Papyrus system at the 1.52 m OHP telescope. Across multiple nights and a range of flux levels and atmospheric conditions, PO4AO outperformed a standard integrator controller, learned and compensated for vibration patterns, and exhibited robustness to measurement noise while operating in a turnkey manner with fixed hyperparameters. These gains occurred despite added latency (~750 μs), jitter, and frame drops from a non-optimized Python implementation interfaced via shared memory to the DAO RTC.

Significance. If the performance comparison is shown to be fair, this constitutes a significant empirical result as the first on-sky validation of RL-based AO control. The multi-night dataset across conditions, combined with the demonstration of vibration compensation and noise robustness, provides concrete evidence supporting RL as a practical alternative to classical integrators for single-conjugate AO, with potential for broader operational adoption once optimized.

major comments (1)

[Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.

minor comments (1)

[Abstract] Abstract: No quantitative performance metrics (e.g., Strehl ratio, residual wavefront error, or improvement factors with uncertainties) are provided to support the outperformance claim; adding these would improve clarity and allow readers to assess the magnitude of the gains.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their thorough review and for recognizing the significance of our on-sky demonstration of PO4AO. We provide a point-by-point response to the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that PO4AO 'consistently outperformed the standard integrator in all tested configurations' and that 'performance gains were achieved despite' the Python implementation's added latency, jitter, and frame drops does not state whether the baseline integrator was retuned, re-optimized, or evaluated under matched latency/jitter conditions. This detail is load-bearing for attributing the reported margin to algorithmic differences rather than implementation asymmetry.

Authors: We agree that this information is important for a fair interpretation of the results. The standard integrator controller is the one already deployed in the DAO RTC and was used in its standard operational configuration without additional retuning or optimization for the purpose of this comparison. PO4AO was interfaced via shared memory, introducing the reported additional latency, jitter, and frame drops, while the integrator operated at the native latency of the RTC. We will revise the abstract to explicitly clarify that the baseline integrator was evaluated under its native conditions without matched implementation overhead, thereby strengthening the claim that the performance gains are attributable to the RL algorithm despite these disadvantages. revision: yes

Circularity Check

0 steps flagged

Empirical demonstration with no derivation chain present

full rationale

The paper reports on-sky experimental results comparing the PO4AO reinforcement learning controller to a standard integrator across flux levels and conditions. No mathematical derivation, first-principles result, fitted parameter renamed as prediction, or self-citation chain is invoked to support a claimed prediction. The central claim (outperformance) is measured against an external baseline controller and is therefore falsifiable by direct observation rather than reducing to the paper's own inputs by construction. No steps matching the enumerated circularity patterns exist.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical on-sky demonstration rather than a theoretical derivation, so the ledger contains no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5880 in / 1035 out tokens · 17694 ms · 2026-06-27T11:38:26.093171+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

68 extracted references · 3 canonical work pages · 2 internal anchors

[1]

2023, JATIS, 9, 049005

Archinuk, F., Hafeez, R., Fabbro, S., Teimoorinia, H., & Véran, J.-P. 2023, JATIS, 9, 049005

2023
[2]

Babcock, H. W. 1953, PASP, 65, 229

1953
[3]

2025, Durham-Adaptive- Optics/daoBase: Initial Release

Barr, D., Cetre, S., Connolly, J., & Davies, T. 2025, Durham-Adaptive- Optics/daoBase: Initial Release

2025
[4]

2020, arXiv preprint arXiv:2003.05714

Boccaletti, A., Chauvin, G., Mouillet, D., et al. 2020, arXiv preprint arXiv:2003.05714

work page arXiv 2020
[5]

2013, in Proc

Bonneville, C., Thomas, F., de Mengin Poirier, M., et al. 2013, in Proc. SPIE Conf., V ol. 8616, SPIE, 163–177

2013
[6]

2025, Science, 389, 1012

Buchli, J., Tracey, B., Andric, T., et al. 2025, Science, 389, 1012

2025
[7]

T., Gray, M., & Neichel, B

Camelo, R., Nousiainen, J., Heritier, C. T., Gray, M., & Neichel, B. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 233–239

2024
[8]

T., Morgan, G., & Neichel, B

Camelo, R., Nousiainen, J., Heritier, C. T., Morgan, G., & Neichel, B. 2023, in AO4ELT7

2023
[9]

H., Dohlen, K., et al

Cantalloube, F., Por, E. H., Dohlen, K., et al. 2018, A&A, 620, L10

2018
[10]

2022, in Proc

Carlotti, A., Bidot, A., Mouillet, D., et al. 2022, in Proc. SPIE Conf., V ol. 12184, SPIE, 523–543

2022
[11]

2020, A&A, 644, A6

Chambouleyron, V ., Fauvarque, O., Janin-Potiron, P., et al. 2020, A&A, 644, A6

2020
[12]

2024, A&A, 681, A48

Chambouleyron, V ., Sengupta, A., Salama, M., et al. 2024, A&A, 681, A48

2024
[13]

2011, in AO4ELT

Conan, J.-M., Raynaud, H., AR, Kulcsár, C., Meimon, S., & Sivo, G. 2011, in AO4ELT

2011
[14]

M., Bond, C

Correia, C. M., Bond, C. Z., Sauvage, J.-F., et al. 2017, JOSA A, 34, 1877

2017
[15]

2022, Nat, 602, 414

Degrave, J., Felici, F., Buchli, J., et al. 2022, Nat, 602, 414

2022
[16]

2019, A&A, 629, A107

Deo, V ., Gendron, É., Rousset, G., et al. 2019, A&A, 629, A107

2019
[17]

1998, Appl

Dessenne, C., Madec, P.-Y ., & Rousset, G. 1998, Appl. Opt., 37, 4623

1998
[18]

2024, in Proc

Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891

2024
[19]

2024, in Proc

Dray, J., Sinquin, B., Gray, M., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1862–1868

2024
[20]

Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomedical Opt. Express, 12, 5423

2021
[21]

2023, in AO4ELT7 Fétick, R

Fetick, R., Chambouleyron, V ., Muslimov, E., et al. 2023, in AO4ELT7 Fétick, R. J. L., Fusco, T., Neichel, B., et al. 2019, A&A, 628, A99

2023
[22]

& Landman, R

Fowler, J. & Landman, R. 2023, Proc. SPIE Conf., 12680, 100

2023
[23]

Frazin, R. A. 2018, arXiv preprint arXiv:1804.01011

work page internal anchor Pith review Pith/arXiv arXiv 2018
[24]

1994, in European Southern Observatory Conference and Workshop

Gendron, E. 1994, in European Southern Observatory Conference and Workshop

1994
[25]

& Le Roux, B

Gray, M. & Le Roux, B. 2012, in Proc. SPIE Conf., V ol. 8447, SPIE, 84471T

2012
[26]

Guerra-Ramos, D., Trujillo-Sevilla, J., & Rodríguez-Ramos, J. M. 2020, applied sciences, 10, 3207

2020
[27]

2018, Annual Review of Astronomy and Astrophysics, 56, 315

Guyon, O. 2018, Annual Review of Astronomy and Astrophysics, 56, 315

2018
[28]

Adaptive Optics Predictive Control with Empirical Orthogonal Functions (EOFs)

Guyon, O. & Males, J. 2017, arXiv preprint arXiv:1707.00570

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Y ., Males, J., Close, L., et al

Haffert, S. Y ., Males, J., Close, L., et al. 2021, in Proc. SPIE Conf., V ol. 11823, SPIE, 118231C

2021
[30]

2018, MNRAS, 481, 2829

Heritier, C., Esposito, S., Fusco, T., et al. 2018, MNRAS, 481, 2829

2018
[31]

& Ramlau, R

Hutterer, V . & Ramlau, R. 2018, Appl. Opt., 57, 8790

2018
[32]

2019, Inverse Problems, 35, 045008

Hutterer, V ., Ramlau, R., & Shatokhina, I. 2019, Inverse Problems, 35, 045008

2019
[33]

2015, PASP, 127, 890

Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890

2015
[34]

2024, Scientific reports, 14, 15733

Kaiser, J., Xu, C., Eichler, A., et al. 2024, Scientific reports, 14, 15733

2024
[35]

2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P

Ke, H., Xu, B., Xu, Z., et al. 2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P. V . D. 2006, Opt. Express, 14(17):7464–7476

2019
[36]

2025, A&A, 696, L1

Landman, R., Haffert, S., Long, J., et al. 2025, A&A, 696, L1

2025
[37]

2024, A&A, 684, A114

Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114

2024
[38]

& Haffert, S

Landman, R. & Haffert, S. Y . 2020, Opt. Express, 28, 16644

2020
[39]

Y ., Radhakrishnan, V

Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2020, in Proc. SPIE Conf., V ol. 11448, SPIE, 1144849

2020
[40]

Y ., Radhakrishnan, V

Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2021, JATIS, 7, 039002

2021
[41]

2019, in ICANN, Springer, 537–542

Liu, X., Morris, T., & Saunter, C. 2019, in ICANN, Springer, 537–542

2019
[42]

2024, in Proc

Lovis, C., Blind, N., Chazelas, B., et al. 2024, in Proc. SPIE Conf., V ol. 13096, SPIE, 412–417 Article number, page 10 of 14 J. Nousiainen et al.: On-sky demonstration of reinforcement learning for adaptive optics control

2024
[43]

R., Close, L

Males, J. R., Close, L. M., Miller, K., et al. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 1070309

2018
[44]

Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001

2018
[45]

1989, The Messenger, 58, 1

Merkle, F., Kern, P., Léna, P., et al. 1989, The Messenger, 58, 1

1989
[46]

2021, in Proc

Muslimov, E., Levraud, N., Chambouleyron, V ., et al. 2021, in Proc. SPIE Conf., V ol. 11876, SPIE, 56–68

2021
[47]

2021, Opt

Nousiainen, J., Rajani, C., Kasper, M., & Helin, T. 2021, Opt. Express, 29, 15327

2021
[48]

2023, Photonics, 10

Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10

2023
[49]

Paschall, R. N. & Anderson, D. J. 1993, Appl. Opt., 32, 6347 Pérez-Fernández, S., Buendía-Roca, A., González-Gutiérrez, C., et al. 2025, Mathematics, 13, 1028

1993
[50]

2022, Opt

Pou, B., Ferreira, F., Quinones, E., Gratadour, D., & Martin, M. 2022, Opt. Ex- press, 30, 2991

2022
[51]

2024, Opt

Pou, B., Smith, J., Quinones, E., Martin, M., & Gratadour, D. 2024, Opt. Express, 32, 37011

2024
[52]

A., Macintosh, B

Poyneer, L. A., Macintosh, B. A., & Véran, J.-P. 2007, JOSA A, 24, 2645

2007
[53]

1999, Adaptive optics in astronomy (Cambridge University)

Roddier, F. 1999, Adaptive optics in astronomy (Cambridge University)

1999
[54]

2020, MNRAS, 498, 3228

Sinquin, B., Prengère, L., Kulcsár, C., et al. 2020, MNRAS, 498, 3228

2020
[55]

2023, in AO4ELT7, 457940

Striffling, A., Fétick, R., Chambouleyron, V ., et al. 2023, in AO4ELT7, 457940

2023
[56]

J.-L., et al

Striffling, A., Héritier, C.-T., Fétick, R. J.-L., et al. 2025, A&A, 703, A253

2025
[57]

2017, Optics Communications, 382, 519

Sun, Z., Chen, Y ., Li, X., Qin, X., & Wang, H. 2017, Optics Communications, 382, 519

2017
[58]

2018, in Proc

Swanson, R., Lamb, M., Correia, C., Sivanandam, S., & Kutulakos, K. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 107031F

2018
[59]

M., Sivanandam, S., & Kutulakos, K

Swanson, R., Lamb, M., Correia, C. M., Sivanandam, S., & Kutulakos, K. 2021, MNRAS, 503, 2944 van Kooten, M., Doelman, N., & Kenworthy, M. 2017, Performance of AO pre- dictive control in the presence of non-stationary turbulence (Instituto de As- trofisica de Canarias) van Kooten, M., Doelman, N., & Kenworthy, M. 2019, JOSA A, 36, 731 van Kooten, M. A., J...

2021
[60]

2024, in Proc

Weinberger, C., Neichel, B., Tapia, J., & Vera, E. 2024, in Proc. SPIE Conf., V ol. 13097, 130970S

2024
[61]

2024, A&A, 687, A202

Weinberger, C., Tapia, J., Neichel, B., & Vera, E. 2024, A&A, 687, A202

2024
[62]

P., Norris, B

Wong, A. P., Norris, B. R., Deo, V ., et al. 2023, PASP, 135, 114501

2023
[63]

P., Norris, B

Wong, A. P., Norris, B. R., Tuthill, P. G., et al. 2021, JATIS, 7, 019001

2021
[64]

Xiong, Y ., Guo, L., Huang, Y ., & Chen, L. 2020, J. Thermophys. Heat Transf., 34, 37

2020
[65]

& Avruch, I

Yatawatta, S. & Avruch, I. M. 2021, MNRAS, 505, 2141 Article number, page 11 of 14 A&A proofs:manuscript no. aa59769-26 Appendix A: Additional telemetry analysis For interested readers, we have added several additional teleme- try plots. For each dataset presented in the paper, we plot the wavefront mean-squared error (MSE) at each time step and com- pare...

2021
[66]

A.1: Additional first night telemetry analysis

Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) Fig. A.1: Additional first night telemetry analysis. (a) mean- squared wavefront error at each time step: blue line is for the integrator and orange line for PO4AO. (b) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO variance for each KL mode. Moreover,...

1950
[67]

Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) 0 2000 4000 6000 8000 10000 12000 14000 time step (t) 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007T otal MSE HD177809, V = 5.72 Total MSE Integrator PO4AO (c) 0 25 50 75 100 125 150 175 200 KL mode index 1.0 1.5 2.0 2.5 3.0 3.5Residual variance gain HD177809, V = 5.72 PO4AO gain in Variance ...

2000
[68]

A.3: Additional second night telemetry analysis

Cygni, V = 6.66 PO4AO gain in Variance Integrator / PO4AO (f) Fig. A.3: Additional second night telemetry analysis. (a, c, e) mean-squared wavefront error at each time step for each target: blue lines are for the integrator and orange lines for PO4AO. (b, d, f) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO vari...

2000

[1] [1]

2023, JATIS, 9, 049005

Archinuk, F., Hafeez, R., Fabbro, S., Teimoorinia, H., & Véran, J.-P. 2023, JATIS, 9, 049005

2023

[2] [2]

Babcock, H. W. 1953, PASP, 65, 229

1953

[3] [3]

2025, Durham-Adaptive- Optics/daoBase: Initial Release

Barr, D., Cetre, S., Connolly, J., & Davies, T. 2025, Durham-Adaptive- Optics/daoBase: Initial Release

2025

[4] [4]

2020, arXiv preprint arXiv:2003.05714

Boccaletti, A., Chauvin, G., Mouillet, D., et al. 2020, arXiv preprint arXiv:2003.05714

work page arXiv 2020

[5] [5]

2013, in Proc

Bonneville, C., Thomas, F., de Mengin Poirier, M., et al. 2013, in Proc. SPIE Conf., V ol. 8616, SPIE, 163–177

2013

[6] [6]

2025, Science, 389, 1012

Buchli, J., Tracey, B., Andric, T., et al. 2025, Science, 389, 1012

2025

[7] [7]

T., Gray, M., & Neichel, B

Camelo, R., Nousiainen, J., Heritier, C. T., Gray, M., & Neichel, B. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 233–239

2024

[8] [8]

T., Morgan, G., & Neichel, B

Camelo, R., Nousiainen, J., Heritier, C. T., Morgan, G., & Neichel, B. 2023, in AO4ELT7

2023

[9] [9]

H., Dohlen, K., et al

Cantalloube, F., Por, E. H., Dohlen, K., et al. 2018, A&A, 620, L10

2018

[10] [10]

2022, in Proc

Carlotti, A., Bidot, A., Mouillet, D., et al. 2022, in Proc. SPIE Conf., V ol. 12184, SPIE, 523–543

2022

[11] [11]

2020, A&A, 644, A6

Chambouleyron, V ., Fauvarque, O., Janin-Potiron, P., et al. 2020, A&A, 644, A6

2020

[12] [12]

2024, A&A, 681, A48

Chambouleyron, V ., Sengupta, A., Salama, M., et al. 2024, A&A, 681, A48

2024

[13] [13]

2011, in AO4ELT

Conan, J.-M., Raynaud, H., AR, Kulcsár, C., Meimon, S., & Sivo, G. 2011, in AO4ELT

2011

[14] [14]

M., Bond, C

Correia, C. M., Bond, C. Z., Sauvage, J.-F., et al. 2017, JOSA A, 34, 1877

2017

[15] [15]

2022, Nat, 602, 414

Degrave, J., Felici, F., Buchli, J., et al. 2022, Nat, 602, 414

2022

[16] [16]

2019, A&A, 629, A107

Deo, V ., Gendron, É., Rousset, G., et al. 2019, A&A, 629, A107

2019

[17] [17]

1998, Appl

Dessenne, C., Madec, P.-Y ., & Rousset, G. 1998, Appl. Opt., 37, 4623

1998

[18] [18]

2024, in Proc

Dinis, I., Wildi, F., Ségransan, D., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1876–1891

2024

[19] [19]

2024, in Proc

Dray, J., Sinquin, B., Gray, M., et al. 2024, in Proc. SPIE Conf., V ol. 13097, SPIE, 1862–1868

2024

[20] [20]

Durech, E., Newberry, W., Franke, J., & Sarunic, M. V . 2021, Biomedical Opt. Express, 12, 5423

2021

[21] [21]

2023, in AO4ELT7 Fétick, R

Fetick, R., Chambouleyron, V ., Muslimov, E., et al. 2023, in AO4ELT7 Fétick, R. J. L., Fusco, T., Neichel, B., et al. 2019, A&A, 628, A99

2023

[22] [22]

& Landman, R

Fowler, J. & Landman, R. 2023, Proc. SPIE Conf., 12680, 100

2023

[23] [23]

Frazin, R. A. 2018, arXiv preprint arXiv:1804.01011

work page internal anchor Pith review Pith/arXiv arXiv 2018

[24] [24]

1994, in European Southern Observatory Conference and Workshop

Gendron, E. 1994, in European Southern Observatory Conference and Workshop

1994

[25] [25]

& Le Roux, B

Gray, M. & Le Roux, B. 2012, in Proc. SPIE Conf., V ol. 8447, SPIE, 84471T

2012

[26] [26]

Guerra-Ramos, D., Trujillo-Sevilla, J., & Rodríguez-Ramos, J. M. 2020, applied sciences, 10, 3207

2020

[27] [27]

2018, Annual Review of Astronomy and Astrophysics, 56, 315

Guyon, O. 2018, Annual Review of Astronomy and Astrophysics, 56, 315

2018

[28] [28]

Adaptive Optics Predictive Control with Empirical Orthogonal Functions (EOFs)

Guyon, O. & Males, J. 2017, arXiv preprint arXiv:1707.00570

work page internal anchor Pith review Pith/arXiv arXiv 2017

[29] [29]

Y ., Males, J., Close, L., et al

Haffert, S. Y ., Males, J., Close, L., et al. 2021, in Proc. SPIE Conf., V ol. 11823, SPIE, 118231C

2021

[30] [30]

2018, MNRAS, 481, 2829

Heritier, C., Esposito, S., Fusco, T., et al. 2018, MNRAS, 481, 2829

2018

[31] [31]

& Ramlau, R

Hutterer, V . & Ramlau, R. 2018, Appl. Opt., 57, 8790

2018

[32] [32]

2019, Inverse Problems, 35, 045008

Hutterer, V ., Ramlau, R., & Shatokhina, I. 2019, Inverse Problems, 35, 045008

2019

[33] [33]

2015, PASP, 127, 890

Jovanovic, N., Martinache, F., Guyon, O., et al. 2015, PASP, 127, 890

2015

[34] [34]

2024, Scientific reports, 14, 15733

Kaiser, J., Xu, C., Eichler, A., et al. 2024, Scientific reports, 14, 15733

2024

[35] [35]

2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P

Ke, H., Xu, B., Xu, Z., et al. 2019, Optik, 178, 785 Kulcsár, C., Raynaud, H.-F., Petit, C., Conan, J.-M., & Lesegno, P. V . D. 2006, Opt. Express, 14(17):7464–7476

2019

[36] [36]

2025, A&A, 696, L1

Landman, R., Haffert, S., Long, J., et al. 2025, A&A, 696, L1

2025

[37] [37]

2024, A&A, 684, A114

Landman, R., Haffert, S., Males, J., et al. 2024, A&A, 684, A114

2024

[38] [38]

& Haffert, S

Landman, R. & Haffert, S. Y . 2020, Opt. Express, 28, 16644

2020

[39] [39]

Y ., Radhakrishnan, V

Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2020, in Proc. SPIE Conf., V ol. 11448, SPIE, 1144849

2020

[40] [40]

Y ., Radhakrishnan, V

Landman, R., Haffert, S. Y ., Radhakrishnan, V . M., & Keller, C. U. 2021, JATIS, 7, 039002

2021

[41] [41]

2019, in ICANN, Springer, 537–542

Liu, X., Morris, T., & Saunter, C. 2019, in ICANN, Springer, 537–542

2019

[42] [42]

2024, in Proc

Lovis, C., Blind, N., Chazelas, B., et al. 2024, in Proc. SPIE Conf., V ol. 13096, SPIE, 412–417 Article number, page 10 of 14 J. Nousiainen et al.: On-sky demonstration of reinforcement learning for adaptive optics control

2024

[43] [43]

R., Close, L

Males, J. R., Close, L. M., Miller, K., et al. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 1070309

2018

[44] [44]

Males, J. R. & Guyon, O. 2018, JATIS, 4, 019001

2018

[45] [45]

1989, The Messenger, 58, 1

Merkle, F., Kern, P., Léna, P., et al. 1989, The Messenger, 58, 1

1989

[46] [46]

2021, in Proc

Muslimov, E., Levraud, N., Chambouleyron, V ., et al. 2021, in Proc. SPIE Conf., V ol. 11876, SPIE, 56–68

2021

[47] [47]

2021, Opt

Nousiainen, J., Rajani, C., Kasper, M., & Helin, T. 2021, Opt. Express, 29, 15327

2021

[48] [48]

2023, Photonics, 10

Parvizi, P., Zou, R., Bellinger, C., Cheriton, R., & Spinello, D. 2023, Photonics, 10

2023

[49] [49]

Paschall, R. N. & Anderson, D. J. 1993, Appl. Opt., 32, 6347 Pérez-Fernández, S., Buendía-Roca, A., González-Gutiérrez, C., et al. 2025, Mathematics, 13, 1028

1993

[50] [50]

2022, Opt

Pou, B., Ferreira, F., Quinones, E., Gratadour, D., & Martin, M. 2022, Opt. Ex- press, 30, 2991

2022

[51] [51]

2024, Opt

Pou, B., Smith, J., Quinones, E., Martin, M., & Gratadour, D. 2024, Opt. Express, 32, 37011

2024

[52] [52]

A., Macintosh, B

Poyneer, L. A., Macintosh, B. A., & Véran, J.-P. 2007, JOSA A, 24, 2645

2007

[53] [53]

1999, Adaptive optics in astronomy (Cambridge University)

Roddier, F. 1999, Adaptive optics in astronomy (Cambridge University)

1999

[54] [54]

2020, MNRAS, 498, 3228

Sinquin, B., Prengère, L., Kulcsár, C., et al. 2020, MNRAS, 498, 3228

2020

[55] [55]

2023, in AO4ELT7, 457940

Striffling, A., Fétick, R., Chambouleyron, V ., et al. 2023, in AO4ELT7, 457940

2023

[56] [56]

J.-L., et al

Striffling, A., Héritier, C.-T., Fétick, R. J.-L., et al. 2025, A&A, 703, A253

2025

[57] [57]

2017, Optics Communications, 382, 519

Sun, Z., Chen, Y ., Li, X., Qin, X., & Wang, H. 2017, Optics Communications, 382, 519

2017

[58] [58]

2018, in Proc

Swanson, R., Lamb, M., Correia, C., Sivanandam, S., & Kutulakos, K. 2018, in Proc. SPIE Conf., V ol. 10703, SPIE, 107031F

2018

[59] [59]

M., Sivanandam, S., & Kutulakos, K

Swanson, R., Lamb, M., Correia, C. M., Sivanandam, S., & Kutulakos, K. 2021, MNRAS, 503, 2944 van Kooten, M., Doelman, N., & Kenworthy, M. 2017, Performance of AO pre- dictive control in the presence of non-stationary turbulence (Instituto de As- trofisica de Canarias) van Kooten, M., Doelman, N., & Kenworthy, M. 2019, JOSA A, 36, 731 van Kooten, M. A., J...

2021

[60] [60]

2024, in Proc

Weinberger, C., Neichel, B., Tapia, J., & Vera, E. 2024, in Proc. SPIE Conf., V ol. 13097, 130970S

2024

[61] [61]

2024, A&A, 687, A202

Weinberger, C., Tapia, J., Neichel, B., & Vera, E. 2024, A&A, 687, A202

2024

[62] [62]

P., Norris, B

Wong, A. P., Norris, B. R., Deo, V ., et al. 2023, PASP, 135, 114501

2023

[63] [63]

P., Norris, B

Wong, A. P., Norris, B. R., Tuthill, P. G., et al. 2021, JATIS, 7, 019001

2021

[64] [64]

Xiong, Y ., Guo, L., Huang, Y ., & Chen, L. 2020, J. Thermophys. Heat Transf., 34, 37

2020

[65] [65]

& Avruch, I

Yatawatta, S. & Avruch, I. M. 2021, MNRAS, 505, 2141 Article number, page 11 of 14 A&A proofs:manuscript no. aa59769-26 Appendix A: Additional telemetry analysis For interested readers, we have added several additional teleme- try plots. For each dataset presented in the paper, we plot the wavefront mean-squared error (MSE) at each time step and com- pare...

2021

[66] [66]

A.1: Additional first night telemetry analysis

Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) Fig. A.1: Additional first night telemetry analysis. (a) mean- squared wavefront error at each time step: blue line is for the integrator and orange line for PO4AO. (b) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO variance for each KL mode. Moreover,...

1950

[67] [67]

Vega, V = 0.09 PO4AO gain in Variance Integrator / PO4AO (b) 0 2000 4000 6000 8000 10000 12000 14000 time step (t) 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007T otal MSE HD177809, V = 5.72 Total MSE Integrator PO4AO (c) 0 25 50 75 100 125 150 175 200 KL mode index 1.0 1.5 2.0 2.5 3.0 3.5Residual variance gain HD177809, V = 5.72 PO4AO gain in Variance ...

2000

[68] [68]

A.3: Additional second night telemetry analysis

Cygni, V = 6.66 PO4AO gain in Variance Integrator / PO4AO (f) Fig. A.3: Additional second night telemetry analysis. (a, c, e) mean-squared wavefront error at each time step for each target: blue lines are for the integrator and orange lines for PO4AO. (b, d, f) Comparison between residual modal variance, i.e., integrator variance divided by the PO4AO vari...

2000