pith. machine review for the scientific record. sign in

arxiv: 2604.06497 · v1 · submitted 2026-04-07 · 💻 cs.CE · cs.SY· eess.SY

Recognition: 2 theorem links

· Lean Theorem

Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs

Authors on Pith no claims yet

Pith reviewed 2026-05-10 18:04 UTC · model grok-4.3

classification 💻 cs.CE cs.SYeess.SY
keywords hypernetworkreinforcement learningparametric controlchaotic PDEKuramoto-Sivashinsky equationKolmogorov-Arnold networkdistributional value estimationunified control manifold
0
0 comments X

The pith

Mapping a forcing parameter directly to policy weights lets one reinforcement learner stabilize chaotic PDEs across regimes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that hypernetworks can take a scalar physical forcing parameter and output the weights of a spatial feedback policy, creating a single trained controller that works for multiple operating points in chaotic systems. This addresses the intractability of classical adjoint methods, which must recompute an entire control law whenever the parameter changes. The authors test the idea on the Kuramoto-Sivashinsky equation using three hypernetwork forms and show that all stabilize the system, while Kolmogorov-Arnold networks maintain the most consistent energy suppression and tracking on unseen parameter values. Parallel simulation further allows a 37 percent cut in training time by accepting a small drop in peak reward.

Core claim

By mapping the physical forcing parameter μ directly to the weights of a spatial feedback policy, the hyperFastRL architecture learns a unified parametric control manifold for the Kuramoto-Sivashinsky equation. Residual MLPs, periodic Fourier networks, and Kolmogorov-Arnold networks are all shown to produce stabilizing controllers. Pessimistic distributional value estimation is used to manage the high variance of chaotic rewards. Kolmogorov-Arnold networks deliver the most consistent energy-cascade suppression and tracking performance on parameter values not seen during training, while Fourier networks show greater extrapolation variability.

What carries the argument

Hypernetwork that accepts the scalar forcing parameter μ as input and produces the weights of the reinforcement-learning policy network used for spatial boundary feedback control.

If this is right

  • All three hypernetwork forms achieve robust stabilization of the chaotic PDE.
  • Kolmogorov-Arnold networks produce the most consistent energy-cascade suppression and tracking on unseen parametrizations.
  • Fourier hypernetworks show higher variability when applied to parameter values outside the training set.
  • Massively parallel environment ensembles permit a 37 percent reduction in wall-clock training time by trading a modest amount of final reward.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same hypernetwork pattern could be tested on other parametric chaotic systems such as Navier-Stokes or reaction-diffusion equations.
  • Combining the hypernetwork output with additional physics-based constraints on the policy weights might strengthen stability guarantees beyond empirical demonstration.
  • The pessimistic distributional critic may transfer to other high-variance reinforcement-learning tasks outside fluid control.

Load-bearing premise

The hypernetwork produces policy weights that remain stabilizing for both trained and unseen values of the forcing parameter, and the pessimistic distributional estimator reduces reward variance without adding bias that breaks the learned mapping.

What would settle it

Finding a new value of the forcing parameter for which the generated policy leaves the Kuramoto-Sivashinsky equation in uncontrolled chaotic growth.

Figures

Figures reproduced from arXiv: 2604.06497 by Anil Sapkota, Omer San.

Figure 1
Figure 1. Figure 1: Architecture of the parameter-conditioned hypernetwork topology (Adapted from [ [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Detailed internal structure of the Hypernetwork [ [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FastTD3 and TQC Optimization Workflow. 1,024 KS environments are simulated in parallel with [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Training, evaluation, and test reward domains across varying gradient-step (GS) configurations for the [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Direct multi-seed performance comparison of MLP, Fourier, and KAN encoders navigating the baseline [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparative RL learning dynamics across three core physical tasks: (a) zero-reference stabilization, (b) [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Physical representation of Case 1 (zero-reference control) via spatiotemporal contour fields [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Spacetime evaluations y(x, t) for Case 2, requiring the agent to continuously force the chaotic KS medium into a structured four-mode spatial oscillation. Columns compare an extrapolative forcing environment (µ = −0.25) against an interpolative case (µ = 0.1125). Here, the policy must not only halt runaway turbulence but intelligently distribute specific energy profiles matching the structural geometry of … view at source ↗
Figure 9
Figure 9. Figure 9: Spacetime evaluations y(x, t) for Case 3, combining four-mode geometric tracking with an explicitly enforced non-zero spatial background mean. This configuration tests the agents’ ability to maintain a prescribed standing wave pattern while simultaneously shifting the equilibrium state of the chaotic medium. Columns compare extrapolative (µ = −0.25) and interpolative (µ = 0.1125) forcing levels. While all … view at source ↗
read the original abstract

Spatiotemporal chaos in fluid systems exhibits severe parametric sensitivity, rendering classical adjoint-based optimal control intractable because each operating regime requires recomputing the control law. We address this bottleneck with hyperFastRL, a parameter-conditioned reinforcement learning framework that leverages Hypernetworks to shift from tuning isolated controllers per-regime to learning a unified parametric control manifold. By mapping a physical forcing parameter {\mu} directly to the weights of a spatial feedback policy, the architecture cleanly decouples parametric adaptation from spatial boundary stabilization. To overcome the extreme variance inherent to chaotic reward landscapes, we deploy a pessimistic distributional value estimation over a massively parallel environment ensemble. We evaluate three Hypernetwork functional forms, ranging from residual MLPs to periodic Fourier and Kolmogorov-Arnold (KAN) representations, on the Kuramoto-Sivashinsky equation under varying spatial forcing. All forms achieve robust stabilization. KAN yields the most consistent energy-cascade suppression and tracking across unseen parametrizations, while Fourier networks exhibit worse extrapolation variability. Furthermore, leveraging high-throughput parallelization allows us to intentionally trade a fraction of peak asymptotic reward for a 37% reduction in training wall-clock time, identifying an optimal operating regime for practical deployment in complex, parameter-varying chaotic PDEs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents HyperFastRL, a hypernetwork-based reinforcement learning framework for unified control of parametric chaotic PDEs such as the Kuramoto-Sivashinsky equation. It maps a scalar physical forcing parameter μ directly to the weights of a spatial feedback policy via hypernetworks (residual MLP, Fourier, and KAN forms), employs pessimistic distributional value estimation over parallel environments to handle chaotic reward variance, and claims that all three architectures achieve robust stabilization, with KAN providing the most consistent energy-cascade suppression and tracking on unseen parametrizations, plus a 37% reduction in training wall-clock time through high-throughput parallelization.

Significance. If the empirical claims are substantiated with quantitative metrics, the work offers a conceptually clean way to decouple parametric adaptation from spatial stabilization in chaotic systems, potentially reducing the need for per-regime controller recomputation in fluid and spatiotemporal chaos applications. The parallel training trade-off for speed is a practical contribution.

major comments (3)
  1. [Abstract] Abstract: The statements that 'all forms achieve robust stabilization' and that the approach yields a '37% reduction in training wall-clock time' are presented without any supporting quantitative metrics (e.g., time-averaged energy norms, stabilization times, L2 tracking errors), error bars, baseline comparisons to non-hypernetwork RL or per-μ controllers, or statistical details, leaving the central empirical claims only partially supported.
  2. [Results] Results (evaluation on KS equation): The claim that KAN yields the 'most consistent energy-cascade suppression and tracking across unseen parametrizations' while Fourier shows 'worse extrapolation variability' requires explicit closed-loop metrics (e.g., energy bounds or cascade spectra) for specific unseen μ values; without these, the generalization assertion cannot be assessed given the exponential divergence property of the KS equation under small policy perturbations.
  3. [Method] Method (pessimistic distributional estimator): No ablation study or analysis is provided to show that the pessimistic distributional value estimation reduces variance without introducing bias that could mask regime-specific destabilization on the learned weight manifold, which is load-bearing for the unseen-μ stabilization claim.
minor comments (2)
  1. [Abstract] The acronym 'hyperFastRL' is introduced in the title and abstract but not expanded or defined on first use.
  2. [Method] Notation for the hypernetwork mapping μ → policy weights would benefit from an explicit equation or schematic diagram in the methods to clarify the architecture variants.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, providing the strongest honest defense supported by the current work while committing to revisions where the concerns are valid and require additional evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statements that 'all forms achieve robust stabilization' and that the approach yields a '37% reduction in training wall-clock time' are presented without any supporting quantitative metrics (e.g., time-averaged energy norms, stabilization times, L2 tracking errors), error bars, baseline comparisons to non-hypernetwork RL or per-μ controllers, or statistical details, leaving the central empirical claims only partially supported.

    Authors: We acknowledge that the abstract, as a high-level summary, does not embed the full quantitative details present in the results section. The manuscript reports time-averaged energy norms, stabilization times, and L2 tracking errors across architectures in the evaluation on the KS equation, along with baseline comparisons to non-hypernetwork RL and per-μ controllers, with error bars from multiple random seeds. To directly address the concern, we will revise the abstract to include key quantitative highlights (e.g., specific energy reduction percentages and the 37% wall-clock reduction from the parallelization study) while referencing the supporting figures and tables. revision: yes

  2. Referee: [Results] Results (evaluation on KS equation): The claim that KAN yields the 'most consistent energy-cascade suppression and tracking across unseen parametrizations' while Fourier shows 'worse extrapolation variability' requires explicit closed-loop metrics (e.g., energy bounds or cascade spectra) for specific unseen μ values; without these, the generalization assertion cannot be assessed given the exponential divergence property of the KS equation under small policy perturbations.

    Authors: We agree that explicit per-μ closed-loop metrics are essential to substantiate generalization claims in a chaotic system like the KS equation. The results section already includes energy cascade spectra and tracking performance aggregated over unseen μ values, showing KAN's lower variability compared to Fourier networks. To strengthen verifiability, we will add a dedicated table in the revised manuscript with specific energy bounds, L2 errors, and stabilization times for individual unseen μ values (e.g., μ = 0.5 and μ = 1.5), including statistical details from repeated trials. This will allow direct evaluation against the exponential divergence concern. revision: yes

  3. Referee: [Method] Method (pessimistic distributional estimator): No ablation study or analysis is provided to show that the pessimistic distributional value estimation reduces variance without introducing bias that could mask regime-specific destabilization on the learned weight manifold, which is load-bearing for the unseen-μ stabilization claim.

    Authors: This is a fair and important point, as the pessimistic distributional estimator is key to managing chaotic reward variance and supporting the unseen-μ claims. The current manuscript motivates and describes the estimator but does not include a dedicated ablation. In the revision, we will incorporate an ablation study comparing the pessimistic approach against standard value estimation, reporting variance reduction metrics, bias indicators, and stabilization performance on unseen regimes to confirm it does not mask destabilization on the weight manifold. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical RL training on KS simulations yields stabilization results independent of method definition.

full rationale

The paper introduces a hypernetwork-conditioned RL controller for parametric KS equation stabilization, with claims resting on direct training runs, parallel environment ensembles, and post-training evaluation of energy suppression and tracking for seen/unseen μ values. No equations or steps reduce by construction to self-definitions, fitted inputs renamed as predictions, or load-bearing self-citations. The pessimistic distributional estimator and hypernetwork forms (MLP, Fourier, KAN) are architectural choices whose performance is measured externally via simulation metrics rather than being implied tautologically. The derivation chain is self-contained against the benchmark of chaotic PDE control tasks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 1 invented entities

The central claim rests on the empirical effectiveness of hypernetwork parameter conditioning and parallel pessimistic value estimation; these are validated only through simulation on one equation and are not derived from first principles.

free parameters (1)
  • Hypernetwork architecture hyperparameters
    Weights and structure of the hypernetworks are learned from data; specific values and selection criteria are not reported in the abstract.
axioms (2)
  • domain assumption The PDE control task can be cast as a Markov decision process with high-variance rewards
    Standard RL modeling assumption invoked to justify distributional estimation.
  • ad hoc to paper Pessimistic distributional value estimation mitigates variance without harming policy quality
    Introduced specifically to handle chaotic reward landscapes.
invented entities (1)
  • Unified parametric control manifold no independent evidence
    purpose: Single model that adapts to multiple forcing regimes
    Conceptual construct enabled by the hypernetwork mapping

pith-pipeline@v0.9.0 · 5522 in / 1457 out tokens · 97109 ms · 2026-05-10T18:04:53.386669+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

98 extracted references · 26 canonical work pages · 5 internal anchors

  1. [1]

    Edward Ott, Celso Grebogi, and James A. Yorke. Controlling chaos.Physical Review Letters, 64(11):1196–1199, 1990

  2. [2]

    Control of chaotic systems by deep reinforcement learning.Proceedings of the Royal Society A, 475(2231):20190351, 2019

    Michaël A Bucci, Onofrio Semeraro, Alexandre Allauzen, Ghislain Wischedel, Lionel Laurent, and Lionel Mathelin. Control of chaotic systems by deep reinforcement learning.Proceedings of the Royal Society A, 475(2231):20190351, 2019

  3. [3]

    A review on deep reinforcement learning for fluid mechanics.Computers & Fluids, 225:104973, 2021

    Paul Garnier, Julien Viquerat, Jean Rabault, Aurélien Larcher, Alexander Kuhnle, and Elie Hachem. A review on deep reinforcement learning for fluid mechanics.Computers & Fluids, 225:104973, 2021

  4. [4]

    Model-free reinforcement learning for pde-constrained control problems

    Guangxiang Zhu, Minghao Zhang, Honglak Lee, and Chongjie Zhang. Model-free reinforcement learning for pde-constrained control problems. arXiv preprint arXiv:2010.12142, 2020

  5. [5]

    Deep reinforcement learning in nonlinear dynamical systems and fluids

    Xiyao Wang, Junge Zhang, Wenzhen Huang, and Qiyue Yin. Deep reinforcement learning in nonlinear dynamical systems and fluids. arXiv preprint arXiv:2010.12914, 2020

  6. [6]

    Bewley, Parviz Moin, and Roger Temam

    Thomas R. Bewley, Parviz Moin, and Roger Temam. Dns-based predictive control of turbulence: an optimal benchmark for feedback algorithms.Journal of Fluid Mechanics, 447:179–225, 2001

  7. [7]

    John Kim and Thomas R. Bewley. A linear systems approach to flow control.Annual Review of Fluid Mechanics, 39:383–417, 2007. 18 hyperFastRLA PREPRINT

  8. [8]

    Henningson

    Shervin Bagheri, Luca Brandt, and Dan S. Henningson. Input-output analysis, model reduction and control of the flat-plate boundary layer.Journal of Fluid Mechanics, 620:263–298, 2009

  9. [9]

    Microelectromechanical systems-based feedback control of turbulence for skin friction reduction.Annual Review of Fluid Mechanics, 41:231–251, 2009

    Nobuhide Kasagi, Yuji Suzuki, and Koji Fukagata. Microelectromechanical systems-based feedback control of turbulence for skin friction reduction.Annual Review of Fluid Mechanics, 41:231–251, 2009

  10. [10]

    Hogberg, T

    M. Hogberg, T. R. Bewley, and D. S. Henningson. Relaminarization of Reτ = 100 turbulence using gain scheduling and linear state-feedback control.Physics of Fluids, 15(11):3572–3575, 2003

  11. [11]

    Sipp and P

    D. Sipp and P. Schmid. Closed-loop control of fluid flow: a review of linear approaches and tools for the stabilization of transitional flows.Aerospace Lab, 2013

  12. [12]

    Denis Sipp and Peter J. Schmid. Linear closed-loop control of fluid instabilities and noise-induced perturbations: A review of approaches and tools.Applied Mechanics Reviews, 68(2), 2016

  13. [13]

    Gregory and Mehmet N

    James W. Gregory and Mehmet N. Tomac. A review of fluidic oscillator development and application for flow control. In43rd Fluid Dynamics Conference. American Institute of Aeronautics and Astronautics, 2013

  14. [14]

    Jones, P

    Bryn Ll. Jones, P. H. Heins, E. C. Kerrigan, J. F. Morrison, and A. S. Sharma. Modelling for robust feedback control of fluid flows.Journal of Fluid Mechanics, 769:687–722, 2015

  15. [15]

    H. J. Tol, M. Kotsonis, C. C. de Visser, and B. Bamieh. Localised estimation and control of linear instabilities in two-dimensional wall-bounded shear flows.Journal of Fluid Mechanics, 824:818–865, 2017

  16. [16]

    Linear iterative method for closed-loop control of quasiperiodic flows.Journal of Fluid Mechanics, 868:26–65, 2019

    Colin Leclercq, Fabrice Demourant, Charles Poussot-Vassal, and Denis Sipp. Linear iterative method for closed-loop control of quasiperiodic flows.Journal of Fluid Mechanics, 868:26–65, 2019

  17. [17]

    Thibault L. B. Flinois and Aimee S. Morgans. Feedback control of unstable flows: a direct modelling approach using the eigensystem realisation algorithm.Journal of Fluid Mechanics, 793:41–78, 2016

  18. [18]

    Rusu, Joel Veness, Marc G

    V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human- level control through deep reinforcement...

  19. [19]

    Asynchronous methods for deep reinforcement learning

    V olodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. InInternational Conference on Machine Learning (ICML), pages 1928–1937. PMLR, 2016

  20. [20]

    Lillicrap, Jonathan J

    Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. InInternational Conference on Learning Representations (ICLR), 2016

  21. [21]

    Proximal Policy Optimization Algorithms

    John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

  22. [22]

    Addressing function approximation error in actor-critic methods

    Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. International Conference on Machine Learning (ICML), pages 1587–1596, 2018

  23. [23]

    Schulman, X

    J. Schulman, X. Chen, and P. Abbeel. A unified framework for policy evaluation and improvement in reinforcement learning.arXiv preprint arXiv:2205.09876, 2022

  24. [24]

    A survey on zero-knowledge machine learning,

    X. Li and Q. Liu. Why off-policy breaks reinforcement learning: An sga-based analysis framework.arXiv preprint arXiv:2501.01234, 2025

  25. [25]

    Controlling overestimation bias with truncated mixture of continuous distributional quantile critics.arXiv preprint arXiv:2005.04269, 2020

    Arsenii Kuznetsov, Pavel Shvechikov, Alexander Grishin, and Dmitry Kuzovkin. Controlling overestimation bias with truncated mixture of continuous distributional quantile critics.arXiv preprint arXiv:2005.04269, 2020

  26. [26]

    Stabilizing off-policy q-learning via bootstrapping error reduction

    Aviral Kumar, Justin Fu, Matthew Soh, George Tucker, and Sergey Levine. Stabilizing off-policy q-learning via bootstrapping error reduction. InAdvances in Neural Information Processing Systems (NeurIPS), volume 32, 2019

  27. [27]

    Conservative q-learning for offline reinforcement learning

    Aviral Kumar, Aurick Zhou, George Tucker, and Sergey Levine. Conservative q-learning for offline reinforcement learning. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

  28. [28]

    Behavior Regularized Offline Reinforcement Learning

    Yifan Wu, George Tucker, and Ofir Nachum. Behavior regularized offline reinforcement learning.arXiv preprint arXiv:1911.11361, 2019

  29. [29]

    Optimal chaos control through reinforcement learning.Chaos, 9(3):775–788, 1999

    Sabino Gadaleta and Gerhard Dangelmayr. Optimal chaos control through reinforcement learning.Chaos, 9(3):775–788, 1999

  30. [30]

    Gadaleta and G

    S. Gadaleta and G. Dangelmayr. Reinforcement learning chaos control using value sensitive vector-quantization. InReinforcement learning chaos control using value sensitive vector-quantization, 2001. 19 hyperFastRLA PREPRINT

  31. [31]

    Restoring chaos using deep reinforcement learning.Chaos, 30(3):031102, 2020

    Sumit Vashishtha and Siddhartha Verma. Restoring chaos using deep reinforcement learning.Chaos, 30(3):031102, 2020

  32. [32]

    Model-free control of chaos with continuous deep q-learning

    Junya Ikemoto and Toshimitsu Ushio. Model-free control of chaos with continuous deep q-learning. arXiv preprint arXiv:1907.07775, 2019

  33. [33]

    Modulating chaos in spatiotemporal systems based on deep reinforcement learning.International Journal of Dynamics and Control, 13(11), 2025

    Yanyan Han, Jiyuan Pan, and Youming Lei. Modulating chaos in spatiotemporal systems based on deep reinforcement learning.International Journal of Dynamics and Control, 13(11), 2025

  34. [34]

    Adaptive rollout length for model- based rl using model-free deep rl.arXiv preprint arXiv:2206.02380, 2022

    Abhinav Bhatia, Philip S. Thomas, and Shlomo Zilberstein. Reinforcement learning for scientific control and pde systems. arXiv preprint arXiv:2206.02380, 2022

  35. [35]

    Control and anti-control of chaos based on the moving largest lyapunov exponent using reinforcement learning.Physica D: Nonlinear Phenomena, 2021

    Yanyan Han, Jianpeng Ding, Lin Du, and Youming Lei. Control and anti-control of chaos based on the moving largest lyapunov exponent using reinforcement learning.Physica D: Nonlinear Phenomena, 2021

  36. [36]

    Froehlich, Maksym Lefarov, Melanie N

    Lukas P. Froehlich, Maksym Lefarov, Melanie N. Zeilinger, and Felix Berkenkamp. Deep reinforcement learning for complex dynamical-system control. arXiv preprint arXiv:2110.07985, 2021

  37. [37]

    Reinforcement learning of chaotic systems control in partially observable environments.Flow, Turbulence and Combustion, 115(3):1357–1378, 2025

    Max Weissenbacher, Anastasia Borovykh, and Georgios Rigas. Reinforcement learning of chaotic systems control in partially observable environments.Flow, Turbulence and Combustion, 115(3):1357–1378, 2025

  38. [38]

    David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel, and Demis Hassabis. Mastering the g...

  39. [39]

    Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control.arXiv preprint arXiv:2505.22642, 2025

    Younggyo Seo, Carmelo Sferrazza, Haoran Geng, Michal Nauman, Zhao-Heng Yin, and Pieter Abbeel. Fasttd3: Simple, fast, and capable reinforcement learning for humanoid control.arXiv preprint arXiv:2505.22642, 2025

  40. [40]

    Mbrl-mc: An hvac control approach via combining model-based deep reinforcement learning and model predictive control.IEEE Internet of Things Journal, 2022

    Liangliang Chen, Fei Meng, and Ying Zhang. Mbrl-mc: An hvac control approach via combining model-based deep reinforcement learning and model predictive control.IEEE Internet of Things Journal, 2022

  41. [41]

    Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

    Jiequn Han, Arnulf Jentzen, and Weinan E. Solving high-dimensional partial differential equations using deep learning.Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018

  42. [42]

    Weinan E, Jiequn Han, and Arnulf Jentzen. Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Communications in Mathematics and Statistics, 5(4):349–380, 2017

  43. [43]

    Learning to fly

    Jie Xu, Tao Du, Michael Foshey, Beichen Li, Bo Zhu, Adriana Schulz, and Wojciech Matusik. Learning to fly. ACM Transactions on Graphics, 38(4):1–12, 2019

  44. [44]

    Vignon, J

    C. Vignon, J. Rabault, and R. Vinuesa. Recent advances in applying deep reinforcement learning for flow control: Perspectives and future directions.Physics of Fluids, 35(3), 2023

  45. [45]

    Morphing airfoils with four morphing parameters

    Amanda Lampton, Adam Niksch, and John Valasek. Morphing airfoils with four morphing parameters. InAIAA Guidance, Navigation and Control Conference and Exhibit, pages 2008–7282, 2008

  46. [46]

    Deep reinforcement learning trading with cumulative prospect theory and truncated quantile critics

    Jonathan Foo, Benny Lesmana, and Chi Seng Pun. Deep reinforcement learning trading with cumulative prospect theory and truncated quantile critics. InProceedings of the 4th ACM International Conference on AI in Finance (ICAIF), 2023

  47. [47]

    Brunton, and Kunihiko Taira

    Sebastian Peitz, Jan Stenner, Vikas Chidananda, Oliver Wallscheid, Steven L. Brunton, and Kunihiko Taira. Learning-based flow control and scientific machine learning perspectives. arXiv preprint arXiv:2301.10737, 2023

  48. [48]

    Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control.Journal of Fluid Mechanics, 865:281–302, 2019

    Jean Rabault, Mariusz Kuchta, Anders Jensen, Ulysse Réglade, and Nicolò Cerardi. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control.Journal of Fluid Mechanics, 865:281–302, 2019

  49. [49]

    Triantafyllou, and George E

    David Fan, Lei Yang, Ziyuan Wang, Michael S. Triantafyllou, and George E. Karniadakis. Reinforcement learning for bluff body active flow control in experiments and simulations.Proceedings of the National Academy of Sciences, 117:26091–26098, 2020

  50. [50]

    Applying deep reinforcement learning to active flow control in weakly turbulent conditions.Physics of Fluids, 33(3), 2021

    Feng Ren, Jean Rabault, and Hui Tang. Applying deep reinforcement learning to active flow control in weakly turbulent conditions.Physics of Fluids, 33(3), 2021

  51. [51]

    Deep reinforcement learning for turbulent drag reduction in channel flows.The European Physical Journal E, 46(4), 2023

    Luca Guastoni, Jean Rabault, Philipp Schlatter, Hossein Azizpour, and Ricardo Vinuesa. Deep reinforcement learning for turbulent drag reduction in channel flows.The European Physical Journal E, 46(4), 2023

  52. [52]

    Exploring Model-based Planning with Policy Networks

    Tingwu Wang and Jimmy Ba. Reinforcement learning methods for active flow control. arXiv preprint arXiv:1906.08649, 2019. 20 hyperFastRLA PREPRINT

  53. [53]

    Chaos suppression through chaos enhancement.Nonlinear Dynamics, 2024

    Lin Li, Jizhou Li, and Takemasa Miyoshi. Chaos suppression through chaos enhancement.Nonlinear Dynamics, 2024

  54. [54]

    Controlling chaos based on state-mapping network and deep reinforcement learning.Nonlinear Dynamics, 2025

    Tongtao Liu and Yongping Zhang. Controlling chaos based on state-mapping network and deep reinforcement learning.Nonlinear Dynamics, 2025

  55. [55]

    Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach.Physics of Fluids, 31(9), 2019

    Jean Rabault and Alexander Kuhnle. Accelerating deep reinforcement learning strategies of flow control through a multi-environment approach.Physics of Fluids, 31(9), 2019

  56. [56]

    Deep reinforcement learning for computational fluid dynamics on hpc systems.Journal of Computational Science, 65:101884, 2022

    Marius Kurz, Philipp Offenhauser, Dominic Viola, Oleksandr Shcherbakov, Michael Resch, and Andrea Beck. Deep reinforcement learning for computational fluid dynamics on hpc systems.Journal of Computational Science, 65:101884, 2022

  57. [57]

    Qiulei Wang, Lei Yan, Gang Hu, Chao Li, Yiqing Xiao, Hao Xiong, Jean Rabault, and Bernd R. Noack. Drlinfluids: An open-source python platform of coupling deep reinforcement learning and openfoam.Physics of Fluids, 34(8), 2022

  58. [58]

    Linot, and Michael D

    Kevin Zeng, Alec J. Linot, and Michael D. Graham. Data-driven control of spatiotemporal chaos with reduced- order neural ode-based models and reinforcement learning.Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 479(2269):20220297, 2023

  59. [59]

    Kevin Zeng and Michael D. Graham. Symmetry reduction for deep reinforcement learning active control of chaotic spatiotemporal dynamics.Physical Review E, 104(1), 2021

  60. [60]

    Numerical evidence for sample efficiency of model-based over model-free reinforcement learning control of partial differential equations

    Stefan Werner and Sebastian Peitz. Numerical evidence for sample efficiency of model-based over model-free reinforcement learning control of partial differential equations. InEuropean Control Conference (ECC), 2024

  61. [61]

    Sample-efficient reinforcement learning of koopman enmpc

    Daniel Mayfrank, Mehmet Velioglu, Alexander Mitsos, and Manuel Dahmen. Sample-efficient reinforcement learning of koopman enmpc. arXiv preprint arXiv:2503.18787, 2025

  62. [62]

    Linot and Michael D

    Alec J. Linot and Michael D. Graham. Data-driven reduced-order modeling of spatiotemporal chaos with neural ordinary differential equations.Chaos, 32(7):073110, 2022

  63. [63]

    Reconstruction, forecasting, and stability of chaotic dynamics from partial data.Chaos, 33(9):093107, 2023

    Elise Ozalp, Georgios Margazoglou, and Luca Magri. Reconstruction, forecasting, and stability of chaotic dynamics from partial data.Chaos, 33(9):093107, 2023

  64. [64]

    Data-driven modeling and forecasting of chaotic dynamics on inertial manifolds constructed as spectral submanifolds.Chaos, 34(3):033140, 2024

    Aihui Liu, Joar Axas, and George Haller. Data-driven modeling and forecasting of chaotic dynamics on inertial manifolds constructed as spectral submanifolds.Chaos, 34(3):033140, 2024

  65. [65]

    Vincent Sitzmann, Julien N. P. Martel, Alexander W. Bergman, David B. Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions. InAdvances in Neural Information Processing Systems (NeurIPS), volume 33, pages 7462–7473, 2020

  66. [66]

    Effective control of two-dimensional rayleigh–benard convection: invariant multi-agent reinforcement learning is all you need

    Colin Vignon, Jean Rabault, Joel Vasanth, Francisco Alcantara-Avila, Mikael Mortensen, and Ricardo Vinuesa. Effective control of two-dimensional rayleigh–benard convection: invariant multi-agent reinforcement learning is all you need. arXiv preprint arXiv:2304.02370, 2023

  67. [67]

    Brunton, and Kunihiko Taira

    Sebastian Peitz, Jan Stenner, Vikas Chidananda, Oliver Wallscheid, Steven L. Brunton, and Kunihiko Taira. Distributed control of partial differential equations using convolutional reinforcement learning.Physica D: Nonlinear Phenomena, 461:134096, 2024

  68. [68]

    Active flow control on airfoils by reinforcement learning.Ocean Engineering, 287:115775, 2023

    Koldo Portal-Porras, Unai Fernandez-Gamiz, Ekaitz Zulueta, Roberto Garcia-Fernandez, and Saioa Etxebar- ria Berrizbeitia. Active flow control on airfoils by reinforcement learning.Ocean Engineering, 287:115775, 2023

  69. [69]

    Hyperl: Parameter-informed reinforcement learning for parametric pdes.arXiv preprint arXiv:2501.04538, 2025

    Nicolò Botteghi, Stefania Fresca, Mengwu Guo, and Andrea Manzoni. Hyperl: Parameter-informed reinforcement learning for parametric pdes.arXiv preprint arXiv:2501.04538, 2025

  70. [70]

    Learning a model is paramount for sample efficiency in reinforcement learning control of pdes

    Stefan Werner and Sebastian Peitz. Learning a model is paramount for sample efficiency in reinforcement learning control of pdes. arXiv preprint arXiv:2302.07160, 2023

  71. [71]

    HyperNetworks

    David Ha, Andrew Dai, and Quoc V Le. Hypernetworks.arXiv preprint arXiv:1609.09106, 2016

  72. [72]

    Recomposing the reinforcement learning building blocks with hypernetworks

    Shai Keynan, Elad Sarafian, and Sarit Kraus. Recomposing the reinforcement learning building blocks with hypernetworks. InProceedings of the 38th International Conference on Machine Learning, pages 9301–9312. PMLR, 2021

  73. [73]

    Fourth-order time-stepping for stiff pdes.SIAM Journal on Scientific Computing, 26(4):1214–1233, 2005

    Aly-Khan Kassam and Lloyd N Trefethen. Fourth-order time-stepping for stiff pdes.SIAM Journal on Scientific Computing, 26(4):1214–1233, 2005

  74. [74]

    Relexi — a scalable open source reinforcement learning framework for high-performance computing.Software Impacts, 14:100422, 2022

    Marius Kurz, Philipp Offenhauser, Dominic Viola, Michael Resch, and Andrea Beck. Relexi — a scalable open source reinforcement learning framework for high-performance computing.Software Impacts, 14:100422, 2022. 21 hyperFastRLA PREPRINT

  75. [75]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2nd edition, 2018

  76. [76]

    Bellemare, Will Dabney, and Rémi Munos

    Marc G. Bellemare, Will Dabney, and Rémi Munos. A distributional perspective on reinforcement learning. In International Conference on Machine Learning (ICML), 2017

  77. [77]

    Bellemare, and Rémi Munos

    Will Dabney, Mark Rowland, Marc G. Bellemare, and Rémi Munos. Distributional reinforcement learning with quantile regression. InAAAI Conference on Artificial Intelligence, 2018

  78. [78]

    Kerrigan, and Georgios Rigas

    Chengwei Xia, Junjie Zhang, Eric C. Kerrigan, and Georgios Rigas. Active flow control for bluff body drag reduction using reinforcement learning with partial measurements.Journal of Fluid Mechanics, 981:A17, 2024

  79. [79]

    org/abs/2409.10164

    Nicolai Dorka. Quantile regression for distributional reward models in rlhf.arXiv preprint arXiv:2409.10164, 2024

  80. [80]

    On the spectral bias of neural networks

    Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred Hamprecht, Yoshua Bengio, and Aaron Courville. On the spectral bias of neural networks. InInternational Conference on Machine Learning (ICML), pages 5301–5310. PMLR, 2019

Showing first 80 references.