Koopman-Assisted Reinforcement Learning
Pith reviewed 2026-05-24 02:47 UTC · model grok-4.3
The pith
The controlled Koopman tensor linearizes value function evolution so that soft actor-critic and value iteration become tractable for nonlinear systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By constructing a controlled Koopman tensor from data, the method reformulates soft value iteration and soft actor-critic to estimate the optimal value function using linear dynamics in lifted coordinates, achieving state-of-the-art performance on a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
What carries the argument
The controlled Koopman tensor, which parameterizes the data-driven Koopman operator by control actions so that the expected time evolution of the value function is captured by linear dynamics.
If this is right
- The framework covers deterministic and stochastic systems as well as discrete and continuous dynamics.
- Reformulated soft actor-critic exceeds traditional neural-network soft actor-critic on the four tested systems.
- Value-function estimation reduces to linear operations once the system is lifted by the controlled Koopman tensor.
- The same tensor construction supports both soft value iteration and soft actor-critic.
Where Pith is reading between the lines
- The linear structure in lifted space could make learned policies easier to analyze or verify than black-box neural policies.
- The method might combine with other data-driven linearization techniques for hybrid control algorithms on robotic systems.
- Performance on fluid and chaotic examples suggests the tensor could scale to additional physical domains with similar lifting properties.
Load-bearing premise
The Koopman operator, once parameterized by control actions into a controlled Koopman tensor, accurately captures the expectation of the time evolution of the value function for the systems considered.
What would settle it
Running the Koopman-assisted soft actor-critic on the fluid flow past a cylinder or Lorenz system and finding no performance improvement over standard neural-network soft actor-critic would falsify the state-of-the-art claim.
Figures
read the original abstract
The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman equation, are ubiquitous in reinforcement learning and control theory. However, these equations become intractable for high-dimensional or nonlinear systems. This paper develops two new reinforcement learning algorithms based on the data-driven Koopman operator, which lifts a nonlinear system into new coordinates where the dynamics become approximately linear, and where Hamilton-Jacobi-Bellman-based methods are more tractable. In particular, the Koopman operator captures the expectation of the time evolution of the value function via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``controlled Koopman tensor'' that facilitates the estimation of the optimal value function. This enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic. This flexible and interpretable framework includes deterministic and stochastic systems, as well as discrete and continuous dynamics. Koopman Assisted reinforcement learning attains state-of-the-art performance with respect to traditional neural network-based soft actor-critic baselines on a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces two max-entropy RL algorithms (Koopman-assisted soft value iteration and soft actor-critic) that construct a data-driven controlled Koopman tensor by parameterizing the Koopman operator with control inputs. This tensor is used to obtain linear dynamics in lifted coordinates that approximate the expectation of the value-function evolution under the Bellman operator, thereby making HJB-based methods tractable for nonlinear and stochastic systems. The central empirical claim is that the resulting algorithms attain state-of-the-art performance relative to standard neural-network soft actor-critic baselines on a linear state-space system, the Lorenz attractor, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
Significance. If the controlled Koopman tensor accurately recovers (or bounds) the controlled expectation of the value function after finite lifting and data-driven fitting, the framework supplies an interpretable, potentially lower-dimensional alternative to deep RL that inherits linear-algebraic tools while retaining the max-entropy objective. The explicit handling of both deterministic/stochastic and discrete/continuous cases, together with the reported outperformance on the four benchmark systems, would constitute a concrete advance for systems where suitable observables exist.
major comments (3)
- [§3.2] §3.2 (Controlled Koopman tensor construction): the claim that the tensor 'facilitates the estimation of the optimal value function' by capturing the expectation of the time evolution under the Bellman operator is load-bearing for all four experimental systems, yet the manuscript provides neither an a-priori error bound on the lifted approximation nor a quantitative residual analysis of ||K_u V - E[V(x_{t+1}) | u]|| for the stochastic double-well and cylinder-flow cases.
- [§4.3, Table 3] §4.3 and Table 3 (Lorenz and cylinder results): the reported SOTA margins are obtained after finite data-driven fitting of the tensor; without an ablation on observable choice, lifting dimension, or tensor rank, it is impossible to determine whether the performance gain is attributable to the Koopman linearization or to implicit regularization that the neural SAC baseline does not receive.
- [§5.1] §5.1 (Soft actor-critic reformulation): the actor update is derived under the assumption that the controlled tensor yields exact linear dynamics for the soft value function; any mismatch between the tensor action and the true controlled expectation propagates directly into the policy gradient, yet no sensitivity analysis or robustness check against tensor approximation error is supplied.
minor comments (3)
- [§3.2] Notation for the controlled tensor K_u is introduced without an explicit statement of its dimensions or the precise least-squares objective used for its data-driven estimation.
- [Figure 4] Figure 4 (double-well trajectories) lacks error bars or multiple random seeds, making it difficult to assess statistical significance of the reported improvement.
- [Abstract and §4] The abstract states 'state-of-the-art performance' but the main text compares only against NN-SAC; a brief discussion of other Koopman or linear-embedding baselines would strengthen the positioning.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the controlled Koopman tensor framework. We address each major point below and outline revisions to strengthen the empirical and robustness aspects of the manuscript.
read point-by-point responses
-
Referee: [§3.2] the claim that the tensor 'facilitates the estimation of the optimal value function' by capturing the expectation of the time evolution under the Bellman operator is load-bearing for all four experimental systems, yet the manuscript provides neither an a-priori error bound on the lifted approximation nor a quantitative residual analysis of ||K_u V - E[V(x_{t+1}) | u]|| for the stochastic double-well and cylinder-flow cases.
Authors: We agree that quantitative residual analysis would strengthen validation of the approximation quality. Deriving a general a-priori error bound for arbitrary nonlinear stochastic systems is challenging without further assumptions on observables and is left for future work. In revision we will add explicit residual computations ||K_u V - E[V(x_{t+1}) | u]|| for the double-well and cylinder cases. revision: partial
-
Referee: [§4.3, Table 3] the reported SOTA margins are obtained after finite data-driven fitting of the tensor; without an ablation on observable choice, lifting dimension, or tensor rank, it is impossible to determine whether the performance gain is attributable to the Koopman linearization or to implicit regularization that the neural SAC baseline does not receive.
Authors: We concur that systematic ablations would help isolate the contribution of the Koopman linearization. The revised manuscript will include additional results varying lifting dimension and tensor rank (with discussion of observable selection) on the Lorenz and cylinder benchmarks. revision: yes
-
Referee: [§5.1] the actor update is derived under the assumption that the controlled tensor yields exact linear dynamics for the soft value function; any mismatch between the tensor action and the true controlled expectation propagates directly into the policy gradient, yet no sensitivity analysis or robustness check against tensor approximation error is supplied.
Authors: We will add a sensitivity study in the revision that perturbs the fitted tensor entries and reports resulting changes in policy performance and value estimates, thereby quantifying robustness to approximation error. revision: yes
- Deriving a general a-priori error bound on the lifted approximation for arbitrary nonlinear stochastic systems without additional assumptions.
Circularity Check
No significant circularity; derivation builds on independent Koopman theory and standard RL
full rationale
The paper constructs a controlled Koopman tensor from data-driven approximation of the Koopman operator to linearize value-function evolution for reformulating soft value iteration and actor-critic. This relies on established Koopman lifting (not self-defined here) and standard Bellman operators; performance is evaluated on external benchmark systems (Lorenz, cylinder flow, etc.) rather than reducing any claimed result to a fitted quantity defined by the same equations. No self-citation chains, ansatz smuggling, or fitted-input-as-prediction patterns appear in the derivation.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The Koopman operator can be parameterized with control actions to form a controlled Koopman tensor that captures the expectation of the time evolution of the value function.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By parameterizing the Koopman operator with the control actions, we construct a 'controlled Koopman tensor' that facilitates the estimation of the optimal value function... Kuϕ(x)=M(ψ(u)⊗ϕ(x))
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Vπ(x)=r(x,u)+γKVπ(x) ... reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
SALSA-RL: Stability Analysis in the Latent Space of Actions for Reinforcement Learning
SALSA-RL introduces latent-space stability analysis for actions of pretrained RL agents using encoder-decoder and state-dependent linear dynamics to enable non-invasive interpretability.
Reference graph
Works this paper leans on
-
[1]
Reinforcement learning: An introduction, volume 1
Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998
work page 1998
-
[2]
Data-driven science and engineering: Machine learning, dynamical systems, and control
Steven L Brunton and J Nathan Kutz. Data-driven science and engineering: Machine learning, dynamical systems, and control. Cambridge University Press, 2022
work page 2022
-
[3]
Continuous control with deep reinforcement learning
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arxiv:1509.02971, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[4]
Asynchronous methods for deep reinforcement learning
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lilli- crap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In ICML, pages 1928–1937. PMLR, 2016
work page 1928
-
[5]
Deep reinforcement learning with double q-learning
Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, 2016
work page 2016
-
[6]
Dueling network architectures for deep reinforcement learning
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado Hasselt, Marc Lanctot, and Nando Freitas. Dueling network architectures for deep reinforcement learning. In International conference on machine learning, pages 1995–2003. PMLR, 2016
work page 1995
-
[7]
Sudharsan Ravichandiran. Hands-on reinforcement learning with Python: master reinforcement and deep reinforcement learning using OpenAI gym and tensorFlow. Packt Publishing Ltd, 2018
work page 2018
-
[8]
Rainbow: Combining improve- ments in deep reinforcement learning
Matteo Hessel, Joseph Modayil, Hado Van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, and David Silver. Rainbow: Combining improve- ments in deep reinforcement learning. In Thirty-second AAAI conference on artificial intelligence, 2018
work page 2018
-
[9]
Shared Autonomy via Deep Reinforcement Learning
Siddharth Reddy, Anca D Dragan, and Sergey Levine. Shared autonomy via deep reinforce- ment learning. arxiv:1802.01744, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[10]
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529, 2015
work page 2015
-
[11]
Grandmaster level in starcraft ii using multi-agent reinforcement learning
Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Micha¨el Mathieu, Andrew Dudzik, Junyoung Chung, David H Choi, Richard Powell, , et al. Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019
work page 2019
-
[12]
Mastering the game of go with deep neural networks and tree search
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, et al. Mastering the game of go with deep neural networks and tree search. nature, 529(7587):484–489, 2016
work page 2016
-
[13]
Mastering the game of go without human knowledge
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017
work page 2017
-
[14]
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, et al. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144, 2018
work page 2018
-
[15]
Deep reinforcement learning for de novo drug design
Mariya Popova, Olexandr Isayev, and Alexander Tropsha. Deep reinforcement learning for de novo drug design. Science advances, 4(7):eaap7885, 2018
work page 2018
-
[16]
Deep reinforcement learning 30 for robotic manipulation with asynchronous off-policy updates
Shixiang Gu, Ethan Holly, Timothy Lillicrap, and Sergey Levine. Deep reinforcement learning 30 for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3389–3396. IEEE, 2017
work page 2017
-
[17]
Deep rein- forcement learning framework for autonomous driving
Ahmad EL Sallab, Mohammed Abdou, Etienne Perot, and Senthil Yogamani. Deep rein- forcement learning framework for autonomous driving. Electronic Imaging, 2017(19):70–76, 2017
work page 2017
-
[18]
Champion-level drone racing using deep reinforcement learning
Elia Kaufmann, Leonard Bauersfeld, Antonio Loquercio, Matthias M ¨uller, Vladlen Koltun, and Davide Scaramuzza. Champion-level drone racing using deep reinforcement learning. Nature, 620(7976):982–987, 2023
work page 2023
-
[19]
Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers
Mattia Gazzola, Babak Hejazialhosseini, and Petros Koumoutsakos. Reinforcement learning and wavelet adapted vortex methods for simulations of self-propelled swimmers. SIAM Journal on Scientific Computing, 36(3):B622–B639, 2014
work page 2014
-
[20]
Flow navigation by smart microswimmers via reinforcement learning
Simona Colabrese, Kristian Gustavsson, Antonio Celani, and Luca Biferale. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett., 118(15):158004, 2017
work page 2017
-
[21]
Efficient collective swimming by harnessing vortices through deep reinforcement learning
Siddhartha Verma, Guido Novati, and Petros Koumoutsakos. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proceedings of the National Academy of Sciences, 115(23):5849–5854, 2018
work page 2018
-
[22]
Controlled gliding and perching through deep-reinforcement-learning
Guido Novati, Lakshminarayanan Mahadevan, and Petros Koumoutsakos. Controlled gliding and perching through deep-reinforcement-learning. Physical Review Fluids, 4(9):093902, 2019
work page 2019
-
[23]
Luca Biferale, Fabio Bonaccorso, Michele Buzzicotti, Patricio Clark Di Leoni, and Kristian Gustavsson. Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos, 29(10):103138, 2019
work page 2019
-
[24]
Reinforcement learning for bluff body active flow control in experiments and simulations
Dixia Fan, Liu Yang, Zhicheng Wang, Michael S Triantafyllou, and George Em Karniadakis. Reinforcement learning for bluff body active flow control in experiments and simulations. Proceedings of the National Academy of Sciences, 117(42):26091–26098, 2020
work page 2020
-
[25]
Scientific multi-agent reinforcement learning for wall-models of turbulent flows
H Jane Bae and Petros Koumoutsakos. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nature Communications, 13(1):1443, 2022
work page 2022
-
[26]
Mag- netic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. Mag- netic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414– 419, 2022
work page 2022
-
[27]
Hamiltonian systems and transformation in hilbert space
Bernard O Koopman. Hamiltonian systems and transformation in hilbert space. Proceedings of the national academy of sciences of the united states of america, 17(5):315, 1931
work page 1931
-
[28]
Dynamical systems of continuous spectra
Bernard O Koopman and J v Neumann. Dynamical systems of continuous spectra. Proceedings of the National Academy of Sciences, 18(3):255–263, 1932
work page 1932
-
[29]
Comparison of systems with complex behavior
Igor Mezi´c and Andrzej Banaszuk. Comparison of systems with complex behavior. Physica D: Nonlinear Phenomena, 197(1):101–133, 2004
work page 2004
-
[30]
Spectral properties of dynamical systems, model reduction and decompositions
Igor Mezi´c. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear Dynamics, 41(1-3):309–325, 2005
work page 2005
-
[31]
Marko Budiˇsi´c, Ryan Mohr, and Igor Mezi´c. Applied Koopmanism a). Chaos: An Interdisci- plinary Journal of Nonlinear Science, 22(4):047510, 2012
work page 2012
-
[32]
Analysis of fluid flows via spectral properties of the Koopman operator
Igor Mezic. Analysis of fluid flows via spectral properties of the Koopman operator. Annual Review of Fluid Mechanics, 45:357–378, 2013
work page 2013
-
[33]
Modern Koopman theory 31 for dynamical systems
Steven L Brunton, Marko Budiˇsi´c, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory 31 for dynamical systems. SIAM Review, 64(2):229–340, 2022
work page 2022
-
[34]
C. W. Rowley, I. Mezic, S. Bagheri, P . Schlatter, and D.S. Henningson. Spectral analysis of nonlinear flows. J. Fluid Mech., 645:115–127, 2009
work page 2009
-
[35]
Dynamic mode decomposition of numerical and experimental data
Peter J Schmid. Dynamic mode decomposition of numerical and experimental data. Journal of fluid mechanics, 656:5–28, 2010
work page 2010
-
[36]
J. H. Tu, C. W. Rowley, D. M. Luchtenburg, S. L. Brunton, and J. N. Kutz. On dynamic mode decomposition: theory and applications. Journal of Computational Dynamics, 1(2):391–421, 2014
work page 2014
-
[37]
J. N. Kutz, S. L. Brunton, B. W. Brunton, and J. L. Proctor. Dynamic Mode Decomposition: Data-Driven Modeling of Complex Systems. SIAM, 2016
work page 2016
-
[38]
Variable projection methods for an optimized dynamic mode decomposition
Travis Askham and J Nathan Kutz. Variable projection methods for an optimized dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 17(1):380–416, 2018
work page 2018
-
[39]
Consistent dynamic mode decomposition
Omri Azencot, Wotao Yin, and Andrea Bertozzi. Consistent dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 18(3):1565–1585, 2019
work page 2019
-
[40]
A data-driven ap- proximation of the Koopman operator: extending dynamic mode decomposition
Matthew O Williams, Ioannis G Kevrekidis, and Clarence W Rowley. A data-driven ap- proximation of the Koopman operator: extending dynamic mode decomposition. Journal of Nonlinear Science, 6:1307–1346, 2015
work page 2015
-
[41]
A kernel approach to data-driven Koopman spectral analysis
Matthew O Williams, Clarence W Rowley, and Ioannis G Kevrekidis. A kernel approach to data-driven Koopman spectral analysis. Journal of Computational Dynamics, 2(2):247–265, 2015
work page 2015
-
[42]
Extended dynamic mode decomposition with learned Koopman eigenfunctions for prediction and control
Carl Folkestad, Daniel Pastor, Igor Mezic, Ryan Mohr, Maria Fonoberova, and Joel Burdick. Extended dynamic mode decomposition with learned Koopman eigenfunctions for prediction and control. In 2020 american control conference (acc), pages 3906–3913. IEEE, 2020
work page 2020
-
[43]
The mpedmd algorithm for data-driven computations of measure- preserving dynamical systems
Matthew J Colbrook. The mpedmd algorithm for data-driven computations of measure- preserving dynamical systems. SIAM Journal on Numerical Analysis, 61(3):1585–1608, 2023
work page 2023
-
[44]
Matthew J Colbrook, Qin Li, Ryan V Raut, and Alex Townsend. Beyond expectations: residual dynamic mode decomposition and variance for stochastic dynamical systems. Nonlinear Dynamics, pages 1–25, 2023
work page 2023
-
[45]
Residual dynamic mode decomposition: robust and verified koopmanism
Matthew J Colbrook, Lorna J Ayton, and M´at´e Sz˝oke. Residual dynamic mode decomposition: robust and verified koopmanism. Journal of Fluid Mechanics, 955:A21, 2023
work page 2023
-
[46]
Rigorous data-driven computation of spectral properties of koopman operators for dynamical systems
Matthew J Colbrook and Alex Townsend. Rigorous data-driven computation of spectral properties of koopman operators for dynamical systems. Communications on Pure and Applied Mathematics, 77(1):221–283, 2024
work page 2024
-
[47]
Data-driven model reduction and transfer operator approximation
Stefan Klus, Feliks N ¨uske, P´eter Koltai, Hao Wu, Ioannis Kevrekidis, Christof Sch ¨utte, and Frank No´e. Data-driven model reduction and transfer operator approximation. Journal of Nonlinear Science, 28:985–1010, 2018
work page 2018
-
[48]
Stefan Klus, Feliks N ¨uske, Sebastian Peitz, Jan-Hendrik Niemann, Cecilia Clementi, and Christof Sch ¨utte. Data-driven approximation of the Koopman generator: Model reduction, system identification, and control. Physica D: Nonlinear Phenomena, 406:132416, 2020
work page 2020
-
[49]
Dynamic mode decomposition with control
Joshua L Proctor, Steven L Brunton, and J Nathan Kutz. Dynamic mode decomposition with control. SIAM Journal on Applied Dynamical Systems, 15(1):142–161, 2016
work page 2016
-
[51]
Data-driven discovery of Koopman eigenfunctions for control
Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of Koopman eigenfunctions for control. Machine Learning: Science and Technology, 2(3):035023, 2021. 32
work page 2021
-
[52]
Provably efficient maximum entropy exploration
Elad Hazan, Sham Kakade, Karan Singh, and Abby Van Soest. Provably efficient maximum entropy exploration. In ICML, pages 2681–2691. PMLR, 2019
work page 2019
-
[53]
Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, and Sergey Levine. Soft actor-critic: Off- policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018
work page 2018
-
[54]
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. Soft actor-critic algorithms and applications. arxiv:1812.05905, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[55]
Koopman operator–based knowledge-guided reinforcement learning for safe human–robot interaction
Anirban Sinha and Yue Wang. Koopman operator–based knowledge-guided reinforcement learning for safe human–robot interaction. Frontiers in Robotics and AI, 9:779194, 2022
work page 2022
-
[56]
Koopman Q-learning: Offline reinforcement learning via symmetries of dynamics
Matthias Weissenbacher, Samarth Sinha, Animesh Garg, and Kawahara Yoshinobu. Koopman Q-learning: Offline reinforcement learning via symmetries of dynamics. In International Conference on Machine Learning, pages 23645–23667. PMLR, 2022
work page 2022
-
[57]
Matthew Retchin, Brandon Amos, Steven Brunton, and Shuran Song. Koopman constrained policy optimization: A Koopman operator theoretic method for differentiable optimal control in robotics. In ICML 2023 Workshop on Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators, 2023
work page 2023
-
[58]
Deep learning for universal linear embeddings of nonlinear dynamics
Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Deep learning for universal linear embeddings of nonlinear dynamics. Nature communications, 9(1):4950, 2018
work page 2018
-
[59]
Deep dynamical modeling and control of unsteady fluid flows
Jeremy Morton, Antony Jameson, Mykel J Kochenderfer, and Freddie Witherden. Deep dynamical modeling and control of unsteady fluid flows. Advances in Neural Information Processing Systems, 31, 2018
work page 2018
-
[60]
Linearly recurrent autoencoder networks for learning dynamics
Samuel E Otto and Clarence W Rowley. Linearly recurrent autoencoder networks for learning dynamics. SIAM Journal on Applied Dynamical Systems, 18(1):558–593, 2019
work page 2019
-
[61]
Learning Koopman invariant subspaces for dynamic mode decomposition
Naoya Takeishi, Yoshinobu Kawahara, and Takehisa Yairi. Learning Koopman invariant subspaces for dynamic mode decomposition. In Advances in Neural Information Processing Systems, pages 1130–1140, 2017
work page 2017
-
[62]
Learning deep neural network represen- tations for koopman operators of nonlinear dynamical systems
Enoch Yeung, Soumya Kundu, and Nathan Hodas. Learning deep neural network represen- tations for koopman operators of nonlinear dynamical systems. In 2019 American Control Conference (ACC), pages 4832–4839. IEEE, 2019
work page 2019
-
[63]
VAMPnets: Deep learning of molecular kinetics
Andreas Mardt, Luca Pasquali, Hao Wu, and Frank No ´e. VAMPnets: Deep learning of molecular kinetics. Nature Communications, 9(5), 2018
work page 2018
-
[64]
Sparse identification of nonlinear dynamics with control (sindyc)
Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Sparse identification of nonlinear dynamics with control (sindyc). IFAC-PapersOnLine, 49(18):710–715, 2016
work page 2016
-
[65]
Sparse identification of nonlinear dynamics for model predictive control in the low-data limit
Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society of London A, 474(2219), 2018
work page 2018
-
[66]
Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control
Milan Korda and Igor Mezi´c. Linear predictors for nonlinear dynamical systems: Koopman operator meets model predictive control. Automatica, 93:149–160, 2018
work page 2018
-
[67]
Hassan Arbabi, Milan Korda, and Igor Mezi ´c. A data-driven Koopman model predictive control framework for nonlinear partial differential equations. In 2018 IEEE Conference on Decision and Control (CDC), pages 6409–6414. IEEE, 2018
work page 2018
-
[68]
Optimal construction of Koopman eigenfunctions for prediction and control
Milan Korda and Igor Mezi´c. Optimal construction of Koopman eigenfunctions for prediction and control. IEEE Transactions on Automatic Control, 65(12):5114–5129, 2020. 33
work page 2020
-
[69]
Koopman operator-based model predictive control with recursive online update
Horacio M Calder ´on, Erik Schulz, Thimo Oehlschl ¨agel, and Herbert Werner. Koopman operator-based model predictive control with recursive online update. In 2021 European Control Conference (ECC), pages 1543–1549. IEEE, 2021
work page 2021
-
[70]
S. L. Brunton and J. N. Kutz. Data-Driven Science and Engineering: Machine Learning, Dynamical Systems, and Control. Cambridge University Press, 2nd edition, 2022
work page 2022
-
[71]
Koopman operator in systems and control
Alexandre Mauroy, Y Susuki, and I Mezi´c. Koopman operator in systems and control. Springer, 2020
work page 2020
-
[72]
Geometry of the ergodic quotient reveals coherent structures in flows
Marko Budiˇsi´c and Igor Mezi´c. Geometry of the ergodic quotient reveals coherent structures in flows. Physica D: Nonlinear Phenomena, 241(15):1255–1269, 2012
work page 2012
-
[73]
Linearization in the large of nonlinear systems and Koopman operator spectrum
Yueheng Lan and Igor Mezi´c. Linearization in the large of nonlinear systems and Koopman operator spectrum. Physica D: Nonlinear Phenomena, 242(1):42–53, 2013
work page 2013
-
[74]
S. L. Brunton, B. W. Brunton, J. L. Proctor, and J. N Kutz. Koopman invariant subspaces and finite linear representations of nonlinear dynamical systems for control. PLoS ONE, 11(2):e0150171, 2016
work page 2016
-
[75]
On convergence of extended dynamic mode decomposition to the Koopman operator
Milan Korda and Igor Mezi´c. On convergence of extended dynamic mode decomposition to the Koopman operator. Journal of Nonlinear Science, 28(2):687–710, 2018
work page 2018
-
[76]
Deep learning markov and Koopman models with physical constraints
Andreas Mardt, Luca Pasquali, Frank No´e, and Hao Wu. Deep learning markov and Koopman models with physical constraints. In Mathematical and Scientific Machine Learning , pages 451–475. PMLR, 2020
work page 2020
-
[77]
S. L. Brunton, B. W. Brunton, J. L. Proctor, E. Kaiser, and J. N. Kutz. Chaos as an intermittently forced linear system. Nature Communications, 8(19):1–9, 2017
work page 2017
-
[78]
Structured time-delay models for dynamical systems with connections to frenet–serret frame
Seth M Hirsh, Sara M Ichinaga, Steven L Brunton, J Nathan Kutz, and Bingni W Brunton. Structured time-delay models for dynamical systems with connections to frenet–serret frame. Proceedings of the Royal Society A, 477(2254):20210097, 2021
work page 2021
-
[79]
Extracting reproducible time-resolved resting state networks using dynamic mode decomposition
James M Kunert-Graf, Kristian M Eschenburg, David J Galas, J Nathan Kutz, Swati D Rane, and Bingni W Brunton. Extracting reproducible time-resolved resting state networks using dynamic mode decomposition. Frontiers in computational neuroscience, page 75, 2019
work page 2019
-
[80]
Centering data improves the dynamic mode decomposition
Seth M Hirsh, Kameron Decker Harris, J Nathan Kutz, and Bingni W Brunton. Centering data improves the dynamic mode decomposition. SIAM Journal on Applied Dynamical Systems, 19(3):1920–1955, 2020
work page 1920
-
[81]
Data-driven resolvent analysis
Benjamin Herrmann, Peter J Baddoo, Richard Semaan, Steven L Brunton, and Beverley J McKeon. Data-driven resolvent analysis. Journal of Fluid Mechanics, 918, 2021
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.