Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

Akhil Anand; Gustav Sir; Haozhe Tian; Homayoun Hamedmoghadam; Monicah Cherop Naibei; Sebastien Gros; Vyacheslav Kungurtsev

arxiv: 2507.04356 · v2 · submitted 2025-07-06 · 🧮 math.OC · cs.AI· cs.RO

Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations

Vyacheslav Kungurtsev , Monicah Cherop Naibei , Gustav Sir , Akhil Anand , Sebastien Gros , Haozhe Tian , Homayoun Hamedmoghadam This is my paper

Pith reviewed 2026-05-19 06:29 UTC · model grok-4.3

classification 🧮 math.OC cs.AIcs.RO

keywords autonomous systemstwo-level optimizationcontrolclassical planningreinforcement learningrobotic carephysical safetyinterpretability

0 comments

The pith

Autonomous systems achieve greater safety and interpretability by framing decisions as a two-level optimization scheme that combines control, classical planning, and learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper formulates the control of autonomous physical agents, using a stylized robotic care scenario as example, as a two-level optimization problem rather than a pure two-level reinforcement learning procedure. The lower level handles physical movement via control methods while the higher level addresses conceptual tasks through classical planning, with learning integrated across both. This structure is presented as a way to gain greater insight into algorithm design and to deliver more efficient performance with improved physical safety and interpretability for users and regulators. A sympathetic reader would care because current autonomous systems often operate as opaque boxes that raise legitimate concerns about reliability and oversight.

Core claim

We present the general formulation of mission-aligned control of autonomous systems as a two-level optimization scheme which incorporates control at the lower level and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of control, classical planning, and RL presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance, where reliability pertains to physical safety and interpretability into an otherwise black-box operation.

What carries the argument

The two-level optimization scheme that places control for physical movements at the lower level and classical planning for tasks at the higher level while incorporating learning.

If this is right

The integration yields more efficient and reliable performance in autonomous physical agents.
Physical safety improves because decisions are structured rather than purely learned.
Interpretability increases, addressing user and regulator concerns about black-box behavior.
Greater insight becomes available for developing new algorithms that blend the three methodologies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same two-level structure could apply directly to the industrial robots, UAVs, and embedded devices listed in the introduction.
High-level planning might make regulatory verification of safety constraints simpler than in end-to-end learned policies.
The framework could support incremental deployment where only the planning layer is updated while the control layer remains fixed.

Load-bearing premise

That casting a stylized robotic care problem as a two-level optimization integrating control, planning, and learning will inherently produce better physical safety and interpretability than existing methods.

What would settle it

A side-by-side test of a robotic care task in simulation or on hardware that measures physical safety violations and decision transparency scores for the two-level optimization system versus a standard reinforcement learning policy.

Figures

Figures reproduced from arXiv: 2507.04356 by Akhil Anand, Gustav Sir, Haozhe Tian, Homayoun Hamedmoghadam, Monicah Cherop Naibei, Sebastien Gros, Vyacheslav Kungurtsev.

read the original abstract

Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robotic care, which would normally involve a two-level Reinforcement Learning procedure that trains a policy for both lower level physical movement decisions as well as higher level conceptual tasks and their sub-components. In order to deliver greater safety and reliability in the system, we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance. Here, the notion of reliability pertains to physical safety and interpretability into an otherwise black box operation of autonomous agents, concerning users and regulators. This work presents the necessary background and general formulation of the optimization framework, detailing each component and its integration with the others.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper proposes a general formulation for mission-aligned control of autonomous systems, using a stylized robotic care example. It frames the task as a two-level optimization scheme: lower-level control for physical movements, upper-level classical planning for conceptual tasks and sub-tasks, with an integrated learning (RL) capacity. The central claim is that this synergistic integration of control, planning, and RL yields greater physical safety, reliability, and interpretability than standard two-level RL or pure planning approaches.

Significance. If the integration can be equipped with explicit preservation mechanisms, the framework could offer a principled route to combine stability guarantees from control theory, mission constraints from planning, and adaptability from learning. The focus on interpretability for users and regulators addresses a practical gap in autonomous systems. As a foundational formulation paper without theorems, examples, or empirical validation, its significance rests on enabling subsequent rigorous developments rather than delivering immediate results.

major comments (1)

[Abstract and general formulation of the two-level optimization scheme] The abstract and formulation claim that the two-level scheme delivers greater physical safety and reliability than existing approaches. However, no safety invariant, constraint qualification, or post-learning verification mechanism is stated that would ensure the learned lower-level policy respects upper-level mission constraints or control-level stability margins when the RL component is active. This link is load-bearing for the reliability assertion.

minor comments (1)

[Abstract] The abstract refers to a 'stylized version of robotic care' without providing a concrete mathematical example, diagram, or small-scale instance that illustrates how the levels interact.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive and insightful review. We appreciate the recognition of the framework's potential to combine stability guarantees, mission constraints, and adaptability, as well as the focus on interpretability. We address the major comment below and will revise the manuscript to strengthen the presentation of the formulation.

read point-by-point responses

Referee: The abstract and formulation claim that the two-level scheme delivers greater physical safety and reliability than existing approaches. However, no safety invariant, constraint qualification, or post-learning verification mechanism is stated that would ensure the learned lower-level policy respects upper-level mission constraints or control-level stability margins when the RL component is active. This link is load-bearing for the reliability assertion.

Authors: We agree that the manuscript, as a foundational formulation, does not provide explicit safety invariants, constraint qualifications, or post-learning verification mechanisms. The abstract and introduction frame the two-level scheme as delivering an opportunity for greater physical safety and reliability through the synergistic integration of control (for stability margins), classical planning (for mission constraints), and RL (for adaptability), with reliability tied to both physical safety and interpretability. This is positioned as an improvement over standard two-level RL or pure planning by design, but we acknowledge the current text does not detail how the lower-level learned policy is guaranteed to respect upper-level constraints. In the revised manuscript we will add a new subsection under the formulation that outlines possible interfaces for preserving these properties, such as embedding Lyapunov-based stability or control barrier functions at the lower level and propagating temporal or logical constraints from the planner to bound RL actions. We will also revise the abstract and introduction to clarify that the framework is structured to enable such preservation mechanisms rather than claiming they are automatically delivered in the general formulation. revision: yes

Circularity Check

0 steps flagged

Formulation paper structures existing methodologies without self-referential reductions or fitted predictions

full rationale

The manuscript presents a general two-level optimization formulation that places classical planning at the upper level, control at the lower level, and an integrated learning capacity. No equations, predictions, or first-principles derivations are advanced that reduce by construction to the inputs; the text instead describes the components and their integration as an independent structuring of control, planning, and RL. No self-citations are invoked as load-bearing uniqueness theorems, no parameters are fitted and then renamed as predictions, and no ansatzes are smuggled via prior work. The claims of improved safety and interpretability are asserted as opportunities arising from the proposed architecture rather than results obtained through any circular chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard domain assumptions about hierarchical decomposition of autonomous tasks rather than new fitted parameters or invented entities; no specific numerical values or novel postulates are introduced in the abstract.

axioms (1)

domain assumption Autonomous systems can be decomposed into a lower physical control level and a higher conceptual planning level.
This decomposition is invoked as the basis for the two-level optimization scheme in the abstract.

pith-pipeline@v0.9.0 · 5769 in / 1252 out tokens · 59593 ms · 2026-05-19T06:29:47.668275+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 3 internal anchors

[1]

Constrained policy optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017

work page 2017
[2]

Reinforcement learn- ing based mpc with neural dynamical models

Saket Adhau, S´ ebastien Gros, and Sigurd Skogestad. Reinforcement learn- ing based mpc with neural dynamical models. European Journal of Control, page 101048, 2024

work page 2024
[3]

Constrained Markov decision processes

Eitan Altman. Constrained Markov decision processes. Routledge, 2021

work page 2021
[4]

Safe learning for control using control lyapunov functions and control barrier functions: A review

Akhil Anand, Katrine Seel, Vilde Gjærum, Anne H˚ akansson, Haakon Robinson, and Aya Saad. Safe learning for control using control lyapunov functions and control barrier functions: A review. Procedia Computer Sci- ence, 192:3987–3997, 2021

work page 2021
[5]

Optimality conditions for model predictive control: Rethinking predictive model design

Akhil S Anand, Arash Bahari Kordabad, Mario Zanon, and Sebastien Gros. Optimality conditions for model predictive control: Rethinking predictive model design. arXiv preprint arXiv:2412.18268 , 2024

work page arXiv 2024
[6]

A painless deterministic policy gradient method for learning-based mpc

Akhil S Anand, Dirk Reinhardt, Shambhuraj Sawant, Jan Tommy Grav- dahl, and Sebastien Gros. A painless deterministic policy gradient method for learning-based mpc. In 2023 European Control Conference (ECC) , pages 1–7. IEEE, 2023. 41

work page 2023
[7]

Anand, S

A.S. Anand, S. Sawant, D. Reinhardt, and S. Gros. Data-driven predic- tive control and MPC: Do we achieve optimality? IFAC-PapersOnLine, 58(15):73–78, 2024. 20th IFAC Symposium on System Identification SYSID 2024

work page 2024
[8]

Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications

Ying Bai and Dali Wang. Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications. Advanced fuzzy logic technologies in indus- trial applications, pages 17–36, 2006

work page 2006
[9]

Principles of sequencing and schedul- ing

Kenneth R Baker and Dan Trietsch. Principles of sequencing and schedul- ing. John Wiley & Sons, 2018

work page 2018
[10]

Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39

Philippe Baptiste, Claude Le Pape, and Wim Nuijten. Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39. Springer Science & Business Media, 2001

work page 2001
[11]

Neuro-dynamic programming

DP Bertsekas. Neuro-dynamic programming. Athena Scientific, 1996

work page 1996
[12]

Safe learning in robotics: From learning-based control to safe reinforcement learning

Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems , 5(1):411–444, 2022

work page 2022
[13]

Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning

Wenqi Cai, Hossein N Esfahani, Arash B Kordabad, and S´ ebastien Gros. Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning. In 2021 60th IEEE Conference on Decision and Control (CDC) , pages 6365–6370. IEEE, 2021

work page 2021
[14]

Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles

Wenqi Cai, Arash B Kordabad, Hossein N Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles. In 2021 60th IEEE Con- ference on Decision and Control (CDC) , pages 2990–2995. IEEE, 2021

work page 2021
[15]

A learning-based model predictive control strategy for home energy management systems

Wenqi Cai, Shambhuraj Sawant, Dirk Reinhardt, Soroush Rastegarpour, and Sebastien Gros. A learning-based model predictive control strategy for home energy management systems. IEEE Access, 2023

work page 2023
[16]

Control regularization for reduced variance re- inforcement learning

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, and Joel Burdick. Control regularization for reduced variance re- inforcement learning. In International Conference on Machine Learning , pages 1141–1150. PMLR, 2019

work page 2019
[17]

Adaptive Multilevel Stochastic Approximation of the Value-at-Risk

St´ ephane Cr´ epey, Noufel Frikha, Azar Louzi, and Jonathan Spence. Adap- tive multilevel stochastic approximation of the value-at-risk.arXiv preprint arXiv:2408.06531, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Magnetic control of tokamak plasmas through deep reinforcement learning

Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdol- maleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022. 42

work page 2022
[19]

General multilevel adap- tations for stochastic approximation algorithms of robbins–monro and polyak–ruppert type

Steffen Dereich and Thomas M¨ uller-Gronbach. General multilevel adap- tations for stochastic approximation algorithms of robbins–monro and polyak–ruppert type. Numerische Mathematik, 142:279–328, 2019

work page 2019
[20]

Scheduling: theory, algorithms, and systems

Jeremy Dick, Johann M Schumann, NAHB Remodelers, Bart L Weathing- ton, Ray Floyd, and Gerardus Blokdyk. Scheduling: theory, algorithms, and systems. 2022

work page 2022
[21]

Optimiza- tion with learning-informed differential equation constraints and its appli- cations

Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. Optimiza- tion with learning-informed differential equation constraints and its appli- cations. ESAIM: Control, Optimisation and Calculus of Variations , 28:3, 2022

work page 2022
[22]

A descent al- gorithm for the optimal control of relu neural network informed pdes based on approximate directional derivatives

Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. A descent al- gorithm for the optimal control of relu neural network informed pdes based on approximate directional derivatives. SIAM Journal on Optimization , 34(3):2314–2349, 2024

work page 2024
[23]

Lie group forced variational integrator networks for learning and control of robot systems

Valentin Duruisseaux, Thai P Duong, Melvin Leok, and Nikolay Atanasov. Lie group forced variational integrator networks for learning and control of robot systems. In Learning for Dynamics and Control Conference , pages 731–744. PMLR, 2023

work page 2023
[24]

Staff scheduling and rostering: A review of applications, methods and mod- els

Andreas T Ernst, Houyuan Jiang, Mohan Krishnamoorthy, and David Sier. Staff scheduling and rostering: A review of applications, methods and mod- els. European journal of operational research, 153(1):3–27, 2004

work page 2004
[25]

Learning explanatory rules from noisy data

Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research , 61:1–64, 2018

work page 2018
[26]

A stochastic planning framework for the discovery of complementary, agricultural systems

Hector Flores and J Rene Villalobos. A stochastic planning framework for the discovery of complementary, agricultural systems. European Journal of Operational Research, 280(2):707–729, 2020

work page 2020
[27]

Addressing function ap- proximation error in actor-critic methods

Scott Fujimoto, Herke Hoof, and David Meger. Addressing function ap- proximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018

work page 2018
[28]

A comprehensive survey on safe reinforcement learning

Javier Garcıa and Fernando Fern´ andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437– 1480, 2015

work page 2015
[29]

A single timescale stochastic approximation method for nested stochastic optimiza- tion

Saeed Ghadimi, Andrzej Ruszczynski, and Mengdi Wang. A single timescale stochastic approximation method for nested stochastic optimiza- tion. SIAM Journal on Optimization , 30(1):960–979, 2020

work page 2020
[30]

Automated Planning: the- ory and practice

Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: the- ory and practice. Elsevier, 2004. 43

work page 2004
[31]

Data-driven economic nmpc using rein- forcement learning

S´ ebastien Gros and Mario Zanon. Data-driven economic nmpc using rein- forcement learning. IEEE TAC, 65(2):636–648, 2019

work page 2019
[32]

Reinforcement learning for mixed-integer problems based on MPC

Sebastien Gros and Mario Zanon. Reinforcement learning for mixed-integer problems based on MPC. IFAC-PapersOnLine, 53(2):5219–5224, 2020

work page 2020
[33]

Learning for mpc with stability & safety guarantees

Sebastien Gros and Mario Zanon. Learning for mpc with stability & safety guarantees. Automatica, 146:110598, 2022

work page 2022
[34]

Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study

Julia Hippisley-Cox, Carol Coupland, and Peter Brindle. Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. British Medical Journal , 357, 2017

work page 2017
[35]

Logic in Computer Science: Modelling and reasoning about systems

Michael Huth and Mark Ryan. Logic in Computer Science: Modelling and reasoning about systems. Cambridge university press, 2004

work page 2004
[36]

Neural logic reinforcement learning

Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning , pages 3110–3119. PMLR, 2019

work page 2019
[37]

Approximately optimal approximate re- inforcement learning

Sham Kakade and John Langford. Approximately optimal approximate re- inforcement learning. In Proceedings of the Nineteenth International Con- ference on Machine Learning, pages 267–274, 2002

work page 2002
[38]

MPC-based re- inforcement learning for economic problems with application to battery storage

Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. MPC-based re- inforcement learning for economic problems with application to battery storage. In 2021 European Control Conference (ECC) , pages 2573–2578. IEEE, 2021

work page 2021
[39]

Multi-agent bat- tery storage management using mpc-based reinforcement learning

Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. Multi-agent bat- tery storage management using mpc-based reinforcement learning. In 2021 IEEE Conference on Control Technology and Applications (CCTA) , pages 57–62. IEEE, 2021

work page 2021
[40]

Reinforcement learning based on scenario- tree mpc for asvs

Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Reinforcement learning based on scenario- tree mpc for asvs. In 2021 American Control Conference (ACC) , pages 1985–1990. IEEE, 2021

work page 2021
[41]

Reinforcement learning for mpc: Fundamentals and current chal- lenges

Arash Bahari Kordabad, Dirk Reinhardt, Akhil S Anand, and Sebastien Gros. Reinforcement learning for mpc: Fundamentals and current chal- lenges. IFAC-PapersOnLine, 56(2):5773–5780, 2023

work page 2023
[42]

Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint

Arash Bahari Kordabad, Rafael Wisniewski, and Sebastien Gros. Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint. IEEE Access, 10:130058–130067, 2022

work page 2022
[43]

Reinforce- ment learning in robotics: Applications and real-world challenges

Petar Kormushev, Sylvain Calinon, and Darwin G Caldwell. Reinforce- ment learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013. 44

work page 2013
[44]

Inter-level cooperation in hi- erarchical reinforcement learning

Abdul Rahman Kreidieh, Glen Berseth, Brandon Trabucco, Samyak Para- juli, Sergey Levine, and Alexandre M Bayen. Inter-level cooperation in hi- erarchical reinforcement learning. arXiv preprint arXiv:1912.02368 , 2019

work page arXiv 1912
[45]

A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems

Vyacheslav Kungurtsev and Johannes Jaschke. A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems. SIAM Journal on Optimization , 27(1):538–564, 2017

work page 2017
[46]

Dynamic stochastic approxima- tion for multi-stage stochastic optimization

Guanghui Lan and Zhiqiang Zhou. Dynamic stochastic approxima- tion for multi-stage stochastic optimization. Mathematical Programming, 187(1):487–532, 2021

work page 2021
[47]

Planning algorithms

Steven M LaValle. Planning algorithms. Cambridge university press, 2006

work page 2006
[48]

End-to- end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to- end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1–40, 2016

work page 2016
[49]

A framework for parameter estimation and model selection from experimental data in systems biology using approximate bayesian computation

Juliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes, and Michael PH Stumpf. A framework for parameter estimation and model selection from experimental data in systems biology using approximate bayesian computation. Nature Protocols, 9(2):439–456, 2014

work page 2014
[50]

Online planner selection with graph neural networks and adaptive scheduling

Tengfei Ma, Patrick Ferber, Siyu Huo, Jie Chen, and Michael Katz. Online planner selection with graph neural networks and adaptive scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 5077–5084, 2020

work page 2020
[51]

An experiment in linguistic syn- thesis with a fuzzy logic controller

Ebrahim H Mamdani and Sedrak Assilian. An experiment in linguistic syn- thesis with a fuzzy logic controller. International journal of man-machine studies, 7(1):1–13, 1975

work page 1975
[52]

Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments

Andreas B Martinsen, Anastasios M Lekkas, and S´ ebastien Gros. Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments. Control Engineering Practice, 120:105024, 2022

work page 2022
[53]

Simultaneous on- line model identification and production optimization using modifier adap- tation

Jos´ e Matias, Vyacheslav Kungurtsev, and Malcolm Egan. Simultaneous on- line model identification and production optimization using modifier adap- tation. Journal of Process Control, 110:110–120, 2022

work page 2022
[54]

Multilevel optimization: algorithms and applications , volume 20

Athanasios Migdalas, Panos M Pardalos, and Peter V¨ arbrand. Multilevel optimization: algorithms and applications , volume 20. Springer Science & Business Media, 2013

work page 2013
[55]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[56]

Model-based reinforcement learning: A survey

Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023. 45

work page 2023
[57]

Second-order fast-slow stochastic systems

Nhu N Nguyen and George Yin. Second-order fast-slow stochastic systems. SIAM Journal on Mathematical Analysis , 56(4):5175–5208, 2024

work page 2024
[58]

Assessing Generalization in Deep Reinforcement Learning

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ ahenb¨ uhl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[59]

Fuzzy con- trol, volume 42

Kevin M Passino, Stephen Yurkovich, and Michael Reinfrank. Fuzzy con- trol, volume 42. Addison-wesley Reading, MA, 1998

work page 1998
[60]

Clearing the jungle of stochastic optimization

Warren B Powell. Clearing the jungle of stochastic optimization. In Bridg- ing data and decisions , pages 109–137. Informs, 2014

work page 2014
[61]

A unified framework for stochastic optimization

Warren B Powell. A unified framework for stochastic optimization. Euro- pean Journal of Operational Research, 275(3):795–821, 2019

work page 2019
[62]

Model predictive control: theory, computation, and design , volume 2

James Blake Rawlings, David Q Mayne, and Moritz Diehl. Model predictive control: theory, computation, and design , volume 2. Nob Hill Publishing Madison, WI, 2017

work page 2017
[63]

A tour of reinforcement learning: The view from continu- ous control

Benjamin Recht. A tour of reinforcement learning: The view from continu- ous control. Annual Review of Control, Robotics, and Autonomous Systems, 2(1):253–279, 2019

work page 2019
[64]

Learning-based mpc from big data using reinforcement learning

Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt, and Sebastien Gros. Learning-based mpc from big data using reinforcement learning. arXiv preprint arXiv:2301.01667, 2023

work page arXiv 2023
[65]

Model-free data-driven predictive control using reinforcement learning

Shambhuraj Sawant, Dirk Reinhardt, Arash Bahari Kordabad, and Se- bastien Gros. Model-free data-driven predictive control using reinforcement learning. In 2023 62nd IEEE Conference on Decision and Control (CDC) , pages 4046–4052. IEEE, 2023

work page 2023
[66]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael I Jordan, and Philipp Moritz. Trust region policy optimization. In International con- ference on machine learning , pages 1889–1897. PMLR, 2015

work page 2015
[67]

Trilevel and multilevel optimization using monotone operator theory

Allahkaram Shafiei, Vyacheslav Kungurtsev, and Jakub Marecek. Trilevel and multilevel optimization using monotone operator theory. Mathematical Methods of Operations Research, 99(1):77–114, 2024

work page 2024
[68]

A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences

Han Shen and Tianyi Chen. A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences. Advances in Neural Infor- mation Processing Systems, 35:17415–17429, 2022

work page 2022
[69]

Deterministic policy gradient algorithms

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Inter- national conference on machine learning , pages 387–395. Pmlr, 2014. 46

work page 2014
[70]

Lifted relational neural networks: Efficient learning of latent relational structures

Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research , 62:69–100, 2018

work page 2018
[71]

Reinforcement learning: An introduction

Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018

work page 2018
[72]

Policy gradient methods for reinforcement learning with function approxi- mation

Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approxi- mation. Advances in neural information processing systems , 12, 1999

work page 1999
[73]

Fuzzy identification of systems and its applications to modeling and control

Tomohiro Takagi and Michio Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE transactions on systems, man, and cybernetics , (1):116–132, 1985

work page 1985
[74]

Reinforcement learning with adaptive regularization for safe con- trol of critical systems

Haozhe Tian, Homayoun Hamedmoghadam, Robert Shorten, and Pietro Ferraro. Reinforcement learning with adaptive regularization for safe con- trol of critical systems. In The 38th Advances in Neural Information Pro- cessing Systems (NeurIPS) , volume 37, pages 2528–2557, 2024

work page 2024
[75]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992

work page 1992
[76]

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

Ning Xu, Hanwang Zhang, An-An Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. Multi-level policy and reward-based deep reinforcement learning framework for image captioning. IEEE Transactions on Multime- dia, 22(5):1372–1383, 2019

work page 2019
[77]

Asymptotic expansions of backward equa- tions for two-time-scale markov chains in continuous time.Acta Mathemat- icae Applicatae Sinica, English Series , 25(3):457–476, 2009

G Yin and Dung Tien Nguyen. Asymptotic expansions of backward equa- tions for two-time-scale markov chains in continuous time.Acta Mathemat- icae Applicatae Sinica, English Series , 25(3):457–476, 2009

work page 2009
[78]

Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37

G George Yin and Qing Zhang. Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37. Springer Science & Business Media, 2012

work page 2012
[79]

Fuzzy sets

Lotfi A Zadeh. Fuzzy sets. Information and Control , 1965

work page 1965
[80]

Graphmp: Graph neural network-based motion planning with efficient graph search

Xiao Zang, Miao Yin, Jinqi Xiao, Saman Zonouz, and Bo Yuan. Graphmp: Graph neural network-based motion planning with efficient graph search. Advances in Neural Information Processing Systems , 36:3131–3142, 2023

work page 2023

Showing first 80 references.

[1] [1]

Constrained policy optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017

work page 2017

[2] [2]

Reinforcement learn- ing based mpc with neural dynamical models

Saket Adhau, S´ ebastien Gros, and Sigurd Skogestad. Reinforcement learn- ing based mpc with neural dynamical models. European Journal of Control, page 101048, 2024

work page 2024

[3] [3]

Constrained Markov decision processes

Eitan Altman. Constrained Markov decision processes. Routledge, 2021

work page 2021

[4] [4]

Safe learning for control using control lyapunov functions and control barrier functions: A review

Akhil Anand, Katrine Seel, Vilde Gjærum, Anne H˚ akansson, Haakon Robinson, and Aya Saad. Safe learning for control using control lyapunov functions and control barrier functions: A review. Procedia Computer Sci- ence, 192:3987–3997, 2021

work page 2021

[5] [5]

Optimality conditions for model predictive control: Rethinking predictive model design

Akhil S Anand, Arash Bahari Kordabad, Mario Zanon, and Sebastien Gros. Optimality conditions for model predictive control: Rethinking predictive model design. arXiv preprint arXiv:2412.18268 , 2024

work page arXiv 2024

[6] [6]

A painless deterministic policy gradient method for learning-based mpc

Akhil S Anand, Dirk Reinhardt, Shambhuraj Sawant, Jan Tommy Grav- dahl, and Sebastien Gros. A painless deterministic policy gradient method for learning-based mpc. In 2023 European Control Conference (ECC) , pages 1–7. IEEE, 2023. 41

work page 2023

[7] [7]

Anand, S

A.S. Anand, S. Sawant, D. Reinhardt, and S. Gros. Data-driven predic- tive control and MPC: Do we achieve optimality? IFAC-PapersOnLine, 58(15):73–78, 2024. 20th IFAC Symposium on System Identification SYSID 2024

work page 2024

[8] [8]

Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications

Ying Bai and Dali Wang. Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications. Advanced fuzzy logic technologies in indus- trial applications, pages 17–36, 2006

work page 2006

[9] [9]

Principles of sequencing and schedul- ing

Kenneth R Baker and Dan Trietsch. Principles of sequencing and schedul- ing. John Wiley & Sons, 2018

work page 2018

[10] [10]

Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39

Philippe Baptiste, Claude Le Pape, and Wim Nuijten. Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39. Springer Science & Business Media, 2001

work page 2001

[11] [11]

Neuro-dynamic programming

DP Bertsekas. Neuro-dynamic programming. Athena Scientific, 1996

work page 1996

[12] [12]

Safe learning in robotics: From learning-based control to safe reinforcement learning

Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems , 5(1):411–444, 2022

work page 2022

[13] [13]

Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning

Wenqi Cai, Hossein N Esfahani, Arash B Kordabad, and S´ ebastien Gros. Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning. In 2021 60th IEEE Conference on Decision and Control (CDC) , pages 6365–6370. IEEE, 2021

work page 2021

[14] [14]

Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles

Wenqi Cai, Arash B Kordabad, Hossein N Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles. In 2021 60th IEEE Con- ference on Decision and Control (CDC) , pages 2990–2995. IEEE, 2021

work page 2021

[15] [15]

A learning-based model predictive control strategy for home energy management systems

Wenqi Cai, Shambhuraj Sawant, Dirk Reinhardt, Soroush Rastegarpour, and Sebastien Gros. A learning-based model predictive control strategy for home energy management systems. IEEE Access, 2023

work page 2023

[16] [16]

Control regularization for reduced variance re- inforcement learning

Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, and Joel Burdick. Control regularization for reduced variance re- inforcement learning. In International Conference on Machine Learning , pages 1141–1150. PMLR, 2019

work page 2019

[17] [17]

Adaptive Multilevel Stochastic Approximation of the Value-at-Risk

St´ ephane Cr´ epey, Noufel Frikha, Azar Louzi, and Jonathan Spence. Adap- tive multilevel stochastic approximation of the value-at-risk.arXiv preprint arXiv:2408.06531, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

Magnetic control of tokamak plasmas through deep reinforcement learning

Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdol- maleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022. 42

work page 2022

[19] [19]

General multilevel adap- tations for stochastic approximation algorithms of robbins–monro and polyak–ruppert type

Steffen Dereich and Thomas M¨ uller-Gronbach. General multilevel adap- tations for stochastic approximation algorithms of robbins–monro and polyak–ruppert type. Numerische Mathematik, 142:279–328, 2019

work page 2019

[20] [20]

Scheduling: theory, algorithms, and systems

Jeremy Dick, Johann M Schumann, NAHB Remodelers, Bart L Weathing- ton, Ray Floyd, and Gerardus Blokdyk. Scheduling: theory, algorithms, and systems. 2022

work page 2022

[21] [21]

Optimiza- tion with learning-informed differential equation constraints and its appli- cations

Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. Optimiza- tion with learning-informed differential equation constraints and its appli- cations. ESAIM: Control, Optimisation and Calculus of Variations , 28:3, 2022

work page 2022

[22] [22]

A descent al- gorithm for the optimal control of relu neural network informed pdes based on approximate directional derivatives

Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. A descent al- gorithm for the optimal control of relu neural network informed pdes based on approximate directional derivatives. SIAM Journal on Optimization , 34(3):2314–2349, 2024

work page 2024

[23] [23]

Lie group forced variational integrator networks for learning and control of robot systems

Valentin Duruisseaux, Thai P Duong, Melvin Leok, and Nikolay Atanasov. Lie group forced variational integrator networks for learning and control of robot systems. In Learning for Dynamics and Control Conference , pages 731–744. PMLR, 2023

work page 2023

[24] [24]

Staff scheduling and rostering: A review of applications, methods and mod- els

Andreas T Ernst, Houyuan Jiang, Mohan Krishnamoorthy, and David Sier. Staff scheduling and rostering: A review of applications, methods and mod- els. European journal of operational research, 153(1):3–27, 2004

work page 2004

[25] [25]

Learning explanatory rules from noisy data

Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research , 61:1–64, 2018

work page 2018

[26] [26]

A stochastic planning framework for the discovery of complementary, agricultural systems

Hector Flores and J Rene Villalobos. A stochastic planning framework for the discovery of complementary, agricultural systems. European Journal of Operational Research, 280(2):707–729, 2020

work page 2020

[27] [27]

Addressing function ap- proximation error in actor-critic methods

Scott Fujimoto, Herke Hoof, and David Meger. Addressing function ap- proximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018

work page 2018

[28] [28]

A comprehensive survey on safe reinforcement learning

Javier Garcıa and Fernando Fern´ andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437– 1480, 2015

work page 2015

[29] [29]

A single timescale stochastic approximation method for nested stochastic optimiza- tion

Saeed Ghadimi, Andrzej Ruszczynski, and Mengdi Wang. A single timescale stochastic approximation method for nested stochastic optimiza- tion. SIAM Journal on Optimization , 30(1):960–979, 2020

work page 2020

[30] [30]

Automated Planning: the- ory and practice

Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: the- ory and practice. Elsevier, 2004. 43

work page 2004

[31] [31]

Data-driven economic nmpc using rein- forcement learning

S´ ebastien Gros and Mario Zanon. Data-driven economic nmpc using rein- forcement learning. IEEE TAC, 65(2):636–648, 2019

work page 2019

[32] [32]

Reinforcement learning for mixed-integer problems based on MPC

Sebastien Gros and Mario Zanon. Reinforcement learning for mixed-integer problems based on MPC. IFAC-PapersOnLine, 53(2):5219–5224, 2020

work page 2020

[33] [33]

Learning for mpc with stability & safety guarantees

Sebastien Gros and Mario Zanon. Learning for mpc with stability & safety guarantees. Automatica, 146:110598, 2022

work page 2022

[34] [34]

Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study

Julia Hippisley-Cox, Carol Coupland, and Peter Brindle. Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. British Medical Journal , 357, 2017

work page 2017

[35] [35]

Logic in Computer Science: Modelling and reasoning about systems

Michael Huth and Mark Ryan. Logic in Computer Science: Modelling and reasoning about systems. Cambridge university press, 2004

work page 2004

[36] [36]

Neural logic reinforcement learning

Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning , pages 3110–3119. PMLR, 2019

work page 2019

[37] [37]

Approximately optimal approximate re- inforcement learning

Sham Kakade and John Langford. Approximately optimal approximate re- inforcement learning. In Proceedings of the Nineteenth International Con- ference on Machine Learning, pages 267–274, 2002

work page 2002

[38] [38]

MPC-based re- inforcement learning for economic problems with application to battery storage

Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. MPC-based re- inforcement learning for economic problems with application to battery storage. In 2021 European Control Conference (ECC) , pages 2573–2578. IEEE, 2021

work page 2021

[39] [39]

Multi-agent bat- tery storage management using mpc-based reinforcement learning

Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. Multi-agent bat- tery storage management using mpc-based reinforcement learning. In 2021 IEEE Conference on Control Technology and Applications (CCTA) , pages 57–62. IEEE, 2021

work page 2021

[40] [40]

Reinforcement learning based on scenario- tree mpc for asvs

Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Reinforcement learning based on scenario- tree mpc for asvs. In 2021 American Control Conference (ACC) , pages 1985–1990. IEEE, 2021

work page 2021

[41] [41]

Reinforcement learning for mpc: Fundamentals and current chal- lenges

Arash Bahari Kordabad, Dirk Reinhardt, Akhil S Anand, and Sebastien Gros. Reinforcement learning for mpc: Fundamentals and current chal- lenges. IFAC-PapersOnLine, 56(2):5773–5780, 2023

work page 2023

[42] [42]

Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint

Arash Bahari Kordabad, Rafael Wisniewski, and Sebastien Gros. Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint. IEEE Access, 10:130058–130067, 2022

work page 2022

[43] [43]

Reinforce- ment learning in robotics: Applications and real-world challenges

Petar Kormushev, Sylvain Calinon, and Darwin G Caldwell. Reinforce- ment learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013. 44

work page 2013

[44] [44]

Inter-level cooperation in hi- erarchical reinforcement learning

Abdul Rahman Kreidieh, Glen Berseth, Brandon Trabucco, Samyak Para- juli, Sergey Levine, and Alexandre M Bayen. Inter-level cooperation in hi- erarchical reinforcement learning. arXiv preprint arXiv:1912.02368 , 2019

work page arXiv 1912

[45] [45]

A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems

Vyacheslav Kungurtsev and Johannes Jaschke. A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems. SIAM Journal on Optimization , 27(1):538–564, 2017

work page 2017

[46] [46]

Dynamic stochastic approxima- tion for multi-stage stochastic optimization

Guanghui Lan and Zhiqiang Zhou. Dynamic stochastic approxima- tion for multi-stage stochastic optimization. Mathematical Programming, 187(1):487–532, 2021

work page 2021

[47] [47]

Planning algorithms

Steven M LaValle. Planning algorithms. Cambridge university press, 2006

work page 2006

[48] [48]

End-to- end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to- end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1–40, 2016

work page 2016

[49] [49]

A framework for parameter estimation and model selection from experimental data in systems biology using approximate bayesian computation

Juliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes, and Michael PH Stumpf. A framework for parameter estimation and model selection from experimental data in systems biology using approximate bayesian computation. Nature Protocols, 9(2):439–456, 2014

work page 2014

[50] [50]

Online planner selection with graph neural networks and adaptive scheduling

Tengfei Ma, Patrick Ferber, Siyu Huo, Jie Chen, and Michael Katz. Online planner selection with graph neural networks and adaptive scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 5077–5084, 2020

work page 2020

[51] [51]

An experiment in linguistic syn- thesis with a fuzzy logic controller

Ebrahim H Mamdani and Sedrak Assilian. An experiment in linguistic syn- thesis with a fuzzy logic controller. International journal of man-machine studies, 7(1):1–13, 1975

work page 1975

[52] [52]

Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments

Andreas B Martinsen, Anastasios M Lekkas, and S´ ebastien Gros. Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments. Control Engineering Practice, 120:105024, 2022

work page 2022

[53] [53]

Simultaneous on- line model identification and production optimization using modifier adap- tation

Jos´ e Matias, Vyacheslav Kungurtsev, and Malcolm Egan. Simultaneous on- line model identification and production optimization using modifier adap- tation. Journal of Process Control, 110:110–120, 2022

work page 2022

[54] [54]

Multilevel optimization: algorithms and applications , volume 20

Athanasios Migdalas, Panos M Pardalos, and Peter V¨ arbrand. Multilevel optimization: algorithms and applications , volume 20. Springer Science & Business Media, 2013

work page 2013

[55] [55]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[56] [56]

Model-based reinforcement learning: A survey

Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023. 45

work page 2023

[57] [57]

Second-order fast-slow stochastic systems

Nhu N Nguyen and George Yin. Second-order fast-slow stochastic systems. SIAM Journal on Mathematical Analysis , 56(4):5175–5208, 2024

work page 2024

[58] [58]

Assessing Generalization in Deep Reinforcement Learning

Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ ahenb¨ uhl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[59] [59]

Fuzzy con- trol, volume 42

Kevin M Passino, Stephen Yurkovich, and Michael Reinfrank. Fuzzy con- trol, volume 42. Addison-wesley Reading, MA, 1998

work page 1998

[60] [60]

Clearing the jungle of stochastic optimization

Warren B Powell. Clearing the jungle of stochastic optimization. In Bridg- ing data and decisions , pages 109–137. Informs, 2014

work page 2014

[61] [61]

A unified framework for stochastic optimization

Warren B Powell. A unified framework for stochastic optimization. Euro- pean Journal of Operational Research, 275(3):795–821, 2019

work page 2019

[62] [62]

Model predictive control: theory, computation, and design , volume 2

James Blake Rawlings, David Q Mayne, and Moritz Diehl. Model predictive control: theory, computation, and design , volume 2. Nob Hill Publishing Madison, WI, 2017

work page 2017

[63] [63]

A tour of reinforcement learning: The view from continu- ous control

Benjamin Recht. A tour of reinforcement learning: The view from continu- ous control. Annual Review of Control, Robotics, and Autonomous Systems, 2(1):253–279, 2019

work page 2019

[64] [64]

Learning-based mpc from big data using reinforcement learning

Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt, and Sebastien Gros. Learning-based mpc from big data using reinforcement learning. arXiv preprint arXiv:2301.01667, 2023

work page arXiv 2023

[65] [65]

Model-free data-driven predictive control using reinforcement learning

Shambhuraj Sawant, Dirk Reinhardt, Arash Bahari Kordabad, and Se- bastien Gros. Model-free data-driven predictive control using reinforcement learning. In 2023 62nd IEEE Conference on Decision and Control (CDC) , pages 4046–4052. IEEE, 2023

work page 2023

[66] [66]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael I Jordan, and Philipp Moritz. Trust region policy optimization. In International con- ference on machine learning , pages 1889–1897. PMLR, 2015

work page 2015

[67] [67]

Trilevel and multilevel optimization using monotone operator theory

Allahkaram Shafiei, Vyacheslav Kungurtsev, and Jakub Marecek. Trilevel and multilevel optimization using monotone operator theory. Mathematical Methods of Operations Research, 99(1):77–114, 2024

work page 2024

[68] [68]

A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences

Han Shen and Tianyi Chen. A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences. Advances in Neural Infor- mation Processing Systems, 35:17415–17429, 2022

work page 2022

[69] [69]

Deterministic policy gradient algorithms

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Inter- national conference on machine learning , pages 387–395. Pmlr, 2014. 46

work page 2014

[70] [70]

Lifted relational neural networks: Efficient learning of latent relational structures

Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research , 62:69–100, 2018

work page 2018

[71] [71]

Reinforcement learning: An introduction

Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018

work page 2018

[72] [72]

Policy gradient methods for reinforcement learning with function approxi- mation

Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approxi- mation. Advances in neural information processing systems , 12, 1999

work page 1999

[73] [73]

Fuzzy identification of systems and its applications to modeling and control

Tomohiro Takagi and Michio Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE transactions on systems, man, and cybernetics , (1):116–132, 1985

work page 1985

[74] [74]

Reinforcement learning with adaptive regularization for safe con- trol of critical systems

Haozhe Tian, Homayoun Hamedmoghadam, Robert Shorten, and Pietro Ferraro. Reinforcement learning with adaptive regularization for safe con- trol of critical systems. In The 38th Advances in Neural Information Pro- cessing Systems (NeurIPS) , volume 37, pages 2528–2557, 2024

work page 2024

[75] [75]

Q-learning

Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992

work page 1992

[76] [76]

Multi-level policy and reward-based deep reinforcement learning framework for image captioning

Ning Xu, Hanwang Zhang, An-An Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. Multi-level policy and reward-based deep reinforcement learning framework for image captioning. IEEE Transactions on Multime- dia, 22(5):1372–1383, 2019

work page 2019

[77] [77]

Asymptotic expansions of backward equa- tions for two-time-scale markov chains in continuous time.Acta Mathemat- icae Applicatae Sinica, English Series , 25(3):457–476, 2009

G Yin and Dung Tien Nguyen. Asymptotic expansions of backward equa- tions for two-time-scale markov chains in continuous time.Acta Mathemat- icae Applicatae Sinica, English Series , 25(3):457–476, 2009

work page 2009

[78] [78]

Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37

G George Yin and Qing Zhang. Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37. Springer Science & Business Media, 2012

work page 2012

[79] [79]

Fuzzy sets

Lotfi A Zadeh. Fuzzy sets. Information and Control , 1965

work page 1965

[80] [80]

Graphmp: Graph neural network-based motion planning with efficient graph search

Xiao Zang, Miao Yin, Jinqi Xiao, Saman Zonouz, and Bo Yuan. Graphmp: Graph neural network-based motion planning with efficient graph search. Advances in Neural Information Processing Systems , 36:3131–3142, 2023

work page 2023