Mission-Aligned Learning-Informed Control of Autonomous Systems: Formulation and Foundations
Pith reviewed 2026-05-19 06:29 UTC · model grok-4.3
The pith
Autonomous systems achieve greater safety and interpretability by framing decisions as a two-level optimization scheme that combines control, classical planning, and learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present the general formulation of mission-aligned control of autonomous systems as a two-level optimization scheme which incorporates control at the lower level and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of control, classical planning, and RL presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance, where reliability pertains to physical safety and interpretability into an otherwise black-box operation.
What carries the argument
The two-level optimization scheme that places control for physical movements at the lower level and classical planning for tasks at the higher level while incorporating learning.
If this is right
- The integration yields more efficient and reliable performance in autonomous physical agents.
- Physical safety improves because decisions are structured rather than purely learned.
- Interpretability increases, addressing user and regulator concerns about black-box behavior.
- Greater insight becomes available for developing new algorithms that blend the three methodologies.
Where Pith is reading between the lines
- The same two-level structure could apply directly to the industrial robots, UAVs, and embedded devices listed in the introduction.
- High-level planning might make regulatory verification of safety constraints simpler than in end-to-end learned policies.
- The framework could support incremental deployment where only the planning layer is updated while the control layer remains fixed.
Load-bearing premise
That casting a stylized robotic care problem as a two-level optimization integrating control, planning, and learning will inherently produce better physical safety and interpretability than existing methods.
What would settle it
A side-by-side test of a robotic care task in simulation or on hardware that measures physical safety violations and decision transparency scores for the two-level optimization system versus a standard reinforcement learning policy.
Figures
read the original abstract
Research, innovation and practical capital investment have been increasing rapidly toward the realization of autonomous physical agents. This includes industrial and service robots, unmanned aerial vehicles, embedded control devices, and a number of other realizations of cybernetic/mechatronic implementations of intelligent autonomous devices. In this paper, we consider a stylized version of robotic care, which would normally involve a two-level Reinforcement Learning procedure that trains a policy for both lower level physical movement decisions as well as higher level conceptual tasks and their sub-components. In order to deliver greater safety and reliability in the system, we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning. This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development, leading to more efficient and reliable performance. Here, the notion of reliability pertains to physical safety and interpretability into an otherwise black box operation of autonomous agents, concerning users and regulators. This work presents the necessary background and general formulation of the optimization framework, detailing each component and its integration with the others.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a general formulation for mission-aligned control of autonomous systems, using a stylized robotic care example. It frames the task as a two-level optimization scheme: lower-level control for physical movements, upper-level classical planning for conceptual tasks and sub-tasks, with an integrated learning (RL) capacity. The central claim is that this synergistic integration of control, planning, and RL yields greater physical safety, reliability, and interpretability than standard two-level RL or pure planning approaches.
Significance. If the integration can be equipped with explicit preservation mechanisms, the framework could offer a principled route to combine stability guarantees from control theory, mission constraints from planning, and adaptability from learning. The focus on interpretability for users and regulators addresses a practical gap in autonomous systems. As a foundational formulation paper without theorems, examples, or empirical validation, its significance rests on enabling subsequent rigorous developments rather than delivering immediate results.
major comments (1)
- [Abstract and general formulation of the two-level optimization scheme] The abstract and formulation claim that the two-level scheme delivers greater physical safety and reliability than existing approaches. However, no safety invariant, constraint qualification, or post-learning verification mechanism is stated that would ensure the learned lower-level policy respects upper-level mission constraints or control-level stability margins when the RL component is active. This link is load-bearing for the reliability assertion.
minor comments (1)
- [Abstract] The abstract refers to a 'stylized version of robotic care' without providing a concrete mathematical example, diagram, or small-scale instance that illustrates how the levels interact.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful review. We appreciate the recognition of the framework's potential to combine stability guarantees, mission constraints, and adaptability, as well as the focus on interpretability. We address the major comment below and will revise the manuscript to strengthen the presentation of the formulation.
read point-by-point responses
-
Referee: The abstract and formulation claim that the two-level scheme delivers greater physical safety and reliability than existing approaches. However, no safety invariant, constraint qualification, or post-learning verification mechanism is stated that would ensure the learned lower-level policy respects upper-level mission constraints or control-level stability margins when the RL component is active. This link is load-bearing for the reliability assertion.
Authors: We agree that the manuscript, as a foundational formulation, does not provide explicit safety invariants, constraint qualifications, or post-learning verification mechanisms. The abstract and introduction frame the two-level scheme as delivering an opportunity for greater physical safety and reliability through the synergistic integration of control (for stability margins), classical planning (for mission constraints), and RL (for adaptability), with reliability tied to both physical safety and interpretability. This is positioned as an improvement over standard two-level RL or pure planning by design, but we acknowledge the current text does not detail how the lower-level learned policy is guaranteed to respect upper-level constraints. In the revised manuscript we will add a new subsection under the formulation that outlines possible interfaces for preserving these properties, such as embedding Lyapunov-based stability or control barrier functions at the lower level and propagating temporal or logical constraints from the planner to bound RL actions. We will also revise the abstract and introduction to clarify that the framework is structured to enable such preservation mechanisms rather than claiming they are automatically delivered in the general formulation. revision: yes
Circularity Check
Formulation paper structures existing methodologies without self-referential reductions or fitted predictions
full rationale
The manuscript presents a general two-level optimization formulation that places classical planning at the upper level, control at the lower level, and an integrated learning capacity. No equations, predictions, or first-principles derivations are advanced that reduce by construction to the inputs; the text instead describes the components and their integration as an independent structuring of control, planning, and RL. No self-citations are invoked as load-bearing uniqueness theorems, no parameters are fitted and then renamed as predictions, and no ansatzes are smuggled via prior work. The claims of improved safety and interpretability are asserted as opportunities arising from the proposed architecture rather than results obtained through any circular chain.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Autonomous systems can be decomposed into a lower physical control level and a higher conceptual planning level.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we present the general formulation of this as a two-level optimization scheme which incorporates control at the lower level, and classical planning at the higher level, integrated with a capacity for learning
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
This synergistic integration of multiple methodologies -- control, classical planning, and RL -- presents an opportunity for greater insight for algorithm development
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Constrained policy optimization
Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. In International conference on machine learning, pages 22–31. PMLR, 2017
work page 2017
-
[2]
Reinforcement learn- ing based mpc with neural dynamical models
Saket Adhau, S´ ebastien Gros, and Sigurd Skogestad. Reinforcement learn- ing based mpc with neural dynamical models. European Journal of Control, page 101048, 2024
work page 2024
-
[3]
Constrained Markov decision processes
Eitan Altman. Constrained Markov decision processes. Routledge, 2021
work page 2021
-
[4]
Safe learning for control using control lyapunov functions and control barrier functions: A review
Akhil Anand, Katrine Seel, Vilde Gjærum, Anne H˚ akansson, Haakon Robinson, and Aya Saad. Safe learning for control using control lyapunov functions and control barrier functions: A review. Procedia Computer Sci- ence, 192:3987–3997, 2021
work page 2021
-
[5]
Optimality conditions for model predictive control: Rethinking predictive model design
Akhil S Anand, Arash Bahari Kordabad, Mario Zanon, and Sebastien Gros. Optimality conditions for model predictive control: Rethinking predictive model design. arXiv preprint arXiv:2412.18268 , 2024
-
[6]
A painless deterministic policy gradient method for learning-based mpc
Akhil S Anand, Dirk Reinhardt, Shambhuraj Sawant, Jan Tommy Grav- dahl, and Sebastien Gros. A painless deterministic policy gradient method for learning-based mpc. In 2023 European Control Conference (ECC) , pages 1–7. IEEE, 2023. 41
work page 2023
- [7]
-
[8]
Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications
Ying Bai and Dali Wang. Fundamentals of fuzzy logic control—fuzzy sets, fuzzy rules and defuzzifications. Advanced fuzzy logic technologies in indus- trial applications, pages 17–36, 2006
work page 2006
-
[9]
Principles of sequencing and schedul- ing
Kenneth R Baker and Dan Trietsch. Principles of sequencing and schedul- ing. John Wiley & Sons, 2018
work page 2018
-
[10]
Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39
Philippe Baptiste, Claude Le Pape, and Wim Nuijten. Constraint-based scheduling: applying constraint programming to scheduling problems , vol- ume 39. Springer Science & Business Media, 2001
work page 2001
-
[11]
DP Bertsekas. Neuro-dynamic programming. Athena Scientific, 1996
work page 1996
-
[12]
Safe learning in robotics: From learning-based control to safe reinforcement learning
Lukas Brunke, Melissa Greeff, Adam W Hall, Zhaocong Yuan, Siqi Zhou, Jacopo Panerati, and Angela P Schoellig. Safe learning in robotics: From learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems , 5(1):411–444, 2022
work page 2022
-
[13]
Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning
Wenqi Cai, Hossein N Esfahani, Arash B Kordabad, and S´ ebastien Gros. Optimal management of the peak power penalty for smart grids using mpc- based reinforcement learning. In 2021 60th IEEE Conference on Decision and Control (CDC) , pages 6365–6370. IEEE, 2021
work page 2021
-
[14]
Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles
Wenqi Cai, Arash B Kordabad, Hossein N Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Mpc-based reinforcement learning for a simplified freight mission of autonomous surface vehicles. In 2021 60th IEEE Con- ference on Decision and Control (CDC) , pages 2990–2995. IEEE, 2021
work page 2021
-
[15]
A learning-based model predictive control strategy for home energy management systems
Wenqi Cai, Shambhuraj Sawant, Dirk Reinhardt, Soroush Rastegarpour, and Sebastien Gros. A learning-based model predictive control strategy for home energy management systems. IEEE Access, 2023
work page 2023
-
[16]
Control regularization for reduced variance re- inforcement learning
Richard Cheng, Abhinav Verma, Gabor Orosz, Swarat Chaudhuri, Yisong Yue, and Joel Burdick. Control regularization for reduced variance re- inforcement learning. In International Conference on Machine Learning , pages 1141–1150. PMLR, 2019
work page 2019
-
[17]
Adaptive Multilevel Stochastic Approximation of the Value-at-Risk
St´ ephane Cr´ epey, Noufel Frikha, Azar Louzi, and Jonathan Spence. Adap- tive multilevel stochastic approximation of the value-at-risk.arXiv preprint arXiv:2408.06531, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[18]
Magnetic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdol- maleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature, 602(7897):414–419, 2022. 42
work page 2022
-
[19]
Steffen Dereich and Thomas M¨ uller-Gronbach. General multilevel adap- tations for stochastic approximation algorithms of robbins–monro and polyak–ruppert type. Numerische Mathematik, 142:279–328, 2019
work page 2019
-
[20]
Scheduling: theory, algorithms, and systems
Jeremy Dick, Johann M Schumann, NAHB Remodelers, Bart L Weathing- ton, Ray Floyd, and Gerardus Blokdyk. Scheduling: theory, algorithms, and systems. 2022
work page 2022
-
[21]
Optimiza- tion with learning-informed differential equation constraints and its appli- cations
Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. Optimiza- tion with learning-informed differential equation constraints and its appli- cations. ESAIM: Control, Optimisation and Calculus of Variations , 28:3, 2022
work page 2022
-
[22]
Guozhi Dong, Michael Hinterm¨ uller, and Kostas Papafitsoros. A descent al- gorithm for the optimal control of relu neural network informed pdes based on approximate directional derivatives. SIAM Journal on Optimization , 34(3):2314–2349, 2024
work page 2024
-
[23]
Lie group forced variational integrator networks for learning and control of robot systems
Valentin Duruisseaux, Thai P Duong, Melvin Leok, and Nikolay Atanasov. Lie group forced variational integrator networks for learning and control of robot systems. In Learning for Dynamics and Control Conference , pages 731–744. PMLR, 2023
work page 2023
-
[24]
Staff scheduling and rostering: A review of applications, methods and mod- els
Andreas T Ernst, Houyuan Jiang, Mohan Krishnamoorthy, and David Sier. Staff scheduling and rostering: A review of applications, methods and mod- els. European journal of operational research, 153(1):3–27, 2004
work page 2004
-
[25]
Learning explanatory rules from noisy data
Richard Evans and Edward Grefenstette. Learning explanatory rules from noisy data. Journal of Artificial Intelligence Research , 61:1–64, 2018
work page 2018
-
[26]
A stochastic planning framework for the discovery of complementary, agricultural systems
Hector Flores and J Rene Villalobos. A stochastic planning framework for the discovery of complementary, agricultural systems. European Journal of Operational Research, 280(2):707–729, 2020
work page 2020
-
[27]
Addressing function ap- proximation error in actor-critic methods
Scott Fujimoto, Herke Hoof, and David Meger. Addressing function ap- proximation error in actor-critic methods. In International Conference on Machine Learning, pages 1587–1596. PMLR, 2018
work page 2018
-
[28]
A comprehensive survey on safe reinforcement learning
Javier Garcıa and Fernando Fern´ andez. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437– 1480, 2015
work page 2015
-
[29]
A single timescale stochastic approximation method for nested stochastic optimiza- tion
Saeed Ghadimi, Andrzej Ruszczynski, and Mengdi Wang. A single timescale stochastic approximation method for nested stochastic optimiza- tion. SIAM Journal on Optimization , 30(1):960–979, 2020
work page 2020
-
[30]
Automated Planning: the- ory and practice
Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: the- ory and practice. Elsevier, 2004. 43
work page 2004
-
[31]
Data-driven economic nmpc using rein- forcement learning
S´ ebastien Gros and Mario Zanon. Data-driven economic nmpc using rein- forcement learning. IEEE TAC, 65(2):636–648, 2019
work page 2019
-
[32]
Reinforcement learning for mixed-integer problems based on MPC
Sebastien Gros and Mario Zanon. Reinforcement learning for mixed-integer problems based on MPC. IFAC-PapersOnLine, 53(2):5219–5224, 2020
work page 2020
-
[33]
Learning for mpc with stability & safety guarantees
Sebastien Gros and Mario Zanon. Learning for mpc with stability & safety guarantees. Automatica, 146:110598, 2022
work page 2022
-
[34]
Julia Hippisley-Cox, Carol Coupland, and Peter Brindle. Development and validation of qrisk3 risk prediction algorithms to estimate future risk of cardiovascular disease: prospective cohort study. British Medical Journal , 357, 2017
work page 2017
-
[35]
Logic in Computer Science: Modelling and reasoning about systems
Michael Huth and Mark Ryan. Logic in Computer Science: Modelling and reasoning about systems. Cambridge university press, 2004
work page 2004
-
[36]
Neural logic reinforcement learning
Zhengyao Jiang and Shan Luo. Neural logic reinforcement learning. In International conference on machine learning , pages 3110–3119. PMLR, 2019
work page 2019
-
[37]
Approximately optimal approximate re- inforcement learning
Sham Kakade and John Langford. Approximately optimal approximate re- inforcement learning. In Proceedings of the Nineteenth International Con- ference on Machine Learning, pages 267–274, 2002
work page 2002
-
[38]
MPC-based re- inforcement learning for economic problems with application to battery storage
Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. MPC-based re- inforcement learning for economic problems with application to battery storage. In 2021 European Control Conference (ECC) , pages 2573–2578. IEEE, 2021
work page 2021
-
[39]
Multi-agent bat- tery storage management using mpc-based reinforcement learning
Arash Bahari Kordabad, Wenqi Cai, and Sebastien Gros. Multi-agent bat- tery storage management using mpc-based reinforcement learning. In 2021 IEEE Conference on Control Technology and Applications (CCTA) , pages 57–62. IEEE, 2021
work page 2021
-
[40]
Reinforcement learning based on scenario- tree mpc for asvs
Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Anastasios M Lekkas, and S´ ebastien Gros. Reinforcement learning based on scenario- tree mpc for asvs. In 2021 American Control Conference (ACC) , pages 1985–1990. IEEE, 2021
work page 2021
-
[41]
Reinforcement learning for mpc: Fundamentals and current chal- lenges
Arash Bahari Kordabad, Dirk Reinhardt, Akhil S Anand, and Sebastien Gros. Reinforcement learning for mpc: Fundamentals and current chal- lenges. IFAC-PapersOnLine, 56(2):5773–5780, 2023
work page 2023
-
[42]
Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint
Arash Bahari Kordabad, Rafael Wisniewski, and Sebastien Gros. Safe reinforcement learning using wasserstein distributionally robust MPC and chance constraint. IEEE Access, 10:130058–130067, 2022
work page 2022
-
[43]
Reinforce- ment learning in robotics: Applications and real-world challenges
Petar Kormushev, Sylvain Calinon, and Darwin G Caldwell. Reinforce- ment learning in robotics: Applications and real-world challenges. Robotics, 2(3):122–148, 2013. 44
work page 2013
-
[44]
Inter-level cooperation in hi- erarchical reinforcement learning
Abdul Rahman Kreidieh, Glen Berseth, Brandon Trabucco, Samyak Para- juli, Sergey Levine, and Alexandre M Bayen. Inter-level cooperation in hi- erarchical reinforcement learning. arXiv preprint arXiv:1912.02368 , 2019
-
[45]
A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems
Vyacheslav Kungurtsev and Johannes Jaschke. A predictor-corrector path- following algorithm for dual-degenerate parametric optimization problems. SIAM Journal on Optimization , 27(1):538–564, 2017
work page 2017
-
[46]
Dynamic stochastic approxima- tion for multi-stage stochastic optimization
Guanghui Lan and Zhiqiang Zhou. Dynamic stochastic approxima- tion for multi-stage stochastic optimization. Mathematical Programming, 187(1):487–532, 2021
work page 2021
-
[47]
Steven M LaValle. Planning algorithms. Cambridge university press, 2006
work page 2006
-
[48]
End-to- end training of deep visuomotor policies
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to- end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1–40, 2016
work page 2016
-
[49]
Juliane Liepe, Paul Kirk, Sarah Filippi, Tina Toni, Chris P Barnes, and Michael PH Stumpf. A framework for parameter estimation and model selection from experimental data in systems biology using approximate bayesian computation. Nature Protocols, 9(2):439–456, 2014
work page 2014
-
[50]
Online planner selection with graph neural networks and adaptive scheduling
Tengfei Ma, Patrick Ferber, Siyu Huo, Jie Chen, and Michael Katz. Online planner selection with graph neural networks and adaptive scheduling. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 5077–5084, 2020
work page 2020
-
[51]
An experiment in linguistic syn- thesis with a fuzzy logic controller
Ebrahim H Mamdani and Sedrak Assilian. An experiment in linguistic syn- thesis with a fuzzy logic controller. International journal of man-machine studies, 7(1):1–13, 1975
work page 1975
-
[52]
Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments
Andreas B Martinsen, Anastasios M Lekkas, and S´ ebastien Gros. Rein- forcement learning-based nmpc for tracking control of asvs: Theory and experiments. Control Engineering Practice, 120:105024, 2022
work page 2022
-
[53]
Simultaneous on- line model identification and production optimization using modifier adap- tation
Jos´ e Matias, Vyacheslav Kungurtsev, and Malcolm Egan. Simultaneous on- line model identification and production optimization using modifier adap- tation. Journal of Process Control, 110:110–120, 2022
work page 2022
-
[54]
Multilevel optimization: algorithms and applications , volume 20
Athanasios Migdalas, Panos M Pardalos, and Peter V¨ arbrand. Multilevel optimization: algorithms and applications , volume 20. Springer Science & Business Media, 2013
work page 2013
-
[55]
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[56]
Model-based reinforcement learning: A survey
Thomas M Moerland, Joost Broekens, Aske Plaat, Catholijn M Jonker, et al. Model-based reinforcement learning: A survey. Foundations and Trends® in Machine Learning, 16(1):1–118, 2023. 45
work page 2023
-
[57]
Second-order fast-slow stochastic systems
Nhu N Nguyen and George Yin. Second-order fast-slow stochastic systems. SIAM Journal on Mathematical Analysis , 56(4):5175–5208, 2024
work page 2024
-
[58]
Assessing Generalization in Deep Reinforcement Learning
Charles Packer, Katelyn Gao, Jernej Kos, Philipp Kr¨ ahenb¨ uhl, Vladlen Koltun, and Dawn Song. Assessing generalization in deep reinforcement learning. arXiv preprint arXiv:1810.12282 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[59]
Kevin M Passino, Stephen Yurkovich, and Michael Reinfrank. Fuzzy con- trol, volume 42. Addison-wesley Reading, MA, 1998
work page 1998
-
[60]
Clearing the jungle of stochastic optimization
Warren B Powell. Clearing the jungle of stochastic optimization. In Bridg- ing data and decisions , pages 109–137. Informs, 2014
work page 2014
-
[61]
A unified framework for stochastic optimization
Warren B Powell. A unified framework for stochastic optimization. Euro- pean Journal of Operational Research, 275(3):795–821, 2019
work page 2019
-
[62]
Model predictive control: theory, computation, and design , volume 2
James Blake Rawlings, David Q Mayne, and Moritz Diehl. Model predictive control: theory, computation, and design , volume 2. Nob Hill Publishing Madison, WI, 2017
work page 2017
-
[63]
A tour of reinforcement learning: The view from continu- ous control
Benjamin Recht. A tour of reinforcement learning: The view from continu- ous control. Annual Review of Control, Robotics, and Autonomous Systems, 2(1):253–279, 2019
work page 2019
-
[64]
Learning-based mpc from big data using reinforcement learning
Shambhuraj Sawant, Akhil S Anand, Dirk Reinhardt, and Sebastien Gros. Learning-based mpc from big data using reinforcement learning. arXiv preprint arXiv:2301.01667, 2023
-
[65]
Model-free data-driven predictive control using reinforcement learning
Shambhuraj Sawant, Dirk Reinhardt, Arash Bahari Kordabad, and Se- bastien Gros. Model-free data-driven predictive control using reinforcement learning. In 2023 62nd IEEE Conference on Decision and Control (CDC) , pages 4046–4052. IEEE, 2023
work page 2023
-
[66]
Trust region policy optimization
John Schulman, Sergey Levine, Pieter Abbeel, Michael I Jordan, and Philipp Moritz. Trust region policy optimization. In International con- ference on machine learning , pages 1889–1897. PMLR, 2015
work page 2015
-
[67]
Trilevel and multilevel optimization using monotone operator theory
Allahkaram Shafiei, Vyacheslav Kungurtsev, and Jakub Marecek. Trilevel and multilevel optimization using monotone operator theory. Mathematical Methods of Operations Research, 99(1):77–114, 2024
work page 2024
-
[68]
A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences
Han Shen and Tianyi Chen. A single-timescale analysis for stochastic ap- proximation with multiple coupled sequences. Advances in Neural Infor- mation Processing Systems, 35:17415–17429, 2022
work page 2022
-
[69]
Deterministic policy gradient algorithms
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Inter- national conference on machine learning , pages 387–395. Pmlr, 2014. 46
work page 2014
-
[70]
Lifted relational neural networks: Efficient learning of latent relational structures
Gustav Sourek, Vojtech Aschenbrenner, Filip Zelezny, Steven Schockaert, and Ondrej Kuzelka. Lifted relational neural networks: Efficient learning of latent relational structures. Journal of Artificial Intelligence Research , 62:69–100, 2018
work page 2018
-
[71]
Reinforcement learning: An introduction
Richard S Sutton. Reinforcement learning: An introduction. A Bradford Book, 2018
work page 2018
-
[72]
Policy gradient methods for reinforcement learning with function approxi- mation
Richard S Sutton, David McAllester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approxi- mation. Advances in neural information processing systems , 12, 1999
work page 1999
-
[73]
Fuzzy identification of systems and its applications to modeling and control
Tomohiro Takagi and Michio Sugeno. Fuzzy identification of systems and its applications to modeling and control. IEEE transactions on systems, man, and cybernetics , (1):116–132, 1985
work page 1985
-
[74]
Reinforcement learning with adaptive regularization for safe con- trol of critical systems
Haozhe Tian, Homayoun Hamedmoghadam, Robert Shorten, and Pietro Ferraro. Reinforcement learning with adaptive regularization for safe con- trol of critical systems. In The 38th Advances in Neural Information Pro- cessing Systems (NeurIPS) , volume 37, pages 2528–2557, 2024
work page 2024
-
[75]
Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8:279–292, 1992
work page 1992
-
[76]
Multi-level policy and reward-based deep reinforcement learning framework for image captioning
Ning Xu, Hanwang Zhang, An-An Liu, Weizhi Nie, Yuting Su, Jie Nie, and Yongdong Zhang. Multi-level policy and reward-based deep reinforcement learning framework for image captioning. IEEE Transactions on Multime- dia, 22(5):1372–1383, 2019
work page 2019
-
[77]
G Yin and Dung Tien Nguyen. Asymptotic expansions of backward equa- tions for two-time-scale markov chains in continuous time.Acta Mathemat- icae Applicatae Sinica, English Series , 25(3):457–476, 2009
work page 2009
-
[78]
Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37
G George Yin and Qing Zhang. Continuous-time Markov chains and appli- cations: a two-time-scale approach, volume 37. Springer Science & Business Media, 2012
work page 2012
- [79]
-
[80]
Graphmp: Graph neural network-based motion planning with efficient graph search
Xiao Zang, Miao Yin, Jinqi Xiao, Saman Zonouz, and Bo Yuan. Graphmp: Graph neural network-based motion planning with efficient graph search. Advances in Neural Information Processing Systems , 36:3131–3142, 2023
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.