Learning to Adapt: Reptile-D-Learning for Robust and Efficient Control Under Parametric Uncertainty

Haipeng Cao; Quan Quan; Zhaolong Shen

arxiv: 2606.25659 · v1 · pith:V3ZDW3QMnew · submitted 2026-06-24 · 💻 cs.RO

Learning to Adapt: Reptile-D-Learning for Robust and Efficient Control Under Parametric Uncertainty

Haipeng Cao , Zhaolong Shen , Quan Quan This is my paper

Pith reviewed 2026-06-25 20:50 UTC · model grok-4.3

classification 💻 cs.RO

keywords meta-learningLyapunov controlparametric uncertaintyD-learningnonlinear systemsadaptationgeneralizationrobust control

0 comments

The pith

Reptile-D-learning uses meta-learning to initialize Lyapunov networks that adapt quickly to unseen parameter changes in nonlinear systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Reptile-D-learning to make learning-based Lyapunov control work when system parameters vary or are uncertain. It applies the Reptile meta-learning algorithm to D-learning so that a Lyapunov network learns an initialization capturing structures common to multiple parameter settings. This initialization then supports fast adaptation and stable control for new configurations without starting from scratch or retraining fully. A reader would care because parameter shifts otherwise invalidate stability guarantees and force repeated expensive training. The approach keeps the model-free benefits of D-learning while adding the meta-learning step for generalization.

Core claim

Reptile-D-learning leverages the Reptile meta-learning algorithm to capture shared dynamical structures across systems with different parameters, thereby learning a generalizable Lyapunov network initialization and a high-performance controller. Experiments on multiple nonlinear control systems demonstrate that this significantly improves both generalization and rapid adaptation to unseen parameter configurations.

What carries the argument

Reptile meta-learning applied to D-learning for Lyapunov derivative estimation, which produces an initialization supporting quick fine-tuning across parameter variations.

If this is right

Controllers retain formal stability guarantees when parameters change without full retraining.
Adaptation to new configurations uses fewer samples and less time than training from scratch.
The framework extends to multiple classes of nonlinear systems.
Model-free estimation of Lyapunov derivatives remains available while gaining meta-learning benefits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Control design could shift toward initializing once on a family of models rather than identifying exact parameters each time.
The method might support continuous online updates in settings where parameters drift gradually.
Similar meta-learning steps could be tested with other derivative-estimation techniques beyond D-learning.

Load-bearing premise

Systems with different parameters share enough dynamical structure that Reptile can extract a single initialization useful for fast adaptation.

What would settle it

If tests on systems with large parameter shifts show no faster adaptation or better stability retention than plain D-learning, the claim would not hold.

Figures

Figures reproduced from arXiv: 2606.25659 by Haipeng Cao, Quan Quan, Zhaolong Shen.

**Figure 1.** Figure 1: 3D trajectory comparison under 1.5× mass perturbation. The D-learning controller diverges, while Reptile-Dlearning converges to the target. tractable for the cascaded Lyapunov, Dfunction, and policy networks. To address these challenges, we propose Reptile-Dlearning, a unified bilevel framework for robust crossparameter stabilization. We represent the Lyapunov network, D-network, and control policy as … view at source ↗

**Figure 2.** Figure 2: Overview of the Reptile-D-learning framework. Three meta-networks (Meta-Lyapunov NN, Meta-Dfunction NN, [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: UAV stabilization trajectories under the benchmark system. Compared with the baseline and standard D-learning, [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Phase-portrait comparison for the inverted pendulum [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Comparison of estimated regions of attraction. Reptile [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 8.** Figure 8: Generalization under UAV mass shifts. Reptile-D [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

**Figure 10.** Figure 10: Adaptation-window analysis. Within early adaptation [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: Unified ablation study across three systems. (a) Inverted pendulum: standard D-learning becomes non-convergent [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

read the original abstract

Learning-based Lyapunov Control (LLC) provides formal stability guarantees for nonlinear systems, but its validity relies heavily on accurate system models. Parameter variations and uncertainties may invalidate stability constraints, leading to costly retraining. Although D-learning can estimate Lyapunov derivatives without relying on explicit dynamics models, it remains limited by single-task dynamics and degrades under large parameter shifts. We propose Reptile-D-learning, a framework that leverages the Reptile meta-learning algorithm to capture shared dynamical structures across systems with different parameters, thereby learning a generalizable Lyapunov network initialization and a high-performance controller. Experiments on multiple nonlinear control systems demonstrate that Reptile-D-learning significantly improves both generalization and rapid adaptation to unseen parameter configurations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Reptile-D-learning applies an off-the-shelf meta-learning step to D-learning for Lyapunov control under parameter shifts, but the transfer of stability properties lacks supporting analysis.

read the letter

The main point is that the paper takes Reptile, a first-order meta-learning method, and uses it to produce an initialization for D-learning of Lyapunov networks so the controller adapts faster to unseen parameters without full retraining. The abstract reports that this improves generalization on multiple nonlinear systems.

They handle a genuine practical issue: learning-based Lyapunov control loses its guarantees when parameters vary, and D-learning already sidesteps explicit models for the derivative estimate. Layering Reptile on top to capture shared structure across parameter instances is a reasonable, lightweight extension rather than a new algorithm from scratch.

The soft spot is exactly the stress-test concern. Reptile averages task gradients in a single outer update, but nothing in that construction ensures the resulting point keeps the Lyapunov loss feasible or convex once the dynamics change. Nonlinear systems routinely have shifting loss basins, so the averaged initialization can easily be no better than a random start, which would make the claimed rapid adaptation depend entirely on the specific experiments. The abstract supplies no equations, no proof sketch, and no discussion of how the meta-step interacts with the Lyapunov derivative condition. Without those, the central claim rests on unverified empirical results.

This is aimed at roboticists and control researchers who already use learning-based Lyapunov methods and need something that tolerates moderate parameter drift. It is not for readers seeking new theory or formal guarantees. The work deserves peer review because the problem is relevant and the method is simple enough for referees to check the experiments directly, though it will need tighter analysis of the initialization step and clearer baseline comparisons to be convincing.

Referee Report

2 major / 2 minor

Summary. The paper proposes Reptile-D-learning, a meta-learning framework that applies the Reptile algorithm to D-learning for Lyapunov-based control. It aims to learn a shared initialization for the Lyapunov network across systems with varying parameters, enabling better generalization and rapid adaptation to unseen parameter configurations without full retraining. Experiments on multiple nonlinear control systems are reported to demonstrate significant improvements in both generalization and adaptation performance under parametric uncertainty.

Significance. If the central empirical claims hold with rigorous validation, the approach could meaningfully extend learning-based Lyapunov control to settings with model mismatch, reducing the need for per-instance retraining while retaining formal stability guarantees. The combination of first-order meta-learning with derivative-free Lyapunov estimation is a targeted contribution if the initialization reliably transfers across qualitatively different loss landscapes.

major comments (2)

[§3] §3 (Reptile-D-learning formulation): The claim that Reptile produces a generalizable initialization rests on the assumption that first-order averaging of task-specific gradients yields a point from which D-learning recovers a valid V̇ < 0 controller for new parameters. No argument is given showing that the averaged point remains in the feasible region of the Lyapunov loss when the underlying dynamics change; if the loss landscapes differ in basin structure, the initialization may be no better than random, undermining the rapid-adaptation result.
[Experiments] Experiments section (multiple nonlinear systems): The reported 'significant improvement' in generalization and adaptation is load-bearing for the central claim, yet the manuscript provides no quantitative baselines (e.g., plain D-learning, MAML variants), no error bars or statistical tests, and no specification of the parameter ranges or failure rates on unseen configurations. Without these, it is impossible to determine whether the gains are attributable to Reptile or to other factors.

minor comments (2)

[Abstract] The abstract and introduction use 'Reptile-D-learning' without an explicit acronym expansion or reference to the original Reptile paper on first use.
[§2] Notation for the Lyapunov network and the D-learning loss should be introduced with a clear table or equation block to avoid ambiguity when comparing across parameter values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below and indicate the revisions we will make.

read point-by-point responses

Referee: [§3] §3 (Reptile-D-learning formulation): The claim that Reptile produces a generalizable initialization rests on the assumption that first-order averaging of task-specific gradients yields a point from which D-learning recovers a valid V̇ < 0 controller for new parameters. No argument is given showing that the averaged point remains in the feasible region of the Lyapunov loss when the underlying dynamics change; if the loss landscapes differ in basin structure, the initialization may be no better than random, undermining the rapid-adaptation result.

Authors: We acknowledge that the manuscript provides no formal argument establishing that the Reptile-averaged initialization necessarily lies in the feasible region of the Lyapunov loss under arbitrary dynamics changes. The method is presented as an empirical extension of first-order meta-learning, relying on observed shared structure across parameter variations rather than a theoretical guarantee. In revision we will expand §3 to state this assumption explicitly, note the absence of such a guarantee as a limitation, and clarify that rapid adaptation is demonstrated empirically rather than proven. revision: partial
Referee: [Experiments] Experiments section (multiple nonlinear systems): The reported 'significant improvement' in generalization and adaptation is load-bearing for the central claim, yet the manuscript provides no quantitative baselines (e.g., plain D-learning, MAML variants), no error bars or statistical tests, and no specification of the parameter ranges or failure rates on unseen configurations. Without these, it is impossible to determine whether the gains are attributable to Reptile or to other factors.

Authors: We agree that the experimental evaluation requires these additions to support the claims. The revised manuscript will include direct comparisons against plain D-learning and MAML variants, report results with error bars and statistical tests, and provide complete details on the parameter ranges explored together with observed failure rates on unseen configurations. revision: yes

Circularity Check

0 steps flagged

No derivation chain or equations present to analyze for circularity

full rationale

The provided abstract and context describe the Reptile-D-learning proposal at a conceptual level only, with no equations, loss functions, update rules, or derivation steps shown. The central claim rests on experimental results for generalization and adaptation rather than any first-principles derivation that could reduce to fitted inputs or self-citations by construction. No load-bearing steps exist in the visible text that match the enumerated circularity patterns, so the finding is no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method implicitly assumes meta-learning can extract shared structures without detailing how.

pith-pipeline@v0.9.1-grok · 5642 in / 1019 out tokens · 16405 ms · 2026-06-25T20:50:05.353398+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Stabilization Learning: A Paradigm Transition Bridging Control Theory and Machine Learning
cs.RO 2026-06 unverdicted novelty 3.0

Stabilization learning is introduced as a stability-centric framework bridging control theory and machine learning via a six-tuple mathematical model applicable to control, observation, and recognition tasks.

Reference graph

Works this paper leans on

28 extracted references · 1 linked inside Pith · cited by 1 Pith paper

[1]

Neural lyapunov control,

Y .-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 3240–3249

2019
[2]

The lyapunov neural network adaptive stability certification for safe learning of dynamical systems,

S. M. Richards, F. Berkenkamp, and A. Krause, “The lyapunov neural network adaptive stability certification for safe learning of dynamical systems,” inProc. Conf. Robot Learn. (CoRL), 2018, pp. 466–476

2018
[3]

Neural lyapunov control of unknown nonlinear systems with stability guarantees,

R. Zhou, T. Quartz, H. D. Sterck, and J. Liu, “Neural lyapunov control of unknown nonlinear systems with stability guarantees,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022, pp. 29 113– 29 125

2022
[4]

Lyapunov- stable neural-network control,

H. Dai, B. Landry, L. Yang, M. Pavone, and R. Tedrake, “Lyapunov- stable neural-network control,” inProc. Robot. Sci. Syst. (RSS), Virtual, Jul. 2021

2021
[5]

Safe model-based reinforcement learning with stability guarantees,

F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017, pp. 908–919

2017
[6]

Lyapunov-regularized reinforcement learning for power system transient stability,

W. Cui and B. Zhang, “Lyapunov-regularized reinforcement learning for power system transient stability,”IEEE Control Syst. Lett., vol. 6, pp. 974–979, 2022

2022
[7]

Lyapunov-based distributed reinforce- ment learning control with stability guarantee,

J. Yao, M. Han, and X. Yin, “Lyapunov-based distributed reinforce- ment learning control with stability guarantee,”Comput. Chem. Eng., vol. 195, p. 108979, 2025

2025
[8]

Control with patterns a D-learning method,

Q. Quan, K.-Y . Cai, and C. Wang, “Control with patterns a D-learning method,” inProc. Conf. Robot Learn. (CoRL), vol. 270, 2025, pp. 1384–1401

2025
[9]

DOPT D-learning with off-policy target toward sample efficiency and fast convergence control,

Z. Shen and Q. Quan, “DOPT D-learning with off-policy target toward sample efficiency and fast convergence control,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2025

2025
[10]

DL-Clip online D-learning with clipping operation for fast model-free stabilizing control,

J. Liu, C. Wang, Z. Shen, and Q. Quan, “DL-Clip online D-learning with clipping operation for fast model-free stabilizing control,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2025

2025
[11]

On first-order meta-learning algorithms,

A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,”arXiv preprint arXiv1803.02999, 2018

Pith/arXiv arXiv 2018
[12]

Model-agnostic meta-learning for fast adaptation of deep networks,

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1126–1135

2017
[13]

Physics-informed neural network lyapunov functions PDE characterization, learning, and verification,

J. Liu, Y . Meng, M. Fitzsimmons, and R. Zhou, “Physics-informed neural network lyapunov functions PDE characterization, learning, and verification,”Automatica, vol. 175, p. 112193, 2025

2025
[14]

Neural lyapunov redesign,

A. Mehrjou, M. Ghavamzadeh, and B. Scholkopf, “Neural lyapunov redesign,”arXiv preprint arXiv2006.03947, 2020

arXiv 2020
[15]

Lyapunov design for robust and efficient robotic reinforcement learn- ing,

T. Westenbroek, F. Castaneda, A. Agrawal, S. Sastry, and K. Sreenath, “Lyapunov design for robust and efficient robotic reinforcement learn- ing,” inProc. Conf. Robot Learn. (CoRL), 2023, pp. 17–36

2023
[16]

Meta-reinforcement learning for adaptive control of second order systems,

D. G. McClementet al., “Meta-reinforcement learning for adaptive control of second order systems,” inProc. IEEE Int. Symp. Adv. Control Ind. Process. (AdCONIP), 2022, pp. 78–83

2022
[17]

Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,

A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019

2019
[18]

Meta- learning in neural networks: A survey,

T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta- learning in neural networks: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5149–5169, 2022

2022
[19]

Transfer learning in deep reinforcement learning a survey,

Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning a survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 13 344–13 362, 2023

2023
[20]

Q-learning,

C. J. C. H. Watkins and P. Dayan, “Q-learning,”Mach. Learn., vol. 8, no. 3-4, pp. 279–292, 1992

1992
[21]

Meta-learning-based adaptive stability certificates for dynamical systems,

A. Jena, D. Kalathil, and L. Xie, “Meta-learning-based adaptive stability certificates for dynamical systems,” inProc. AAAI Conf. Artif. Intell., vol. 38, no. 11, 2024, pp. 12 801–12 809

2024
[22]

Meta-learning with implicit gradients,

A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-learning with implicit gradients,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

2019
[23]

H. K. Khalil,Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall, 2002

2002
[24]

Commonroad composable benchmarks for motion planning on roads,

M. Althoff, M. Koschi, and S. Manzinger, “Commonroad composable benchmarks for motion planning on roads,” inProc. IEEE Intell. Veh. Symp. (IV), 2017, pp. 719–726

2017
[25]

Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,

W. Giernacki, M. Skwierczy ´nski, W. Witwicki, P. Wro ´nski, and P. Kozierski, “Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,” inProc. Int. Conf. Methods Models Autom. Robot. (MMAR), 2017, pp. 37–42

2017
[26]

System identification of the Crazyflie 2.0 nano quadro- copter,

J. Förster, “System identification of the Crazyflie 2.0 nano quadro- copter,” ETH Zürich, Tech. Rep., 2015

2015
[27]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30

2017
[28]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 7294–7301

2018

[1] [1]

Neural lyapunov control,

Y .-C. Chang, N. Roohi, and S. Gao, “Neural lyapunov control,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019, pp. 3240–3249

2019

[2] [2]

The lyapunov neural network adaptive stability certification for safe learning of dynamical systems,

S. M. Richards, F. Berkenkamp, and A. Krause, “The lyapunov neural network adaptive stability certification for safe learning of dynamical systems,” inProc. Conf. Robot Learn. (CoRL), 2018, pp. 466–476

2018

[3] [3]

Neural lyapunov control of unknown nonlinear systems with stability guarantees,

R. Zhou, T. Quartz, H. D. Sterck, and J. Liu, “Neural lyapunov control of unknown nonlinear systems with stability guarantees,” in Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, 2022, pp. 29 113– 29 125

2022

[4] [4]

Lyapunov- stable neural-network control,

H. Dai, B. Landry, L. Yang, M. Pavone, and R. Tedrake, “Lyapunov- stable neural-network control,” inProc. Robot. Sci. Syst. (RSS), Virtual, Jul. 2021

2021

[5] [5]

Safe model-based reinforcement learning with stability guarantees,

F. Berkenkamp, M. Turchetta, A. P. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017, pp. 908–919

2017

[6] [6]

Lyapunov-regularized reinforcement learning for power system transient stability,

W. Cui and B. Zhang, “Lyapunov-regularized reinforcement learning for power system transient stability,”IEEE Control Syst. Lett., vol. 6, pp. 974–979, 2022

2022

[7] [7]

Lyapunov-based distributed reinforce- ment learning control with stability guarantee,

J. Yao, M. Han, and X. Yin, “Lyapunov-based distributed reinforce- ment learning control with stability guarantee,”Comput. Chem. Eng., vol. 195, p. 108979, 2025

2025

[8] [8]

Control with patterns a D-learning method,

Q. Quan, K.-Y . Cai, and C. Wang, “Control with patterns a D-learning method,” inProc. Conf. Robot Learn. (CoRL), vol. 270, 2025, pp. 1384–1401

2025

[9] [9]

DOPT D-learning with off-policy target toward sample efficiency and fast convergence control,

Z. Shen and Q. Quan, “DOPT D-learning with off-policy target toward sample efficiency and fast convergence control,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2025

2025

[10] [10]

DL-Clip online D-learning with clipping operation for fast model-free stabilizing control,

J. Liu, C. Wang, Z. Shen, and Q. Quan, “DL-Clip online D-learning with clipping operation for fast model-free stabilizing control,” in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2025

2025

[11] [11]

On first-order meta-learning algorithms,

A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,”arXiv preprint arXiv1803.02999, 2018

Pith/arXiv arXiv 2018

[12] [12]

Model-agnostic meta-learning for fast adaptation of deep networks,

C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” inProc. Int. Conf. Mach. Learn. (ICML), 2017, pp. 1126–1135

2017

[13] [13]

Physics-informed neural network lyapunov functions PDE characterization, learning, and verification,

J. Liu, Y . Meng, M. Fitzsimmons, and R. Zhou, “Physics-informed neural network lyapunov functions PDE characterization, learning, and verification,”Automatica, vol. 175, p. 112193, 2025

2025

[14] [14]

Neural lyapunov redesign,

A. Mehrjou, M. Ghavamzadeh, and B. Scholkopf, “Neural lyapunov redesign,”arXiv preprint arXiv2006.03947, 2020

arXiv 2020

[15] [15]

Lyapunov design for robust and efficient robotic reinforcement learn- ing,

T. Westenbroek, F. Castaneda, A. Agrawal, S. Sastry, and K. Sreenath, “Lyapunov design for robust and efficient robotic reinforcement learn- ing,” inProc. Conf. Robot Learn. (CoRL), 2023, pp. 17–36

2023

[16] [16]

Meta-reinforcement learning for adaptive control of second order systems,

D. G. McClementet al., “Meta-reinforcement learning for adaptive control of second order systems,” inProc. IEEE Int. Symp. Adv. Control Ind. Process. (AdCONIP), 2022, pp. 78–83

2022

[17] [17]

Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,

A. Nagabandi, I. Clavera, S. Liu, R. S. Fearing, P. Abbeel, S. Levine, and C. Finn, “Learning to adapt in dynamic, real-world environments through meta-reinforcement learning,” inProc. Int. Conf. Learn. Represent. (ICLR), 2019

2019

[18] [18]

Meta- learning in neural networks: A survey,

T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta- learning in neural networks: A survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 9, pp. 5149–5169, 2022

2022

[19] [19]

Transfer learning in deep reinforcement learning a survey,

Z. Zhu, K. Lin, A. K. Jain, and J. Zhou, “Transfer learning in deep reinforcement learning a survey,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 11, pp. 13 344–13 362, 2023

2023

[20] [20]

Q-learning,

C. J. C. H. Watkins and P. Dayan, “Q-learning,”Mach. Learn., vol. 8, no. 3-4, pp. 279–292, 1992

1992

[21] [21]

Meta-learning-based adaptive stability certificates for dynamical systems,

A. Jena, D. Kalathil, and L. Xie, “Meta-learning-based adaptive stability certificates for dynamical systems,” inProc. AAAI Conf. Artif. Intell., vol. 38, no. 11, 2024, pp. 12 801–12 809

2024

[22] [22]

Meta-learning with implicit gradients,

A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-learning with implicit gradients,” inAdv. Neural Inf. Process. Syst. (NeurIPS), vol. 32, 2019

2019

[23] [23]

H. K. Khalil,Nonlinear Systems, 3rd ed. Upper Saddle River, NJ, USA: Prentice Hall, 2002

2002

[24] [24]

Commonroad composable benchmarks for motion planning on roads,

M. Althoff, M. Koschi, and S. Manzinger, “Commonroad composable benchmarks for motion planning on roads,” inProc. IEEE Intell. Veh. Symp. (IV), 2017, pp. 719–726

2017

[25] [25]

Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,

W. Giernacki, M. Skwierczy ´nski, W. Witwicki, P. Wro ´nski, and P. Kozierski, “Crazyflie 2.0 quadrotor as a platform for research and education in robotics and control engineering,” inProc. Int. Conf. Methods Models Autom. Robot. (MMAR), 2017, pp. 37–42

2017

[26] [26]

System identification of the Crazyflie 2.0 nano quadro- copter,

J. Förster, “System identification of the Crazyflie 2.0 nano quadro- copter,” ETH Zürich, Tech. Rep., 2015

2015

[27] [27]

Domain randomization for transferring deep neural networks from simulation to the real world,

J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” inProc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 2017, pp. 23–30

2017

[28] [28]

Sim-to- real transfer of robotic control with dynamics randomization,

X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to- real transfer of robotic control with dynamics randomization,” inProc. IEEE Int. Conf. Robot. Autom. (ICRA), 2018, pp. 7294–7301

2018