Improving Action Smoothness for a Cascaded Online Learning Flight Control System

Erik-jan van Kampen; Yifei Li

arxiv: 2507.04346 · v7 · submitted 2025-07-06 · 📡 eess.SY · cs.AI· cs.SY

Improving Action Smoothness for a Cascaded Online Learning Flight Control System

Yifei Li , Erik-jan van Kampen This is my paper

Pith reviewed 2026-05-19 06:33 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.SY

keywords action smoothnessonline learningflight controlcascaded systemlow-pass filtertemporal smoothnessFast Fourier Transformoscillatory control

0 comments

The pith

An online temporal smoothness technique and a low-pass filter reduce the amplitude and frequency of control actions in cascaded online learning flight control systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that oscillatory control actions in cascaded online learning flight controllers can be tamed by two specific additions. The cascaded structure is common in flight control but its stability suffers when online learning produces rapid, high-amplitude changes in the commands. The authors add an online temporal smoothness technique that penalizes abrupt changes over time and a low-pass filter that attenuates high-frequency content. Frequency-domain analysis with the Fast Fourier Transform then quantifies the drop in both amplitude and oscillation rate, and simulations confirm the result. A reader would care because smoother commands make these learning-based controllers far more suitable for real aircraft hardware where vibration and actuator wear matter.

Core claim

The central claim is that the introduction of an online temporal smoothness technique and a low-pass filter reduces the amplitude and frequency of the control actions in the cascaded online learning flight control system. Fast Fourier Transform is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.

What carries the argument

The online temporal smoothness technique, which penalizes large changes between successive actions, paired with a low-pass filter that removes high-frequency components from the control signal.

If this is right

The frequency content of control signals drops, lowering actuator wear.
Amplitude of sudden command changes decreases, improving closed-loop stability margins.
The cascaded structure remains usable for online learning while satisfying practical smoothness requirements.
FFT-based frequency analysis becomes a standard check for controller quality in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same smoothness additions could be applied to other cascaded adaptive controllers outside aviation.
Hardware-in-the-loop tests would be needed to confirm that reduced oscillations translate to lower vibration on actual airframes.
The approach may trade a small amount of responsiveness for smoothness, an effect worth quantifying in future experiments.

Load-bearing premise

The oscillatory behavior seen in the baseline cascaded system is caused primarily by the online learning component and can be reduced by the proposed techniques without creating new stability problems or losing tracking performance.

What would settle it

A side-by-side simulation or hardware flight test that records actuator commands, computes their amplitude and frequency spectra via FFT, and checks whether the smoothed version shows a measurable drop in both metrics relative to the baseline.

Figures

Figures reproduced from arXiv: 2507.04346 by Erik-jan van Kampen, Yifei Li.

**Figure 3.** Figure 3: shows the network structures. The critic network takes as input the pitch rate qt and the tracking error eq(t) . The output is the estimated value function V2(t), with an absolute value activation applied at the output layer to ensure the positive definiteness of V2(t). The actor network utilizes an additional input αt to account for dynamic coupling in model 1. The output is the control surface deflection… view at source ↗

**Figure 4.** Figure 4: Comparison between control systems without and with temporal [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of network parameters for flight control systems [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 7.** Figure 7: Comparison of control surface deflection in the time and frequency [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of actors’ output increments over time. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 10.** Figure 10: shows the dynamical saturation level of the activation function in output layer of actors. The vanilla control system eventually makes tanh(·) saturated in the range of input intervals [-4,-2], [2,4] (the higher-level agent), and input intervals [-3,-2], [2,3] (the lower-level agent). In these intervals, the derivative of tanh(·) is less than 0.1, providing slight gradient for actor training. As a compari… view at source ↗

**Figure 11.** Figure 11: Comparison of sensitivity measures before and after using a TS [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗

**Figure 13.** Figure 13: Detailed comparison of pitch rate tracking in the time and [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗

**Figure 17.** Figure 17: Comparison of sensitivity measures between TS method and [PITH_FULL_IMAGE:figures/full_fig_p009_17.png] view at source ↗

**Figure 15.** Figure 15: Comparison of one-step cost of two control systems using the TS [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗

read the original abstract

This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that introducing an online temporal smoothness technique together with a low-pass filter into a cascaded online-learning flight controller reduces the amplitude and frequency of control actions. FFT analysis is used to quantify the frequency-domain improvement, and simulation results are presented to demonstrate the overall benefit for practical engineering use.

Significance. If the central claim holds, the work addresses a recognized practical obstacle to deploying online learning in cascaded flight-control architectures. The combination of a temporal smoothness operator and low-pass filtering is a direct, implementable addition that could improve actuator longevity and reduce wear without requiring a complete redesign of the inner-loop controller.

major comments (3)

[Simulation results] Simulation results (and abstract): the reported FFT reductions in amplitude and frequency are presented without accompanying time-domain metrics (RMSE, rise time, overshoot, or steady-state error) comparing the baseline cascaded system to the modified system. Without these quantities it is impossible to verify that tracking performance and disturbance rejection are preserved.
[Method] Method section describing the low-pass filter and temporal smoothness operator: no Bode or Nyquist analysis, gain/phase margins, or closed-loop pole locations are provided to confirm that the added phase lag does not degrade inner-loop responsiveness or stability margins in the cascaded architecture.
[FFT analysis] FFT analysis: the paper does not state whether the smoothness parameters were tuned after observing the baseline oscillations or fixed a priori; post-hoc tuning would weaken the claim that the improvement is a general property of the proposed techniques.

minor comments (2)

[Method] Notation for the temporal smoothness operator should be defined explicitly with a discrete-time equation or pseudocode so that the reader can reproduce the exact implementation.
[Figures] Figure captions for the FFT plots should include the exact frequency range, windowing method, and number of simulation runs averaged.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the changes planned for the revised manuscript.

read point-by-point responses

Referee: [Simulation results] Simulation results (and abstract): the reported FFT reductions in amplitude and frequency are presented without accompanying time-domain metrics (RMSE, rise time, overshoot, or steady-state error) comparing the baseline cascaded system to the modified system. Without these quantities it is impossible to verify that tracking performance and disturbance rejection are preserved.

Authors: We agree that time-domain metrics are necessary to demonstrate that the proposed modifications preserve tracking performance. In the revised manuscript we will add RMSE, rise time, overshoot, and steady-state error comparisons between the baseline and modified controllers across the presented simulation scenarios. revision: yes
Referee: [Method] Method section describing the low-pass filter and temporal smoothness operator: no Bode or Nyquist analysis, gain/phase margins, or closed-loop pole locations are provided to confirm that the added phase lag does not degrade inner-loop responsiveness or stability margins in the cascaded architecture.

Authors: We acknowledge the absence of explicit stability-margin analysis. Although the simulations exhibit stable closed-loop behavior, we will include Bode plots together with gain and phase margin calculations for the inner-loop controller after insertion of the low-pass filter and temporal smoothness operator. revision: yes
Referee: [FFT analysis] FFT analysis: the paper does not state whether the smoothness parameters were tuned after observing the baseline oscillations or fixed a priori; post-hoc tuning would weaken the claim that the improvement is a general property of the proposed techniques.

Authors: The smoothness parameters were chosen a priori on the basis of typical actuator bandwidths and expected flight-control frequency content, before any baseline oscillation data were examined. We will add an explicit statement of this selection procedure in the revised FFT analysis section. revision: yes

Circularity Check

0 steps flagged

No circularity: additive techniques validated empirically without self-referential reduction

full rationale

The paper introduces an online temporal smoothness technique and low-pass filter as direct modifications to a cascaded online learning flight control system, then evaluates them via FFT frequency analysis and simulation results showing reduced control action amplitude and frequency. No derivation chain, equations, or claims reduce these improvements to fitted parameters, self-definitions, or self-citation chains by construction; the central claim rests on independent methodological additions and external simulation benchmarks rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities; the work appears to rest on standard assumptions from control theory and online learning without new postulates.

pith-pipeline@v0.9.0 · 5615 in / 1025 out tokens · 32488 ms · 2026-05-19T06:33:54.724813+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

[1]

J. D. Anderson Fundamentals of Aerodynamics. McGraw-Hill Series in Aeronautical and Aerospace Engineering. New York, USA: McGraw-Hill, 2017

work page 2017
[2]

Sieberling, Q

S. Sieberling, Q. P. Chu, and J. A. Mulder Robust Flight Control Using Incremental Nonlinear Dynamic Inversion and Angular Ac- celeration Prediction Journal of Guidance, Control and Dynamics , vol. 33, no. 6, pp. 1732–1742, 2010, doi: 10.2514/1.49978

work page doi:10.2514/1.49978 2010
[3]

X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Stability Analysis for Incremental Nonlinear Dynamic Inversion Control Journal of Guidance, Control and Dynamics , vol. 42, no. 5, pp. 1116–1129, 2019, doi: 10.2514/1.G003791

work page doi:10.2514/1.g003791 2019
[4]

Z. C. Liu, Y . F. Zhang, and J. J. Liang and H. X. Chen Application of the Improved Incremental Nonlinear Dynamic Inversion in Fixed- Wing UA V Flight Tests Journal of Aerospace Engineering , vol. 35, no. 6, pp. 1–13, 2022, doi: 10.1061/(ASCE)AS.1943-5525.0001495

work page doi:10.1061/(asce)as.1943-5525.0001495 2022
[5]

Y . C. Wang, W. S. Chen, S. X. Zhang, J. W. Zhu and L. J. Cao Command-Filtered Incremental Backstepping Controller for Small Unmanned Aerial Vehicles Journal of Guidance, Control, and Dy- namics, vol. 41, no. 4, pp. 952–965, 2018, doi: 10.2514/1.G003001

work page doi:10.2514/1.g003001 2018
[6]

X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Incremental Sliding-Mode Fault-Tolerant Flight Control Journal of Guidance, Control, and Dynamics , vol. 42, no. 2, pp. 244–259, 2019, doi: 10.2514/1.G003497

work page doi:10.2514/1.g003497 2019
[7]

W. H. Chen, J. Yang, L. Guo and S. H. Li Disturbance-Observer- Based Control and Related Methods—An Overview IEEE Transac- tions on Industrial Electronics , vol. 63, no. 2, pp. 1083–1095, 2016, doi: 10.1109/TIE.2015.2478397

work page doi:10.1109/tie.2015.2478397 2016
[8]

D. M. Acosta and S. S. Joshi Adaptive Nonlinear Dynamic Inversion Control of an Autonomous Airship for the Exploration of Titan AIAA Guidance, Navigation and Control Conference and Exhibit , Hilton Head, South Carolina, USA, August, 2007, doi: 10.2514/6.2007- 6502

work page doi:10.2514/6.2007- 2007
[9]

E. J. J. Smeur, Q. P. Chu and G. H. E. de Croon Adaptive Incremental Nonlinear Dynamic Inversion for Attitude Control of Micro Air Vehicles Journal of Guidance, Control, and Dynamics , vol. 39, no. 3, pp. 450–461, 2016, doi: 10.2514/1.G001490

work page doi:10.2514/1.g001490 2016
[10]

B. Smit, T. S. C. Pollack and E. van Kampen Adaptive Incremental Nonlinear Dynamic Inversion Flight Control for Consistent Handling Qualities AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-1394

work page doi:10.2514/6.2022-1394 2022
[11]

Harris, C

J. Harris, C. M. Elliott and G. S. Tallant L1 Adaptive Nonlinear Dynamic Inversion Control for the Innovative Control Effectors Aircraft AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-0791

work page doi:10.2514/6.2022-0791 2022
[12]

Sonneveldt, Q

L. Sonneveldt, Q. P. Chu and J. A. Mulder Constrained Adaptive Backstepping Flight Control: Application to a Nonlinear F-16/MATV Model AIAA Guidance, Navigation and Control Conference and Ex- hibit, Keystone, Colorado, USA, August, 2006, doi: 10.2514/6.2006- 6413

work page doi:10.2514/6.2006- 2006
[13]

Sonneveldt, Q

L. Sonneveldt, Q. P. Chu and J. A. Mulder Nonlinear Flight Control Design Using Constrained Adaptive Backstepping Journal of Guidance, Control, and Dynamics , vol. 30, no. 2, pp. 322–336, 2007, doi: 10.2514/1.25834

work page doi:10.2514/1.25834 2007
[14]

Q. Hu, Y . Meng, C. L. Wang and Y . M. Zhang Adaptive Back- stepping Control for Air-breathing Hypersonic Vehicles with Input Nonlinearities Aerospace Science and Technology, vol. 73, pp. 289– 299, 2018, doi: 10.1016/j.ast.2017.12.001

work page doi:10.1016/j.ast.2017.12.001 2018
[15]

R. S. Sutton and A. G. Barto Reinforcement Learning: An Intro- duction. 6th edition, Sigma Series in Pure Mathematics. Cambridge, MA, USA: A Bradford Book, 2018

work page 2018
[16]

Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Online Dual Heuristic Programming for Nonlinear Adaptive Con- trol Control Engineering Practice , vol. 73, pp. 13–25, 2018, doi: 10.1016/j.conengprac.2017.12.011

work page doi:10.1016/j.conengprac.2017.12.011 2018
[17]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Flight Control Design In Proceedings of the EuroGNC 2015 , Toulouse, France, 2015

work page 2015
[18]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Adaptive Tracking Control with Partial Observability Journal of Guidance, Control, and Dynam- ics, vol. 41, no. 12, pp. 2554–2567, 2018, doi: 10.2514/1.G003472

work page doi:10.2514/1.g003472 2018
[19]

Y . Zhou, E. van Kampen and Q.P. Chu An Incremental Approximate Dynamic Programming Flight Controller Based on Output Feedback AIAA Guidance, Navigation, and Control Conference , San Diego, California, USA, January, 2016, doi: 10.2514/6.2016-0360

work page doi:10.2514/6.2016-0360 2016
[20]

Sun and E

B. Sun and E. van Kampen Incremental Model-Based Heuristic Dynamic Programming with Output Feedback Applied to Aerospace System Identification and Control 2020 IEEE Conference on Control Technology and Applications, Montreal, QC, Canada, 2020, pp. 366- 371, August, 2020, doi: 10.1109/CCTA41146.2020.9206261

work page doi:10.1109/ccta41146.2020.9206261 2020
[21]

Sun and E

B. Sun and E. van Kampen Intelligent Adaptive Optimal Control Us- ing Incremental Model-Based Global Dual Heuristic Programming Subject to Partial Observability Applied Soft Computing , vol. 103, pp. 1–15, 2021, doi: 10.1016/j.asoc.2021.107153

work page doi:10.1016/j.asoc.2021.107153 2021
[22]

Sun and E

B. Sun and E. van Kampen Reinforcement-Learning-Based Adaptive Optimal Flight Control with Output Feedback and Input Constraints Journal of Guidance, Control, and Dynamics , vol. 44, no. 9, pp. 1685–1691, 2021, doi: 10.2514/1.G005715

work page doi:10.2514/1.g005715 2021
[23]

Sun and E

B. Sun and E. van Kampen Event-Triggered Constrained Control Using Explainable Global Dual Heuristic Programming for Nonlin- ear Discrete-Time Systems Neurocomputing, vol. 468, pp. 452–463, 2022, doi: 10.1016/j.neucom.2021.10.046

work page doi:10.1016/j.neucom.2021.10.046 2022
[24]

Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Heuristic Dynamic Programming for Nonlinear Adaptive Flight Control In Proceedings of the International Micro Air Vehicles Conference and Competition , Beijing, China, October, 2016, url: https://www.imavs.org/papers/2016/25.pdf

work page 2016
[25]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Model based On- line Heuristic Dynamic Programming for Nonlinear Adaptive Track- ing Control with Partial Observability Aerospace Science and Tech- nology, vol. 105, pp. 1–14, 2020, doi: 10.1016/j.ast.2020.106013

work page doi:10.1016/j.ast.2020.106013 2020
[26]

Heyer, D

S. Heyer, D. Kroezen and E. van Kampen Online Adaptive Incre- mental Reinforcement Learning Flight Control for a CS-25 Class Aircraft AIAA Scitech 2020 Forum , Orlando, FL, USA, October, 2020, doi: 10.2514/6.2020-1844

work page doi:10.2514/6.2020-1844 2020
[27]

Sun, C.C

L.G. Sun, C.C. de Visser, Q.P. Chu and W. Falkena Hybrid Sensor- Based Backstepping Control Approach with Its Application to Fault- Tolerant Flight Control Journal of Guidance, Control, and Dynamics, vol. 31, no. 1, pp. 59–71, 2014, doi: 10.2514/1.61890

work page doi:10.2514/1.61890 2014
[28]

Y . Zhou, E. van Kampen and Q.P. Chu Launch Vehicle Adaptive Flight Control with Incremental Model Based Heuristic Dynamic Programming International Astronautical Congress, Adelaide, Aus- tralia, Septermber, 2017

work page 2017
[29]

2021 , url =

S. Mysore, B. Mabsout, R. Mancuso and K. Saenko Regularizing Action Policies for Smooth Control with Reinforcement Learning IEEE International Conference on Robotics and Automation , Xi’an, China, October, 2021, doi: 10.1109/ICRA48506.2021.9561138

work page doi:10.1109/icra48506.2021.9561138 2021
[30]

Kalliny, A.A

A.N. Kalliny, A.A. El-Badawy and S.M. Elkhamisy Command- Filtered Integral Backstepping Control of Longitudinal Flapping- Wing Flight Journal of Guidance, Control, and Dynamics , vol. 41, no. 7, pp. 1556–1568, 2018, doi: 10.2514/1.G003267

work page doi:10.2514/1.g003267 2018
[31]

Farrell, M

J.A. Farrell, M. Polycarpou, M. Sharma and W. Dong Command Filtered Backstepping IEEE Transactions on Automatic Control, vol. 54, no. 6, pp. 1391-1395, 2009, doi: 10.1109/TAC.2009.2015562

work page doi:10.1109/tac.2009.2015562 2009
[32]

R.A. Hull, D. Schumacher and Z.H. Qu Design and Evaluation of Robust Nonlinear Missile Autopilots from a Performance Perspective In Proceedings of 1995 American Control Conference , Seattle, W A, USA, June, 1995, doi: 10.1109/ACC.1995.529235

work page doi:10.1109/acc.1995.529235 1995
[33]

P. J. Werbos Advanced Forecasting Methods for Global Crisis Warning and Models of Intelligence General Systems, vol. 22, pp. 25-38, 1977, url: https://gwern.net/doc/reinforcement-learning/1977- werbos.pdf

work page 1977
[34]

Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200

K. Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200. APPENDIX A. Derivation of incremental model Taking the Taylor expansion of systems 6: αt+1 =αt + F 1 t−1(αt − αt−1) + G1 t−1(qt − qt−1) + O (αt − αt−1)2, (qt − qt−1)2 qt+1 =qt + F 2 t−1(qt − qt−1) + G2 t−1(δt − δt−1) + O (qt − qt−1)2, (δt − δt−1)2 (3...

work page doi:10.48550/arxiv.2502.10200 2025
[35]

Critic: The gradient of LC1 t with respect to ψ1 is ∂LC1 t ∂ψ1(t) = ∂LC1 t ∂δ1(t) ∂δ1(t) ∂ψ1(t) = −δ1(t) ∂ ˆV1(αt, eα(t)) ∂ψ1(t) (39) The parameter set ψ1 is updated as ψ1(t+1) = ψ1(t) − ηC1 ∂LC1 t ∂ψ1(t) (40) where ηC1 is the learning rate

work page
[36]

Target Critic: The target critic network is used to stabilize learning by delaying the updates: ψ′ 1(t+1) = τ ψ1(t+1) + (1 − τ)ψ′ 1(t) (41) where τ is delay factor τ, ψ′ 1 is the parameter set of target critic

work page
[37]

∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) =

Actor: The gradient of LA1 t with respect to ϑ1 is ∂LA1 t ∂ϑ1(t) = ∂ h c1(ˆeα(t+1), qt) + γ ˆV1(ˆαt+1, ˆeα(t+1)) i ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ˆG1 t−1 ∂qref(t) ∂ϑ1(t) (42) The parameter set ϑ1 is updated as ϑ1(t+1) ...

work page
[38]

Critic: The gradient of LC2 t with respect to ψ2 is ∂LC2 t ∂ψ2(t) = ∂LC2 t ∂δ2(t) ∂δ2(t) ∂ψ2(t) = − δ2(t) ∂ ˆV (qt, eq(t)) ∂ψ2(t) (44) The parameter set ψ2 is updated as ψ2(t + 1) = ψ2(t) − ηC2 ∂LC2 t ∂ψ2(t) (45) where ηC2 is the learning rate

work page
[39]

ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

Target Critic: Target critic is used to stabilize the learning by slowing down the network update. ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

work page
[40]

∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) =

Actor: The gradient of LA2 t with respect to ϑ2 is ∂LA2 t ∂ϑ2(t) = ∂ h c2(ˆeq(t+1), δt) + γ ˆV2target(ˆqt+1, ˆeq(t+1)) i ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ˆG2 t−1 ∂δe(t) ∂ϑ2(t) (47) The parameter set ϑ2 is updated as ϑ...

work page

[1] [1]

J. D. Anderson Fundamentals of Aerodynamics. McGraw-Hill Series in Aeronautical and Aerospace Engineering. New York, USA: McGraw-Hill, 2017

work page 2017

[2] [2]

Sieberling, Q

S. Sieberling, Q. P. Chu, and J. A. Mulder Robust Flight Control Using Incremental Nonlinear Dynamic Inversion and Angular Ac- celeration Prediction Journal of Guidance, Control and Dynamics , vol. 33, no. 6, pp. 1732–1742, 2010, doi: 10.2514/1.49978

work page doi:10.2514/1.49978 2010

[3] [3]

X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Stability Analysis for Incremental Nonlinear Dynamic Inversion Control Journal of Guidance, Control and Dynamics , vol. 42, no. 5, pp. 1116–1129, 2019, doi: 10.2514/1.G003791

work page doi:10.2514/1.g003791 2019

[4] [4]

Z. C. Liu, Y . F. Zhang, and J. J. Liang and H. X. Chen Application of the Improved Incremental Nonlinear Dynamic Inversion in Fixed- Wing UA V Flight Tests Journal of Aerospace Engineering , vol. 35, no. 6, pp. 1–13, 2022, doi: 10.1061/(ASCE)AS.1943-5525.0001495

work page doi:10.1061/(asce)as.1943-5525.0001495 2022

[5] [5]

Y . C. Wang, W. S. Chen, S. X. Zhang, J. W. Zhu and L. J. Cao Command-Filtered Incremental Backstepping Controller for Small Unmanned Aerial Vehicles Journal of Guidance, Control, and Dy- namics, vol. 41, no. 4, pp. 952–965, 2018, doi: 10.2514/1.G003001

work page doi:10.2514/1.g003001 2018

[6] [6]

X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Incremental Sliding-Mode Fault-Tolerant Flight Control Journal of Guidance, Control, and Dynamics , vol. 42, no. 2, pp. 244–259, 2019, doi: 10.2514/1.G003497

work page doi:10.2514/1.g003497 2019

[7] [7]

W. H. Chen, J. Yang, L. Guo and S. H. Li Disturbance-Observer- Based Control and Related Methods—An Overview IEEE Transac- tions on Industrial Electronics , vol. 63, no. 2, pp. 1083–1095, 2016, doi: 10.1109/TIE.2015.2478397

work page doi:10.1109/tie.2015.2478397 2016

[8] [8]

D. M. Acosta and S. S. Joshi Adaptive Nonlinear Dynamic Inversion Control of an Autonomous Airship for the Exploration of Titan AIAA Guidance, Navigation and Control Conference and Exhibit , Hilton Head, South Carolina, USA, August, 2007, doi: 10.2514/6.2007- 6502

work page doi:10.2514/6.2007- 2007

[9] [9]

E. J. J. Smeur, Q. P. Chu and G. H. E. de Croon Adaptive Incremental Nonlinear Dynamic Inversion for Attitude Control of Micro Air Vehicles Journal of Guidance, Control, and Dynamics , vol. 39, no. 3, pp. 450–461, 2016, doi: 10.2514/1.G001490

work page doi:10.2514/1.g001490 2016

[10] [10]

B. Smit, T. S. C. Pollack and E. van Kampen Adaptive Incremental Nonlinear Dynamic Inversion Flight Control for Consistent Handling Qualities AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-1394

work page doi:10.2514/6.2022-1394 2022

[11] [11]

Harris, C

J. Harris, C. M. Elliott and G. S. Tallant L1 Adaptive Nonlinear Dynamic Inversion Control for the Innovative Control Effectors Aircraft AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-0791

work page doi:10.2514/6.2022-0791 2022

[12] [12]

Sonneveldt, Q

L. Sonneveldt, Q. P. Chu and J. A. Mulder Constrained Adaptive Backstepping Flight Control: Application to a Nonlinear F-16/MATV Model AIAA Guidance, Navigation and Control Conference and Ex- hibit, Keystone, Colorado, USA, August, 2006, doi: 10.2514/6.2006- 6413

work page doi:10.2514/6.2006- 2006

[13] [13]

Sonneveldt, Q

L. Sonneveldt, Q. P. Chu and J. A. Mulder Nonlinear Flight Control Design Using Constrained Adaptive Backstepping Journal of Guidance, Control, and Dynamics , vol. 30, no. 2, pp. 322–336, 2007, doi: 10.2514/1.25834

work page doi:10.2514/1.25834 2007

[14] [14]

Q. Hu, Y . Meng, C. L. Wang and Y . M. Zhang Adaptive Back- stepping Control for Air-breathing Hypersonic Vehicles with Input Nonlinearities Aerospace Science and Technology, vol. 73, pp. 289– 299, 2018, doi: 10.1016/j.ast.2017.12.001

work page doi:10.1016/j.ast.2017.12.001 2018

[15] [15]

R. S. Sutton and A. G. Barto Reinforcement Learning: An Intro- duction. 6th edition, Sigma Series in Pure Mathematics. Cambridge, MA, USA: A Bradford Book, 2018

work page 2018

[16] [16]

Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Online Dual Heuristic Programming for Nonlinear Adaptive Con- trol Control Engineering Practice , vol. 73, pp. 13–25, 2018, doi: 10.1016/j.conengprac.2017.12.011

work page doi:10.1016/j.conengprac.2017.12.011 2018

[17] [17]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Flight Control Design In Proceedings of the EuroGNC 2015 , Toulouse, France, 2015

work page 2015

[18] [18]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Adaptive Tracking Control with Partial Observability Journal of Guidance, Control, and Dynam- ics, vol. 41, no. 12, pp. 2554–2567, 2018, doi: 10.2514/1.G003472

work page doi:10.2514/1.g003472 2018

[19] [19]

Y . Zhou, E. van Kampen and Q.P. Chu An Incremental Approximate Dynamic Programming Flight Controller Based on Output Feedback AIAA Guidance, Navigation, and Control Conference , San Diego, California, USA, January, 2016, doi: 10.2514/6.2016-0360

work page doi:10.2514/6.2016-0360 2016

[20] [20]

Sun and E

B. Sun and E. van Kampen Incremental Model-Based Heuristic Dynamic Programming with Output Feedback Applied to Aerospace System Identification and Control 2020 IEEE Conference on Control Technology and Applications, Montreal, QC, Canada, 2020, pp. 366- 371, August, 2020, doi: 10.1109/CCTA41146.2020.9206261

work page doi:10.1109/ccta41146.2020.9206261 2020

[21] [21]

Sun and E

B. Sun and E. van Kampen Intelligent Adaptive Optimal Control Us- ing Incremental Model-Based Global Dual Heuristic Programming Subject to Partial Observability Applied Soft Computing , vol. 103, pp. 1–15, 2021, doi: 10.1016/j.asoc.2021.107153

work page doi:10.1016/j.asoc.2021.107153 2021

[22] [22]

Sun and E

B. Sun and E. van Kampen Reinforcement-Learning-Based Adaptive Optimal Flight Control with Output Feedback and Input Constraints Journal of Guidance, Control, and Dynamics , vol. 44, no. 9, pp. 1685–1691, 2021, doi: 10.2514/1.G005715

work page doi:10.2514/1.g005715 2021

[23] [23]

Sun and E

B. Sun and E. van Kampen Event-Triggered Constrained Control Using Explainable Global Dual Heuristic Programming for Nonlin- ear Discrete-Time Systems Neurocomputing, vol. 468, pp. 452–463, 2022, doi: 10.1016/j.neucom.2021.10.046

work page doi:10.1016/j.neucom.2021.10.046 2022

[24] [24]

Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Heuristic Dynamic Programming for Nonlinear Adaptive Flight Control In Proceedings of the International Micro Air Vehicles Conference and Competition , Beijing, China, October, 2016, url: https://www.imavs.org/papers/2016/25.pdf

work page 2016

[25] [25]

Y . Zhou, E. van Kampen and Q. P. Chu Incremental Model based On- line Heuristic Dynamic Programming for Nonlinear Adaptive Track- ing Control with Partial Observability Aerospace Science and Tech- nology, vol. 105, pp. 1–14, 2020, doi: 10.1016/j.ast.2020.106013

work page doi:10.1016/j.ast.2020.106013 2020

[26] [26]

Heyer, D

S. Heyer, D. Kroezen and E. van Kampen Online Adaptive Incre- mental Reinforcement Learning Flight Control for a CS-25 Class Aircraft AIAA Scitech 2020 Forum , Orlando, FL, USA, October, 2020, doi: 10.2514/6.2020-1844

work page doi:10.2514/6.2020-1844 2020

[27] [27]

Sun, C.C

L.G. Sun, C.C. de Visser, Q.P. Chu and W. Falkena Hybrid Sensor- Based Backstepping Control Approach with Its Application to Fault- Tolerant Flight Control Journal of Guidance, Control, and Dynamics, vol. 31, no. 1, pp. 59–71, 2014, doi: 10.2514/1.61890

work page doi:10.2514/1.61890 2014

[28] [28]

Y . Zhou, E. van Kampen and Q.P. Chu Launch Vehicle Adaptive Flight Control with Incremental Model Based Heuristic Dynamic Programming International Astronautical Congress, Adelaide, Aus- tralia, Septermber, 2017

work page 2017

[29] [29]

2021 , url =

S. Mysore, B. Mabsout, R. Mancuso and K. Saenko Regularizing Action Policies for Smooth Control with Reinforcement Learning IEEE International Conference on Robotics and Automation , Xi’an, China, October, 2021, doi: 10.1109/ICRA48506.2021.9561138

work page doi:10.1109/icra48506.2021.9561138 2021

[30] [30]

Kalliny, A.A

A.N. Kalliny, A.A. El-Badawy and S.M. Elkhamisy Command- Filtered Integral Backstepping Control of Longitudinal Flapping- Wing Flight Journal of Guidance, Control, and Dynamics , vol. 41, no. 7, pp. 1556–1568, 2018, doi: 10.2514/1.G003267

work page doi:10.2514/1.g003267 2018

[31] [31]

Farrell, M

J.A. Farrell, M. Polycarpou, M. Sharma and W. Dong Command Filtered Backstepping IEEE Transactions on Automatic Control, vol. 54, no. 6, pp. 1391-1395, 2009, doi: 10.1109/TAC.2009.2015562

work page doi:10.1109/tac.2009.2015562 2009

[32] [32]

R.A. Hull, D. Schumacher and Z.H. Qu Design and Evaluation of Robust Nonlinear Missile Autopilots from a Performance Perspective In Proceedings of 1995 American Control Conference , Seattle, W A, USA, June, 1995, doi: 10.1109/ACC.1995.529235

work page doi:10.1109/acc.1995.529235 1995

[33] [33]

P. J. Werbos Advanced Forecasting Methods for Global Crisis Warning and Models of Intelligence General Systems, vol. 22, pp. 25-38, 1977, url: https://gwern.net/doc/reinforcement-learning/1977- werbos.pdf

work page 1977

[34] [34]

Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200

K. Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200. APPENDIX A. Derivation of incremental model Taking the Taylor expansion of systems 6: αt+1 =αt + F 1 t−1(αt − αt−1) + G1 t−1(qt − qt−1) + O (αt − αt−1)2, (qt − qt−1)2 qt+1 =qt + F 2 t−1(qt − qt−1) + G2 t−1(δt − δt−1) + O (qt − qt−1)2, (δt − δt−1)2 (3...

work page doi:10.48550/arxiv.2502.10200 2025

[35] [35]

Critic: The gradient of LC1 t with respect to ψ1 is ∂LC1 t ∂ψ1(t) = ∂LC1 t ∂δ1(t) ∂δ1(t) ∂ψ1(t) = −δ1(t) ∂ ˆV1(αt, eα(t)) ∂ψ1(t) (39) The parameter set ψ1 is updated as ψ1(t+1) = ψ1(t) − ηC1 ∂LC1 t ∂ψ1(t) (40) where ηC1 is the learning rate

work page

[36] [36]

Target Critic: The target critic network is used to stabilize learning by delaying the updates: ψ′ 1(t+1) = τ ψ1(t+1) + (1 − τ)ψ′ 1(t) (41) where τ is delay factor τ, ψ′ 1 is the parameter set of target critic

work page

[37] [37]

∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) =

Actor: The gradient of LA1 t with respect to ϑ1 is ∂LA1 t ∂ϑ1(t) = ∂ h c1(ˆeα(t+1), qt) + γ ˆV1(ˆαt+1, ˆeα(t+1)) i ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ˆG1 t−1 ∂qref(t) ∂ϑ1(t) (42) The parameter set ϑ1 is updated as ϑ1(t+1) ...

work page

[38] [38]

Critic: The gradient of LC2 t with respect to ψ2 is ∂LC2 t ∂ψ2(t) = ∂LC2 t ∂δ2(t) ∂δ2(t) ∂ψ2(t) = − δ2(t) ∂ ˆV (qt, eq(t)) ∂ψ2(t) (44) The parameter set ψ2 is updated as ψ2(t + 1) = ψ2(t) − ηC2 ∂LC2 t ∂ψ2(t) (45) where ηC2 is the learning rate

work page

[39] [39]

ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

Target Critic: Target critic is used to stabilize the learning by slowing down the network update. ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

work page

[40] [40]

∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) =

Actor: The gradient of LA2 t with respect to ϑ2 is ∂LA2 t ∂ϑ2(t) = ∂ h c2(ˆeq(t+1), δt) + γ ˆV2target(ˆqt+1, ˆeq(t+1)) i ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ˆG2 t−1 ∂δe(t) ∂ϑ2(t) (47) The parameter set ϑ2 is updated as ϑ...

work page