pith. sign in

arxiv: 2507.04346 · v7 · submitted 2025-07-06 · 📡 eess.SY · cs.AI· cs.SY

Improving Action Smoothness for a Cascaded Online Learning Flight Control System

Pith reviewed 2026-05-19 06:33 UTC · model grok-4.3

classification 📡 eess.SY cs.AIcs.SY
keywords action smoothnessonline learningflight controlcascaded systemlow-pass filtertemporal smoothnessFast Fourier Transformoscillatory control
0
0 comments X

The pith

An online temporal smoothness technique and a low-pass filter reduce the amplitude and frequency of control actions in cascaded online learning flight control systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper sets out to show that oscillatory control actions in cascaded online learning flight controllers can be tamed by two specific additions. The cascaded structure is common in flight control but its stability suffers when online learning produces rapid, high-amplitude changes in the commands. The authors add an online temporal smoothness technique that penalizes abrupt changes over time and a low-pass filter that attenuates high-frequency content. Frequency-domain analysis with the Fast Fourier Transform then quantifies the drop in both amplitude and oscillation rate, and simulations confirm the result. A reader would care because smoother commands make these learning-based controllers far more suitable for real aircraft hardware where vibration and actuator wear matter.

Core claim

The central claim is that the introduction of an online temporal smoothness technique and a low-pass filter reduces the amplitude and frequency of the control actions in the cascaded online learning flight control system. Fast Fourier Transform is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.

What carries the argument

The online temporal smoothness technique, which penalizes large changes between successive actions, paired with a low-pass filter that removes high-frequency components from the control signal.

If this is right

  • The frequency content of control signals drops, lowering actuator wear.
  • Amplitude of sudden command changes decreases, improving closed-loop stability margins.
  • The cascaded structure remains usable for online learning while satisfying practical smoothness requirements.
  • FFT-based frequency analysis becomes a standard check for controller quality in simulation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same smoothness additions could be applied to other cascaded adaptive controllers outside aviation.
  • Hardware-in-the-loop tests would be needed to confirm that reduced oscillations translate to lower vibration on actual airframes.
  • The approach may trade a small amount of responsiveness for smoothness, an effect worth quantifying in future experiments.

Load-bearing premise

The oscillatory behavior seen in the baseline cascaded system is caused primarily by the online learning component and can be reduced by the proposed techniques without creating new stability problems or losing tracking performance.

What would settle it

A side-by-side simulation or hardware flight test that records actuator commands, computes their amplitude and frequency spectra via FFT, and checks whether the smoothed version shows a measurable drop in both metrics relative to the baseline.

Figures

Figures reproduced from arXiv: 2507.04346 by Erik-jan van Kampen, Yifei Li.

Figure 1
Figure 1. Figure 1: Cascaded online learning flight control system. Two actors are [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: shows the network structures. The critic network takes as input the pitch rate qt and the tracking error eq(t) . The output is the estimated value function V2(t), with an absolute value activation applied at the output layer to ensure the positive definiteness of V2(t). The actor network utilizes an additional input αt to account for dynamic coupling in model 1. The output is the control surface deflection… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison between control systems without and with temporal [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparison of network parameters for flight control systems [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of control surface deflection in the time and frequency [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Comparison of actors’ output increments over time. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: shows the dynamical saturation level of the activation function in output layer of actors. The vanilla control system eventually makes tanh(·) saturated in the range of input intervals [-4,-2], [2,4] (the higher-level agent), and input intervals [-3,-2], [2,3] (the lower-level agent). In these intervals, the derivative of tanh(·) is less than 0.1, providing slight gradient for actor training. As a compari… view at source ↗
Figure 11
Figure 11. Figure 11: Comparison of sensitivity measures before and after using a TS [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Detailed comparison of pitch rate tracking in the time and [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
Figure 17
Figure 17. Figure 17: Comparison of sensitivity measures between TS method and [PITH_FULL_IMAGE:figures/full_fig_p009_17.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparison of one-step cost of two control systems using the TS [PITH_FULL_IMAGE:figures/full_fig_p009_15.png] view at source ↗
read the original abstract

This paper aims to improve the action smoothness of a cascaded online learning flight control system. Although the cascaded structure is widely used in flight control design, its stability can be compromised by oscillatory control actions, which poses challenges for practical engineering applications. To address this issue, we introduce an online temporal smoothness technique and a low-pass filter to reduce the amplitude and frequency of the control actions. Fast Fourier Transform (FFT) is used to analyze policy performance in the frequency domain. Simulation results demonstrate the improvements achieved by the two proposed techniques.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that introducing an online temporal smoothness technique together with a low-pass filter into a cascaded online-learning flight controller reduces the amplitude and frequency of control actions. FFT analysis is used to quantify the frequency-domain improvement, and simulation results are presented to demonstrate the overall benefit for practical engineering use.

Significance. If the central claim holds, the work addresses a recognized practical obstacle to deploying online learning in cascaded flight-control architectures. The combination of a temporal smoothness operator and low-pass filtering is a direct, implementable addition that could improve actuator longevity and reduce wear without requiring a complete redesign of the inner-loop controller.

major comments (3)
  1. [Simulation results] Simulation results (and abstract): the reported FFT reductions in amplitude and frequency are presented without accompanying time-domain metrics (RMSE, rise time, overshoot, or steady-state error) comparing the baseline cascaded system to the modified system. Without these quantities it is impossible to verify that tracking performance and disturbance rejection are preserved.
  2. [Method] Method section describing the low-pass filter and temporal smoothness operator: no Bode or Nyquist analysis, gain/phase margins, or closed-loop pole locations are provided to confirm that the added phase lag does not degrade inner-loop responsiveness or stability margins in the cascaded architecture.
  3. [FFT analysis] FFT analysis: the paper does not state whether the smoothness parameters were tuned after observing the baseline oscillations or fixed a priori; post-hoc tuning would weaken the claim that the improvement is a general property of the proposed techniques.
minor comments (2)
  1. [Method] Notation for the temporal smoothness operator should be defined explicitly with a discrete-time equation or pseudocode so that the reader can reproduce the exact implementation.
  2. [Figures] Figure captions for the FFT plots should include the exact frequency range, windowing method, and number of simulation runs averaged.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and indicate the changes planned for the revised manuscript.

read point-by-point responses
  1. Referee: [Simulation results] Simulation results (and abstract): the reported FFT reductions in amplitude and frequency are presented without accompanying time-domain metrics (RMSE, rise time, overshoot, or steady-state error) comparing the baseline cascaded system to the modified system. Without these quantities it is impossible to verify that tracking performance and disturbance rejection are preserved.

    Authors: We agree that time-domain metrics are necessary to demonstrate that the proposed modifications preserve tracking performance. In the revised manuscript we will add RMSE, rise time, overshoot, and steady-state error comparisons between the baseline and modified controllers across the presented simulation scenarios. revision: yes

  2. Referee: [Method] Method section describing the low-pass filter and temporal smoothness operator: no Bode or Nyquist analysis, gain/phase margins, or closed-loop pole locations are provided to confirm that the added phase lag does not degrade inner-loop responsiveness or stability margins in the cascaded architecture.

    Authors: We acknowledge the absence of explicit stability-margin analysis. Although the simulations exhibit stable closed-loop behavior, we will include Bode plots together with gain and phase margin calculations for the inner-loop controller after insertion of the low-pass filter and temporal smoothness operator. revision: yes

  3. Referee: [FFT analysis] FFT analysis: the paper does not state whether the smoothness parameters were tuned after observing the baseline oscillations or fixed a priori; post-hoc tuning would weaken the claim that the improvement is a general property of the proposed techniques.

    Authors: The smoothness parameters were chosen a priori on the basis of typical actuator bandwidths and expected flight-control frequency content, before any baseline oscillation data were examined. We will add an explicit statement of this selection procedure in the revised FFT analysis section. revision: yes

Circularity Check

0 steps flagged

No circularity: additive techniques validated empirically without self-referential reduction

full rationale

The paper introduces an online temporal smoothness technique and low-pass filter as direct modifications to a cascaded online learning flight control system, then evaluates them via FFT frequency analysis and simulation results showing reduced control action amplitude and frequency. No derivation chain, equations, or claims reduce these improvements to fitted parameters, self-definitions, or self-citation chains by construction; the central claim rests on independent methodological additions and external simulation benchmarks rather than tautological equivalence to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides no explicit free parameters, axioms, or invented entities; the work appears to rest on standard assumptions from control theory and online learning without new postulates.

pith-pipeline@v0.9.0 · 5615 in / 1025 out tokens · 32488 ms · 2026-05-19T06:33:54.724813+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    J. D. Anderson Fundamentals of Aerodynamics. McGraw-Hill Series in Aeronautical and Aerospace Engineering. New York, USA: McGraw-Hill, 2017

  2. [2]

    Sieberling, Q

    S. Sieberling, Q. P. Chu, and J. A. Mulder Robust Flight Control Using Incremental Nonlinear Dynamic Inversion and Angular Ac- celeration Prediction Journal of Guidance, Control and Dynamics , vol. 33, no. 6, pp. 1732–1742, 2010, doi: 10.2514/1.49978

  3. [3]

    X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Stability Analysis for Incremental Nonlinear Dynamic Inversion Control Journal of Guidance, Control and Dynamics , vol. 42, no. 5, pp. 1116–1129, 2019, doi: 10.2514/1.G003791

  4. [4]

    Z. C. Liu, Y . F. Zhang, and J. J. Liang and H. X. Chen Application of the Improved Incremental Nonlinear Dynamic Inversion in Fixed- Wing UA V Flight Tests Journal of Aerospace Engineering , vol. 35, no. 6, pp. 1–13, 2022, doi: 10.1061/(ASCE)AS.1943-5525.0001495

  5. [5]

    Y . C. Wang, W. S. Chen, S. X. Zhang, J. W. Zhu and L. J. Cao Command-Filtered Incremental Backstepping Controller for Small Unmanned Aerial Vehicles Journal of Guidance, Control, and Dy- namics, vol. 41, no. 4, pp. 952–965, 2018, doi: 10.2514/1.G003001

  6. [6]

    X. R. Wang, E. van Kampen, Q. P. Chu and P. Lu Incremental Sliding-Mode Fault-Tolerant Flight Control Journal of Guidance, Control, and Dynamics , vol. 42, no. 2, pp. 244–259, 2019, doi: 10.2514/1.G003497

  7. [7]

    W. H. Chen, J. Yang, L. Guo and S. H. Li Disturbance-Observer- Based Control and Related Methods—An Overview IEEE Transac- tions on Industrial Electronics , vol. 63, no. 2, pp. 1083–1095, 2016, doi: 10.1109/TIE.2015.2478397

  8. [8]

    D. M. Acosta and S. S. Joshi Adaptive Nonlinear Dynamic Inversion Control of an Autonomous Airship for the Exploration of Titan AIAA Guidance, Navigation and Control Conference and Exhibit , Hilton Head, South Carolina, USA, August, 2007, doi: 10.2514/6.2007- 6502

  9. [9]

    E. J. J. Smeur, Q. P. Chu and G. H. E. de Croon Adaptive Incremental Nonlinear Dynamic Inversion for Attitude Control of Micro Air Vehicles Journal of Guidance, Control, and Dynamics , vol. 39, no. 3, pp. 450–461, 2016, doi: 10.2514/1.G001490

  10. [10]

    B. Smit, T. S. C. Pollack and E. van Kampen Adaptive Incremental Nonlinear Dynamic Inversion Flight Control for Consistent Handling Qualities AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-1394

  11. [11]

    Harris, C

    J. Harris, C. M. Elliott and G. S. Tallant L1 Adaptive Nonlinear Dynamic Inversion Control for the Innovative Control Effectors Aircraft AIAA SciTech 2022 Forum, San Diego, CA&Virtual, USA, January, 2022, doi: 10.2514/6.2022-0791

  12. [12]

    Sonneveldt, Q

    L. Sonneveldt, Q. P. Chu and J. A. Mulder Constrained Adaptive Backstepping Flight Control: Application to a Nonlinear F-16/MATV Model AIAA Guidance, Navigation and Control Conference and Ex- hibit, Keystone, Colorado, USA, August, 2006, doi: 10.2514/6.2006- 6413

  13. [13]

    Sonneveldt, Q

    L. Sonneveldt, Q. P. Chu and J. A. Mulder Nonlinear Flight Control Design Using Constrained Adaptive Backstepping Journal of Guidance, Control, and Dynamics , vol. 30, no. 2, pp. 322–336, 2007, doi: 10.2514/1.25834

  14. [14]

    Q. Hu, Y . Meng, C. L. Wang and Y . M. Zhang Adaptive Back- stepping Control for Air-breathing Hypersonic Vehicles with Input Nonlinearities Aerospace Science and Technology, vol. 73, pp. 289– 299, 2018, doi: 10.1016/j.ast.2017.12.001

  15. [15]

    R. S. Sutton and A. G. Barto Reinforcement Learning: An Intro- duction. 6th edition, Sigma Series in Pure Mathematics. Cambridge, MA, USA: A Bradford Book, 2018

  16. [16]

    Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Online Dual Heuristic Programming for Nonlinear Adaptive Con- trol Control Engineering Practice , vol. 73, pp. 13–25, 2018, doi: 10.1016/j.conengprac.2017.12.011

  17. [17]

    Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Flight Control Design In Proceedings of the EuroGNC 2015 , Toulouse, France, 2015

  18. [18]

    Y . Zhou, E. van Kampen and Q. P. Chu Incremental Approximate Dynamic Programming for Nonlinear Adaptive Tracking Control with Partial Observability Journal of Guidance, Control, and Dynam- ics, vol. 41, no. 12, pp. 2554–2567, 2018, doi: 10.2514/1.G003472

  19. [19]

    Y . Zhou, E. van Kampen and Q.P. Chu An Incremental Approximate Dynamic Programming Flight Controller Based on Output Feedback AIAA Guidance, Navigation, and Control Conference , San Diego, California, USA, January, 2016, doi: 10.2514/6.2016-0360

  20. [20]

    Sun and E

    B. Sun and E. van Kampen Incremental Model-Based Heuristic Dynamic Programming with Output Feedback Applied to Aerospace System Identification and Control 2020 IEEE Conference on Control Technology and Applications, Montreal, QC, Canada, 2020, pp. 366- 371, August, 2020, doi: 10.1109/CCTA41146.2020.9206261

  21. [21]

    Sun and E

    B. Sun and E. van Kampen Intelligent Adaptive Optimal Control Us- ing Incremental Model-Based Global Dual Heuristic Programming Subject to Partial Observability Applied Soft Computing , vol. 103, pp. 1–15, 2021, doi: 10.1016/j.asoc.2021.107153

  22. [22]

    Sun and E

    B. Sun and E. van Kampen Reinforcement-Learning-Based Adaptive Optimal Flight Control with Output Feedback and Input Constraints Journal of Guidance, Control, and Dynamics , vol. 44, no. 9, pp. 1685–1691, 2021, doi: 10.2514/1.G005715

  23. [23]

    Sun and E

    B. Sun and E. van Kampen Event-Triggered Constrained Control Using Explainable Global Dual Heuristic Programming for Nonlin- ear Discrete-Time Systems Neurocomputing, vol. 468, pp. 452–463, 2022, doi: 10.1016/j.neucom.2021.10.046

  24. [24]

    Y . Zhou, E. van Kampen and Q.P. Chu Incremental Model Based Heuristic Dynamic Programming for Nonlinear Adaptive Flight Control In Proceedings of the International Micro Air Vehicles Conference and Competition , Beijing, China, October, 2016, url: https://www.imavs.org/papers/2016/25.pdf

  25. [25]

    Y . Zhou, E. van Kampen and Q. P. Chu Incremental Model based On- line Heuristic Dynamic Programming for Nonlinear Adaptive Track- ing Control with Partial Observability Aerospace Science and Tech- nology, vol. 105, pp. 1–14, 2020, doi: 10.1016/j.ast.2020.106013

  26. [26]

    Heyer, D

    S. Heyer, D. Kroezen and E. van Kampen Online Adaptive Incre- mental Reinforcement Learning Flight Control for a CS-25 Class Aircraft AIAA Scitech 2020 Forum , Orlando, FL, USA, October, 2020, doi: 10.2514/6.2020-1844

  27. [27]

    Sun, C.C

    L.G. Sun, C.C. de Visser, Q.P. Chu and W. Falkena Hybrid Sensor- Based Backstepping Control Approach with Its Application to Fault- Tolerant Flight Control Journal of Guidance, Control, and Dynamics, vol. 31, no. 1, pp. 59–71, 2014, doi: 10.2514/1.61890

  28. [28]

    Y . Zhou, E. van Kampen and Q.P. Chu Launch Vehicle Adaptive Flight Control with Incremental Model Based Heuristic Dynamic Programming International Astronautical Congress, Adelaide, Aus- tralia, Septermber, 2017

  29. [29]

    2021 , url =

    S. Mysore, B. Mabsout, R. Mancuso and K. Saenko Regularizing Action Policies for Smooth Control with Reinforcement Learning IEEE International Conference on Robotics and Automation , Xi’an, China, October, 2021, doi: 10.1109/ICRA48506.2021.9561138

  30. [30]

    Kalliny, A.A

    A.N. Kalliny, A.A. El-Badawy and S.M. Elkhamisy Command- Filtered Integral Backstepping Control of Longitudinal Flapping- Wing Flight Journal of Guidance, Control, and Dynamics , vol. 41, no. 7, pp. 1556–1568, 2018, doi: 10.2514/1.G003267

  31. [31]

    Farrell, M

    J.A. Farrell, M. Polycarpou, M. Sharma and W. Dong Command Filtered Backstepping IEEE Transactions on Automatic Control, vol. 54, no. 6, pp. 1391-1395, 2009, doi: 10.1109/TAC.2009.2015562

  32. [32]

    R.A. Hull, D. Schumacher and Z.H. Qu Design and Evaluation of Robust Nonlinear Missile Autopilots from a Performance Perspective In Proceedings of 1995 American Control Conference , Seattle, W A, USA, June, 1995, doi: 10.1109/ACC.1995.529235

  33. [33]

    P. J. Werbos Advanced Forecasting Methods for Global Crisis Warning and Models of Intelligence General Systems, vol. 22, pp. 25-38, 1977, url: https://gwern.net/doc/reinforcement-learning/1977- werbos.pdf

  34. [34]

    Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200

    K. Shibata Dynamic Reinforcement Learning for Actors arXiv preprint, 2025, doi: 10.48550/arXiv.2502.10200. APPENDIX A. Derivation of incremental model Taking the Taylor expansion of systems 6: αt+1 =αt + F 1 t−1(αt − αt−1) + G1 t−1(qt − qt−1) + O (αt − αt−1)2, (qt − qt−1)2 qt+1 =qt + F 2 t−1(qt − qt−1) + G2 t−1(δt − δt−1) + O (qt − qt−1)2, (δt − δt−1)2 (3...

  35. [35]

    Critic: The gradient of LC1 t with respect to ψ1 is ∂LC1 t ∂ψ1(t) = ∂LC1 t ∂δ1(t) ∂δ1(t) ∂ψ1(t) = −δ1(t) ∂ ˆV1(αt, eα(t)) ∂ψ1(t) (39) The parameter set ψ1 is updated as ψ1(t+1) = ψ1(t) − ηC1 ∂LC1 t ∂ψ1(t) (40) where ηC1 is the learning rate

  36. [36]

    Target Critic: The target critic network is used to stabilize learning by delaying the updates: ψ′ 1(t+1) = τ ψ1(t+1) + (1 − τ)ψ′ 1(t) (41) where τ is delay factor τ, ψ′ 1 is the parameter set of target critic

  37. [37]

    ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) =

    Actor: The gradient of LA1 t with respect to ϑ1 is ∂LA1 t ∂ϑ1(t) = ∂ h c1(ˆeα(t+1), qt) + γ ˆV1(ˆαt+1, ˆeα(t+1)) i ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ∂ ˆαt+1 ∂qref(t) ∂qref(t) ∂ϑ1(t) = " ∂c1(ˆeα(t+1), qt) ∂ ˆαt+1 + γ ˆV1(ˆαt+1, ˆeα(t+1)) ∂ ˆαt+1 # ˆG1 t−1 ∂qref(t) ∂ϑ1(t) (42) The parameter set ϑ1 is updated as ϑ1(t+1) ...

  38. [38]

    Critic: The gradient of LC2 t with respect to ψ2 is ∂LC2 t ∂ψ2(t) = ∂LC2 t ∂δ2(t) ∂δ2(t) ∂ψ2(t) = − δ2(t) ∂ ˆV (qt, eq(t)) ∂ψ2(t) (44) The parameter set ψ2 is updated as ψ2(t + 1) = ψ2(t) − ηC2 ∂LC2 t ∂ψ2(t) (45) where ηC2 is the learning rate

  39. [39]

    ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

    Target Critic: Target critic is used to stabilize the learning by slowing down the network update. ψ′ 2(t+1) = τ ψ2(t+1) + (1 − τ)ψ′ 2(t) (46) where τ is delay factor, ψ′ 2 is parameter set of target critic

  40. [40]

    ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) =

    Actor: The gradient of LA2 t with respect to ϑ2 is ∂LA2 t ∂ϑ2(t) = ∂ h c2(ˆeq(t+1), δt) + γ ˆV2target(ˆqt+1, ˆeq(t+1)) i ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ∂ˆqt+1 ∂δe(t) ∂δe(t) ∂ϑ2(t) = " ∂c2(ˆeq(t+1), δt) ∂ˆqt+1 + γ ˆV2target(ˆqt+1, ˆeq(t+1)) ∂ˆqt+1 # ˆG2 t−1 ∂δe(t) ∂ϑ2(t) (47) The parameter set ϑ2 is updated as ϑ...