Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

Erik-jan van Kampen; Yifei Li

arxiv: 2407.11077 · v4 · submitted 2024-07-13 · 💻 cs.LG · cs.AI

Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

Yifei Li , Erik-jan van Kampen This is my paper

Pith reviewed 2026-05-23 23:03 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords reinforcement learningdata augmentationsymmetryfixed-wing aircraftDDPGattitude controloffline RLflight control

0 comments

The pith

Symmetry in fixed-wing aircraft lateral dynamics enables data augmentation that accelerates DDPG convergence for attitude tracking control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the symmetry of the aircraft's lateral attitude dynamics, modeled as a Markov Decision Process, can be exploited to generate augmented training samples. These samples are added to the replay buffer of Deep Deterministic Policy Gradient to increase coverage of the state-action space, and a second critic network trained only on the augmented data forms a dual-critic structure that improves sample utilization. Flight-control simulations confirm both the model's symmetry and faster policy convergence when the augmented samples are used compared with standard DDPG.

Core claim

Under the symmetry assumption for the MDP of lateral attitude dynamics, a symmetric data augmentation method produces equivalent samples that are integrated into the DDPG dataset, while a dual-critic structure raises sample-efficiency; the aircraft model is verified to be symmetric and the resulting controller exhibits accelerated policy convergence in simulation.

What carries the argument

Symmetric data augmentation that transforms state-action-reward-next-state tuples according to the MDP symmetry, combined with a dual-critic DDPG architecture where one critic learns from the augmented samples.

If this is right

Augmented samples raise coverage of the state-action space without collecting new flight data.
The dual-critic structure improves utilization of every collected transition.
Policy convergence occurs in fewer training episodes for this attitude-tracking task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same augmentation technique could be applied to any dynamical system whose MDP satisfies an analogous symmetry, such as certain robotic or marine vehicles.
If the symmetry holds only approximately on hardware, the method may still reduce the volume of real-flight data needed for controller training.
Extending the dual-critic idea to other off-policy algorithms would test whether the sample-efficiency gain generalizes beyond DDPG.

Load-bearing premise

The lateral attitude dynamics of the fixed-wing aircraft obey the symmetry assumption of the underlying Markov Decision Process.

What would settle it

A direct comparison of DDPG training curves on the same aircraft model, with and without the symmetric augmentation, that shows identical convergence speed would falsify the claimed acceleration benefit.

Figures

Figures reproduced from arXiv: 2407.11077 by Erik-jan van Kampen, Yifei Li.

**Figure 1.** Figure 1: State-action set S ×A of dynamical system (1), with two contained subsets: (1) explored state-action set (S × A)exp by explored transitions of a RL agent; (2) symmetric state-action set (S × A)aug by augmented transitions defined in Section III.A. A. Symmetric Data Augmentation The symmetric data augmentation method is implemented by Eqs.(19)(20)(21)(23), and summarized as s ′ t = Ast + Bx∗ (25) where st =… view at source ↗

**Figure 2.** Figure 2: Learning performance of three RL agents in 500 episodes. Solid [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Operation performance of three RL agents with fixed actor weights. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 5.** Figure 5: Aircraft lateral states and control inputs histories by three RL agents. Solid line represents mean of states and control inputs for 5 instances, [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Learning performance of three RL algorithms in 3000-episode [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

read the original abstract

The symmetry of dynamical systems can be exploited for state-transition prediction and to facilitate control policy optimization. This paper leverages system symmetry to develop sample-efficient offline reinforcement learning (RL) approaches. Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed. The augmented samples are integrated into the dataset of Deep Deterministic Policy Gradient (DDPG) to enhance its coverage rate of the state-action space. Furthermore, sample utilization efficiency is improved by introducing a second critic trained on the augmented samples, resulting in a dual-critic structure. The aircraft's model is verified to be symmetric, and flight control simulations demonstrate accelerated policy convergence when augmented samples are employed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Routine application of known symmetry augmentation to DDPG for aircraft control, but the abstract supplies no numbers or verification details so the convergence claim cannot be checked.

read the letter

The paper applies symmetric data augmentation inside a dual-critic DDPG setup to the lateral attitude tracking task on a fixed-wing aircraft and reports that the model is symmetric and that the extra samples speed up policy convergence in simulation. That combination for this specific control problem is the only concrete new element; the rest follows directly from earlier symmetry-exploitation work in RL. The approach is sensible on paper: if the MDP really is symmetric, then mirroring samples should improve coverage without extra environment interaction. The dual-critic trick is a clean way to keep the original and augmented data from fighting each other during updates. Beyond that, the manuscript does little else. The abstract states the symmetry verification and the convergence improvement but supplies no numbers, no baseline curves, no error bars, and no description of how the symmetry check was performed or whether it included the reward function and disturbance terms. The stress-test note is therefore on target: nominal dynamics symmetry does not automatically carry over to the full MDP once rewards and unmodeled effects enter. Without those checks the reported gain could be an artifact of biased augmentation. This work is aimed at the narrow group already working on sample-efficient RL for aerospace vehicles. Anyone outside that niche will find almost nothing to use. A serious editor should desk-reject rather than send it out; the central claim is not yet supported by evidence that can be evaluated.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes a symmetric data augmentation technique for offline DDPG, exploiting an assumed MDP symmetry in the lateral attitude dynamics of a fixed-wing aircraft. A dual-critic architecture is introduced (one critic on original data, one on augmented samples) to improve state-action coverage and sample efficiency. The abstract states that the aircraft model has been verified symmetric and that flight-control simulations show accelerated policy convergence under the augmentation.

Significance. If the symmetry verification extends rigorously to the full MDP (including rewards and disturbances) and the reported convergence gains are reproducible with proper baselines, the dual-critic augmentation could offer a lightweight way to improve sample efficiency in symmetric control tasks. The approach is conceptually straightforward and could be useful for other symmetric dynamical systems, but the current presentation leaves the magnitude and robustness of the benefit unclear.

major comments (2)

[Abstract] Abstract: the central claim that 'the aircraft's model is verified to be symmetric' and that augmentation yields accelerated convergence is load-bearing, yet the abstract supplies no quantitative metrics, error bars, baseline comparisons, or description of the verification procedure. Without these, the empirical support for the dual-critic improvement cannot be assessed.
[Abstract] The symmetry assumption for the MDP: the skeptic concern is valid on the available text. Verification appears limited to the nominal aircraft dynamics; it is not shown that the symmetry mapping preserves the reward function and is robust to unmodeled disturbances (wind gusts, actuator asymmetries). If the augmented samples systematically misrepresent the true MDP, the reported convergence benefit would be biased.

minor comments (1)

The description of how augmented samples are generated and how the second critic is trained could be clarified with an explicit equation or pseudocode block.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address the major points below and indicate planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'the aircraft's model is verified to be symmetric' and that augmentation yields accelerated convergence is load-bearing, yet the abstract supplies no quantitative metrics, error bars, baseline comparisons, or description of the verification procedure. Without these, the empirical support for the dual-critic improvement cannot be assessed.

Authors: We agree that the abstract would be strengthened by including quantitative support. The revised abstract will report key metrics on policy convergence (e.g., episodes to reach target performance with standard deviations across runs), explicit baseline comparisons to standard DDPG, and a concise statement of the symmetry verification procedure applied to the aircraft dynamics. revision: yes
Referee: [Abstract] The symmetry assumption for the MDP: the skeptic concern is valid on the available text. Verification appears limited to the nominal aircraft dynamics; it is not shown that the symmetry mapping preserves the reward function and is robust to unmodeled disturbances (wind gusts, actuator asymmetries). If the augmented samples systematically misrepresent the true MDP, the reported convergence benefit would be biased.

Authors: The manuscript verifies symmetry on the nominal lateral dynamics model and assumes the full MDP symmetry (including reward) follows from the control task formulation. Explicit checks that the symmetry mapping preserves the reward under disturbances are not provided. We will revise the text to clarify this assumption, add discussion of potential limitations from unmodeled effects, and note that the reported gains are observed under the simulated conditions. revision: partial

Circularity Check

0 steps flagged

No circularity: symmetry is an explicit input assumption, not derived from results

full rationale

The paper treats aircraft model symmetry as a verified premise that is then applied to generate augmented samples for DDPG training. No equations, fitted parameters, or predictions are shown to reduce by construction to those inputs; the reported convergence improvement is presented as an empirical outcome of the augmentation method rather than a tautological restatement of the symmetry assumption. No self-citations or uniqueness theorems are invoked as load-bearing steps. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven symmetry assumption for the aircraft MDP and on the unstated details of how the model verification and simulations were performed.

axioms (1)

domain assumption The aircraft lateral dynamics satisfy the symmetry assumption for the MDP
Stated explicitly in the abstract as the foundation for the augmentation method.

pith-pipeline@v0.9.0 · 5645 in / 1204 out tokens · 17369 ms · 2026-05-23T23:03:34.192050+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Under the symmetry assumption for a Markov Decision Process (MDP), a symmetric data augmentation method is proposed... The aircraft's model is verified to be symmetric
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Definition 3. (Symmetric reward function) r(xt, at) = r(x't, a't)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

[1]

P.Zipfel, Aerodynamic Symmetry of Aircraft and Guided Missiles, Journal of Aircraft, vol.13, no.7, 1976

work page 1976
[2]

Y .Yao, G.Xiong, K.Wang, et.al, Vehicle Detection Method based on Active Basis Model and Symmetry in ITS, IEEE Conference in Intelligent Transportation System(ITSC), The Hague, Netherlands, Oct, 2013

work page 2013
[3]

F.Amadio, A.Colome, C.Torras, Exploiting Symmetries in Reinforce- ment Learning of Bimanual Robotic Tasks, IEEE Robotics and Au- tomation Letters, vol.4, no.2, 2019, pp.1838-1845

work page 2019
[4]

A.Mahajan, T.Tulabandhula, Symmetry Learning for Function Ap- proximation in Reinforcement learning, 2017, arXiv:1706.02999

work page internal anchor Pith review Pith/arXiv arXiv 2017
[5]

Z.Martinot, Solutions to Ordinary Differential Equations using Meth- ods of Symmetry, University of Washton, USA, 2014

work page 2014
[6]

M.Weissenbacher, S,Sinha, A.Garg, Y .Kawahara, Koopman Q- learning: offline reinforcement learning via symmetries of dynamics, Proceedings of the 39th International Conference on Machine Learn- ing(ICML), Baltimore, USA, Mar, July, 2022

work page 2022
[7]

G.Russo, J.E.Slotine, Symmetries, Stability and Control in Nonlinear Systems and Networks, MIT, USA, 2014

work page 2014
[8]

M.Zinkevich, T.Balch, Symmetry in Markov Decision Process and its Implications for Single Agent and Multiagent Learning, Proceedings of the 18th International Conference on Machine Learning, Mas- sachusetts, USA, 2001. 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG-SCA 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG-SDA 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG 0 5 10 ...

work page 2001
[9]

E.van der Pol, D.Worrall, et.al, MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning, Conference on Neural Infor- mation Processing Systems(NeurIPS), Vancouver, June, 2020

work page 2020
[10]

S.Liu, M.Xu, P.Huang, X.Zhang, et.al, Continual vision-based rein- forcement learning with group symmetries, 7th Conference on Robot Learning(CoRL), Atlanta, USA, November, 2015

work page 2015
[11]

W.Yu, G.Turk and C.K.Liu, Learning symmetric and low-energy locomotion, ACM Transactions on Graphics, vol.(37), no.(4), 2018

work page 2018
[12]

F.Abdolhosseini, H.Y .Ling, Z.Xie, et.al, On learning symmetric loco- motion, Proceedings of 12th ACM Conference on Motion, Interaction and Games, Newcastle, U.k., October 2019

work page 2019
[13]

M.Kasaei, M.Abreu, N.Lau, et.al, A CPB-based Agile and Versatile Locomotion Framework with Proximal Symmetry Loss Function, arXiv:2103.00928, 2021

work page arXiv 2021
[14]

M.Abreu, L.P.Reis, N.Lau, Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension, arXiv:2309.02711, 2023

work page arXiv 2023
[15]

M.A.S.Kamal, J.Murata, Reinforcement learning for problems with symmetrical restricted states, Robotics and Autonomous Systems, vol.(56), 2008, pp.717-727

work page 2008
[16]

S.Mishra, A.Abdolmaleki, A.Guez, et.al, Augmenting Learning Us- ing Symmetry in a Biologically-inspired Domain, arXiv:1910.00528, 2019

work page arXiv 1910
[17]

G.Angelotti, N.Drougard, and C.P.C.Chanel, Expert-guided Symmetry Detection in Markov Decision Process, ICAART, 2022

work page 2022
[18]

G.Angelotti, N.Drougard, C.P.C.Chanel, Data Augmentation Through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning, International Conference on Agents and Artificial

work page
[19]

J.Brandstetter, M.Welling, D.E.Worral, Lie Point Symmetry Data Augmentation for Neural PDE Solvers, 2022

work page 2022
[20]

Incorporating symmetry into deep dy- namics models for improved generalization

R.Wang, R.Walters, and R.Yu. Incorporating symmetry into deep dy- namics models for improved generalization. International Conference on Learning Representations(ICLR), 2021

work page 2021
[21]

equivariant networks: A theory of generalization on dynamics forecasting, ICML, 2022

R.Wang, R.Walters, R.Yu, Data augmentation vs. equivariant networks: A theory of generalization on dynamics forecasting, ICML, 2022

work page 2022
[22]

Y .Lin, J.Huang, et.al, Towards more sample efficiency in reinforcement learning with data augmentation, NIPS, 2019

work page 2019
[23]

Intelligence(ICAART), Lisbon, Portugal, 2023

Y .Lin, J.Huang, M.Zimmer, et.al, Invariant transform experience re- play: data augmentation for deep reinforcement learning, IEEE Trans- actions on Robotics and Automation Letters, vol.5, no.4, pp:6615- 6622, 2020. Intelligence(ICAART), Lisbon, Portugal, 2023

work page 2020
[24]

C.Pinneri, S.Bechtle, Equivariant Data Augmentation for Generaliza- tion in Offline Reinforcement Learning, ICML, 2023

work page 2023
[25]

R.Wang, R.Walters, R.Yu, Approximately Equivariant Networks for Imperfectly Symmetric Dynamics, ICML, 2022

work page 2022
[26]

J.Huang, W.Zeng, H.Xiong, et.al, Symmetry-informed Reinforce- ment Learning and its Application to Low-Level Attitude Control of Quadrotors, IEEE Transactions on Artificial Intelligence, vol(5), no.(3), pp: 1147-1161

work page
[27]

Cascade Flight Control of Quadrotors Based on Deep Reinforcement Learning, IEEE Robotics and Automa- tion Letters, 2022

H.Han, J.Cheng, Z.Xi, B.Yao. Cascade Flight Control of Quadrotors Based on Deep Reinforcement Learning, IEEE Robotics and Automa- tion Letters, 2022

work page 2022
[28]

Deterministic Policy Gradient With Inte- gral Compensator for Robust Quadrotor Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020

Y .ang, J.Sun, H.He, C.Sun. Deterministic Policy Gradient With Inte- gral Compensator for Robust Quadrotor Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020

work page 2020
[29]

Z.Liu, Q.Dang, et.al

B.Ma. Z.Liu, Q.Dang, et.al. Deep Reinforcement Learning of UA V Tracking Control Under Wind Disturbances Environments, IEEE Transactions on Instrumentation and Measurement, 2023

work page 2023
[30]

Interchangeable Reinforcement- Learning Flight Controller for Fixed-Wing UASs, IEEE Transactions on Aerospace and Electronic Systems, early access

M.Chowdhury, S.Keshmiri, et.al. Interchangeable Reinforcement- Learning Flight Controller for Fixed-Wing UASs, IEEE Transactions on Aerospace and Electronic Systems, early access

work page
[31]

Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UA Vs: Field Experiments, IEEE Transactions on Neural Networks and Learn- ing Systems, 2024

E.Bohn, E.M.Coates, D.Reinhardt, T.A.Johansen. Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UA Vs: Field Experiments, IEEE Transactions on Neural Networks and Learn- ing Systems, 2024

work page 2024
[32]

H.Jiang, H.Xiong, W.Zeng, et.al. Safely learn to Fly Aircraft From Human: An Offline–Online Reinforcement Learning Strategy and Its Application to Aircraft Stall Recovery, IEEE Transactions on Aerospace and Electronic Systems, vol(59), no.(6), pp: 8194-8207

work page
[33]

Z.Liu, W.Zhao, et.al

B.Ma. Z.Liu, W.Zhao, et.al. Target Tracking Control of UA V Through Deep Reinforcement Learning, IEEE Transactions on Intelligent Transportation Systems, 2023

work page 2023
[34]

T.P.Lillicrap, J.J.Hunt, A.Pritizel, et.al, Continuous Control with Deep Reinforcement Learning, International Conference on Learning Rep- resentations (ICLR), 2016

work page 2016
[35]

J.L.Doob, The Brownian Movement and Stochastic Equations, Annals of Mathematicals, Mathematics Department, Princeton University, 1942

work page 1942
[36]

H.Ohta, P.N.Nikiforuk and M.M.Gupta, Design of Desirabale Han- dling Qualities for Aircraft Lateral Dynamics, Journal of Guidance, Control and Dynamics, vol.2, no.1, 1979, pp.31-39

work page 1979
[37]

8-10, 1977

H.Ohta, P.N.Nikiforuk and M.M.Gupta, Some analytical control laws for the design of desirable lateral handing qualities using the model matching method, AIAA Paper 77-1045, Hollywood, Fla., Aug. 8-10, 1977

work page 1977
[38]

H.J.Stetter, Analysis of discretization methods for ordinary differential equations, Springer,1973

work page 1973
[39]

S.M.Shinner, Modern control system theory and application, Addison- Wesley, 2nd, 1978. IX. A PPENDIX A. Proof of Theorem 1 Theorem 1. (Symmetry of xt+1) For a discrete-time system model (1), two state transition samples (xt, at, xt+1), (x′ t, a′ t, x′ t+1) and a reference point x = x∗ are selected. By assuming Eqs.(19), (20) hold, x∗ is a symmetric point ...

work page 1978

[1] [1]

P.Zipfel, Aerodynamic Symmetry of Aircraft and Guided Missiles, Journal of Aircraft, vol.13, no.7, 1976

work page 1976

[2] [2]

Y .Yao, G.Xiong, K.Wang, et.al, Vehicle Detection Method based on Active Basis Model and Symmetry in ITS, IEEE Conference in Intelligent Transportation System(ITSC), The Hague, Netherlands, Oct, 2013

work page 2013

[3] [3]

F.Amadio, A.Colome, C.Torras, Exploiting Symmetries in Reinforce- ment Learning of Bimanual Robotic Tasks, IEEE Robotics and Au- tomation Letters, vol.4, no.2, 2019, pp.1838-1845

work page 2019

[4] [4]

A.Mahajan, T.Tulabandhula, Symmetry Learning for Function Ap- proximation in Reinforcement learning, 2017, arXiv:1706.02999

work page internal anchor Pith review Pith/arXiv arXiv 2017

[5] [5]

Z.Martinot, Solutions to Ordinary Differential Equations using Meth- ods of Symmetry, University of Washton, USA, 2014

work page 2014

[6] [6]

M.Weissenbacher, S,Sinha, A.Garg, Y .Kawahara, Koopman Q- learning: offline reinforcement learning via symmetries of dynamics, Proceedings of the 39th International Conference on Machine Learn- ing(ICML), Baltimore, USA, Mar, July, 2022

work page 2022

[7] [7]

G.Russo, J.E.Slotine, Symmetries, Stability and Control in Nonlinear Systems and Networks, MIT, USA, 2014

work page 2014

[8] [8]

M.Zinkevich, T.Balch, Symmetry in Markov Decision Process and its Implications for Single Agent and Multiagent Learning, Proceedings of the 18th International Conference on Machine Learning, Mas- sachusetts, USA, 2001. 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG-SCA 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG-SDA 0 5 10 15 20 25 30 −20 0 20 ϕ [deg] DDPG 0 5 10 ...

work page 2001

[9] [9]

E.van der Pol, D.Worrall, et.al, MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning, Conference on Neural Infor- mation Processing Systems(NeurIPS), Vancouver, June, 2020

work page 2020

[10] [10]

S.Liu, M.Xu, P.Huang, X.Zhang, et.al, Continual vision-based rein- forcement learning with group symmetries, 7th Conference on Robot Learning(CoRL), Atlanta, USA, November, 2015

work page 2015

[11] [11]

W.Yu, G.Turk and C.K.Liu, Learning symmetric and low-energy locomotion, ACM Transactions on Graphics, vol.(37), no.(4), 2018

work page 2018

[12] [12]

F.Abdolhosseini, H.Y .Ling, Z.Xie, et.al, On learning symmetric loco- motion, Proceedings of 12th ACM Conference on Motion, Interaction and Games, Newcastle, U.k., October 2019

work page 2019

[13] [13]

M.Kasaei, M.Abreu, N.Lau, et.al, A CPB-based Agile and Versatile Locomotion Framework with Proximal Symmetry Loss Function, arXiv:2103.00928, 2021

work page arXiv 2021

[14] [14]

M.Abreu, L.P.Reis, N.Lau, Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension, arXiv:2309.02711, 2023

work page arXiv 2023

[15] [15]

M.A.S.Kamal, J.Murata, Reinforcement learning for problems with symmetrical restricted states, Robotics and Autonomous Systems, vol.(56), 2008, pp.717-727

work page 2008

[16] [16]

S.Mishra, A.Abdolmaleki, A.Guez, et.al, Augmenting Learning Us- ing Symmetry in a Biologically-inspired Domain, arXiv:1910.00528, 2019

work page arXiv 1910

[17] [17]

G.Angelotti, N.Drougard, and C.P.C.Chanel, Expert-guided Symmetry Detection in Markov Decision Process, ICAART, 2022

work page 2022

[18] [18]

G.Angelotti, N.Drougard, C.P.C.Chanel, Data Augmentation Through Expert-guided Symmetry Detection to Improve Performance in Offline Reinforcement Learning, International Conference on Agents and Artificial

work page

[19] [19]

J.Brandstetter, M.Welling, D.E.Worral, Lie Point Symmetry Data Augmentation for Neural PDE Solvers, 2022

work page 2022

[20] [20]

Incorporating symmetry into deep dy- namics models for improved generalization

R.Wang, R.Walters, and R.Yu. Incorporating symmetry into deep dy- namics models for improved generalization. International Conference on Learning Representations(ICLR), 2021

work page 2021

[21] [21]

equivariant networks: A theory of generalization on dynamics forecasting, ICML, 2022

R.Wang, R.Walters, R.Yu, Data augmentation vs. equivariant networks: A theory of generalization on dynamics forecasting, ICML, 2022

work page 2022

[22] [22]

Y .Lin, J.Huang, et.al, Towards more sample efficiency in reinforcement learning with data augmentation, NIPS, 2019

work page 2019

[23] [23]

Intelligence(ICAART), Lisbon, Portugal, 2023

Y .Lin, J.Huang, M.Zimmer, et.al, Invariant transform experience re- play: data augmentation for deep reinforcement learning, IEEE Trans- actions on Robotics and Automation Letters, vol.5, no.4, pp:6615- 6622, 2020. Intelligence(ICAART), Lisbon, Portugal, 2023

work page 2020

[24] [24]

C.Pinneri, S.Bechtle, Equivariant Data Augmentation for Generaliza- tion in Offline Reinforcement Learning, ICML, 2023

work page 2023

[25] [25]

R.Wang, R.Walters, R.Yu, Approximately Equivariant Networks for Imperfectly Symmetric Dynamics, ICML, 2022

work page 2022

[26] [26]

J.Huang, W.Zeng, H.Xiong, et.al, Symmetry-informed Reinforce- ment Learning and its Application to Low-Level Attitude Control of Quadrotors, IEEE Transactions on Artificial Intelligence, vol(5), no.(3), pp: 1147-1161

work page

[27] [27]

Cascade Flight Control of Quadrotors Based on Deep Reinforcement Learning, IEEE Robotics and Automa- tion Letters, 2022

H.Han, J.Cheng, Z.Xi, B.Yao. Cascade Flight Control of Quadrotors Based on Deep Reinforcement Learning, IEEE Robotics and Automa- tion Letters, 2022

work page 2022

[28] [28]

Deterministic Policy Gradient With Inte- gral Compensator for Robust Quadrotor Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020

Y .ang, J.Sun, H.He, C.Sun. Deterministic Policy Gradient With Inte- gral Compensator for Robust Quadrotor Control, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2020

work page 2020

[29] [29]

Z.Liu, Q.Dang, et.al

B.Ma. Z.Liu, Q.Dang, et.al. Deep Reinforcement Learning of UA V Tracking Control Under Wind Disturbances Environments, IEEE Transactions on Instrumentation and Measurement, 2023

work page 2023

[30] [30]

Interchangeable Reinforcement- Learning Flight Controller for Fixed-Wing UASs, IEEE Transactions on Aerospace and Electronic Systems, early access

M.Chowdhury, S.Keshmiri, et.al. Interchangeable Reinforcement- Learning Flight Controller for Fixed-Wing UASs, IEEE Transactions on Aerospace and Electronic Systems, early access

work page

[31] [31]

Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UA Vs: Field Experiments, IEEE Transactions on Neural Networks and Learn- ing Systems, 2024

E.Bohn, E.M.Coates, D.Reinhardt, T.A.Johansen. Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UA Vs: Field Experiments, IEEE Transactions on Neural Networks and Learn- ing Systems, 2024

work page 2024

[32] [32]

H.Jiang, H.Xiong, W.Zeng, et.al. Safely learn to Fly Aircraft From Human: An Offline–Online Reinforcement Learning Strategy and Its Application to Aircraft Stall Recovery, IEEE Transactions on Aerospace and Electronic Systems, vol(59), no.(6), pp: 8194-8207

work page

[33] [33]

Z.Liu, W.Zhao, et.al

B.Ma. Z.Liu, W.Zhao, et.al. Target Tracking Control of UA V Through Deep Reinforcement Learning, IEEE Transactions on Intelligent Transportation Systems, 2023

work page 2023

[34] [34]

T.P.Lillicrap, J.J.Hunt, A.Pritizel, et.al, Continuous Control with Deep Reinforcement Learning, International Conference on Learning Rep- resentations (ICLR), 2016

work page 2016

[35] [35]

J.L.Doob, The Brownian Movement and Stochastic Equations, Annals of Mathematicals, Mathematics Department, Princeton University, 1942

work page 1942

[36] [36]

H.Ohta, P.N.Nikiforuk and M.M.Gupta, Design of Desirabale Han- dling Qualities for Aircraft Lateral Dynamics, Journal of Guidance, Control and Dynamics, vol.2, no.1, 1979, pp.31-39

work page 1979

[37] [37]

8-10, 1977

H.Ohta, P.N.Nikiforuk and M.M.Gupta, Some analytical control laws for the design of desirable lateral handing qualities using the model matching method, AIAA Paper 77-1045, Hollywood, Fla., Aug. 8-10, 1977

work page 1977

[38] [38]

H.J.Stetter, Analysis of discretization methods for ordinary differential equations, Springer,1973

work page 1973

[39] [39]

S.M.Shinner, Modern control system theory and application, Addison- Wesley, 2nd, 1978. IX. A PPENDIX A. Proof of Theorem 1 Theorem 1. (Symmetry of xt+1) For a discrete-time system model (1), two state transition samples (xt, at, xt+1), (x′ t, a′ t, x′ t+1) and a reference point x = x∗ are selected. By assuming Eqs.(19), (20) hold, x∗ is a symmetric point ...

work page 1978