Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions

Bochen Yu; Henrik Ebel; Jingshan Chen; Peter Eberhard

arxiv: 2605.19562 · v1 · pith:3CVTLOVJnew · submitted 2026-05-19 · 💻 cs.RO · cs.LG· math.OC

Learning-Accelerated Optimization-based Trajectory Planning for Cooperative Aerial-Ground Handover Missions

Jingshan Chen , Bochen Yu , Henrik Ebel , Peter Eberhard This is my paper

Pith reviewed 2026-05-20 05:13 UTC · model grok-4.3

classification 💻 cs.RO cs.LGmath.OC

keywords UAVUGVtrajectory planninghandover missionsLSTM networkswarm-start optimizationlearning-augmented planningmulti-robot systems

0 comments

The pith

LSTM-based predictions warm-start optimization to deliver over threefold faster trajectory planning for UAV-UGV handovers with 100 percent success.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a hybrid framework can make centralized trajectory optimization practical for real-time use in cooperative aerial-ground missions. Decoupled encoder-decoder LSTM networks first predict coordinated handover trajectories directly from task specifications. These predictions then serve as informed starting points that let the downstream optimizer converge quickly to dynamically feasible and task-optimal solutions. If the approach works, it removes the main barrier to deploying model-based planners on heterogeneous robot teams by cutting computation time dramatically while preserving reliability. Benchmark results show the combined system runs more than three times faster than pure optimization and never fails to find a solution.

Core claim

The central claim is that decoupled encoder-decoder LSTM networks generate coordinated handover trajectory predictions from task specifications; these predictions act as warm starts for a centralized trajectory optimizer and thereby produce more than a threefold speedup together with a 100 percent optimization success rate relative to cold-start optimization.

What carries the argument

Decoupled encoder-decoder LSTM networks that produce coordinated trajectory predictions used as warm starts for the centralized optimizer.

If this is right

Real-time trajectory generation becomes feasible for dynamic aerial-ground handover operations.
Dynamic feasibility and task optimality are retained through the final model-based optimization step.
The method supports reliable planning across varied task specifications for heterogeneous robot teams.
Data-driven inference combined with model-based refinement reduces overall planning latency without sacrificing guarantees.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same warm-start pattern could accelerate optimization in other multi-robot coordination problems such as formation control or object transport.
Online retraining of the LSTM component might allow the system to adapt when mission conditions drift beyond the original training distribution.
The framework offers a reusable template for speeding up any model-based planner that currently suffers from poor initial guesses.

Load-bearing premise

The LSTM predictions are accurate enough to serve as warm starts that let the optimizer reliably reach feasible, task-optimal solutions for the tested mission specifications.

What would settle it

Running the optimizer on a new set of handover tasks where the learning-augmented version either fails to converge or requires more time than cold-start optimization would falsify the speedup and success-rate claims.

Figures

Figures reproduced from arXiv: 2605.19562 by Bochen Yu, Henrik Ebel, Jingshan Chen, Peter Eberhard.

**Figure 1.** Figure 1: Overview of the proposed learning-accelerated trajectory planning pipeline. The baseline planner generates expert demonstrations for surrogate training offline, and the trained surrogate then provides informed warm starts to the same planner during online deployment. 2 Related Work Learning-based methods have been widely explored in robotics to complement or even replace traditional planning and control pi… view at source ↗

**Figure 2.** Figure 2: Surrogate architecture with an agent-decoupled encoder–decoder LSTM structure. R 4×3 be defined by A11 = A22 = 1 and Aij = 0 otherwise. For the UGV state, C = A. For the UAV, we set C = [A 08×3] ⊤. After the spatial shift, the input vector of our surrogate planner is constructed by concatenating the relative start and goal states τ =: [x rel ugv,0 ⊤ x rel uav,0 ⊤ x rel ugv,N ⊤ x rel uav,N ⊤ ] ⊤ ∈ R 32. Co… view at source ↗

**Figure 3.** Figure 3: Comparison of convergence iterations for cold and warm starts. Each pair of connected points represents a matched run, i.e., a cold and warm start for the same start and goal states. The white circular markers indicate empirical means, with red bars showing 95% confidence intervals. starts are clearly reflected in the reduced iteration counts. Although a few outliers near the boundary of the training dist… view at source ↗

**Figure 4.** Figure 4: Representative coordinated trajectories for a randomly selected handover mission, illustrating the raw predictions from the surrogate planner (IL prediction), the refined trajectories obtained by using these predictions as a warm start (warm), and the solutions from a cold-start optimization (cold). UAV and UGV. Using these initial guesses, our approach achieves a more than 60% reduction in computational … view at source ↗

read the original abstract

This paper presents a learning-augmented trajectory planning framework for cooperative unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) handover missions. While centralized trajectory optimization ensures dynamic feasibility and task optimality, its high computational cost limits real-time applicability. We propose a neural surrogate planner utilizing decoupled encoder-decoder long short-term memory (LSTM) networks to generate coordinated handover trajectory predictions from the task specifications. These predictions serve as informed warm starts for the downstream centralized optimizer, thereby accelerating convergence to dynamically feasible solutions. Benchmark evaluations demonstrate that the learning-augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization. The results indicate that combining data-driven inference with model-based refinement enables fast and reliable trajectory generation for heterogeneous multi-robot systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LSTM warm starts give over 3x speedup and 100% success for UAV-UGV handover optimization in the reported benchmarks.

read the letter

The paper shows that a decoupled encoder-decoder LSTM can provide effective warm starts for centralized trajectory optimization in cooperative UAV and UGV handover missions. This leads to more than a threefold reduction in computation time and a 100% success rate in their benchmark tests compared to starting from scratch. They describe training the LSTM directly on task specifications to output predicted trajectories for both vehicles. These predictions then initialize the optimizer, which refines them to ensure dynamic feasibility and optimality for the handover. The separation into encoder-decoder for each robot type keeps things manageable while the central optimizer handles coordination. The benchmarks include comparisons against cold-start optimization across various mission setups, with reported speedups and reliability metrics. This approach does a good job of making optimization methods practical for real-time applications in heterogeneous robot teams without changing the underlying problem formulation. The results give a clear picture of the performance improvement in simulation. A soft spot is the limited information on how the training data was generated and what range of handover scenarios it covers. If the test cases are close to the training distribution, the high success rate may not extend as far as hoped to new situations. The paper also doesn't compare against other possible warm-start strategies like simpler heuristics or different network types, which would help put the gains in context. Overall, this is for people working on trajectory planning for multi-robot systems, particularly those interested in learning to accelerate model-based methods. It would be relevant for applications in logistics or disaster response where quick planning for aerial-ground cooperation matters. I would recommend sending it for peer review. The central claims are supported by the described experiments, and the work is a solid incremental step that others can build on.

Referee Report

1 major / 1 minor

Summary. The paper presents a learning-augmented trajectory planning framework for cooperative UAV-UGV handover missions. It uses decoupled encoder-decoder LSTM networks to predict coordinated handover trajectories from task specifications; these predictions initialize a centralized optimizer as warm starts to accelerate convergence while preserving dynamic feasibility and task optimality. Benchmark evaluations claim more than a threefold speedup and 100% optimization success rate relative to cold-start optimization.

Significance. If the reported performance gains hold under wider conditions, the approach would meaningfully advance real-time planning for heterogeneous multi-robot systems by cleanly separating data-driven prediction from model-based refinement. The absence of circularity between the LSTM surrogate and the optimizer, together with direct empirical comparisons, strengthens the contribution.

major comments (1)

[Benchmark evaluations] Benchmark evaluations section: the central claims of >3x speedup and 100% success rate are stated without specification of training-data distribution, baseline optimizer settings, number of scenarios or trials, or statistical significance testing. These omissions leave the performance assertions only partially supported and require additional reporting to substantiate the speedup and reliability results.

minor comments (1)

[Abstract] The abstract and experimental description would benefit from explicit mention of the range of task specifications used for training and testing to clarify generalization.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We are grateful to the referee for the positive overall assessment and the recommendation for minor revision. We address the single major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Benchmark evaluations] Benchmark evaluations section: the central claims of >3x speedup and 100% success rate are stated without specification of training-data distribution, baseline optimizer settings, number of scenarios or trials, or statistical significance testing. These omissions leave the performance assertions only partially supported and require additional reporting to substantiate the speedup and reliability results.

Authors: We thank the referee for this constructive observation. We agree that the Benchmark evaluations section would be strengthened by explicit reporting of these elements. In the revised manuscript, we will expand the section to specify the training-data distribution used for the LSTM networks, the precise settings of the baseline optimizer (including solver tolerances and termination criteria for the cold-start comparisons), the number of scenarios evaluated and the number of trials conducted per scenario, and the results of statistical significance testing (e.g., paired t-tests on computation times). These additions will provide full substantiation for the reported >3x speedup and 100% success rate while preserving the existing empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper separates data-driven LSTM prediction of warm-start trajectories from the downstream centralized optimizer. Task specifications feed the decoupled encoder-decoder networks to produce initial guesses; the optimizer then refines them under the original dynamic feasibility and optimality constraints. Reported metrics (speedup and 100% success rate) are obtained from direct benchmark comparisons against cold-start baselines on concrete task specifications, providing independent empirical grounding rather than any reduction of predictions to fitted parameters by construction. No self-citation chains, uniqueness theorems, or ansatzes are invoked to force the central claims, and the architecture does not rename or smuggle in prior results as new derivations.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on the assumption that LSTM-generated trajectories are adequate warm starts for the optimizer and that the centralized optimizer itself guarantees dynamic feasibility when started from such points. No free parameters or invented entities are explicitly introduced in the abstract.

free parameters (1)

LSTM network architecture and training hyperparameters
Chosen to fit trajectory data for the surrogate planner; specific values not stated in abstract.

axioms (2)

domain assumption Centralized trajectory optimization ensures dynamic feasibility and task optimality when provided with suitable initial guesses.
Invoked in the abstract as the reason the neural predictions are useful.
domain assumption Decoupled encoder-decoder LSTM networks can generate coordinated handover trajectory predictions from task specifications.
Core premise of the neural surrogate component.

pith-pipeline@v0.9.0 · 5673 in / 1359 out tokens · 53829 ms · 2026-05-20T05:13:29.878007+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Benchmark evaluations demonstrate that the learning-augmented planning framework achieves more than a threefold speedup and 100% optimization success rate compared to cold start optimization.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

[1]

arXiv preprint (2025)

Agarwal, S., Alonso-Mora, J., Sun, S.: Decentralized real-time planning for multi-uav cooperative manipulation via imitation learning. arXiv preprint (2025). https://doi.org/10.48550/ARXIV.2510.17143

work page doi:10.48550/arxiv.2510.17143 2025
[2]

CasADi – A software framework for nonlinear optimization and optimal con- trol

Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M.: CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Program- ming Computation11(1), 1–36 (2018). https://doi.org/10.1007/s12532-018-0139-4

work page doi:10.1007/s12532-018-0139-4 2018
[3]

arXiv preprint (2025)

Banerjee, S., Cauligi, A., Pavone, M.: Deep learning warm starts for trajec- tory optimization on the international space station. arXiv preprint (2025). https://doi.org/10.48550/ARXIV.2505.05588

work page doi:10.48550/arxiv.2505.05588 2025
[4]

In: AIAA AVIATION 2022 Forum

Cao, P., Hwang, J.T., Bewley, T., Kuester, F.: Mission-oriented trajectory optimiza- tion for search-and-rescue multirotor UAVs in cluttered and GPS-denied environ- ments. In: AIAA AVIATION 2022 Forum. American Institute of Aeronautics and Astronautics (2022). https://doi.org/10.2514/6.2022-3999 12 Jingshan Chen et al

work page doi:10.2514/6.2022-3999 2022
[5]

at - Automatisierungstechnik 72(2), 80–90 (2024)

Chen, J., Luo, W., Ebel, H., Eberhard, P.: Optimization-based trajectory planning for transport collaboration of heterogeneous systems. at - Automatisierungstechnik 72(2), 80–90 (2024). https://doi.org/10.1515/auto-2023-0078

work page doi:10.1515/auto-2023-0078 2024
[6]

In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Chen, J., Xu, L., Ebel, H., Eberhard, P.: An online optimization-based trajec- tory planning approach for cooperative landing tasks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 12108–12114 (2025). https://doi.org/10.1109/iros60139.2025.11245920

work page doi:10.1109/iros60139.2025.11245920 2025
[7]

Relatively lazy: Indoor-outdoor navigation using vision and GNSS,

Diaz, M., Fevens, T., Paull, L.: Uncertainty-aware policy sampling and mixing for safe interactive imitation learning. In: Conference on Robots and Vision (CRV). pp. 72–78 (2021). https://doi.org/10.1109/crv52889.2021.00018

work page doi:10.1109/crv52889.2021.00018 2021
[8]

Ebel, H.: Distributed control and organization of communicating mobile robots. Ph.D. thesis, D¨ uren (2021)

work page 2021
[9]

Annual Re- view of Control, Robotics, and Autonomous Systems3(1), 269–296 (2020)

Hewing, L., Wabersich, K.P., Menner, M., Zeilinger, M.N.: Learning-based model predictive control: Toward safe learning in control. Annual Re- view of Control, Robotics, and Autonomous Systems3(1), 269–296 (2020). https://doi.org/10.1146/annurev-control-090419-075625

work page doi:10.1146/annurev-control-090419-075625 2020
[10]

Frontiers in Robotics and AI10 (2023)

Kou, Y., Liu, X., Ma, X., Xiang, Y., Zang, J.: Learning-based intelligent trajectory planning for auto navigation of magnetic robots. Frontiers in Robotics and AI10 (2023). https://doi.org/10.3389/frobt.2023.1281362

work page doi:10.3389/frobt.2023.1281362 2023
[11]

In: Proceed- ings of the Workshop on Design Automation for CPS and IoT

Li, Y., Eslamiat, H., Wang, N., Zhao, Z., Sanyal, A.K., Qiu, Q.: Autonomous waypoints planning and trajectory generation for multi-rotor UAVs. In: Proceed- ings of the Workshop on Design Automation for CPS and IoT. pp. 31–40 (2019). https://doi.org/10.1145/3313151.3313163

work page doi:10.1145/3313151.3313163 2019
[12]

In: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference

Lin, A.T., Debord, M., Estabridis, K., Hewer, G., Montufar, G., Osher, S.: Decen- tralized multi-agents by imitation of a centralized controller. In: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference. PMLR, vol. 145, pp. 619–651 (2022), https://proceedings.mlr.press/v145/lin22a.html

work page 2022
[13]

Expert Systems with Applications 173, 114660 (2021)

Madridano, ´A., Al-Kaff, A., Mart´ ın, D., de la Escalera, A.: Trajectory planning for multi-robot systems: Methods and applications. Expert Systems with Applications 173, 114660 (2021). https://doi.org/10.1016/j.eswa.2021.114660

work page doi:10.1016/j.eswa.2021.114660 2021
[14]

In: IEEE International Conference on Robotics and Automation (ICRA)

Pham, H., Pham, Q.C.: Time-optimal path tracking via reachability analysis. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 3007–3012 (2018). https://doi.org/10.1109/icra.2018.8460576

work page doi:10.1109/icra.2018.8460576 2018
[15]

In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics

Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and struc- tured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. PMLR, vol. 15, pp. 627–635 (2011), https://proceedings.mlr.press/v15/ross11a.html

work page 2011
[16]

In: European Conference on Mobile Robots (ECMR)

Sajja, A., Khorshidi, S., Houben, S., Bennewitz, M.: End-to-end multi-task policy learning from NMPC for quadruped locomotion. In: European Conference on Mobile Robots (ECMR). pp. 1–6 (2025). https://doi.org/10.1109/ecmr65884.2025.11163057

work page doi:10.1109/ecmr65884.2025.11163057 2025
[17]

Scientific Reports15(1) (2025)

Shiyu, F.: Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning. Scientific Reports15(1) (2025). https://doi.org/10.1038/s41598-025-21664-5

work page doi:10.1038/s41598-025-21664-5 2025
[18]

Springer London (2009)

Siciliano, B., Sciavicco, L., Villani, L., Oriolo, G.: Robotics: Modelling, Planning and Control. Springer London (2009). https://doi.org/10.1007/978-1-84628-642-1

work page doi:10.1007/978-1-84628-642-1 2009
[19]

McClellan, J., Haghani, N., Winder, J., Huang, F., and Tokekar, P

Xu, Z., Zhou, R., Yin, Y., Gao, H., Tomizuka, M., Li, J.: MATRIX: Multi-agent trajectory generation with diverse contexts. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 12650–12657 (2024). https://doi.org/10.1109/icra57147.2024.10610944

work page doi:10.1109/icra57147.2024.10610944 2024

[1] [1]

arXiv preprint (2025)

Agarwal, S., Alonso-Mora, J., Sun, S.: Decentralized real-time planning for multi-uav cooperative manipulation via imitation learning. arXiv preprint (2025). https://doi.org/10.48550/ARXIV.2510.17143

work page doi:10.48550/arxiv.2510.17143 2025

[2] [2]

CasADi – A software framework for nonlinear optimization and optimal con- trol

Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M.: CasADi: a software framework for nonlinear optimization and optimal control. Mathematical Program- ming Computation11(1), 1–36 (2018). https://doi.org/10.1007/s12532-018-0139-4

work page doi:10.1007/s12532-018-0139-4 2018

[3] [3]

arXiv preprint (2025)

Banerjee, S., Cauligi, A., Pavone, M.: Deep learning warm starts for trajec- tory optimization on the international space station. arXiv preprint (2025). https://doi.org/10.48550/ARXIV.2505.05588

work page doi:10.48550/arxiv.2505.05588 2025

[4] [4]

In: AIAA AVIATION 2022 Forum

Cao, P., Hwang, J.T., Bewley, T., Kuester, F.: Mission-oriented trajectory optimiza- tion for search-and-rescue multirotor UAVs in cluttered and GPS-denied environ- ments. In: AIAA AVIATION 2022 Forum. American Institute of Aeronautics and Astronautics (2022). https://doi.org/10.2514/6.2022-3999 12 Jingshan Chen et al

work page doi:10.2514/6.2022-3999 2022

[5] [5]

at - Automatisierungstechnik 72(2), 80–90 (2024)

Chen, J., Luo, W., Ebel, H., Eberhard, P.: Optimization-based trajectory planning for transport collaboration of heterogeneous systems. at - Automatisierungstechnik 72(2), 80–90 (2024). https://doi.org/10.1515/auto-2023-0078

work page doi:10.1515/auto-2023-0078 2024

[6] [6]

In: 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Chen, J., Xu, L., Ebel, H., Eberhard, P.: An online optimization-based trajec- tory planning approach for cooperative landing tasks. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 12108–12114 (2025). https://doi.org/10.1109/iros60139.2025.11245920

work page doi:10.1109/iros60139.2025.11245920 2025

[7] [7]

Relatively lazy: Indoor-outdoor navigation using vision and GNSS,

Diaz, M., Fevens, T., Paull, L.: Uncertainty-aware policy sampling and mixing for safe interactive imitation learning. In: Conference on Robots and Vision (CRV). pp. 72–78 (2021). https://doi.org/10.1109/crv52889.2021.00018

work page doi:10.1109/crv52889.2021.00018 2021

[8] [8]

Ebel, H.: Distributed control and organization of communicating mobile robots. Ph.D. thesis, D¨ uren (2021)

work page 2021

[9] [9]

Annual Re- view of Control, Robotics, and Autonomous Systems3(1), 269–296 (2020)

Hewing, L., Wabersich, K.P., Menner, M., Zeilinger, M.N.: Learning-based model predictive control: Toward safe learning in control. Annual Re- view of Control, Robotics, and Autonomous Systems3(1), 269–296 (2020). https://doi.org/10.1146/annurev-control-090419-075625

work page doi:10.1146/annurev-control-090419-075625 2020

[10] [10]

Frontiers in Robotics and AI10 (2023)

Kou, Y., Liu, X., Ma, X., Xiang, Y., Zang, J.: Learning-based intelligent trajectory planning for auto navigation of magnetic robots. Frontiers in Robotics and AI10 (2023). https://doi.org/10.3389/frobt.2023.1281362

work page doi:10.3389/frobt.2023.1281362 2023

[11] [11]

In: Proceed- ings of the Workshop on Design Automation for CPS and IoT

Li, Y., Eslamiat, H., Wang, N., Zhao, Z., Sanyal, A.K., Qiu, Q.: Autonomous waypoints planning and trajectory generation for multi-rotor UAVs. In: Proceed- ings of the Workshop on Design Automation for CPS and IoT. pp. 31–40 (2019). https://doi.org/10.1145/3313151.3313163

work page doi:10.1145/3313151.3313163 2019

[12] [12]

In: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference

Lin, A.T., Debord, M., Estabridis, K., Hewer, G., Montufar, G., Osher, S.: Decen- tralized multi-agents by imitation of a centralized controller. In: Proceedings of the 2nd Mathematical and Scientific Machine Learning Conference. PMLR, vol. 145, pp. 619–651 (2022), https://proceedings.mlr.press/v145/lin22a.html

work page 2022

[13] [13]

Expert Systems with Applications 173, 114660 (2021)

Madridano, ´A., Al-Kaff, A., Mart´ ın, D., de la Escalera, A.: Trajectory planning for multi-robot systems: Methods and applications. Expert Systems with Applications 173, 114660 (2021). https://doi.org/10.1016/j.eswa.2021.114660

work page doi:10.1016/j.eswa.2021.114660 2021

[14] [14]

In: IEEE International Conference on Robotics and Automation (ICRA)

Pham, H., Pham, Q.C.: Time-optimal path tracking via reachability analysis. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 3007–3012 (2018). https://doi.org/10.1109/icra.2018.8460576

work page doi:10.1109/icra.2018.8460576 2018

[15] [15]

In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics

Ross, S., Gordon, G., Bagnell, D.: A reduction of imitation learning and struc- tured prediction to no-regret online learning. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. PMLR, vol. 15, pp. 627–635 (2011), https://proceedings.mlr.press/v15/ross11a.html

work page 2011

[16] [16]

In: European Conference on Mobile Robots (ECMR)

Sajja, A., Khorshidi, S., Houben, S., Bennewitz, M.: End-to-end multi-task policy learning from NMPC for quadruped locomotion. In: European Conference on Mobile Robots (ECMR). pp. 1–6 (2025). https://doi.org/10.1109/ecmr65884.2025.11163057

work page doi:10.1109/ecmr65884.2025.11163057 2025

[17] [17]

Scientific Reports15(1) (2025)

Shiyu, F.: Reinforcement learning-driven deep learning approaches for optimized robot trajectory planning. Scientific Reports15(1) (2025). https://doi.org/10.1038/s41598-025-21664-5

work page doi:10.1038/s41598-025-21664-5 2025

[18] [18]

Springer London (2009)

Siciliano, B., Sciavicco, L., Villani, L., Oriolo, G.: Robotics: Modelling, Planning and Control. Springer London (2009). https://doi.org/10.1007/978-1-84628-642-1

work page doi:10.1007/978-1-84628-642-1 2009

[19] [19]

McClellan, J., Haghani, N., Winder, J., Huang, F., and Tokekar, P

Xu, Z., Zhou, R., Yin, Y., Gao, H., Tomizuka, M., Li, J.: MATRIX: Multi-agent trajectory generation with diverse contexts. In: IEEE International Conference on Robotics and Automation (ICRA). pp. 12650–12657 (2024). https://doi.org/10.1109/icra57147.2024.10610944

work page doi:10.1109/icra57147.2024.10610944 2024