ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization

Chao Xu; Fei Gao; Guodong Liu; Jiajun Yu; Li Wang; Pengxiang Zhou; Wentao Liu; Yanjun Cao; Yin He

arxiv: 2604.22715 · v1 · submitted 2026-04-24 · 💻 cs.RO

ATRS: Adaptive Trajectory Re-splitting via a Shared Neural Policy for Parallel Optimization

Jiajun Yu , Guodong Liu , Li Wang , Pengxiang Zhou , Wentao Liu , Yin He , Chao Xu , Fei Gao

show 1 more author

Yanjun Cao

This is my paper

Pith reviewed 2026-05-08 11:27 UTC · model grok-4.3

classification 💻 cs.RO

keywords adaptive re-splittingshared neural policyparallel ADMMtrajectory optimizationmotion planningreinforcement learningzero-shot generalizationmulti-agent MDP

0 comments

The pith

A shared neural policy inside the ADMM loop lets parallel planners adaptively re-split stagnating trajectory segments.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fixed decompositions in parallel ADMM trajectory optimization often stall in tightly constrained areas because a few lagging segments hold up the whole solve. The paper shows that turning the choice of when and where to re-split into a multi-agent decision process solved by one shared neural policy removes this rigidity. Every segment acts as an identical agent, so the same network handles any number of pieces and needs no environment geometry as input. Only solver internal states drive the policy, which supports direct use on new problems. A separate election step picks only the single most stuck segment for re-splitting to keep the iterations stable.

Core claim

ATRS embeds a shared Deep Reinforcement Learning policy into the parallel ADMM loop by casting adaptive re-splitting as a Multi-Agent Shared-Policy Markov Decision Process in which all trajectory segments are homogeneous agents. The single neural network produces size-invariant decisions and operates solely on internal solver states, enabling zero-shot generalization to unseen environments. A Confidence-Based Election mechanism selects only the most stagnating segment for re-splitting at each step to preserve numerical stability. Simulations confirm up to 26.0 percent fewer iterations and 19.1 percent less computation time, with real-world tests showing real-time replanning at 35 ms per step

What carries the argument

The shared neural policy network within the Multi-Agent Shared-Policy Markov Decision Process that maps solver internal states to re-splitting actions for homogeneous trajectory-segment agents.

Load-bearing premise

That the policy-driven re-splitting of only the most stagnating segment will consistently speed global convergence without creating new delays or instability in the ADMM iterations.

What would settle it

An experiment on a constrained motion-planning instance in which the adaptive re-splitting procedure increases the total number of ADMM iterations required to reach the same solution tolerance.

Figures

Figures reproduced from arXiv: 2604.22715 by Chao Xu, Fei Gao, Guodong Liu, Jiajun Yu, Li Wang, Pengxiang Zhou, Wentao Liu, Yanjun Cao, Yin He.

**Figure 1.** Figure 1: Real-world quadrotor navigation through a forest environment. The trajectory color encodes segment density from low (blue) to high (red). The inset illustrates adaptive re-splitting: the stagnating segment P1P2 is subdivided by inserting intermediate waypoints S1–S4, injecting local degrees of freedom to accelerate convergence. pioneered the use of Consensus ADMM (CADMM) to decompose long trajectories int… view at source ↗

**Figure 2.** Figure 2: Illustration of the re-splitting mechanism. When a trajectory segment stagnates in a highly constrained region (pink), the policy πθ triggers a re-splitting. The newly introduced sub-segments (green) inject local degrees of freedom, enabling faster convergence. parameterize the global trajectory as a piecewise polynomial consisting of N segments, where N is initially determined by the front-end path planne… view at source ↗

**Figure 3.** Figure 3: System architecture. The system consists of four modules: (A) Parallel Environments, (B) Decision Making via Shared Policy Inference and Confidence-Based Election, (C) Trajectory Re-splitting and ADMM Solving, and (D) Centralized TD3 Policy Update. During training (solid blue lines), A produces parallel rollouts that flow through B and C, with transitions stored in the replay buffer for off-policy updates … view at source ↗

**Figure 4.** Figure 4: Overview of the benchmark environments. (a) Map A: Sparse obstacle distribution generated by 3D Perlin noise (used for training). (b) Map B: A highly cluttered environment containing 2,500 random cubic obstacles. (c) Map C: A structured environment with 120 irregular polygonal cells connected by narrow passages. ratio ρ as the fraction of obstacle voxels in the 3D grid. We define three density levels: Spar… view at source ↗

**Figure 5.** Figure 5: Iteration distribution across unseen environments. ATRS (gray) exhibits a lower median and less variance than TOP (brown) in all three scenarios. 3) Generalization Verification: To further validate zero-shot generalization, we evaluate ATRS on three unseen environments, including an in-distribution variant and two structurally distinct layouts ( view at source ↗

**Figure 6.** Figure 6: Ablation study on Map A. Left axis shows ADMM iterations (bars); right axis shows success rate (line) view at source ↗

**Figure 7.** Figure 7: Real-world experiments. (a) Global trajectory (85 m, 20 segments) planned on a pre-built point-cloud map. ATRS reduces iterations from 1007 to 436. (b) Online replanning in a sparse region; no split occurs. (c) Online replanning near dense obstacles; orange dots mark the boundaries of the adaptively inserted segment. [6] C. Wang, J. Bingham, and M. Tomizuka, “Trajectory splitting: A distributed formulation… view at source ↗

read the original abstract

Parallel trajectory optimization via the Alternating Direction Method of Multipliers (ADMM) has emerged as a scalable approach to long-horizon motion planning. However, existing frameworks typically decompose the problem into parallel subproblems based on a predefined fixed structure. Such structural rigidity often causes optimization stagnation in highly constrained regions, where a few lagging subproblems delay global convergence. A natural remedy is to adaptively re-split these stagnating segments online. Yet, deciding when, where, and how to split exceeds the capability of rule-based heuristics. To this end, we propose ATRS, a novel framework that embeds a shared Deep Reinforcement Learning policy into the parallel ADMM loop. We formulate this adaptive adjustment as a Multi-Agent Shared-Policy Markov Decision Process, where all trajectory segments act as homogeneous agents and share a unified neural policy network. This parameter-sharing architecture endows the system with size invariance, enabling it to handle dynamically changing segment counts during re-splitting and generalize to arbitrary trajectory lengths. Furthermore, our formulation inherently supports zero-shot generalization to unseen environments, as our network relies solely on the internal states of the numerical solver rather than on the geometric features of the environment. To ensure solver stability, a Confidence-Based Election mechanism selects only the most stagnating segment for re-splitting at each step. Extensive simulations demonstrate that ATRS accelerates convergence, reducing the number of iterations by up to 26.0% and the computation time by up to 19.1%. Real-world experiments further confirm its applicability to both large-scale offline global planning and real-time onboard replanning within 35 ms per cycle, with no sim-to-real degradation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ATRS trains a shared RL policy to pick which ADMM trajectory segment to re-split next, and the tests show up to 26% fewer iterations with real-robot timing under 35 ms.

read the letter

The paper's main contribution is embedding a parameter-shared DRL policy inside the ADMM loop so that trajectory segments act as agents and decide online whether and where to re-split when some subproblems lag. The policy takes only solver-internal quantities like primal and dual residuals, which gives size invariance and the claimed zero-shot transfer to new environments without retraining on geometry. They add a confidence-based election step that picks only the single most-stagnating segment, which is meant to keep the parallel solves stable. Simulations report the 26% iteration drop and 19% time cut, and the real-robot section shows both offline global plans and onboard replanning that stays inside 35 ms with no obvious sim-to-real gap. That hardware timing result is the part that stands out for anyone who actually deploys these planners. The approach is straightforward to understand once you see the MDP formulation and the shared-network trick. The weakest part is the lack of clear ablations that would show whether the learned policy beats a simple rule or random re-split on worst-case runs. Without those, it is still possible that the re-split occasionally creates new dual-variable jumps or forces extra iterations elsewhere, even if the average numbers look good. The paper does report multiple scenarios and real hardware, but the tables would need to include failure rates or variance across seeds to make the stability claim fully convincing. This is useful reading for people who already work with ADMM or other parallel trajectory optimizers in robotics. If you run long-horizon planning on robots and care about iteration count, the integration details are worth seeing. I would send it out for peer review; the empirical side on hardware is concrete enough that referees can check the numbers directly.

Referee Report

2 major / 1 minor

Summary. The paper proposes ATRS, a framework embedding a shared Deep Reinforcement Learning policy into parallel ADMM trajectory optimization to adaptively re-split stagnating segments. It models the problem as a Multi-Agent Shared-Policy MDP with parameter sharing for size invariance and zero-shot generalization to unseen environments using only solver-internal states. A Confidence-Based Election mechanism selects the most stagnating segment for re-splitting to maintain stability. Simulations report up to 26.0% fewer iterations and 19.1% less computation time, with real-world validation for offline planning and real-time replanning at 35 ms per cycle.

Significance. If the empirical claims hold with proper controls, ATRS offers a scalable way to overcome fixed-decomposition stagnation in parallel motion planning, with the parameter-sharing architecture providing a clear advantage for variable-length trajectories and environment-agnostic operation. The integration of RL directly into the solver loop is a practical contribution to robotics optimization.

major comments (2)

[Abstract] Abstract: The headline performance claims (26.0% iteration reduction, 19.1% time reduction) are presented without reference to the specific baselines, number of trials, variance, or statistical tests used; this information is load-bearing for attributing gains to the learned policy rather than re-splitting in general.
[Experimental evaluation] Experimental evaluation: No ablation replacing the shared policy with a random or null re-split baseline is reported, nor is a worst-case failure rate or oscillation analysis provided for the Confidence-Based Election; without this, it remains unclear whether adaptive re-splitting consistently avoids new stagnation or dual-variable discontinuities in ADMM.

minor comments (1)

[Abstract] Abstract: The statement of 'no sim-to-real degradation' would be strengthened by explicit metrics (e.g., success rate, cost difference) used to quantify it.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below, indicating where revisions will be made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: The headline performance claims (26.0% iteration reduction, 19.1% time reduction) are presented without reference to the specific baselines, number of trials, variance, or statistical tests used; this information is load-bearing for attributing gains to the learned policy rather than re-splitting in general.

Authors: We agree that the abstract would benefit from explicit context on the performance claims. The reported reductions are measured relative to standard parallel ADMM with fixed decomposition. In the revised manuscript, we will update the abstract to name this baseline, note that full details on the number of trials, variance, and statistical tests appear in the experimental evaluation, and clarify that gains are attributed to the learned policy through controlled comparisons. This revision ensures the abstract is self-contained while preserving the manuscript's emphasis on the experimental section. revision: yes
Referee: [Experimental evaluation] Experimental evaluation: No ablation replacing the shared policy with a random or null re-split baseline is reported, nor is a worst-case failure rate or oscillation analysis provided for the Confidence-Based Election; without this, it remains unclear whether adaptive re-splitting consistently avoids new stagnation or dual-variable discontinuities in ADMM.

Authors: We acknowledge that additional controls would strengthen the attribution of benefits to the learned policy and the stability of the election mechanism. Our current results compare ATRS to fixed-decomposition ADMM and rule-based heuristics, but we will add a random re-split ablation in the revised experimental section to isolate the policy's contribution. We will also incorporate an analysis of the Confidence-Based Election, reporting worst-case failure rates (e.g., fraction of trials where re-splitting increases iterations) and checking for oscillations or dual-variable discontinuities via post-re-split residual monitoring across all trials. These additions directly address concerns about new stagnation. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL-solver integration with external validation

full rationale

The paper introduces ATRS as a practical embedding of a shared DRL policy into ADMM iterations for adaptive re-splitting. All reported gains (≤26% fewer iterations, ≤19.1% less time) and the zero-shot claim are presented as outcomes of simulation and real-world experiments rather than as quantities derived from the method's own equations or fitted parameters. No self-definitional loops, fitted-input-as-prediction, or load-bearing self-citations appear in the abstract or described framework; the policy network, Confidence-Based Election, and size-invariance are design decisions whose net benefit is tested externally. The derivation chain therefore remains self-contained against benchmarks outside the fitted values.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on empirical effectiveness of the trained policy and stability of re-splitting; no first-principles derivation is provided.

free parameters (1)

Shared neural policy network weights
Learned via reinforcement learning on internal solver states; specific architecture and training details not given in abstract.

axioms (2)

domain assumption Internal states of the ADMM numerical solver are sufficient to decide beneficial re-splitting actions without access to environment geometry.
Invoked to support zero-shot generalization claim.
domain assumption Selective re-splitting of only the most stagnating segment via Confidence-Based Election preserves overall ADMM convergence properties.
Required for solver stability during adaptive adjustments.

pith-pipeline@v0.9.0 · 5621 in / 1343 out tokens · 60604 ms · 2026-05-08T11:27:23.893734+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages

[1]

Toward integrated large-scale environmental monitoring us- ing wsn/uav/crowdsensing: A review of applications, signal processing, and future perspectives,

A. Fascista, “Toward integrated large-scale environmental monitoring us- ing wsn/uav/crowdsensing: A review of applications, signal processing, and future perspectives,”Sensors, vol. 22, no. 5, p. 1824, 2022

work page 2022
[2]

Multi-flight path planning for a single agricultural drone in a regular farmland area,

H. Dong, X. Ma, and S. Zhang, “Multi-flight path planning for a single agricultural drone in a regular farmland area,”Sustainability, vol. 17, no. 6, p. 2433, 2025

work page 2025
[3]

Gpops-ii: A matlab software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming,

M. A. Patterson and A. V . Rao, “Gpops-ii: A matlab software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming,” ACM Transactions on Mathematical Software (TOMS), vol. 41, no. 1, pp. 1–37, 2014

work page 2014
[4]

Finding locally optimal, collision-free trajectories with sequential con- vex optimization

J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collision-free trajectories with sequential con- vex optimization.” inRobotics: Science and Systems, vol. 9, no. 1. Berlin, Germany, 2013, pp. 1–10

work page 2013
[5]

Geometrically constrained tra- jectory optimization for multicopters,

Z. Wang, X. Zhou, C. Xu, and F. Gao, “Geometrically constrained tra- jectory optimization for multicopters,”IEEE Transactions on Robotics, vol. 38, no. 5, pp. 3259–3278, 2022. Fig. 7.Real-world experiments. (a) Global trajectory (85 m, 20 segments) planned on a pre-built point-cloud map. ATRS reduces iterations from 1007 to 436. (b) Online replanning in a...

work page 2022
[6]

Trajectory splitting: A distributed formulation for collision avoiding trajectory optimization,

C. Wang, J. Bingham, and M. Tomizuka, “Trajectory splitting: A distributed formulation for collision avoiding trajectory optimization,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 8113–8120

work page 2021
[7]

Top: Trajectory op- timization via parallel optimization towards constant time complexity,

J. Yu, N. Chen, G. Liu, C. Xu, F. Gao, and Y . Cao, “Top: Trajectory op- timization via parallel optimization towards constant time complexity,” IEEE Robotics and Automation Letters, 2025

work page 2025
[8]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022

work page 2022
[9]

Learning to warm- start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm- start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

work page 2024
[10]

Accelerating quadratic opti- mization with reinforcement learning,

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Accelerating quadratic opti- mization with reinforcement learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 21 043–21 055, 2021

work page 2021
[11]

Minimum snap trajectory generation and control for quadrotors,

D. Mellinger and V . Kumar, “Minimum snap trajectory generation and control for quadrotors,” inIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2011, pp. 2520–2525

work page 2011
[12]

Navrl: Learning safe flight in dynamic environments,

Z. Xu, X. Han, H. Shen, H. Jin, and K. Shimada, “Navrl: Learning safe flight in dynamic environments,”IEEE Robotics and Automation Letters, 2025

work page 2025
[13]

Learning to initialize trajectory optimization for vision-based autonomous flight in unknown environments,

Y . Chen, J. Li, W. Qin, Y . Hua, X. Dong, and Q. Li, “Learning to initialize trajectory optimization for vision-based autonomous flight in unknown environments,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 9525–9532

work page 2025
[14]

Deep learning for opti- mization of trajectories for quadrotors,

Y . Wu, X. Sun, I. Spasojevic, and V . Kumar, “Deep learning for opti- mization of trajectories for quadrotors,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2479–2486, 2024

work page 2024
[15]

One policy to control them all: Shared modular policies for agent-agnostic control,

W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 4455–4464

work page 2020
[16]

Multi-agent actor-critic for mixed cooperative-competitive environ- ments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environ- ments,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[17]

Markov games as a framework for multi-agent rein- forcement learning,

M. L. Littman, “Markov games as a framework for multi-agent rein- forcement learning,” inMachine learning proceedings 1994. Elsevier, 1994, pp. 157–163

work page 1994
[18]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 1587–1596

work page 2018
[19]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011
[20]

Fast-lio2: Fast direct lidar- inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar- inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

work page 2053
[21]

Safety-assured high-speed navigation for mavs,

Y . Ren, F. Zhu, G. Lu, Y . Cai, L. Yin, F. Kong, J. Lin, N. Chen, and F. Zhang, “Safety-assured high-speed navigation for mavs,”Science Robotics, vol. 10, no. 98, p. eado6187, 2025

work page 2025

[1] [1]

Toward integrated large-scale environmental monitoring us- ing wsn/uav/crowdsensing: A review of applications, signal processing, and future perspectives,

A. Fascista, “Toward integrated large-scale environmental monitoring us- ing wsn/uav/crowdsensing: A review of applications, signal processing, and future perspectives,”Sensors, vol. 22, no. 5, p. 1824, 2022

work page 2022

[2] [2]

Multi-flight path planning for a single agricultural drone in a regular farmland area,

H. Dong, X. Ma, and S. Zhang, “Multi-flight path planning for a single agricultural drone in a regular farmland area,”Sustainability, vol. 17, no. 6, p. 2433, 2025

work page 2025

[3] [3]

Gpops-ii: A matlab software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming,

M. A. Patterson and A. V . Rao, “Gpops-ii: A matlab software for solving multiple-phase optimal control problems using hp-adaptive gaussian quadrature collocation methods and sparse nonlinear programming,” ACM Transactions on Mathematical Software (TOMS), vol. 41, no. 1, pp. 1–37, 2014

work page 2014

[4] [4]

Finding locally optimal, collision-free trajectories with sequential con- vex optimization

J. Schulman, J. Ho, A. X. Lee, I. Awwal, H. Bradlow, and P. Abbeel, “Finding locally optimal, collision-free trajectories with sequential con- vex optimization.” inRobotics: Science and Systems, vol. 9, no. 1. Berlin, Germany, 2013, pp. 1–10

work page 2013

[5] [5]

Geometrically constrained tra- jectory optimization for multicopters,

Z. Wang, X. Zhou, C. Xu, and F. Gao, “Geometrically constrained tra- jectory optimization for multicopters,”IEEE Transactions on Robotics, vol. 38, no. 5, pp. 3259–3278, 2022. Fig. 7.Real-world experiments. (a) Global trajectory (85 m, 20 segments) planned on a pre-built point-cloud map. ATRS reduces iterations from 1007 to 436. (b) Online replanning in a...

work page 2022

[6] [6]

Trajectory splitting: A distributed formulation for collision avoiding trajectory optimization,

C. Wang, J. Bingham, and M. Tomizuka, “Trajectory splitting: A distributed formulation for collision avoiding trajectory optimization,” in2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021, pp. 8113–8120

work page 2021

[7] [7]

Top: Trajectory op- timization via parallel optimization towards constant time complexity,

J. Yu, N. Chen, G. Liu, C. Xu, F. Gao, and Y . Cao, “Top: Trajectory op- timization via parallel optimization towards constant time complexity,” IEEE Robotics and Automation Letters, 2025

work page 2025

[8] [8]

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022

work page 2022

[9] [9]

Learning to warm- start fixed-point optimization algorithms,

R. Sambharya, G. Hall, B. Amos, and B. Stellato, “Learning to warm- start fixed-point optimization algorithms,”Journal of Machine Learning Research, vol. 25, no. 166, pp. 1–46, 2024

work page 2024

[10] [10]

Accelerating quadratic opti- mization with reinforcement learning,

J. Ichnowski, P. Jain, B. Stellato, G. Banjac, M. Luo, F. Borrelli, J. E. Gonzalez, I. Stoica, and K. Goldberg, “Accelerating quadratic opti- mization with reinforcement learning,”Advances in Neural Information Processing Systems, vol. 34, pp. 21 043–21 055, 2021

work page 2021

[11] [11]

Minimum snap trajectory generation and control for quadrotors,

D. Mellinger and V . Kumar, “Minimum snap trajectory generation and control for quadrotors,” inIEEE International Conference on Robotics and Automation (ICRA). IEEE, 2011, pp. 2520–2525

work page 2011

[12] [12]

Navrl: Learning safe flight in dynamic environments,

Z. Xu, X. Han, H. Shen, H. Jin, and K. Shimada, “Navrl: Learning safe flight in dynamic environments,”IEEE Robotics and Automation Letters, 2025

work page 2025

[13] [13]

Learning to initialize trajectory optimization for vision-based autonomous flight in unknown environments,

Y . Chen, J. Li, W. Qin, Y . Hua, X. Dong, and Q. Li, “Learning to initialize trajectory optimization for vision-based autonomous flight in unknown environments,” in2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025, pp. 9525–9532

work page 2025

[14] [14]

Deep learning for opti- mization of trajectories for quadrotors,

Y . Wu, X. Sun, I. Spasojevic, and V . Kumar, “Deep learning for opti- mization of trajectories for quadrotors,”IEEE Robotics and Automation Letters, vol. 9, no. 3, pp. 2479–2486, 2024

work page 2024

[15] [15]

One policy to control them all: Shared modular policies for agent-agnostic control,

W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” inInternational Conference on Machine Learning. PMLR, 2020, pp. 4455–4464

work page 2020

[16] [16]

Multi-agent actor-critic for mixed cooperative-competitive environ- ments,

R. Lowe, Y . I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environ- ments,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[17] [17]

Markov games as a framework for multi-agent rein- forcement learning,

M. L. Littman, “Markov games as a framework for multi-agent rein- forcement learning,” inMachine learning proceedings 1994. Elsevier, 1994, pp. 157–163

work page 1994

[18] [18]

Addressing function approxi- mation error in actor-critic methods,

S. Fujimoto, H. Hoof, and D. Meger, “Addressing function approxi- mation error in actor-critic methods,” inInternational Conference on Machine Learning. PMLR, 2018, pp. 1587–1596

work page 2018

[19] [19]

Distributed optimization and statistical learning via the alternating direction method of multipliers,

S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Ecksteinet al., “Distributed optimization and statistical learning via the alternating direction method of multipliers,”Foundations and Trends® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2011

work page 2011

[20] [20]

Fast-lio2: Fast direct lidar- inertial odometry,

W. Xu, Y . Cai, D. He, J. Lin, and F. Zhang, “Fast-lio2: Fast direct lidar- inertial odometry,”IEEE Transactions on Robotics, vol. 38, no. 4, pp. 2053–2073, 2022

work page 2053

[21] [21]

Safety-assured high-speed navigation for mavs,

Y . Ren, F. Zhu, G. Lu, Y . Cai, L. Yin, F. Kong, J. Lin, N. Chen, and F. Zhang, “Safety-assured high-speed navigation for mavs,”Science Robotics, vol. 10, no. 98, p. eado6187, 2025

work page 2025