Diffusion-based 4D Trajectory Prediction and Distributed Control for UAV Swarms

Haoang Li; Hongliang Lu; Tianshun Li; Xinhu Zheng

arxiv: 2606.31197 · v1 · pith:AP6KLWSZnew · submitted 2026-06-30 · 💻 cs.RO

Diffusion-based 4D Trajectory Prediction and Distributed Control for UAV Swarms

Tianshun Li , Hongliang Lu , Haoang Li , Xinhu Zheng This is my paper

Pith reviewed 2026-07-01 05:31 UTC · model grok-4.3

classification 💻 cs.RO

keywords UAV swarm4D trajectory predictiondiffusion modeldistributed nonlinear model predictive controlformation controlresidual dynamicsreal-time control

0 comments

The pith

A diffusion model that refines axis-wise forecasts supplies the uncertainty estimates needed for real-time distributed nonlinear model predictive control of UAV swarms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that UAV swarms can predict and track four-dimensional trajectories more accurately by first forecasting motion separately along each axis and then using a diffusion process to correct the remaining time-correlated errors. These corrected predictions are fed directly into a distributed controller that keeps the vehicles in formation while respecting real-time limits. A reader would care because reliable swarm flight in cluttered low-altitude spaces would open practical uses such as coordinated inspection or delivery without constant human oversight. The authors also release a new synchronized dataset covering six different airspace scenarios to support this kind of work.

Core claim

The authors claim that a dimension-decoupled coarse-to-fine forecaster combined with a diffusion-based residual dynamics refinement module, when inserted into an uncertainty-aware distributed nonlinear model predictive control loop, produces formation-stable trajectories whose average tracking error falls below 0.07 meters at 34 frames per second in urban and industrial settings.

What carries the argument

The diffusion-based residual dynamics refinement module, which models the remaining temporally correlated uncertainties after an initial axis-wise forecast and supplies those corrections to the distributed controller.

If this is right

Trajectory tracking error drops by 10-15 percent relative to prior methods while preserving sub-30 ms latency.
Formation stability holds across six distinct airspace scenarios using the released synchronized three-UAV dataset.
Real-time inference at 34 FPS remains feasible for agile flight in complex environments.
The same prediction-control loop can be applied to other multi-vehicle tasks that require handling dynamic uncertainties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The axis-wise decoupling step may reduce the cost of extending the method to larger swarms without a proportional rise in computation.
If the released dataset is used for training other predictors, the diffusion refinement step could be tested as a plug-in module for existing forecasters.
The approach implicitly assumes that the dominant uncertainties are temporally correlated rather than spatially correlated across vehicles; relaxing that assumption would require a different refinement architecture.

Load-bearing premise

The diffusion-based residual dynamics refinement module must meaningfully reduce prediction error beyond the coarse-to-fine forecasting step alone.

What would settle it

Ablating the diffusion refinement module and measuring whether average tracking error rises above 0.07 m or the reported 10-15 percent improvement disappears on the same test sequences.

Figures

Figures reproduced from arXiv: 2606.31197 by Haoang Li, Hongliang Lu, Tianshun Li, Xinhu Zheng.

**Figure 2.** Figure 2: Framework of the proposed distributed trajectory prediction and control scheme for UAV swarms. Future trajectories are first predicted in a decoupled [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Visualization of real-world 4D UAV swarm trajectories in one urban scenario in the dataset. Each trajectory is parameterized by time and represented [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Convergence comparison of the inter-UAV distance dynamics under [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Steady-state tracking error and solver time during a hovering task [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: t-SNE visualization of residual dynamics [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Accurate 4D trajectory prediction and closed-loop tracking are essential for Unmanned Aerial Vehicle (UAV) swarms to achieve safe and efficient operations in complex low-altitude environments such as urban airspaces, industrial sites, and indoor facilities. However, this task remains challenging due to intrinsic nonlinearity of UAV swarm dynamics and strict real-time constraints of swarm formation control. To address these challenges, we propose a unified framework that couples coarse-to-fine trajectory forecasting with uncertainty-aware Distributed Nonlinear Model Predictive Control (DNMPC). Our approach features two key innovations: 1) a dimension-decoupled trajectory prediction module that reduces computational complexity by forecasting axis-wise motion, and 2) a diffusion-based residual dynamics refinement module that captures temporally correlated dynamic uncertainties. These refined predictions are then integrated into a DNMPC loop to ensure formation stability. We also introduce a synchronized multi-scenario 4D UAV swarm dataset spanning six representative airspace scenarios. The dataset contains over \textbf{7,900} frames of synchronized three-UAV trajectories with frame-level annotations of speed intention and target sector. Extensive experiments demonstrate that our approach outperforms state-of-the-art baselines, reducing trajectory tracking error by up to \textbf{10-15\%} and achieving sub-\textbf{0.07\,m} average tracking error in complex urban and industrial environments, while maintaining real-time inference speeds of 34 FPS (sub-30 ms latency) suitable for agile flight.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs axis-decoupled 4D forecasting with a diffusion residual module inside DNMPC for UAV swarms and releases a new multi-scenario dataset, but supplies no ablation that isolates what the diffusion step actually contributes to the reported 10-15% error drop.

read the letter

The core of this work is a concrete pipeline: forecast each axis separately to keep compute down, run a diffusion model on the residuals to model time-correlated uncertainties, then close the loop with distributed nonlinear MPC for three-UAV formations. They also ship a synchronized dataset of 7900 frames across six urban/industrial scenarios with speed and sector labels. That dataset and the end-to-end real-time claim (34 FPS, sub-0.07 m tracking) are the parts that could be useful to other groups.

The integration itself is not revolutionary on its own, but the specific combination for swarm control in tight airspace is a reasonable engineering move. The performance numbers, if they hold up under scrutiny, would be worth noting for anyone doing online prediction-plus-control.

The main gap is the missing ablation. The abstract presents the diffusion module as one of the two key innovations that captures the uncertainties driving the error reduction, yet there is no table or delta showing performance with the module turned off. Without that, it is impossible to tell whether the decoupled forecaster already carries most of the gain or whether the diffusion step is load-bearing. The abstract also gives no equations, no baseline descriptions, and no validation protocol, so the soundness of the 10-15% claim cannot be checked from what is provided.

This paper is for robotics and control researchers who work on UAV swarm prediction and formation control. A reader who needs a new multi-UAV dataset or is already running similar DNMPC loops could extract practical value even if the diffusion contribution remains unproven.

It is worth sending to peer review. The dataset and the focused application give it enough substance to justify referee time, provided the authors are asked to add the ablation and more method detail.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a unified framework for 4D trajectory prediction and distributed control of UAV swarms. It combines a dimension-decoupled coarse-to-fine forecasting module with a diffusion-based residual dynamics refinement module, which is integrated into a Distributed Nonlinear Model Predictive Control (DNMPC) scheme. A new dataset of over 7,900 frames across six scenarios is introduced, and the approach is claimed to achieve up to 10-15% reduction in trajectory tracking error, sub-0.07 m average error, and real-time performance at 34 FPS.

Significance. If the empirical claims are substantiated, this work could offer a practical advancement in real-time UAV swarm coordination under uncertainty, with potential applications in urban airspaces and industrial settings. The provision of a new multi-scenario dataset represents a useful contribution to the community for benchmarking.

major comments (1)

[Abstract] Abstract: The abstract presents the diffusion-based residual dynamics refinement module as one of the two key innovations that 'captures temporally correlated dynamic uncertainties,' yet supplies no ablation study, error delta, or statistical test demonstrating its isolated contribution to the reported 10-15% tracking error reduction or sub-0.07 m performance. Without this, it remains possible that the dimension-decoupled forecaster alone drives the gains, rendering the diffusion module non-load-bearing for the DNMPC integration claim.

minor comments (2)

The abstract references 'state-of-the-art baselines' and 'extensive experiments' but omits any description of the baselines, validation protocol, error-bar details, or dataset access information, which prevents verification of the performance numbers.
The dataset is described as containing 'over 7,900 frames of synchronized three-UAV trajectories' with 'frame-level annotations of speed intention and target sector'; providing the exact frame count, per-scenario breakdown, and annotation format would improve reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The abstract presents the diffusion-based residual dynamics refinement module as one of the two key innovations that 'captures temporally correlated dynamic uncertainties,' yet supplies no ablation study, error delta, or statistical test demonstrating its isolated contribution to the reported 10-15% tracking error reduction or sub-0.07 m performance. Without this, it remains possible that the dimension-decoupled forecaster alone drives the gains, rendering the diffusion module non-load-bearing for the DNMPC integration claim.

Authors: We agree that an explicit ablation isolating the contribution of the diffusion-based residual dynamics refinement module is necessary to substantiate its role in the reported performance improvements and to support the DNMPC integration claim. The manuscript currently demonstrates overall gains relative to baselines but does not include a dedicated ablation with error deltas and statistical tests for this module alone. We will add such an ablation study (including quantitative deltas and significance testing) to the experiments section and revise the abstract to reference these results, ensuring the claims are properly evidenced. revision: yes

Circularity Check

0 steps flagged

Empirical pipeline with no self-referential derivations or fitted predictions

full rationale

The paper presents a modular framework combining a dimension-decoupled forecaster, diffusion-based residual refinement, and DNMPC integration, validated through experiments on a new dataset. No equations, fitted parameters, or self-citations are shown that reduce any claimed prediction or performance metric to its own inputs by construction. The reported error reductions (10-15%, sub-0.07 m) are positioned as empirical outcomes rather than algebraic identities or renamed fits, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the diffusion model and DNMPC components are treated as standard building blocks whose internal hyperparameters are not enumerated.

pith-pipeline@v0.9.1-grok · 5797 in / 1369 out tokens · 39405 ms · 2026-07-01T05:31:06.188113+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 7 canonical work pages · 3 internal anchors

[1]

Aeroduo: Aerial duo for uav-based vision and language navigation,

R. Wu, Y . Zhang, J. Chen, L. Huang, S. Zhang, X. Zhou, L. Wang, and S. Liu, “Aeroduo: Aerial duo for uav-based vision and language navigation,” inProceedings of the 33rd ACM International Conference on Multimedia, ser. MM ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 2576–2585. [Online]. Available: https://doi.org/10.1145/3746027.3754498

work page doi:10.1145/3746027.3754498 2025
[2]

Cooperative task allocation with si- multaneous arrival and resource constraint for multi-uav using a genetic algorithm,

F. Yan, J. Chu, J. Hu, and X. Zhu, “Cooperative task allocation with si- multaneous arrival and resource constraint for multi-uav using a genetic algorithm,”Expert Syst. Appl., vol. 245, p. 123023, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:266666228

2023
[3]

Adaptive conflict resolution for multi- uav 4d routes optimization using stochastic fractal search algorithm,

B. Pang, K. H. Low, and C. Lv, “Adaptive conflict resolution for multi- uav 4d routes optimization using stochastic fractal search algorithm,” Transportation Research Part C: Emerging Technologies, vol. 139, p. 103666, 2022

2022
[4]

Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,

T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020

2020
[5]

Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,

Y . Yuan, X. Weng, Y . Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9813–9823

2021
[6]

Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”Robotics: Science and Systems (RSS), 2019

2019
[7]

Unified multi-agent trajectory modeling with masked trajectory diffusion,

S. Yang, Z. Shi, and Z. Zou, “Unified multi-agent trajectory modeling with masked trajectory diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 27 563–27 574

2025
[8]

Heterogeneous-agent trajectory forecasting incorporating class uncertainty,

B. Ivanovic, K.-H. Lee, P. Tokmakov, B. Wulfe, R. T. McAllister, A. Gaidon, and M. Pavone, “Heterogeneous-agent trajectory forecasting incorporating class uncertainty,”2022 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pp. 12 196–12 203, 2021

2022
[9]

Vnagt: Variational non-autoregressive graph transformer network for multi-agent trajectory prediction,

X. Chen, H. Zhang, Y . Hu, J. Liang, and H. Wang, “Vnagt: Variational non-autoregressive graph transformer network for multi-agent trajectory prediction,”IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 12 540–12 552, 2023

2023
[10]

Denoising Diffusion Probabilistic Models

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”ArXiv, vol. abs/2006.11239, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219955663

work page internal anchor Pith review Pith/arXiv arXiv 2006
[11]

Swarmdiff: Swarm robotic trajectory planning in cluttered environments via diffusion transformer,

K. Ding, C. Jiao, Y . Hu, K. Zhou, P. Wu, Y . Mu, and C. Liu, “Swarmdiff: Swarm robotic trajectory planning in cluttered environments via diffusion transformer,”2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4164–4173, 2025. [Online]. Available: https://api.semanticscholar.org/CorpusID:278783000

2025
[12]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9644–9653

2023
[13]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, pp. 1684 – 1704, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257378658

2023
[14]

Planning with diffu- sion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffu- sion for flexible behavior synthesis,” inProceedings of the International Conference on Machine Learning (ICML), 2022

2022
[15]

Edmp: Ensemble-of-costs-guided diffu- sion for motion planning,

K. Saha, V . Mandadi, J. Reddy, A. Srikanth, A. Agarwal, B. Sen, A. Singh, and M. Krishna, “Edmp: Ensemble-of-costs-guided diffu- sion for motion planning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 10 351–10 358

2024
[16]

Navidiffusor: Cost-guided diffusion model for visual navigation,

Y . Zeng, H. Ren, S. Wang, J. Huang, and H. Cheng, “Navidiffusor: Cost-guided diffusion model for visual navigation,” 2025. [Online]. Available: https://arxiv.org/abs/2504.10003

work page arXiv 2025
[17]

Motiondiffuse: Text-driven human motion generation with diffusion model,

M. Zhang, Z. Cai, L. Pan, F. Hong, X. Guo, L. Yang, and Z. Liu, “Motiondiffuse: Text-driven human motion generation with diffusion model,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, pp. 4115–4128, 2022

2022
[18]

Guided motion diffusion for controllable human motion synthesis,

K. Karunratanakul, K. Preechakul, S. Suwajanakorn, and S. Tang, “Guided motion diffusion for controllable human motion synthesis,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2151–2162

2023
[19]

Hierarchical diffusion policy: Manipulation trajectory generation via contact guidance,

D. Wang, C. Liu, F. Chang, and Y . Xu, “Hierarchical diffusion policy: Manipulation trajectory generation via contact guidance,”IEEE Trans- actions on Robotics, vol. 41, pp. 2086–2104, 2025

2086
[20]

Trajectory prediction with latent belief energy-based model,

B. Pang, T. Zhao, X. Xie, and Y . N. Wu, “Trajectory prediction with latent belief energy-based model,”2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 809–11 819, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:233168683

2021
[21]

Group normalization,

Y . Wu and K. He, “Group normalization,”International Journal of Computer Vision, vol. 128, pp. 742 – 755, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:4076251

2018
[22]

Mish: A self regularized non-monotonic activation function,

D. Misra, “Mish: A self regularized non-monotonic activation function,” Proceedings of the British Machine Vision Conference 2020, 2020. [On- line]. Available: https://api.semanticscholar.org/CorpusID:221113156

2020
[23]

Decen- tralized real-time iterations for distributed nonlinear model predictive control,

G. Stomberg, A. Engelmann, M. Diehl, and T. Faulwasser, “Decen- tralized real-time iterations for distributed nonlinear model predictive control,”arXiv preprint arXiv:2401.14898, 2024

work page arXiv 2024
[24]

Data-driven mpc for quadrotors,

G. Torrente, E. Kaufmann, P. F ¨ohn, and D. Scaramuzza, “Data-driven mpc for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3769–3776, 2021

2021
[25]

Neural lander: Stable drone landing control using learned dynamics,

G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anand- kumar, Y . Yue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics,” in2019 international conference on robotics and automation (icra). IEEE, 2019, pp. 9784–9790

2019
[26]

Indooruav: Benchmarking vision-language uav navigation in continuous indoor environments,

X. Liu, Y . Liu, H. Qiu, Q. Yang, and Z. Lian, “Indooruav: Benchmarking vision-language uav navigation in continuous indoor environments,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 40, 2026

2026
[27]

The unscented kalman filter for non- linear estimation,

E. Wan and R. Van Der Merwe, “The unscented kalman filter for non- linear estimation,” inProceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), 2000, pp. 153–158

2000
[28]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv:1803.01271, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[29]

Multi- stream representation learning for pedestrian trajectory prediction,

Y . Wu, L. Wang, S. Zhou, J. Duan, G. Hua, and W. Tang, “Multi- stream representation learning for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 2875–2882

2023
[30]

Non-probability sampling network for stochastic human trajectory prediction,

I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022
[31]

Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks,

V . Kosaraju, A. Sadeghian, R. Mart´ın-Mart´ın, I. Reid, S. H. Rezatofighi, and S. Savarese, “Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks,”Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

2019
[32]

Dynamic attention- based cvae-gan for pedestrian trajectory prediction,

Z. Zhou, G. Huang, Z. Su, Y . Li, and W. Hua, “Dynamic attention- based cvae-gan for pedestrian trajectory prediction,”IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 704–711, 2022

2022
[33]

Motion transformer with global intention localization and local movement refinement,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,”Advances in Neural Information Processing Systems, 2022

2022
[34]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”ArXiv, vol. abs/2010.02502, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:222140788

work page internal anchor Pith review Pith/arXiv arXiv 2010
[35]

Citynav: A large-scale dataset for real-world aerial navigation,

J. Lee, T. Miyanishi, S. Kurita, K. Sakamoto, D. Azuma, Y . Matsuo, and N. Inoue, “Citynav: A large-scale dataset for real-world aerial navigation,” 2025. [Online]. Available: https://arxiv.org/abs/2406.14240

work page arXiv 2025

[1] [1]

Aeroduo: Aerial duo for uav-based vision and language navigation,

R. Wu, Y . Zhang, J. Chen, L. Huang, S. Zhang, X. Zhou, L. Wang, and S. Liu, “Aeroduo: Aerial duo for uav-based vision and language navigation,” inProceedings of the 33rd ACM International Conference on Multimedia, ser. MM ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 2576–2585. [Online]. Available: https://doi.org/10.1145/3746027.3754498

work page doi:10.1145/3746027.3754498 2025

[2] [2]

Cooperative task allocation with si- multaneous arrival and resource constraint for multi-uav using a genetic algorithm,

F. Yan, J. Chu, J. Hu, and X. Zhu, “Cooperative task allocation with si- multaneous arrival and resource constraint for multi-uav using a genetic algorithm,”Expert Syst. Appl., vol. 245, p. 123023, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:266666228

2023

[3] [3]

Adaptive conflict resolution for multi- uav 4d routes optimization using stochastic fractal search algorithm,

B. Pang, K. H. Low, and C. Lv, “Adaptive conflict resolution for multi- uav 4d routes optimization using stochastic fractal search algorithm,” Transportation Research Part C: Emerging Technologies, vol. 139, p. 103666, 2022

2022

[4] [4]

Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,

T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020

2020

[5] [5]

Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,

Y . Yuan, X. Weng, Y . Ou, and K. M. Kitani, “Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9813–9823

2021

[6] [6]

Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,

M. Bansal, A. Krizhevsky, and A. Ogale, “Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst,”Robotics: Science and Systems (RSS), 2019

2019

[7] [7]

Unified multi-agent trajectory modeling with masked trajectory diffusion,

S. Yang, Z. Shi, and Z. Zou, “Unified multi-agent trajectory modeling with masked trajectory diffusion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 27 563–27 574

2025

[8] [8]

Heterogeneous-agent trajectory forecasting incorporating class uncertainty,

B. Ivanovic, K.-H. Lee, P. Tokmakov, B. Wulfe, R. T. McAllister, A. Gaidon, and M. Pavone, “Heterogeneous-agent trajectory forecasting incorporating class uncertainty,”2022 IEEE/RSJ International Confer- ence on Intelligent Robots and Systems (IROS), pp. 12 196–12 203, 2021

2022

[9] [9]

Vnagt: Variational non-autoregressive graph transformer network for multi-agent trajectory prediction,

X. Chen, H. Zhang, Y . Hu, J. Liang, and H. Wang, “Vnagt: Variational non-autoregressive graph transformer network for multi-agent trajectory prediction,”IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 12 540–12 552, 2023

2023

[10] [10]

Denoising Diffusion Probabilistic Models

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,”ArXiv, vol. abs/2006.11239, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219955663

work page internal anchor Pith review Pith/arXiv arXiv 2006

[11] [11]

Swarmdiff: Swarm robotic trajectory planning in cluttered environments via diffusion transformer,

K. Ding, C. Jiao, Y . Hu, K. Zhou, P. Wu, Y . Mu, and C. Liu, “Swarmdiff: Swarm robotic trajectory planning in cluttered environments via diffusion transformer,”2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4164–4173, 2025. [Online]. Available: https://api.semanticscholar.org/CorpusID:278783000

2025

[12] [12]

Motiondiffuser: Controllable multi-agent motion prediction using diffusion,

C. Jiang, A. Cornman, C. Park, B. Sapp, Y . Zhou, D. Anguelov et al., “Motiondiffuser: Controllable multi-agent motion prediction using diffusion,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9644–9653

2023

[13] [13]

Diffusion policy: Visuomotor policy learning via action diffusion,

C. Chi, S. Feng, Y . Du, Z. Xu, E. Cousineau, B. Burchfiel, and S. Song, “Diffusion policy: Visuomotor policy learning via action diffusion,”The International Journal of Robotics Research, vol. 44, pp. 1684 – 1704, 2023. [Online]. Available: https://api.semanticscholar.org/CorpusID:257378658

2023

[14] [14]

Planning with diffu- sion for flexible behavior synthesis,

M. Janner, Y . Du, J. B. Tenenbaum, and S. Levine, “Planning with diffu- sion for flexible behavior synthesis,” inProceedings of the International Conference on Machine Learning (ICML), 2022

2022

[15] [15]

Edmp: Ensemble-of-costs-guided diffu- sion for motion planning,

K. Saha, V . Mandadi, J. Reddy, A. Srikanth, A. Agarwal, B. Sen, A. Singh, and M. Krishna, “Edmp: Ensemble-of-costs-guided diffu- sion for motion planning,” in2024 IEEE International Conference on Robotics and Automation (ICRA), 2024, pp. 10 351–10 358

2024

[16] [16]

Navidiffusor: Cost-guided diffusion model for visual navigation,

Y . Zeng, H. Ren, S. Wang, J. Huang, and H. Cheng, “Navidiffusor: Cost-guided diffusion model for visual navigation,” 2025. [Online]. Available: https://arxiv.org/abs/2504.10003

work page arXiv 2025

[17] [17]

Motiondiffuse: Text-driven human motion generation with diffusion model,

M. Zhang, Z. Cai, L. Pan, F. Hong, X. Guo, L. Yang, and Z. Liu, “Motiondiffuse: Text-driven human motion generation with diffusion model,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, pp. 4115–4128, 2022

2022

[18] [18]

Guided motion diffusion for controllable human motion synthesis,

K. Karunratanakul, K. Preechakul, S. Suwajanakorn, and S. Tang, “Guided motion diffusion for controllable human motion synthesis,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 2151–2162

2023

[19] [19]

Hierarchical diffusion policy: Manipulation trajectory generation via contact guidance,

D. Wang, C. Liu, F. Chang, and Y . Xu, “Hierarchical diffusion policy: Manipulation trajectory generation via contact guidance,”IEEE Trans- actions on Robotics, vol. 41, pp. 2086–2104, 2025

2086

[20] [20]

Trajectory prediction with latent belief energy-based model,

B. Pang, T. Zhao, X. Xie, and Y . N. Wu, “Trajectory prediction with latent belief energy-based model,”2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 809–11 819, 2021. [Online]. Available: https://api.semanticscholar.org/CorpusID:233168683

2021

[21] [21]

Group normalization,

Y . Wu and K. He, “Group normalization,”International Journal of Computer Vision, vol. 128, pp. 742 – 755, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:4076251

2018

[22] [22]

Mish: A self regularized non-monotonic activation function,

D. Misra, “Mish: A self regularized non-monotonic activation function,” Proceedings of the British Machine Vision Conference 2020, 2020. [On- line]. Available: https://api.semanticscholar.org/CorpusID:221113156

2020

[23] [23]

Decen- tralized real-time iterations for distributed nonlinear model predictive control,

G. Stomberg, A. Engelmann, M. Diehl, and T. Faulwasser, “Decen- tralized real-time iterations for distributed nonlinear model predictive control,”arXiv preprint arXiv:2401.14898, 2024

work page arXiv 2024

[24] [24]

Data-driven mpc for quadrotors,

G. Torrente, E. Kaufmann, P. F ¨ohn, and D. Scaramuzza, “Data-driven mpc for quadrotors,”IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 3769–3776, 2021

2021

[25] [25]

Neural lander: Stable drone landing control using learned dynamics,

G. Shi, X. Shi, M. O’Connell, R. Yu, K. Azizzadenesheli, A. Anand- kumar, Y . Yue, and S.-J. Chung, “Neural lander: Stable drone landing control using learned dynamics,” in2019 international conference on robotics and automation (icra). IEEE, 2019, pp. 9784–9790

2019

[26] [26]

Indooruav: Benchmarking vision-language uav navigation in continuous indoor environments,

X. Liu, Y . Liu, H. Qiu, Q. Yang, and Z. Lian, “Indooruav: Benchmarking vision-language uav navigation in continuous indoor environments,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), vol. 40, 2026

2026

[27] [27]

The unscented kalman filter for non- linear estimation,

E. Wan and R. Van Der Merwe, “The unscented kalman filter for non- linear estimation,” inProceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373), 2000, pp. 153–158

2000

[28] [28]

An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling

S. Bai, J. Z. Kolter, and V . Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv:1803.01271, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[29] [29]

Multi- stream representation learning for pedestrian trajectory prediction,

Y . Wu, L. Wang, S. Zhou, J. Duan, G. Hua, and W. Tang, “Multi- stream representation learning for pedestrian trajectory prediction,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, 2023, pp. 2875–2882

2023

[30] [30]

Non-probability sampling network for stochastic human trajectory prediction,

I. Bae, J.-H. Park, and H.-G. Jeon, “Non-probability sampling network for stochastic human trajectory prediction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022

[31] [31]

Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks,

V . Kosaraju, A. Sadeghian, R. Mart´ın-Mart´ın, I. Reid, S. H. Rezatofighi, and S. Savarese, “Social-bigat: multimodal trajectory forecasting using bicycle-gan and graph attention networks,”Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019

2019

[32] [32]

Dynamic attention- based cvae-gan for pedestrian trajectory prediction,

Z. Zhou, G. Huang, Z. Su, Y . Li, and W. Hua, “Dynamic attention- based cvae-gan for pedestrian trajectory prediction,”IEEE Robotics and Automation Letters, vol. 8, no. 2, pp. 704–711, 2022

2022

[33] [33]

Motion transformer with global intention localization and local movement refinement,

S. Shi, L. Jiang, D. Dai, and B. Schiele, “Motion transformer with global intention localization and local movement refinement,”Advances in Neural Information Processing Systems, 2022

2022

[34] [34]

Denoising Diffusion Implicit Models

J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,”ArXiv, vol. abs/2010.02502, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:222140788

work page internal anchor Pith review Pith/arXiv arXiv 2010

[35] [35]

Citynav: A large-scale dataset for real-world aerial navigation,

J. Lee, T. Miyanishi, S. Kurita, K. Sakamoto, D. Azuma, Y . Matsuo, and N. Inoue, “Citynav: A large-scale dataset for real-world aerial navigation,” 2025. [Online]. Available: https://arxiv.org/abs/2406.14240

work page arXiv 2025