HumanFlow -- Diffusion-Driven MAV Navigation Among Humans via Tightly-Coupled Motion Tracking, Forecasting, and Control

Joshua N\"af; Simon Schaefer; Stefan Leutenegger

arxiv: 2605.25685 · v1 · pith:7W7WTGPVnew · submitted 2026-05-25 · 💻 cs.RO

HumanFlow -- Diffusion-Driven MAV Navigation Among Humans via Tightly-Coupled Motion Tracking, Forecasting, and Control

Simon Schaefer , Joshua N\"af , Stefan Leutenegger This is my paper

Pith reviewed 2026-06-29 21:41 UTC · model grok-4.3

classification 💻 cs.RO

keywords latent diffusion modelhuman motion trackingmotion forecastingMAV navigationflow-matching MPCsocial navigationpartial observability3D scene context

0 comments

The pith

HumanFlow couples a latent diffusion model for human motion tracking and forecasting with a flow-matching MPC policy to enable collision-free MAV navigation under partial observability.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HumanFlow, a latent diffusion model that unifies tracking and forecasting of human motion while conditioning on 3D scene context. This produces smooth and accurate predictions even with heavy occlusions and improves on prior methods in accuracy and computational efficiency. The model's latent representations are then used to condition a flow-matching approximate MPC policy for controlling a micro aerial vehicle. Simulation tests with real human trajectories show the integrated system achieves better navigation performance while staying collision-free despite incomplete observations of the humans.

Core claim

HumanFlow is a latent diffusion model that unifies human motion tracking and forecasting conditioned on the 3D scene context. Its latent space can be tightly coupled with control by conditioning a flow-matching-based approximate MPC policy on these representations. Validation in simulation with real human trajectories for MAV social navigation shows superior navigation performance and collision-free behavior even under partial observability of the human.

What carries the argument

Latent diffusion model for unified human motion tracking and forecasting, whose representations directly condition a flow-matching approximate MPC policy.

If this is right

The diffusion model produces smooth and accurate human motion predictions under heavy occlusions.
It outperforms state-of-the-art methods in tracking accuracy while running significantly faster.
The tightly coupled controller achieves superior navigation performance in simulation using real human trajectories.
The MAV remains collision-free even when human visibility is only partial.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Direct use of the same latent space for both forecasting and control may reduce the lag between perception and action in dynamic settings.
The scene-conditioned approach could transfer to navigation tasks involving multiple interacting humans or other robot platforms.
Testing the same coupling on physical MAV hardware would reveal whether simulation results hold under real sensor noise and dynamics.
Alternative ways to embed the diffusion latents into the policy might further stabilize the closed-loop behavior.

Load-bearing premise

The latent representations from the diffusion model contain sufficient information to condition the flow-matching MPC policy without losing critical dynamic constraints or causing closed-loop instability.

What would settle it

A recorded collision between the MAV and a human during a navigation test that uses the HumanFlow controller under partial human observability.

Figures

Figures reproduced from arXiv: 2605.25685 by Joshua N\"af, Simon Schaefer, Stefan Leutenegger.

**Figure 2.** Figure 2: Overview of HumanFlow. First, clean human motion is encoded into a compact latent space (I). Subsequently, a latent diffusion model is trained [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Exemplary offline ground-truth trajectory generation with three [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative examples of denoised and in-filled body motion with and [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative evaluation of the motion tracking model across diverse [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

read the original abstract

Robust and accurate perception of humans in their 3D scene context is essential for integrating robots into everyday environments. Existing approaches, however, often fail to predict plausible and accurate human motion estimates that are consistent with the surrounding scene, especially in the presence of heavy occlusions or partial visibility. This can limit both safety and efficiency for robotic operations. We introduce HumanFlow, a latent diffusion model that unifies human motion tracking and forecasting, conditioned on the 3D scene context. We show that our human motion model produces smooth and accurate predictions under challenging conditions, including heavy occlusions, and outperforms state-of-the-art methods in tracking accuracy while being significantly more efficient. Furthermore, we show how HumanFlow's latent space can be tightly coupled with control by conditioning a flow-matching-based, approximate MPC policy on these representations. We validate our policy in simulation with real human trajectories for MAV social navigation, demonstrating superior navigation performance and remaining collision-free, even under partial observability of the human.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HumanFlow puts tracking, forecasting, and flow-matching MPC into one latent diffusion pipeline for occluded human motion, but the abstract gives no equations, ablations, or numbers to check whether the coupling actually works.

read the letter

The core move is folding human motion tracking and forecasting into a single latent diffusion model that takes 3D scene context, then feeding that latent directly into a flow-matching approximate MPC for MAV control. That integration is the main thing on offer.

It targets a real deployment issue: keeping drones safe around people when visibility drops. The abstract says the model stays accurate and smooth under heavy occlusion, beats prior trackers on accuracy and speed, and lets the controller stay collision-free in simulation with real trajectories. If the experiments hold, the tight coupling could reduce the usual hand-off errors between perception and planning.

The problem is we only have the abstract. No training details, no loss functions, no ablation tables, no error bars, and no comparison numbers appear. Without those, the "outperforms SOTA" and "significantly more efficient" claims stay uncheckable. The stress-test worry about dynamic constraints is also open: nothing shows whether the latent space keeps velocity or acceleration bounds that the MPC needs, or whether occlusion uncertainty gets passed through properly. If the diffusion step smooths too aggressively, the controller could plan on an incomplete manifold.

This is for groups already working on human-aware robot navigation or diffusion models for dynamics. A reader who wants to see a concrete end-to-end pipeline might get something from the full version, but only if it includes the missing experimental backbone.

I would send it to review if the full manuscript supplies the methods and results; the idea is practical enough to justify referee time even if revisions are needed. Right now the evidence is too thin to cite or bring to a reading group.

Referee Report

2 major / 2 minor

Summary. The paper introduces HumanFlow, a latent diffusion model that unifies 3D scene-conditioned human motion tracking and forecasting. It claims to produce smooth, accurate predictions under heavy occlusions that outperform SOTA methods in tracking accuracy while being more efficient. The model’s latent space is then tightly coupled to a flow-matching approximate MPC policy to enable collision-free MAV social navigation under partial observability, with validation on real human trajectories in simulation.

Significance. If the central claims hold, the work would offer a concrete route to tightly integrated perception-control pipelines for robots in human environments, leveraging diffusion latents for both forecasting and policy conditioning. The efficiency and occlusion robustness claims, if substantiated with ablations and error bars, would be a notable contribution to the MAV navigation literature.

major comments (2)

[Abstract] Abstract and methods description: the headline claim that the diffusion latent space can be directly conditioned into the flow-matching approximate MPC without loss of dynamic constraints or introduction of instability is load-bearing for the navigation results, yet no derivation, constraint-preservation proof, or ablation is referenced showing that velocity/acceleration bounds or occlusion uncertainty are preserved in the latent encoding.
[Validation] Validation section: the reported superior navigation performance and collision-free behavior under partial observability rest on the assumption that the latent representations transmit scene-consistent human dynamics; without ablations that isolate the effect of the latent conditioning versus a baseline MPC, it is impossible to confirm that the coupling itself is responsible for the gains rather than other implementation choices.

minor comments (2)

[Abstract] The abstract states performance gains but provides no quantitative metrics, dataset descriptions, or error bars; these should be added to the results section for reproducibility.
[Methods] Notation for the latent space and flow-matching policy should be introduced with explicit equations early in the methods to clarify the conditioning mechanism.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the manuscript. We address each major comment below and outline the revisions we will make to strengthen the presentation of the latent-to-control coupling.

read point-by-point responses

Referee: [Abstract] Abstract and methods description: the headline claim that the diffusion latent space can be directly conditioned into the flow-matching approximate MPC without loss of dynamic constraints or introduction of instability is load-bearing for the navigation results, yet no derivation, constraint-preservation proof, or ablation is referenced showing that velocity/acceleration bounds or occlusion uncertainty are preserved in the latent encoding.

Authors: We agree that an explicit derivation would improve clarity. In the revised manuscript we will add a short derivation in the methods section showing that the flow-matching approximate MPC preserves velocity and acceleration bounds when conditioned on the latent representations, together with a targeted ablation that quantifies stability under varying occlusion levels. revision: yes
Referee: [Validation] Validation section: the reported superior navigation performance and collision-free behavior under partial observability rest on the assumption that the latent representations transmit scene-consistent human dynamics; without ablations that isolate the effect of the latent conditioning versus a baseline MPC, it is impossible to confirm that the coupling itself is responsible for the gains rather than other implementation choices.

Authors: We concur that isolating the contribution of the latent conditioning is necessary. We will expand the validation section with new ablation experiments that compare the full HumanFlow-conditioned policy against an otherwise identical baseline MPC that receives only raw observations, thereby confirming that the performance gains under partial observability arise from the tight coupling. revision: yes

Circularity Check

0 steps flagged

No circularity; abstract presents empirical claims without visible derivation chain

full rationale

The provided abstract and context contain no equations, parameter-fitting descriptions, self-citations, or uniqueness theorems that could reduce any prediction or result to its inputs by construction. Claims of outperforming SOTA in tracking accuracy and achieving collision-free navigation under partial observability are stated as validation outcomes without reference to internal reductions or load-bearing self-references. The work is therefore self-contained against external benchmarks on the basis of the given text, with no identifiable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, background axioms, or new postulated entities; ledger left empty.

pith-pipeline@v0.9.1-grok · 5708 in / 1088 out tokens · 26641 ms · 2026-06-29T21:41:08.221342+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 17 canonical work pages · 4 internal anchors

[1]

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap. Deep learning using rectified linear units (relu), 2019. URL https://arxiv.org/abs/1803.08375

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

Intention-aware online pomdp planning for autonomous driving in a crowd

Haoyu Bai, Shaojun Cai, Nan Ye, David Hsu, and Wee Sun Lee. Intention-aware online pomdp planning for autonomous driving in a crowd. In2015 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 454–460, 2015. doi: 10.1109/ICRA.2015.7139219

work page doi:10.1109/icra.2015.7139219 2015
[3]

Multi-HMR: Multi-person whole- body human mesh recovery in a single shot

Fabien Baradel*, Matthieu Armando, Salma Galaaoui, Romain Br ´egier, Philippe Weinzaepfel, Gr ´egory Rogez, and Thomas Lucas*. Multi-HMR: Multi-person whole- body human mesh recovery in a single shot. InECCV, 2024

2024
[4]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. InComputer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, October 2016

2016
[5]

Long-term human motion prediction with scene context

Zhe Cao, Hang Gao, Karttikeya Mangalam, Qizhi Cai, Minh V o, and Jitendra Malik. Long-term human motion prediction with scene context. InECCV, 2020

2020
[6]

VidBot: Learning general- izable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Polle- feys, and Stefan Leutenegger. VidBot: Learning general- izable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation. InCVPR, 2025

2025
[7]

Executing your commands via motion diffusion in latent space

Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18000–18010, 2023

2023
[8]

Murray, and Joel W

Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, 2019

2019
[9]

Dif- fusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023

2023
[10]

Non- isotropic gaussian diffusion for realistic 3d human motion prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha, Zhen- zhang Ye, Riccardo Marin, and Daniel Cremers. Non- isotropic gaussian diffusion for realistic 3d human motion prediction. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 1871– 1882, June 2025

2025
[11]

Hybrid predictive control for aerial robotic physical interaction towards inspection opera- tions

Georgios Darivianakis, Kostas Alexis, Michael Burri, and Roland Siegwart. Hybrid predictive control for aerial robotic physical interaction towards inspection opera- tions. InIEEE International Conference on Robotics and Automation (ICRA), pages 53–58, 2014. doi: 10. 1109/ICRA.2014.6906589

work page arXiv 2014
[12]

Anca D. Dragan. Robot planning with mathematical models of human state and action, 2017. URL https: //arxiv.org/abs/1705.04226

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Michael Everett, Yu Fan Chen, and Jonathan P. How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.IEEE Access, 9: 10357–10377, 2021. ISSN 2169-3536

2021
[14]

PAMPC: Perception-aware model predic- tive control for quadrotors

Davide Falanga, Philipp Foehn, Peng Lu, and Davide Scaramuzza. PAMPC: Perception-aware model predic- tive control for quadrotors. InIEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS), 2018

2018
[15]

Distribution-aligned diffusion for human mesh recovery

Lin Geng Foo, Jia Gong, Hossein Rahmani, and Jun Liu. Distribution-aligned diffusion for human mesh recovery. arXiv preprint arXiv:2211.16940, 2023

work page arXiv 2023
[16]

Herbert, Jaime F

David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac, Sampada Deglurkar, and Claire J. Tomlin. Planning, fast and slow: A framework for adaptive real-time safe trajectory planning.IEEE International Conference on Robotics and Automation, 2018

2018
[17]

Hu- mans in 4D: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Ra- jasegaran, Angjoo Kanazawa*, and Jitendra Malik*. Hu- mans in 4D: Reconstructing and tracking humans with transformers. InInternational Conference on Computer Vision (ICCV), 2023

2023
[18]

Diffpose: Toward more reliable 3d pose estimation

Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, and Jun Liu. Diffpose: Toward more reliable 3d pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

2023
[19]

Karen Liu, Yuting Ye, and Lingni Ma

Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard Newcombe, C. Karen Liu, Yuting Ye, and Lingni Ma. HMD2: Environment-aware motion generation from single egocentric head-mounted device. InInternational Conference on 3D Vision (3DV), March 2025

2025
[20]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[21]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. De- noising diffusion probabilistic models.arXiv preprint arxiv:2006.11239, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006
[22]

Zhiming Hu, Zheming Yin, Daniel Haeufle, Syn Schmitt, and Andreas Bulling. Hoimotion: Forecasting human motion during human-object interactions using egocen- tric 3d object bounding boxes.IEEE Transactions on Visualization and Computer Graphics (TVCG), pages 1– 11, 2024

2024
[23]

Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36, 2024

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36, 2024

2024
[24]

Guided motion diffusion for controllable human motion synthesis

Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, and Siyu Tang. Guided motion diffusion for controllable human motion synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2151–2162, 2023

2023
[25]

Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025

Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025

2025
[26]

Occluded human mesh recovery

Rawal Khirodkar, Shashank Tripathi, and Kris Kitani. Occluded human mesh recovery. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), pages 1705–1715, Piscataway, NJ, June

2022
[27]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

IEEE. doi: 10.1109/CVPR52688.2022.00176

work page doi:10.1109/cvpr52688.2022.00176 2022
[28]

Huang, Otmar Hilliges, and Michael J

Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. PARE: Part attention regressor for 3D human body estimation. InProc. International Conference on Computer Vision (ICCV), pages 11127–11137, October 2021

2021
[29]

A software package for sequential quadratic programming

Dieter Kraft. A software package for sequential quadratic programming. Technical Report DFVLR-FB 88-28, In- stitut f ¨ur Dynamik der Flugsysteme, Oberpfaffenhofen, July 1988

1988
[30]

Socially compliant mobile robot navigation via inverse reinforcement learning.The In- ternational Journal of Robotics Research, 35(11):1289– 1307, 2016

Henrik Kretzschmar, Markus Spies, Christoph Sprunk, and Wolfram Burgard. Socially compliant mobile robot navigation via inverse reinforcement learning.The In- ternational Journal of Robotics Research, 35(11):1289– 1307, 2016. doi: 10.1177/0278364915619772

work page doi:10.1177/0278364915619772 2016
[31]

Jotr: 3d joint contrastive learning with transformers for occluded human mesh recovery.ICCV, 2023

Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, and Yi Yang. Jotr: 3d joint contrastive learning with transformers for occluded human mesh recovery.ICCV, 2023

2023
[32]

Ross, and Angjoo Kanazawa

Ruilong Li, Shan Yang, David A. Ross, and Angjoo Kanazawa. Learn to dance with AIST++: Music con- ditioned 3d dance generation.ICCV, 2021

2021
[33]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, and Maximilian Nickel Matt Le. Flow matching for generative modeling.ICLR, 2023

2023
[34]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model.ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015

2015
[35]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learn- ing Representations, 2019. URL https://openreview.net/ forum?id=Bkg6RiCqY7

2019
[36]

Troje, Gerard Pons-Moll, and Michael J

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. AMASS: Archive of motion capture as surface shapes. InInter- national Conference on Computer Vision, pages 5442– 5451, October 2019

2019
[37]

Kuchenbecker

Pau Marquez Julbe, Julian Nubert, Henrik Hose, Sebas- tian Trimpe, and Katherine J. Kuchenbecker. Diffusion- based approximate MPC: Fast and consistent imitation of multi-modal action distributions. InIROS, 2025

2025
[38]

Mitchell, A.M

I.M. Mitchell, A.M. Bayen, and C.J. Tomlin. A time- dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005. doi: 10.1109/ TAC.2005.851439

work page arXiv 2005
[39]

Pytorch: an imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K ¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin- tala. Pytorch: an imperative style, high-perf...

2019
[40]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

2019
[41]

Courville

Ethan Perez, Florian Strub, Harm de Vries, Vincent Du- moulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018

2018
[42]

Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, and Leonidas J. Guibas. HuMoR: 3d human motion model for robust pose estimation. In International Conference on Computer Vision (ICCV), 2021

2021
[43]

Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data

Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors,Computer Vision – ECCV 2020, pages 683–700, Cham, 2020. Springer International Publishing

2020
[44]

Han, Florian Shkurti, and Angela P

Sepehr Samavi, James R. Han, Florian Shkurti, and Angela P. Schoellig. Sicnav: Safe and interactive crowd navigation using model predictive control and bilevel op- timization.IEEE Transactions on Robotics, 41:801–818, 2025

2025
[45]

Leveraging neural network gradients within trajectory optimization for proactive human-robot interactions.IEEE International Conference on Robotics and Automation, 2020

Simon Schaefer, Karen Leung, Boris Ivanovic, and Marco Pavone. Leveraging neural network gradients within trajectory optimization for proactive human-robot interactions.IEEE International Conference on Robotics and Automation, 2020

2020
[46]

HumanHalo - safe and efficient 3d navigation among humans via minimally conservative mpc

Simon Schaefer, Helen Oleynikova, Sandra Hirche, and Stefan Leutenegger. HumanHalo - safe and efficient 3d navigation among humans via minimally conservative mpc. Inarxiv, 2024

2024
[47]

Kirillov, E

Mingyi Shi, Sebastian Starke, Yuting Ye, Taku Ko- mura, and Jungdam Won. Phasemp: Robust 3d pose estimation via phase-conditioned human motion prior. In2023 IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 14679–14691, 2023. doi: 10.1109/ICCV51070.2023.01353

work page doi:10.1109/iccv51070.2023.01353 2023
[48]

HUMOF: Human motion forecasting in interactive social scenes

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, and Yuexin Ma. HUMOF: Human motion forecasting in interactive social scenes. InThe F ourteenth International Conference on Learning Representations, 2026

2026
[49]

Move beyond trajectories: Distribution space coupling for crowd navigation.Robotics: Science and Systems, 2021

Muchen Sun, Francesca Baldini, Peter Trautman, and Todd Murphey. Move beyond trajectories: Distribution space coupling for crowd navigation.Robotics: Science and Systems, 2021

2021
[50]

Aircaprl: Autonomous aerial human motion capture us- ing deep reinforcement learning.IEEE Robotics and Automation Letters, 5(4):6678–6685, October 2020

Rahul Tallamraju, Nitin Saini, Elia Bonetto, Michael Pabst, Yu Tang Liu, Michael Black, and Aamir Ahmad. Aircaprl: Autonomous aerial human motion capture us- ing deep reinforcement learning.IEEE Robotics and Automation Letters, 5(4):6678–6685, October 2020. URL https://ieeexplore.ieee.org/document/9158379

work page arXiv 2020
[51]

Human motion diffusion model

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffusion model. InICLR, 2023

2023
[52]

Murray, and Andreas Krause

Pete Trautman, Jeremy Ma, Richard M. Murray, and Andreas Krause. Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation.The International Journal of Robotics Research, 34(3):335–356, 2015. doi: 10.1177/ 0278364914557874

2015
[53]

Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control.Journal of Field Robotics, 36 (1):49–77, 2019

Dimos Tzoumanikas, Wenbin Li, Marius Grimm, Ketao Zhang, Mirko Kovac, and Stefan Leutenegger. Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control.Journal of Field Robotics, 36 (1):49–77, 2019. doi: https://doi.org/10.1002/rob.21821. URL https://onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/rob.21821 2019
[54]

Guy, Ming Lin, and Dinesh Manocha

Jur van den Berg, Stephen J. Guy, Ming Lin, and Dinesh Manocha. Reciprocal n-body collision avoid- ance. In C ´edric Pradalier, Roland Siegwart, and Ger- hard Hirzinger, editors,Robotics Research, pages 3–19, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-19457-3

2011
[55]

Emanuele Vespa, Nils Funk, Paul H. J. Kelly, and Stefan Leutenegger. Adaptive-resolution octree-based volumetric SLAM. InInternational Conference on 3D Vision (3DV), pages 654–662, Qu´ebec City, QC, Canada, September 2019. doi: 10.1109/3DV .2019.00077

work page doi:10.1109/3dv 2019
[56]

Toro-Arcila, and America Morales-Diaz

Rodolfo Villalobos-Salazar, Abimael Contreras-Carlos, Carlos A. Toro-Arcila, and America Morales-Diaz. Hi- erarchical drone navigation control for ensuring safety with stationary humans. In2024 21st International Con- ference on Electrical Engineering, Computing Science and Automatic Control (CCE), pages 1–6, 2024. doi: 10.1109/CCE62852.2024.10771000

work page doi:10.1109/cce62852.2024.10771000 2024
[57]

Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun- Cheng Wu, and Xiaohui Liang. Fg-t2m: Fine-grained text-driven human motion generation via diffusion model. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 21978–21987, 2023. doi: 10.1109/ICCV51070.2023.02014

work page doi:10.1109/iccv51070.2023.02014 2023
[58]

Scene- aware human motion forecasting via mutual distance prediction

Chaoyue Xing, Wei Mao, and Miaomiao Liu. Scene- aware human motion forecasting via mutual distance prediction. InECCV, 2024

2024
[59]

Decoupling human and camera motion from videos in the wild

Vickie Ye, Georgios Pavlakos, Jitendra Malik, and Angjoo Kanazawa. Decoupling human and camera motion from videos in the wild. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

2023
[60]

Learning motion priors for 4d human body capture in 3d scenes

Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys, and Siyu Tang. Learning motion priors for 4d human body capture in 3d scenes. InInternational Conference on Computer Vision (ICCV), October 2021

2021
[61]

Probabilistic human mesh recovery in 3d scenes from egocentric views

Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbar- ian, Darren Cosker, and Siyu Tang. Probabilistic human mesh recovery in 3d scenes from egocentric views. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023
[62]

RoHM: Robust human motion reconstruction via diffusion

Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexan- der Winkler, Petr Kadlecek, Siyu Tang, and Federica Bogo. RoHM: Robust human motion reconstruction via diffusion. InCVPR, 2024

2024
[63]

3d-aware neural body fitting for occlusion robust 3d human pose estimation

Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, and Alan L Yuille. 3d-aware neural body fitting for occlusion robust 3d human pose estimation. ICCV, 2023

2023
[64]

GIMO: Gaze-informed human motion prediction in con- text.arXiv preprint arXiv:2204.09443, 2022

Yang Zheng, Yanchao Yang, Kaichun Mo, Jiaman Li, Tao Yu, Yebin Liu, Karen Liu, and Leonidas J Guibas. GIMO: Gaze-informed human motion prediction in con- text.arXiv preprint arXiv:2204.09443, 2022. HumanFlow – Supplementary Material I. MODELARCHITECTURES A. Human Motion Model Autoencoder.Before encoding, the input is flattened along the joint dimension an...

work page arXiv 2022

[1] [1]

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred Agarap. Deep learning using rectified linear units (relu), 2019. URL https://arxiv.org/abs/1803.08375

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

Intention-aware online pomdp planning for autonomous driving in a crowd

Haoyu Bai, Shaojun Cai, Nan Ye, David Hsu, and Wee Sun Lee. Intention-aware online pomdp planning for autonomous driving in a crowd. In2015 IEEE Interna- tional Conference on Robotics and Automation (ICRA), pages 454–460, 2015. doi: 10.1109/ICRA.2015.7139219

work page doi:10.1109/icra.2015.7139219 2015

[3] [3]

Multi-HMR: Multi-person whole- body human mesh recovery in a single shot

Fabien Baradel*, Matthieu Armando, Salma Galaaoui, Romain Br ´egier, Philippe Weinzaepfel, Gr ´egory Rogez, and Thomas Lucas*. Multi-HMR: Multi-person whole- body human mesh recovery in a single shot. InECCV, 2024

2024

[4] [4]

Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. InComputer Vision – ECCV 2016, Lecture Notes in Computer Science. Springer International Publishing, October 2016

2016

[5] [5]

Long-term human motion prediction with scene context

Zhe Cao, Hang Gao, Karttikeya Mangalam, Qizhi Cai, Minh V o, and Jitendra Malik. Long-term human motion prediction with scene context. InECCV, 2020

2020

[6] [6]

VidBot: Learning general- izable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation

Hanzhi Chen, Boyang Sun, Anran Zhang, Marc Polle- feys, and Stefan Leutenegger. VidBot: Learning general- izable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation. InCVPR, 2025

2025

[7] [7]

Executing your commands via motion diffusion in latent space

Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, and Gang Yu. Executing your commands via motion diffusion in latent space. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18000–18010, 2023

2023

[8] [8]

Murray, and Joel W

Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks, 2019

2019

[9] [9]

Dif- fusion policy: Visuomotor policy learning via action diffusion

Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Dif- fusion policy: Visuomotor policy learning via action diffusion. InRSS, 2023

2023

[10] [10]

Non- isotropic gaussian diffusion for realistic 3d human motion prediction

Cecilia Curreli, Dominik Muhle, Abhishek Saroha, Zhen- zhang Ye, Riccardo Marin, and Daniel Cremers. Non- isotropic gaussian diffusion for realistic 3d human motion prediction. InProceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 1871– 1882, June 2025

2025

[11] [11]

Hybrid predictive control for aerial robotic physical interaction towards inspection opera- tions

Georgios Darivianakis, Kostas Alexis, Michael Burri, and Roland Siegwart. Hybrid predictive control for aerial robotic physical interaction towards inspection opera- tions. InIEEE International Conference on Robotics and Automation (ICRA), pages 53–58, 2014. doi: 10. 1109/ICRA.2014.6906589

work page arXiv 2014

[12] [12]

Anca D. Dragan. Robot planning with mathematical models of human state and action, 2017. URL https: //arxiv.org/abs/1705.04226

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Michael Everett, Yu Fan Chen, and Jonathan P. How. Collision avoidance in pedestrian-rich environments with deep reinforcement learning.IEEE Access, 9: 10357–10377, 2021. ISSN 2169-3536

2021

[14] [14]

PAMPC: Perception-aware model predic- tive control for quadrotors

Davide Falanga, Philipp Foehn, Peng Lu, and Davide Scaramuzza. PAMPC: Perception-aware model predic- tive control for quadrotors. InIEEE/RSJ Int. Conf. Intell. Robot. Syst. (IROS), 2018

2018

[15] [15]

Distribution-aligned diffusion for human mesh recovery

Lin Geng Foo, Jia Gong, Hossein Rahmani, and Jun Liu. Distribution-aligned diffusion for human mesh recovery. arXiv preprint arXiv:2211.16940, 2023

work page arXiv 2023

[16] [16]

Herbert, Jaime F

David Fridovich-Keil, Sylvia L. Herbert, Jaime F. Fisac, Sampada Deglurkar, and Claire J. Tomlin. Planning, fast and slow: A framework for adaptive real-time safe trajectory planning.IEEE International Conference on Robotics and Automation, 2018

2018

[17] [17]

Hu- mans in 4D: Reconstructing and tracking humans with transformers

Shubham Goel, Georgios Pavlakos, Jathushan Ra- jasegaran, Angjoo Kanazawa*, and Jitendra Malik*. Hu- mans in 4D: Reconstructing and tracking humans with transformers. InInternational Conference on Computer Vision (ICCV), 2023

2023

[18] [18]

Diffpose: Toward more reliable 3d pose estimation

Jia Gong, Lin Geng Foo, Zhipeng Fan, Qiuhong Ke, Hossein Rahmani, and Jun Liu. Diffpose: Toward more reliable 3d pose estimation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

2023

[19] [19]

Karen Liu, Yuting Ye, and Lingni Ma

Vladimir Guzov, Yifeng Jiang, Fangzhou Hong, Gerard Pons-Moll, Richard Newcombe, C. Karen Liu, Yuting Ye, and Lingni Ma. HMD2: Environment-aware motion generation from single egocentric head-mounted device. InInternational Conference on 3D Vision (3DV), March 2025

2025

[20] [20]

Gaussian Error Linear Units (GELUs)

Dan Hendrycks and Kevin Gimpel. Gaussian error linear units (gelus).arXiv preprint arXiv:1606.08415, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[21] [21]

Denoising Diffusion Probabilistic Models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. De- noising diffusion probabilistic models.arXiv preprint arxiv:2006.11239, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2006

[22] [22]

Zhiming Hu, Zheming Yin, Daniel Haeufle, Syn Schmitt, and Andreas Bulling. Hoimotion: Forecasting human motion during human-object interactions using egocen- tric 3d object bounding boxes.IEEE Transactions on Visualization and Computer Graphics (TVCG), pages 1– 11, 2024

2024

[23] [23]

Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36, 2024

Biao Jiang, Xin Chen, Wen Liu, Jingyi Yu, Gang Yu, and Tao Chen. Motiongpt: Human motion as a foreign language.Advances in Neural Information Processing Systems, 36, 2024

2024

[24] [24]

Guided motion diffusion for controllable human motion synthesis

Korrawe Karunratanakul, Konpat Preechakul, Supasorn Suwajanakorn, and Siyu Tang. Guided motion diffusion for controllable human motion synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2151–2162, 2023

2023

[25] [25]

Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025

Bingxin Ke, Kevin Qu, Tianfu Wang, Nando Metzger, Shengyu Huang, Bo Li, Anton Obukhov, and Konrad Schindler. Marigold: Affordable adaptation of diffusion- based image generators for image analysis, 2025

2025

[26] [26]

Occluded human mesh recovery

Rawal Khirodkar, Shashank Tripathi, and Kris Kitani. Occluded human mesh recovery. In2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), pages 1705–1715, Piscataway, NJ, June

2022

[27] [27]

In: IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022

IEEE. doi: 10.1109/CVPR52688.2022.00176

work page doi:10.1109/cvpr52688.2022.00176 2022

[28] [28]

Huang, Otmar Hilliges, and Michael J

Muhammed Kocabas, Chun-Hao P. Huang, Otmar Hilliges, and Michael J. Black. PARE: Part attention regressor for 3D human body estimation. InProc. International Conference on Computer Vision (ICCV), pages 11127–11137, October 2021

2021

[29] [29]

A software package for sequential quadratic programming

Dieter Kraft. A software package for sequential quadratic programming. Technical Report DFVLR-FB 88-28, In- stitut f ¨ur Dynamik der Flugsysteme, Oberpfaffenhofen, July 1988

1988

[30] [30]

Socially compliant mobile robot navigation via inverse reinforcement learning.The In- ternational Journal of Robotics Research, 35(11):1289– 1307, 2016

Henrik Kretzschmar, Markus Spies, Christoph Sprunk, and Wolfram Burgard. Socially compliant mobile robot navigation via inverse reinforcement learning.The In- ternational Journal of Robotics Research, 35(11):1289– 1307, 2016. doi: 10.1177/0278364915619772

work page doi:10.1177/0278364915619772 2016

[31] [31]

Jotr: 3d joint contrastive learning with transformers for occluded human mesh recovery.ICCV, 2023

Jiahao Li, Zongxin Yang, Xiaohan Wang, Jianxin Ma, Chang Zhou, and Yi Yang. Jotr: 3d joint contrastive learning with transformers for occluded human mesh recovery.ICCV, 2023

2023

[32] [32]

Ross, and Angjoo Kanazawa

Ruilong Li, Shan Yang, David A. Ross, and Angjoo Kanazawa. Learn to dance with AIST++: Music con- ditioned 3d dance generation.ICCV, 2021

2021

[33] [33]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, and Maximilian Nickel Matt Le. Flow matching for generative modeling.ICLR, 2023

2023

[34] [34]

Matthew Loper, Naureen Mahmood, Javier Romero, Ger- ard Pons-Moll, and Michael J. Black. SMPL: A skinned multi-person linear model.ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, October 2015

2015

[35] [35]

Decoupled weight de- cay regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight de- cay regularization. InInternational Conference on Learn- ing Representations, 2019. URL https://openreview.net/ forum?id=Bkg6RiCqY7

2019

[36] [36]

Troje, Gerard Pons-Moll, and Michael J

Naureen Mahmood, Nima Ghorbani, Nikolaus F. Troje, Gerard Pons-Moll, and Michael J. Black. AMASS: Archive of motion capture as surface shapes. InInter- national Conference on Computer Vision, pages 5442– 5451, October 2019

2019

[37] [37]

Kuchenbecker

Pau Marquez Julbe, Julian Nubert, Henrik Hose, Sebas- tian Trimpe, and Katherine J. Kuchenbecker. Diffusion- based approximate MPC: Fast and consistent imitation of multi-modal action distributions. InIROS, 2025

2025

[38] [38]

Mitchell, A.M

I.M. Mitchell, A.M. Bayen, and C.J. Tomlin. A time- dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005. doi: 10.1109/ TAC.2005.851439

work page arXiv 2005

[39] [39]

Pytorch: an imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas K ¨opf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chin- tala. Pytorch: an imperative style, high-perf...

2019

[40] [40]

Georgios Pavlakos, Vasileios Choutas, Nima Ghorbani, Timo Bolkart, Ahmed A. A. Osman, Dimitrios Tzionas, and Michael J. Black. Expressive body capture: 3d hands, face, and body from a single image. InProceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019

2019

[41] [41]

Courville

Ethan Perez, Florian Strub, Harm de Vries, Vincent Du- moulin, and Aaron C. Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018

2018

[42] [42]

Davis Rempe, Tolga Birdal, Aaron Hertzmann, Jimei Yang, Srinath Sridhar, and Leonidas J. Guibas. HuMoR: 3d human motion model for robust pose estimation. In International Conference on Computer Vision (ICCV), 2021

2021

[43] [43]

Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data

Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors,Computer Vision – ECCV 2020, pages 683–700, Cham, 2020. Springer International Publishing

2020

[44] [44]

Han, Florian Shkurti, and Angela P

Sepehr Samavi, James R. Han, Florian Shkurti, and Angela P. Schoellig. Sicnav: Safe and interactive crowd navigation using model predictive control and bilevel op- timization.IEEE Transactions on Robotics, 41:801–818, 2025

2025

[45] [45]

Leveraging neural network gradients within trajectory optimization for proactive human-robot interactions.IEEE International Conference on Robotics and Automation, 2020

Simon Schaefer, Karen Leung, Boris Ivanovic, and Marco Pavone. Leveraging neural network gradients within trajectory optimization for proactive human-robot interactions.IEEE International Conference on Robotics and Automation, 2020

2020

[46] [46]

HumanHalo - safe and efficient 3d navigation among humans via minimally conservative mpc

Simon Schaefer, Helen Oleynikova, Sandra Hirche, and Stefan Leutenegger. HumanHalo - safe and efficient 3d navigation among humans via minimally conservative mpc. Inarxiv, 2024

2024

[47] [47]

Kirillov, E

Mingyi Shi, Sebastian Starke, Yuting Ye, Taku Ko- mura, and Jungdam Won. Phasemp: Robust 3d pose estimation via phase-conditioned human motion prior. In2023 IEEE/CVF International Conference on Com- puter Vision (ICCV), pages 14679–14691, 2023. doi: 10.1109/ICCV51070.2023.01353

work page doi:10.1109/iccv51070.2023.01353 2023

[48] [48]

HUMOF: Human motion forecasting in interactive social scenes

Caiyi Sun, Yujing Sun, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu Ming Yiu, and Yuexin Ma. HUMOF: Human motion forecasting in interactive social scenes. InThe F ourteenth International Conference on Learning Representations, 2026

2026

[49] [49]

Move beyond trajectories: Distribution space coupling for crowd navigation.Robotics: Science and Systems, 2021

Muchen Sun, Francesca Baldini, Peter Trautman, and Todd Murphey. Move beyond trajectories: Distribution space coupling for crowd navigation.Robotics: Science and Systems, 2021

2021

[50] [50]

Aircaprl: Autonomous aerial human motion capture us- ing deep reinforcement learning.IEEE Robotics and Automation Letters, 5(4):6678–6685, October 2020

Rahul Tallamraju, Nitin Saini, Elia Bonetto, Michael Pabst, Yu Tang Liu, Michael Black, and Aamir Ahmad. Aircaprl: Autonomous aerial human motion capture us- ing deep reinforcement learning.IEEE Robotics and Automation Letters, 5(4):6678–6685, October 2020. URL https://ieeexplore.ieee.org/document/9158379

work page arXiv 2020

[51] [51]

Human motion diffusion model

Guy Tevet, Sigal Raab, Brian Gordon, Yoni Shafir, Daniel Cohen-or, and Amit Haim Bermano. Human motion diffusion model. InICLR, 2023

2023

[52] [52]

Murray, and Andreas Krause

Pete Trautman, Jeremy Ma, Richard M. Murray, and Andreas Krause. Robot navigation in dense human crowds: Statistical models and experimental studies of human–robot cooperation.The International Journal of Robotics Research, 34(3):335–356, 2015. doi: 10.1177/ 0278364914557874

2015

[53] [53]

Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control.Journal of Field Robotics, 36 (1):49–77, 2019

Dimos Tzoumanikas, Wenbin Li, Marius Grimm, Ketao Zhang, Mirko Kovac, and Stefan Leutenegger. Fully autonomous micro air vehicle flight and landing on a moving target using visual–inertial estimation and model-predictive control.Journal of Field Robotics, 36 (1):49–77, 2019. doi: https://doi.org/10.1002/rob.21821. URL https://onlinelibrary.wiley.com/doi/a...

work page doi:10.1002/rob.21821 2019

[54] [54]

Guy, Ming Lin, and Dinesh Manocha

Jur van den Berg, Stephen J. Guy, Ming Lin, and Dinesh Manocha. Reciprocal n-body collision avoid- ance. In C ´edric Pradalier, Roland Siegwart, and Ger- hard Hirzinger, editors,Robotics Research, pages 3–19, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg. ISBN 978-3-642-19457-3

2011

[55] [55]

Emanuele Vespa, Nils Funk, Paul H. J. Kelly, and Stefan Leutenegger. Adaptive-resolution octree-based volumetric SLAM. InInternational Conference on 3D Vision (3DV), pages 654–662, Qu´ebec City, QC, Canada, September 2019. doi: 10.1109/3DV .2019.00077

work page doi:10.1109/3dv 2019

[56] [56]

Toro-Arcila, and America Morales-Diaz

Rodolfo Villalobos-Salazar, Abimael Contreras-Carlos, Carlos A. Toro-Arcila, and America Morales-Diaz. Hi- erarchical drone navigation control for ensuring safety with stationary humans. In2024 21st International Con- ference on Electrical Engineering, Computing Science and Automatic Control (CCE), pages 1–6, 2024. doi: 10.1109/CCE62852.2024.10771000

work page doi:10.1109/cce62852.2024.10771000 2024

[57] [57]

Yin Wang, Zhiying Leng, Frederick W. B. Li, Shun- Cheng Wu, and Xiaohui Liang. Fg-t2m: Fine-grained text-driven human motion generation via diffusion model. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 21978–21987, 2023. doi: 10.1109/ICCV51070.2023.02014

work page doi:10.1109/iccv51070.2023.02014 2023

[58] [58]

Scene- aware human motion forecasting via mutual distance prediction

Chaoyue Xing, Wei Mao, and Miaomiao Liu. Scene- aware human motion forecasting via mutual distance prediction. InECCV, 2024

2024

[59] [59]

Decoupling human and camera motion from videos in the wild

Vickie Ye, Georgios Pavlakos, Jitendra Malik, and Angjoo Kanazawa. Decoupling human and camera motion from videos in the wild. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023

2023

[60] [60]

Learning motion priors for 4d human body capture in 3d scenes

Siwei Zhang, Yan Zhang, Federica Bogo, Marc Pollefeys, and Siyu Tang. Learning motion priors for 4d human body capture in 3d scenes. InInternational Conference on Computer Vision (ICCV), October 2021

2021

[61] [61]

Probabilistic human mesh recovery in 3d scenes from egocentric views

Siwei Zhang, Qianli Ma, Yan Zhang, Sadegh Aliakbar- ian, Darren Cosker, and Siyu Tang. Probabilistic human mesh recovery in 3d scenes from egocentric views. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

2023

[62] [62]

RoHM: Robust human motion reconstruction via diffusion

Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu, Alexan- der Winkler, Petr Kadlecek, Siyu Tang, and Federica Bogo. RoHM: Robust human motion reconstruction via diffusion. InCVPR, 2024

2024

[63] [63]

3d-aware neural body fitting for occlusion robust 3d human pose estimation

Yi Zhang, Pengliang Ji, Angtian Wang, Jieru Mei, Adam Kortylewski, and Alan L Yuille. 3d-aware neural body fitting for occlusion robust 3d human pose estimation. ICCV, 2023

2023

[64] [64]

GIMO: Gaze-informed human motion prediction in con- text.arXiv preprint arXiv:2204.09443, 2022

Yang Zheng, Yanchao Yang, Kaichun Mo, Jiaman Li, Tao Yu, Yebin Liu, Karen Liu, and Leonidas J Guibas. GIMO: Gaze-informed human motion prediction in con- text.arXiv preprint arXiv:2204.09443, 2022. HumanFlow – Supplementary Material I. MODELARCHITECTURES A. Human Motion Model Autoencoder.Before encoding, the input is flattened along the joint dimension an...

work page arXiv 2022