pith. sign in

arxiv: 2605.22017 · v1 · pith:5S3YHFR2new · submitted 2026-05-21 · 💻 cs.CV

Diverse Yet Consistent: Context-Guided Diffusion with Energy-Based Joint Refinement for Multi-Agent Motion Prediction

Pith reviewed 2026-05-22 07:11 UTC · model grok-4.3

classification 💻 cs.CV
keywords multi-agent motion predictiondiffusion modelsenergy-based modelstrajectory forecastingjoint consistencycontext guidancepedestrian datasets
0
0 comments X

The pith

A context-guided diffusion process generates diverse multi-agent motions that energy-based refinement then adjusts to enforce interaction consistency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to generate motion predictions for multiple interacting agents that are both varied in their individual behaviors and coherent as a group. It does this by first running a diffusion model whose sampling is steered by historical trajectory context, producing a range of plausible paths, and then applying an energy-based adjustment to the overall joint distribution. The goal is to keep the refined paths realistic for each agent while satisfying interaction constraints that separate models often miss. Readers would care because applications such as autonomous vehicles and crowd monitoring require forecasts that respect both personal options and mutual influences, and current methods tend to excel at one but not both. The reported experiments on standard pedestrian datasets show gains on both single-agent and multi-agent error measures.

Core claim

By embedding contextual information from past trajectories into the diffusion sampling steps, the model produces a diverse set of candidate trajectories; an energy-based term is then used to reshape the joint distribution so that the selected trajectories satisfy interaction consistency while leaving the individual trajectory plausibilities largely intact.

What carries the argument

Context-guided diffusion sampling followed by energy-based refinement of the joint trajectory distribution.

If this is right

  • Marginal prediction errors (ADE/FDE) on ETH/UCY drop below those of strong single-agent baselines.
  • Joint errors (JADE/JFDE) also improve over marginal baselines while remaining competitive with earlier joint methods.
  • The same framework delivers larger marginal gains than prior joint-prediction techniques without sacrificing joint performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same two-stage pattern could be tested on vehicle or drone trajectory data where interaction rules differ from pedestrian crowds.
  • Replacing the energy term with a learned critic might allow the refinement stage to adapt to new scene types without hand-designed potentials.
  • Extending the diffusion guidance to include map or semantic context could further increase diversity while the energy stage still enforces physical constraints.

Load-bearing premise

The energy-based step can adjust the joint distribution for interaction consistency without making the individual trajectories less plausible than those produced by the diffusion stage alone.

What would settle it

On the ETH or UCY datasets, run the method and check whether joint metrics such as JADE or JFDE improve while marginal ADE/FDE stay at least as good as the diffusion-only version; if either the joint scores fail to rise or the individual paths become visibly unrealistic, the central claim would be undercut.

Figures

Figures reproduced from arXiv: 2605.22017 by Lei Chu, Yuhuan Zhao.

Figure 1
Figure 1. Figure 1: Core concept and result of CODA: By incorporating rich interaction context and applying energy-based optimization, CODA [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Illustrates the framework of our CODA. It consists of three key modules: (1) Dynamic Context as Guidance Condition (DCGC), [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Prediction samples (Th=20) on the NBA dataset. Light blue shows historical trajectories, dark blue shows future ground truth, white curves are sampled predictions, and the violet curve denotes the mean estimate [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative trajectory prediction results on the Univ [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Deepgenerative models havebecomeapromisingapproach for human motion prediction due to their ability to capture multimodal distributions and represent diverse human be haviors. However, generating predictions that are both di verse and jointly consistent among interacting agents re mains challenging. In addition, most existing approaches are primarily evaluated using single-agent (marginal) met rics, which fail to fully reflect the joint dynamics of multi agent interactions. We propose a diffusion-based frame work that improves multi-agent motion prediction by lever aging rich contextual information from historical trajecto ries. This information is incorporated through a guidance mechanism to enhance the diversity and expressiveness of predicted motions. To further enforce interaction consis tency, we introduce an energy-based formulation that re fines the joint trajectory distribution while preserving the plausibility of individual trajectories. Extensive experi ments on four benchmark datasets demonstrate that our approach consistently outperforms existing methods. No tably, our approach substantially improves both marginal (ADE/FDE) and joint (JADE/JFDE) metrics on ETH/UCY over strong marginal baselines. Compared with prior joint prediction methods, it delivers significant gains in marginal metrics while maintaining competitive joint performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a diffusion-based framework for multi-agent motion prediction that leverages contextual information from historical trajectories via a guidance mechanism to enhance diversity and expressiveness. It introduces an energy-based formulation for joint trajectory refinement to enforce interaction consistency while preserving individual trajectory plausibility. The approach is evaluated on four benchmark datasets, showing consistent outperformance, with substantial improvements in marginal (ADE/FDE) and joint (JADE/JFDE) metrics on ETH/UCY over strong baselines, and gains in marginal metrics compared to prior joint methods.

Significance. If the results hold, this work would be significant for advancing multi-agent motion prediction by addressing the trade-off between diversity and joint consistency, which is key for applications in autonomous driving and human-robot interaction. The combination of context-guided diffusion and energy-based refinement provides a novel way to generate diverse yet consistent predictions, potentially outperforming methods focused solely on marginal or joint metrics. The reported improvements on standard benchmarks suggest practical relevance.

major comments (2)
  1. [Abstract and §4 Experiments] Abstract and §4 Experiments: The central claim that the energy-based joint refinement enforces interaction consistency without harming the plausibility or diversity of individual trajectories is load-bearing for the title and reported gains over marginal baselines. However, the manuscript provides no quantitative before-vs-after comparison of marginal ADE/FDE (or diversity measures such as mode coverage) after the refinement step. This leaves open the possibility that the energy term over-penalizes valid but less common interaction modes, directly undermining the 'diverse yet consistent' claim.
  2. [§3.3 Energy-based joint refinement] §3.3 Energy-based joint refinement: The exact mathematical form of the energy function, its weighting relative to the diffusion score, and the sampling procedure used for refinement are not specified with sufficient precision to evaluate whether the refinement step can be guaranteed to preserve marginal plausibility. Without this, it is impossible to assess the risk that the joint consistency term collapses modes or pushes trajectories into low-probability regions.
minor comments (2)
  1. [Abstract] The abstract contains multiple typographical and spacing errors (e.g., 'Deepgenerative models havebecomeapromisingapproach', 'be haviors'). These should be corrected.
  2. [Throughout] Ensure first use of acronyms (ADE, FDE, JADE, JFDE) is accompanied by their full definitions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract and §4 Experiments] Abstract and §4 Experiments: The central claim that the energy-based joint refinement enforces interaction consistency without harming the plausibility or diversity of individual trajectories is load-bearing for the title and reported gains over marginal baselines. However, the manuscript provides no quantitative before-vs-after comparison of marginal ADE/FDE (or diversity measures such as mode coverage) after the refinement step. This leaves open the possibility that the energy term over-penalizes valid but less common interaction modes, directly undermining the 'diverse yet consistent' claim.

    Authors: We acknowledge that a quantitative before-and-after analysis would provide stronger evidence for our claim. To address this, we will add a new ablation study in Section 4 of the revised manuscript. This study will report marginal ADE/FDE and diversity metrics (such as the number of modes covered) computed on the trajectories before and after the energy-based joint refinement. We expect this to show that the refinement step improves joint metrics while maintaining or even enhancing marginal performance and diversity, consistent with our overall results on the benchmarks. revision: yes

  2. Referee: [§3.3 Energy-based joint refinement] §3.3 Energy-based joint refinement: The exact mathematical form of the energy function, its weighting relative to the diffusion score, and the sampling procedure used for refinement are not specified with sufficient precision to evaluate whether the refinement step can be guaranteed to preserve marginal plausibility. Without this, it is impossible to assess the risk that the joint consistency term collapses modes or pushes trajectories into low-probability regions.

    Authors: We agree that the description in §3.3 could be more precise to allow full evaluation and reproducibility. In the revised manuscript, we will expand §3.3 to include the exact mathematical formulation of the energy function E(·), the specific weighting parameter λ used to balance it with the diffusion score, and the details of the sampling procedure (e.g., the number of refinement steps and the optimization method employed). This will enable readers to assess the preservation of marginal plausibility and the risk of mode collapse. revision: yes

Circularity Check

0 steps flagged

No circularity: method builds on standard diffusion and energy-based models with empirical validation

full rationale

The paper presents a context-guided diffusion process for diverse multi-agent motion predictions followed by an energy-based joint refinement step to enforce interaction consistency. No equations, derivations, or self-citations are visible in the provided abstract or description that reduce any claimed result to its own inputs by construction. The approach is described as leveraging established diffusion guidance mechanisms and energy-based formulations without self-referential fitting, parameter renaming, or load-bearing uniqueness theorems from the authors' prior work. Performance claims rest on experimental comparisons against baselines on ETH/UCY and other benchmarks rather than tautological predictions, rendering the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the effectiveness of the context guidance mechanism and the energy-based refinement step, neither of which is derived from first principles or supported by external benchmarks in the abstract.

invented entities (1)
  • Energy-based joint refinement no independent evidence
    purpose: To enforce interaction consistency among agents while preserving individual trajectory plausibility
    Introduced as the key mechanism for joint consistency; no independent evidence or falsifiable prediction outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5735 in / 1156 out tokens · 32072 ms · 2026-05-22T07:11:03.935775+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

68 extracted references · 68 canonical work pages · 2 internal anchors

  1. [1]

    Tripod: Human trajectory and pose dynamics forecasting in the wild

    Vida Adeli, Mahsa Ehsanpour, Ian Reid, Juan Car- los Niebles, Silvio Savarese, Ehsan Adeli, and Hamid Rezatofighi. Tripod: Human trajectory and pose dynamics forecasting in the wild. InICCV, pages 13390–13400, 2021. 2, 3

  2. [2]

    So- cial lstm: Human trajectory prediction in crowded spaces

    Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. So- cial lstm: Human trajectory prediction in crowded spaces. In CVPR, pages 961–971, 2016. 1, 2

  3. [3]

    Non- probability sampling network for stochastic human trajectory prediction

    Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Non- probability sampling network for stochastic human trajectory prediction. InCVPR, pages 6477–6487, 2022. 6

  4. [4]

    Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting

    Inhwan Bae, Jean Oh, and Hae-Gon Jeon. Eigentrajectory: Low-rank descriptors for multi-modal trajectory forecasting. InICCV, pages 10017–10029, 2023. 6

  5. [5]

    Singu- lartrajectory: Universal trajectory predictor using diffusion model

    Inhwan Bae, Young-Jae Park, and Hae-Gon Jeon. Singu- lartrajectory: Universal trajectory predictor using diffusion model. InCVPR, pages 17890–17901, 2024. 1, 5, 6

  6. [6]

    Pointwise: Predicting points and valuing deci- sions in real time with nba optical tracking data

    Dan Cervone, Alexander D’amour, Luke Bornn, and Kirk Goldsberry. Pointwise: Predicting points and valuing deci- sions in real time with nba optical tracking data. InPro- ceedings of the 8th MIT Sloan Sports Analytics Conference, Boston, MA, USA, 2014. 6

  7. [7]

    Context-conditioned spatio-temporal predictive learning for reliable v2v channel prediction.TITS,

    Lei Chu, Daoud Burghal, Rui Wang, Michael Neuman, and Andreas F Molisch. Context-conditioned spatio-temporal predictive learning for reliable v2v channel prediction.TITS,

  8. [8]

    Lookout: Diverse multi-future predic- tion and planning for self-driving

    Alexander Cui, Sergio Casas, Abolfazl Sadat, Renjie Liao, and Raquel Urtasun. Lookout: Diverse multi-future predic- tion and planning for self-driving. InCVPR, pages 16107– 16116, 2021. 1

  9. [9]

    Nachiket Deo and Mohan M. Trivedi. Convolutional social pooling for vehicle trajectory prediction. InCVPR, pages 1468–1476, 2018. 1

  10. [10]

    Diffusion models beat gans on image synthesis

    Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. InNeurIPS, pages 8780–8794,

  11. [11]

    Implicit generation and mod- eling with energy based models.NeurIPS, 32, 2019

    Yilun Du and Igor Mordatch. Implicit generation and mod- eling with energy based models.NeurIPS, 32, 2019. 2, 5

  12. [12]

    Neuralized markov random field for interaction-aware stochastic human trajectory prediction

    Zilin Fang, David Hsu, Gim Hee Lee, and Gim Hee Lee. Neuralized markov random field for interaction-aware stochastic human trajectory prediction. InICLR, 2025. 2, 3, 6, 7

  13. [13]

    Moflow: One-step flow matching for human trajectory fore- casting via implicit maximum likelihood estimation based distillation

    Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, and Renjie Liao. Moflow: One-step flow matching for human trajectory fore- casting via implicit maximum likelihood estimation based distillation. InCVPR, pages 17282–17293, 2025. 1, 5, 6, 7

  14. [14]

    Transformer networks for trajectory forecasting

    Francesco Giuliari, Irtiza Hasan, Marco Cristani, and Fabio Galasso. Transformer networks for trajectory forecasting. In ICPR, pages 10335–10342. IEEE, 2021. 2

  15. [15]

    Social gan: Socially acceptable trajec- tories with generative adversarial networks

    Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable trajec- tories with generative adversarial networks. InCVPR, pages 2255–2264, 2018. 6, 7

  16. [16]

    Classifier-free diffusion guidance.ICLR, 2022

    Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.ICLR, 2022. 2, 4

  17. [17]

    Denoising dif- fusion probabilistic models.NeurIPS, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.NeurIPS, 33:6840–6851, 2020. 2

  18. [18]

    Action-reaction: Fore- casting the dynamics of human interaction

    De-An Huang and Kris M Kitani. Action-reaction: Fore- casting the dynamics of human interaction. InECCV, pages 489–504. Springer, 2014. 2, 3

  19. [19]

    Stgat: Modeling spatial-temporal interactions for human trajectory prediction

    Yingfan Huang, Huikun Bi, Zhaoxin Li, Tianlu Mao, and Zhaoqi Wang. Stgat: Modeling spatial-temporal interactions for human trajectory prediction. InCVPR, pages 6272–6281,

  20. [20]

    An edit friendly ddpm noise space: Inversion and manipulations

    Inbar Huberman-Spiegelglas, Vladimir Kulikov, and Tomer Michaeli. An edit friendly ddpm noise space: Inversion and manipulations. InCVPR, pages 12469–12478, 2024. 2

  21. [21]

    Planning with Diffusion for Flexible Behavior Synthesis

    Michael Janner, Yilun Du, Joshua B Tenenbaum, and Sergey Levine. Planning with diffusion for flexible behavior synthe- sis.arXiv preprint arXiv:2205.09991, 2022. 2

  22. [22]

    Multi-agent long-term 3d human pose forecasting via interaction-aware trajectory conditioning

    Jaewoo Jeong, Daehee Park, and Kuk-Jin Yoon. Multi-agent long-term 3d human pose forecasting via interaction-aware trajectory conditioning. InCVPR, pages 1617–1628, 2024. 1

  23. [23]

    Multi-modal knowledge distillation-based human trajectory forecasting

    Jaewoo Jeong, Seohee Lee, Daehee Park, Giwon Lee, and Kuk-Jin Yoon. Multi-modal knowledge distillation-based human trajectory forecasting. InCVPR, pages 24222–24233,

  24. [24]

    Motiondiffuser: Controllable multi-agent motion prediction using diffusion

    Chiyu Jiang, Andre Cornman, Cheolho Park, Benjamin Sapp, Yin Zhou, Dragomir Anguelov, et al. Motiondiffuser: Controllable multi-agent motion prediction using diffusion. InCVPR, pages 9644–9653, 2023. 1

  25. [25]

    Human trajectory forecasting in crowds: A deep learning perspec- tive.TITS, 23(7):7386–7400, 2021

    Parth Kothari, Sven Kreiss, and Alexandre Alahi. Human trajectory forecasting in crowds: A deep learning perspec- tive.TITS, 23(7):7386–7400, 2021. 1

  26. [26]

    Bcdiff: Bidirectional consistent diffusion for instantaneous trajectory prediction

    Rongqing Li, Changsheng Li, Dongchun Ren, Guangyi Chen, Ye Yuan, and Guoren Wang. Bcdiff: Bidirectional consistent diffusion for instantaneous trajectory prediction. NeurIPS, 36:14400–14413, 2023. 1

  27. [27]

    Joint pedestrian tra- jectory prediction through posterior sampling

    Haotian Lin, Yixiao Wang, Mingxiao Huo, Chensheng Peng, Zhiyuan Liu, and Masayoshi Tomizuka. Joint pedestrian tra- jectory prediction through posterior sampling. InProceed- ings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5672–5679, 2024. 6, 7

  28. [28]

    Progressive pretext task learning for human trajec- tory prediction

    Xiaotong Lin, Tianming Liang, Jianhuang Lai, and Jian- Fang Hu. Progressive pretext task learning for human trajec- tory prediction. InECCV, pages 197–214. Springer, 2024. 2, 5

  29. [29]

    A structured self-attentive sentence embedding

    Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. InICLR, 2017. 3

  30. [30]

    Self-supervised point cloud registration with deep versatile descriptors for intelligent driving.TITS, 24(9):9767–9779, 2023

    Dongrui Liu, Chuanchaun Chen, Changqing Xu, Robert C Qiu, and Lei Chu. Self-supervised point cloud registration with deep versatile descriptors for intelligent driving.TITS, 24(9):9767–9779, 2023. 1

  31. [31]

    It is not the journey but the destination: Endpoint conditioned trajectory prediction

    Karttikeya Mangalam, Harshayu Girase, Shreyas Agarwal, Kuan-Hui Lee, Ehsan Adeli, Jitendra Malik, and Adrien Gaidon. It is not the journey but the destination: Endpoint conditioned trajectory prediction. InECCV, pages 759–776,

  32. [32]

    From goals, waypoints & paths to long term human trajectory forecasting

    Karttikeya Mangalam, Yang An, Harshayu Girase, and Ji- tendra Malik. From goals, waypoints & paths to long term human trajectory forecasting. InCVPR, pages 15233–15242,

  33. [33]

    Leapfrog diffusion model for stochastic trajec- tory prediction

    Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, and Yan- feng Wang. Leapfrog diffusion model for stochastic trajec- tory prediction. InCVPR, pages 5517–5526, 2023. 1, 6

  34. [34]

    Forecasting hu- man trajectory from scene history

    Mancheng Meng, Ziyan Wu, Terrence Chen, Xiran Cai, Xi- ang Zhou, Fan Yang, and Dinggang Shen. Forecasting hu- man trajectory from scene history. InNeurIPS, pages 24920– 24933, 2022. 1, 2

  35. [35]

    Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction

    Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. InCVPR, pages 14424–14432, 2020. 1, 2, 6

  36. [36]

    Tra- jectory prediction with latent belief energy-based model

    Bo Pang, Tianyang Zhao, Xu Xie, and Ying Nian Wu. Tra- jectory prediction with latent belief energy-based model. In CVPR, pages 11814–11824, 2021. 2, 5

  37. [37]

    Stirnet: A spatial-temporal interaction-aware re- cursive network for human trajectory prediction

    Yusheng Peng, Gaofeng Zhang, Xiangyu Li, and Liping Zheng. Stirnet: A spatial-temporal interaction-aware re- cursive network for human trajectory prediction. InCVPR, pages 2285–2293, 2021. 2

  38. [38]

    A self-adaptive parameter selection trajectory prediction approach via hidden markov models.TITS, 16(1): 284–296, 2014

    Shaojie Qiao, Dayong Shen, Xiaoteng Wang, Nan Han, and William Zhu. A self-adaptive parameter selection trajectory prediction approach via hidden markov models.TITS, 16(1): 284–296, 2014. 2

  39. [39]

    Efficient learning of sparse representations with an energy-based model.NeurIPS, 19, 2006

    Marc’Aurelio Ranzato, Christopher Poultney, Sumit Chopra, and Yann Cun. Efficient learning of sparse representations with an energy-based model.NeurIPS, 19, 2006. 2, 5

  40. [40]

    Trace and pace: Controllable pedestrian animation via guided tra- jectory diffusion

    Davis Rempe, Zhengyi Luo, Xue Bin Peng, Ye Yuan, Kris Kitani, Karsten Kreis, Sanja Fidler, and Or Litany. Trace and pace: Controllable pedestrian animation via guided tra- jectory diffusion. InCVPR, pages 13756–13766, 2023. 2, 4

  41. [41]

    Learning social etiquette: Human tra- jectory understanding in crowded scenes

    Alexandre Robicquet, Amir Sadeghian, Alexandre Alahi, and Silvio Savarese. Learning social etiquette: Human tra- jectory understanding in crowded scenes. InECCV, pages 549–565. Springer, 2016. 6

  42. [42]

    Joint long-term prediction of human motion using a planning- based social force approach

    Andrey Rudenko, Luigi Palmieri, and Kai O Arras. Joint long-term prediction of human motion using a planning- based social force approach. InICLR, pages 4571–4577. IEEE, 2018. 2

  43. [43]

    Social-transmotion: Promptable human trajectory prediction

    Saeed Saadatnejad, Yang Gao, Kaouther Messaoud, and Alexandre Alahi. Social-transmotion: Promptable human trajectory prediction. InICLR, 2023. 6

  44. [44]

    Jrdb-traj: A dataset and benchmark for trajectory forecasting in crowds.arXiv preprint arXiv:2311.02736, 2023

    Saeed Saadatnejad, Yang Gao, Hamid Rezatofighi, and Alexandre Alahi. Jrdb-traj: A dataset and benchmark for trajectory forecasting in crowds.arXiv preprint arXiv:2311.02736, 2023. 6

  45. [45]

    Progressive Distillation for Fast Sampling of Diffusion Models

    Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512, 2022. 2

  46. [46]

    Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data

    Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible trajec- tory forecasting with heterogeneous data. InECCV, pages 683–700. Springer, 2020. 1, 6, 7

  47. [47]

    Pose forecasting in industrial human-robot collaboration

    Alessio Sampieri, Guido Maria D’Amely di Melendugno, Andrea Avogaro, Federico Cunico, Francesco Setti, Geri Sk- enderi, Marco Cristani, and Fabio Galasso. Pose forecasting in industrial human-robot collaboration. InECCV, pages 51–

  48. [48]

    Tra- jectory unified transformer for pedestrian trajectory predic- tion

    Liushuai Shi, Le Wang, Sanping Zhou, and Gang Hua. Tra- jectory unified transformer for pedestrian trajectory predic- tion. InICCV, pages 9675–9684, 2023. 2, 6

  49. [49]

    Pip: Planning- informed trajectory prediction for autonomous driving

    Haoran Song, Wenchao Ding, Yuxuan Chen, Shaojie Shen, Michael Yu Wang, and Qifeng Chen. Pip: Planning- informed trajectory prediction for autonomous driving. In ECCV, pages 598–614. Springer, 2020. 1

  50. [50]

    Social diffusion: Long-term multiple hu- man motion anticipation

    Julian Tanke, Linguang Zhang, Amy Zhao, Chengcheng Tang, Yujun Cai, Lezi Wang, Po-Chen Wu, Juergen Gall, and Cem Keskin. Social diffusion: Long-term multiple hu- man motion anticipation. InICCV, pages 9601–9611, 2023. 1

  51. [51]

    Ana- lyzing the variety loss in the context of probabilistic trajec- tory prediction

    Luca Anthony Thiede and Pratik Prabhanjan Brahma. Ana- lyzing the variety loss in the context of probabilistic trajec- tory prediction. InICCV, pages 9954–9963, 2019. 1

  52. [52]

    An uncertain future: Forecasting from static images using variational autoencoders

    Jacob Walker, Carl Doersch, Abhinav Gupta, and Martial Hebert. An uncertain future: Forecasting from static images using variational autoencoders. InECCV, pages 835–851. Springer, 2016. 1

  53. [53]

    Graphtcn: Spatio-temporal interaction modeling for human trajectory prediction

    Chengxin Wang, Shaofeng Cai, and Gary Tan. Graphtcn: Spatio-temporal interaction modeling for human trajectory prediction. InWACV, pages 3450–3459, 2021. 2, 3

  54. [54]

    Joint metrics matter: A better standard for trajectory forecasting

    Erica Weng, Hana Hoshino, Deva Ramanan, and Kris Ki- tani. Joint metrics matter: A better standard for trajectory forecasting. InICCV, pages 20315–20326, 2023. 2, 6, 7

  55. [55]

    View vertically: A hierarchical network for trajectory prediction via fourier spectrums

    Conghao Wong, Beihao Xia, Ziming Hong, Qinmu Peng, Wei Yuan, Qiong Cao, Yibo Yang, and Xinge You. View vertically: A hierarchical network for trajectory prediction via fourier spectrums. InECCV, pages 682–700, 2020. 1, 6, 7

  56. [56]

    Timestamp-supervised wearable- based activity segmentation and recognition with contrastive learning and order-preserving optimal transport.TMC, 23 (12):10734–10751, 2024

    Songpengcheng Xia, Lei Chu, Ling Pei, Jiarui Yang, Wenx- ian Yu, and Robert C Qiu. Timestamp-supervised wearable- based activity segmentation and recognition with contrastive learning and order-preserving optimal transport.TMC, 23 (12):10734–10751, 2024. 1

  57. [57]

    Envposer: Environment-aware realistic human motion estimation from sparse observations with uncertainty modeling

    Songpengcheng Xia, Yu Zhang, Zhuo Su, Xiaozheng Zheng, Zheng Lv, Guidong Wang, Yongjie Zhang, Qi Wu, Lei Chu, and Ling Pei. Envposer: Environment-aware realistic human motion estimation from sparse observations with uncertainty modeling. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 1839–1849, 2025. 1

  58. [58]

    Groupnet: Multiscale hypergraph neural net- works for trajectory prediction with relational reasoning

    Chenxin Xu, Maosen Li, Zhenyang Ni, Ya Zhang, and Si- heng Chen. Groupnet: Multiscale hypergraph neural net- works for trajectory prediction with relational reasoning. In CVPR, pages 6498–6507, 2022. 6

  59. [59]

    Remember intentions: Retrospective-memory-based trajec- tory prediction

    Chenxin Xu, Weibo Mao, Wenjun Zhang, and Siheng Chen. Remember intentions: Retrospective-memory-based trajec- tory prediction. InCVPR, pages 6488–6497, 2022. 6, 7

  60. [60]

    Eq- motion: Equivariant multi-agent motion prediction with in- variant interaction reasoning

    Chenxin Xu, Robby T Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, and Yanfeng Wang. Eq- motion: Equivariant multi-agent motion prediction with in- variant interaction reasoning. InCVPR, pages 1410–1420,

  61. [61]

    So- cialvae: Human trajectory prediction using timewise latents

    Pei Xu, Jean-Bernard Hayet, and Ioannis Karamouzas. So- cialvae: Human trajectory prediction using timewise latents. InECCV, pages 511–528. Springer, 2022. 1, 6

  62. [62]

    Mobt- cast: Leveraging auxiliary trajectory forecasting for human mobility prediction.NeurIPS, 34:30380–30391, 2021

    Hao Xue, Flora Salim, Yongli Ren, and Nuria Oliver. Mobt- cast: Leveraging auxiliary trajectory forecasting for human mobility prediction.NeurIPS, 34:30380–30391, 2021. 3

  63. [63]

    Dlow: Diversifying latent flows for diverse human motion prediction

    Ye Yuan and Kris Kitani. Dlow: Diversifying latent flows for diverse human motion prediction. InEuropean Conference on Computer Vision, pages 346–364. Springer, 2020. 2

  64. [64]

    Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting

    Ye Yuan, Xinshuo Weng, Yanglan Ou, and Kris M Kitani. Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. InICCV, pages 9813–9823, 2021. 3, 6

  65. [65]

    Physdiff: Physics-guided human motion diffusion model

    Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, and Jan Kautz. Physdiff: Physics-guided human motion diffusion model. InICCV, pages 16010–16021, 2023. 2

  66. [66]

    Dynamic inertial poser (dynaip): Part- based motion dynamics learning for enhanced human pose estimation with sparse inertial sensors

    Yu Zhang, Songpengcheng Xia, Lei Chu, Jiarui Yang, Qi Wu, and Ling Pei. Dynamic inertial poser (dynaip): Part- based motion dynamics learning for enhanced human pose estimation with sparse inertial sensors. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1889–1899, 2024. 1

  67. [67]

    Multi-agent tensor fusion for contextual trajectory predic- tion

    Tianyang Zhao, Yifei Xu, Mathew Monfort, Wongun Choi, Chris Baker, Yibiao Zhao, Yizhou Wang, and Ying Nian Wu. Multi-agent tensor fusion for contextual trajectory predic- tion. InCVPR, pages 12126–12134, 2019. 1

  68. [68]

    Query-centric trajectory prediction

    Zikang Zhou, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Query-centric trajectory prediction. InCVPR, pages 17863–17873, 2023. 2