arxiv: 2605.11385 · v1 · submitted 2026-05-12 · 💻 cs.CV · cs.RO

Recognition: no theorem link

JACoP: Joint Alignment for Compliant Multi-Agent Prediction

Qingze Liu , Alen Mrdovic , Danrui Li , Mathew Schwartz , Sejong Yoon , Mubbasir Kapadia

Authors on Pith no claims yet

Pith reviewed 2026-05-13 02:02 UTC · model grok-4.3

classification 💻 cs.CV cs.RO

keywords trajectory predictionmulti-agent predictionMarkov Random Fieldscene compliancesocial collisionshuman trajectory predictiongenerative modelingjoint inference

0 comments

The pith

Representing agent interactions as energy potentials in a Markov Random Field enables joint selection of multi-agent trajectories that minimize scene violations and collisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents JACoP as a way to fix a common flaw in stochastic trajectory prediction models, which often produce accurate single-agent paths that still collide with each other or the environment when used together. It adds an initial filtering stage based on agent-centric profiles, then applies a Markov Random Field aligner that treats spatial distances and social behaviors as energy terms in a joint distribution. By inferring and sampling from this combined distribution, the method picks sets of trajectories that are collectively plausible. A sympathetic reader would care because this makes generated predictions usable in real applications like navigation or simulation, where isolated accuracy is not enough.

Core claim

JACoP is a multi-stage framework that first uses an Anchor-Based Agent-Centric Profiler to filter for initial compliance and then employs a Markov Random Field based aligner to formalize joint selection of scene predictions. Inter-agent spatial and social costs are represented as MRF energy potentials, allowing inference and sampling from the joint trajectory distribution to achieve prediction with optimal scene compliance.

What carries the argument

The Markov Random Field based aligner, which encodes spatial and social costs as energy potentials to perform joint inference over candidate trajectories from multiple agents.

If this is right

The framework produces predictions with fewer environmental violations than prior generative models.
Social collisions between agents are reduced while individual trajectory accuracy remains competitive.
Sampling from the joint distribution yields scene-level plausibility that independent per-agent models lack.
The resulting outputs are more suitable for downstream tasks that require collective feasibility.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same energy-potential approach could be applied to other multi-object prediction settings where global consistency matters more than isolated accuracy.
If the MRF potentials prove robust across environments, they offer a lightweight post-processing step that existing trajectory generators could adopt without retraining.
Datasets that explicitly annotate interaction violations would allow direct measurement of how completely the energy terms cover real-world constraints.

Load-bearing premise

The defined MRF energy potentials for spatial and social costs capture the relevant interactions and that joint inference yields compliant sets without losing high-accuracy individual trajectories or adding new inconsistencies.

What would settle it

On standard multi-agent datasets, compare violation counts and accuracy when using the MRF joint sampler versus simply taking the highest-scoring individual predictions; if the joint version shows no reduction in collisions or a clear drop in per-agent accuracy, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2605.11385 by Alen Mrdovic, Danrui Li, Mathew Schwartz, Mubbasir Kapadia, Qingze Liu, Sejong Yoon.

**Figure 2.** Figure 2: Model Architecture. Our framework operates in two stages: (Left) Latent embeddings from agents’ historical movement, social context, and environment query prototype trajectories, which are filtered and refined. (Right) We then use the refined proposals to infer a joint distribution of future trajectories via a Markov Random Field (MRF), with the final scene prediction sampled using Gibbs sampling. encoder … view at source ↗

**Figure 3.** Figure 3: Radar plot for all evaluation metrics among the five testing splits of ETH-UCY dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization for all prediction for two individual agents from Hotel (Top) and Zara1 (bottom) splits of ETH-UCY Dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Scene prediction with best JADE performance from Hotel (top) and Zara2 (bottom) split. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Radar Chart for normalized SDD evaluation result [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: A2A collision rate versus number of agents on the ETH [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Radar plot for all metrics among the five ETH-UCY dataset splits comparing between AgentFormer+JACoP Aligner and JACoP. [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Scene prediction with best JADE performance from Hotel (Row 1), Univ (Row 2), Zara1 (Row 3) and Zara2 (Row 4) [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Individual prediction from ETH (Row 1), Hotel (Row 2) and Zara1 (Row 3) [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

read the original abstract

Stochastic Human Trajectory Prediction (HTP) using generative modeling has emerged as a significant area of research. Although state-of-the-art models excel in optimizing the accuracy of individual agents, they often struggle to generate predictions that are collectively compliant, leading to output trajectories marred by social collisions and environmental violations, thus rendering them impractical for real-world applications. To bridge this gap, we present JACoP: Joint Alignment for Compliant Multi-Agent Prediction, an innovative multi-stage framework that ensures scene-level plausibility. JACoP incorporates an Anchor-Based Agent-Centric Profiler for effective initial compliance filtering and employs a Markov Random Field (MRF) based aligner to formalize the joint selection for scene predictions. By representing inter-agent spatial and social costs as MRF energy potentials, we successfully infer and sample from the joint trajectory distribution, achieving prediction with optimal scene compliance. Comprehensive experiments show that JACoP not only achieves competitive accuracy, but also sets a new standard in reducing both environmental violations and social collisions, thereby confirming its ability to produce collectively feasible and practically applicable trajectory predictions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JACoP adds an MRF-based selection step after per-agent candidate generation to cut collisions and violations in multi-agent trajectory prediction, but the abstract leaves the inference method and experimental controls too vague to judge the gains.

read the letter

The key point with this paper is that it proposes using an MRF to jointly select from per-agent trajectory candidates in order to enforce scene compliance, which is a practical concern in multi-agent human trajectory prediction. The new part is the specific pipeline: an Anchor-Based Agent-Centric Profiler for initial candidates followed by the MRF aligner that treats spatial and social costs as energy potentials. This is not just another generative model but a post-selection mechanism to make outputs collectively feasible. It does well at framing the problem clearly and claiming that this leads to fewer environmental violations and social collisions while keeping accuracy competitive. The soft spots are around the inference and evaluation. The state space for N agents with K candidates each is huge, so any real implementation must approximate the joint inference. The abstract does not say what method they use or whether they checked how close the result is to the true minimum energy. The energy weights are free parameters that could be tuned to look good on the compliance metrics. Experiments are summarized without specifics on baselines, agent counts, or significance testing, which makes it hard to gauge the strength of the claims. A reader interested in multi-agent prediction for robotics or autonomous driving would get some ideas from the MRF formulation. It is worth a serious referee because the problem it targets is real and the approach is distinct from pure generative modeling. I would recommend sending it out for peer review, mainly to get feedback on the missing implementation and analysis details.

Referee Report

3 major / 2 minor

Summary. The paper introduces JACoP, a multi-stage framework for stochastic human trajectory prediction. It uses an Anchor-Based Agent-Centric Profiler for initial compliance filtering of per-agent candidates, followed by an MRF-based aligner that represents inter-agent spatial and social costs as energy potentials, then performs joint inference and sampling over the trajectory distribution to produce scene-compliant multi-agent predictions while maintaining competitive individual accuracy.

Significance. If the MRF-based joint selection can be shown to reliably recover compliant combinations without discarding high-accuracy trajectories or introducing artifacts from approximation, the work would meaningfully advance multi-agent prediction by directly optimizing scene-level feasibility rather than post-hoc correction, with potential impact on applications requiring collision-free outputs.

major comments (3)

[MRF-based aligner] Abstract and MRF aligner description: the claim of achieving 'prediction with optimal scene compliance' via MRF energy minimization assumes that the inference procedure recovers a global or near-global mode of the joint distribution. For N agents each with K candidates the state space is K^N and thus intractable for exact inference; the manuscript must specify the exact algorithm (loopy BP, MCMC, etc.) and report an optimality-gap or approximation-quality analysis, as this is load-bearing for the optimality claim.
[MRF energy potentials] MRF energy potential definitions: the spatial and social costs are encoded as potentials whose weights are free parameters. Without an explicit statement of how these weights are set (fixed a priori, cross-validated, or learned) and an ablation showing sensitivity, the procedure risks implicit fitting to compliance metrics on the validation set, undermining the assertion that the joint distribution is inferred rather than tuned.
[Experiments] Experiments section: the abstract asserts competitive accuracy together with reduced environmental violations and social collisions, yet provides no statistical significance tests, no ablation on candidate-set diversity, and no comparison against strong joint baselines that also enforce compliance. These omissions prevent verification that the MRF step improves compliance without trading off accuracy or simply selecting from an already-filtered pool.

minor comments (2)

[Abstract] Abstract: the phrasing 'sets a new standard' is stronger than the reported metrics support; replace with quantitative deltas relative to the strongest baseline.
[Method] Notation: the distinction between the per-agent candidate set and the joint configuration space should be introduced with explicit symbols early in the method section to avoid later ambiguity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has identified several areas where the manuscript can be strengthened in terms of clarity, rigor, and experimental validation. We address each major comment below and commit to incorporating the necessary revisions.

read point-by-point responses

Referee: [MRF-based aligner] Abstract and MRF aligner description: the claim of achieving 'prediction with optimal scene compliance' via MRF energy minimization assumes that the inference procedure recovers a global or near-global mode of the joint distribution. For N agents each with K candidates the state space is K^N and thus intractable for exact inference; the manuscript must specify the exact algorithm (loopy BP, MCMC, etc.) and report an optimality-gap or approximation-quality analysis, as this is load-bearing for the optimality claim.

Authors: We agree that the optimality claim requires explicit support through details on the inference procedure. The current manuscript describes the MRF formulation and joint inference but does not specify the algorithm or provide approximation analysis. In the revised version, we will state that approximate inference is performed via loopy belief propagation and include an analysis of solution quality (e.g., energy comparisons against multiple random restarts and, where computationally feasible, exact inference on small agent subsets). This will clarify the practical meaning of 'optimal scene compliance' under the exponential state space. revision: yes
Referee: [MRF energy potentials] MRF energy potential definitions: the spatial and social costs are encoded as potentials whose weights are free parameters. Without an explicit statement of how these weights are set (fixed a priori, cross-validated, or learned) and an ablation showing sensitivity, the procedure risks implicit fitting to compliance metrics on the validation set, undermining the assertion that the joint distribution is inferred rather than tuned.

Authors: We acknowledge the need for transparency on the potential weights. The manuscript does not currently provide this detail. In the revision, we will explicitly state that the weights are determined via cross-validation on a held-out validation set to balance the spatial and social terms. We will also add a sensitivity ablation showing how compliance and accuracy metrics vary with weight perturbations around the selected values, confirming robustness rather than overfitting to validation compliance scores. revision: yes
Referee: [Experiments] Experiments section: the abstract asserts competitive accuracy together with reduced environmental violations and social collisions, yet provides no statistical significance tests, no ablation on candidate-set diversity, and no comparison against strong joint baselines that also enforce compliance. These omissions prevent verification that the MRF step improves compliance without trading off accuracy or simply selecting from an already-filtered pool.

Authors: We thank the referee for these suggestions to strengthen the empirical claims. The current experiments section reports mean metrics but lacks the requested elements. In the revised manuscript, we will add paired statistical significance tests for the reported reductions in violations and collisions. We will include an ablation varying the diversity of the candidate sets produced by the profiler. We will also expand the baselines to include additional strong joint methods that enforce compliance, allowing direct comparison of the MRF aligner's contribution beyond the initial filtering stage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper describes a multi-stage framework that first generates per-agent trajectory candidates via an Anchor-Based Agent-Centric Profiler, then uses an MRF whose energy potentials are explicitly defined in terms of spatial and social costs to select a joint configuration. The inference step produces a distribution whose samples minimize that energy by construction, but this is an explicit modeling and optimization choice rather than a reduction of the claimed result to its own inputs. No evidence appears of fitted parameters being relabeled as predictions, self-citation load-bearing uniqueness theorems, or ansatzes smuggled via prior work. The central claim (joint selection yields compliant outputs) rests on the modeling assumption that the chosen potentials capture relevant interactions, which is an empirical modeling decision open to external validation rather than a definitional tautology. The derivation is therefore self-contained against the stated inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Framework rests on standard MRF modeling assumptions for joint distributions and likely requires hand-tuned or fitted weights for energy potentials; no new entities postulated.

free parameters (1)

MRF energy potential weights
Weights balancing spatial, social, and environmental costs in the aligner, chosen or optimized to achieve compliance gains.

axioms (1)

domain assumption Inter-agent interactions and environmental constraints can be accurately represented as additive energy potentials in an MRF for joint inference.
Invoked in the description of the MRF aligner to enable sampling from the joint trajectory distribution.

pith-pipeline@v0.9.0 · 5504 in / 1251 out tokens · 82890 ms · 2026-05-13T02:02:18.994525+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages

[1]

So- cial lstm: Human trajectory prediction in crowded spaces

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. So- cial lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 961–971, 2016. 2

work page 2016
[2]

Learning pedestrian group representations for multi-modal trajectory prediction

Inhwan Bae, Jin-Hwi Park, and Hae-Gon Jeon. Learning pedestrian group representations for multi-modal trajectory prediction. InEuropean Conference on Computer Vision, pages 270–289. Springer, 2022. 2, 6

work page 2022
[3]

Singu- lartrajectory: Universal trajectory predictor using diffusion model

Inhwan Bae, Young-Jae Park, and Hae-Gon Jeon. Singu- lartrajectory: Universal trajectory predictor using diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17890– 17901, 2024. 1, 2, 3, 6

work page 2024
[4]

Unified uncertainty-aware diffusion for multi-agent trajectory modeling

Guillem Capellera, Antonio Rubio, Luis Ferraz, and Antonio Agudo. Unified uncertainty-aware diffusion for multi-agent trajectory modeling. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 22476–22486,

work page
[5]

Mgf: Mixed gaussian flow for diverse trajectory prediction.Advances in Neural Information Processing Sys- tems, 37:57539–57563, 2024

Jiahe Chen, Jinkun Cao, Dahua Lin, Kris Kitani, and Jiang- miao Pang. Mgf: Mixed gaussian flow for diverse trajectory prediction.Advances in Neural Information Processing Sys- tems, 37:57539–57563, 2024. 2, S4

work page 2024
[6]

Socialmoif: Multi- order intention fusion for pedestrian trajectory prediction

Kai Chen, Xiaodong Zhao, Yujie Huang, Guoyu Fang, Xiao Song, Ruiping Wang, and Ziyuan Wang. Socialmoif: Multi- order intention fusion for pedestrian trajectory prediction. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 22465–22475, 2025. 2

work page 2025
[7]

Neuralized markov random field for interaction-aware stochastic human trajectory prediction

Zilin Fang, David Hsu, Gim Hee Lee, and Gim Hee Lee. Neuralized markov random field for interaction-aware stochastic human trajectory prediction. InICLR, 2025. S4

work page 2025
[8]

Moflow: One-step flow matching for human trajectory fore- casting via implicit maximum likelihood estimation based distillation

Yuxiang Fu, Qi Yan, Lele Wang, Ke Li, and Renjie Liao. Moflow: One-step flow matching for human trajectory fore- casting via implicit maximum likelihood estimation based distillation. InProceedings of the Computer Vision and Pat- tern Recognition Conference, pages 17282–17293, 2025. 1, 2

work page 2025
[9]

Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation

Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid. Vectornet: Encoding hd maps and agent dynamics from vectorized rep- resentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11525– 11533, 2020. 2

work page 2020
[10]

Stochastic trajectory pre- diction via motion indeterminacy diffusion

Tianpei Gu, Guangyi Chen, Junlong Li, Chunze Lin, Yong- ming Rao, Jie Zhou, and Jiwen Lu. Stochastic trajectory pre- diction via motion indeterminacy diffusion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 17113–17122, 2022. 2

work page 2022
[11]

Social gan: Socially acceptable tra- jectories with generative adversarial networks

Agrim Gupta, Justin Johnson, Li Fei-Fei, Silvio Savarese, and Alexandre Alahi. Social gan: Socially acceptable tra- jectories with generative adversarial networks. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 2255–2264, 2018. 2, 5

work page 2018
[12]

Stgat: Modeling spatial-temporal interac- tions for human trajectory prediction

Yingfan Huang, Huikun Bi, Zhaoxin Li, Tianlu Mao, and Zhaoqi Wang. Stgat: Modeling spatial-temporal interac- tions for human trajectory prediction. InProceedings of the IEEE/CVF international conference on computer vision, pages 6272–6281, 2019. 2

work page 2019
[13]

The trajectron: Proba- bilistic multi-agent trajectory modeling with dynamic spa- tiotemporal graphs

Boris Ivanovic and Marco Pavone. The trajectron: Proba- bilistic multi-agent trajectory modeling with dynamic spa- tiotemporal graphs. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 2375–2384,

work page
[14]

Higher-order relational reasoning for pedestrian trajectory prediction

Sungjune Kim, Hyung-gun Chi, Hyerin Lim, Karthik Ra- mani, Jinkyu Kim, and Sangpil Kim. Higher-order relational reasoning for pedestrian trajectory prediction. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15251–15260, 2024. 2

work page 2024
[15]

Social- bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks.Advances in neural informa- tion processing systems, 32, 2019

Vineet Kosaraju, Amir Sadeghian, Roberto Mart ´ın-Mart´ın, Ian Reid, Hamid Rezatofighi, and Silvio Savarese. Social- bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks.Advances in neural informa- tion processing systems, 32, 2019. 2

work page 2019
[16]

Muse-vae: multi-scale vae for environment-aware long term trajectory prediction

Mihee Lee, Samuel S Sohn, Seonghyeon Moon, Sejong Yoon, Mubbasir Kapadia, and Vladimir Pavlovic. Muse-vae: multi-scale vae for environment-aware long term trajectory prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2221–2230,

work page
[17]

Desire: Distant future prediction in dynamic scenes with interacting agents

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B Choy, Philip HS Torr, and Manmohan Chandraker. Desire: Distant future prediction in dynamic scenes with interacting agents. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 336–345, 2017. 2

work page 2017
[18]

Mart: Multiscale relational transformer networks for multi-agent trajectory prediction

Seongju Lee, Junseok Lee, Yeonguk Yu, Taeri Kim, and Ky- oobin Lee. Mart: Multiscale relational transformer networks for multi-agent trajectory prediction. InEuropean Confer- ence on Computer Vision, pages 89–107. Springer, 2024. S4

work page 2024
[19]

Crowds by example

Alon Lerner, Yiorgos Chrysanthou, and Dani Lischinski. Crowds by example. InComputer graphics forum, pages 655–664. Wiley Online Library, 2007. 5

work page 2007
[20]

Learning lane graph represen- tations for motion forecasting

Ming Liang, Bin Yang, Rui Hu, Yun Chen, Renjie Liao, Song Feng, and Raquel Urtasun. Learning lane graph represen- tations for motion forecasting. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23– 28, 2020, Proceedings, Part II 16, pages 541–556. Springer,

work page 2020
[21]

Trajdiffuse: A conditional diffusion model for environment-aware trajec- tory prediction

Qingze Tony Liu, Danrui Li, Samuel S Sohn, Sejong Yoon, Mubbasir Kapadia, and Vladimir Pavlovic. Trajdiffuse: A conditional diffusion model for environment-aware trajec- tory prediction. InInternational Conference on Pattern Recognition, pages 382–397. Springer, 2024. 1, 2

work page 2024
[22]

JFP: Joint future prediction with interactive multi-agent modeling for autonomous driving

Wenjie Luo, Cheolho Park, Andre Cornman, Benjamin Sapp, and Dragomir Anguelov. JFP: Joint future prediction with interactive multi-agent modeling for autonomous driving. InProceedings of The 6th Conference on Robot Learning, pages 1457–1467. PMLR, 2023. 4

work page 2023
[23]

From goals, waypoints & paths to long term hu- man trajectory forecasting

Karttikeya Mangalam, Yang An, Harshayu Girase, and Jiten- dra Malik. From goals, waypoints & paths to long term hu- man trajectory forecasting. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 15233– 15242, 2021. 1, 2, 6

work page 2021
[24]

Leapfrog diffusion model for stochastic trajectory 9 prediction

Weibo Mao, Chenxin Xu, Qi Zhu, Siheng Chen, and Yanfeng Wang. Leapfrog diffusion model for stochastic trajectory 9 prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5517–5526,

work page
[25]

Coloss-gan: Collision-free human trajectory generation with a collision loss and gan

Martin Moder and Josef Pauli. Coloss-gan: Collision-free human trajectory generation with a collision loss and gan. In 2021 20th International Conference on Advanced Robotics (ICAR), pages 625–632. IEEE, 2021. 2

work page 2021
[26]

Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction

Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14424– 14432, 2020. 2

work page 2020
[27]

You’ll never walk alone: Modeling social behav- ior for multi-target tracking

Stefano Pellegrini, Andreas Ess, Konrad Schindler, and Luc Van Gool. You’ll never walk alone: Modeling social behav- ior for multi-target tracking. In2009 IEEE 12th international conference on computer vision, pages 261–268. IEEE, 2009. 5

work page 2009
[28]

Amd: Adap- tive momentum and decoupled contrastive learning frame- work for robust long-tail trajectory prediction.arXiv preprint arXiv:2507.01801, 2025

Bin Rao, Haicheng Liao, Yanchen Guan, Chengyue Wang, Bonan Wang, Jiaxun Zhang, and Zhenning Li. Amd: Adap- tive momentum and decoupled contrastive learning frame- work for robust long-tail trajectory prediction.arXiv preprint arXiv:2507.01801, 2025. 2

work page arXiv 2025
[29]

R2p2: A reparameterized pushforward policy for diverse, precise generative path forecasting

Nicholas Rhinehart, Kris M Kitani, and Paul Vernaza. R2p2: A reparameterized pushforward policy for diverse, precise generative path forecasting. InProceedings of the European Conference on Computer Vision (ECCV), pages 772–788,

work page
[30]

Precog: Prediction conditioned on goals in visual multi-agent settings

Nicholas Rhinehart, Rowan McAllister, Kris Kitani, and Sergey Levine. Precog: Prediction conditioned on goals in visual multi-agent settings. InProceedings of the IEEE/CVF international conference on computer vision, pages 2821– 2830, 2019. 2

work page 2019
[31]

FJMP: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs

Luke Rowe, Martin Ethier, Eli-Henry Dykhne, and Krzysztof Czarnecki. FJMP: Factorized joint multi-agent motion prediction over learned directed acyclic interaction graphs. In2023 IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pages 13745–13755. IEEE, 2023. 2

work page 2023
[32]

Trajectron++: Dynamically-feasible tra- jectory forecasting with heterogeneous data

Tim Salzmann, Boris Ivanovic, Punarjay Chakravarty, and Marco Pavone. Trajectron++: Dynamically-feasible tra- jectory forecasting with heterogeneous data. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVIII 16, pages 683–700. Springer, 2020. 2

work page 2020
[33]

Shaoshuai Shi, Li Jiang, Dengxin Dai, and Bernt Schiele. Mtr++: Multi-agent motion prediction with symmetric scene modeling and guided intention querying.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3955– 3971, 2024. 2

work page 2024
[34]

A2x: An agent and environment interac- tion benchmark for multimodal human trajectory prediction

Samuel S Sohn, Mihee Lee, Seonghyeon Moon, Gang Qiao, Muhammad Usman, Sejong Yoon, Vladimir Pavlovic, and Mubbasir Kapadia. A2x: An agent and environment interac- tion benchmark for multimodal human trajectory prediction. InProceedings of the 14th ACM SIGGRAPH Conference on Motion, Interaction and Games, pages 1–9, 2021. 3

work page 2021
[35]

Recursive social behavior graph for trajectory prediction

Jianhua Sun, Qinhong Jiang, and Cewu Lu. Recursive social behavior graph for trajectory prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 660–669, 2020. 2

work page 2020
[36]

Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020

Matthew Tancik, Pratul Srinivasan, Ben Mildenhall, Sara Fridovich-Keil, Nithin Raghavan, Utkarsh Singhal, Ravi Ra- mamoorthi, Jonathan Barron, and Ren Ng. Fourier features let networks learn high frequency functions in low dimen- sional domains.Advances in neural information processing systems, 33:7537–7547, 2020. 3

work page 2020
[37]

An- alyzing the variety loss in the context of probabilistic tra- jectory prediction

Luca Anthony Thiede and Pratik Prabhanjan Brahma. An- alyzing the variety loss in the context of probabilistic tra- jectory prediction. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 9954–9963,

work page
[38]

multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,

B Varadarajan, A Hefny, A Srivastava, KS Refaat, N Nayakanti, A Cornman, K Chen, B Douillard, and CP Lam. D. anguelovet al.,“multipath++: Efficient information fusion and trajectory aggregation for behavior prediction,”.arXiv preprint arXiv, 2111, 2021. 2

work page 2021
[39]

Fend: A future enhanced distribution-aware contrastive learning framework for long-tail trajectory prediction

Yuning Wang, Pu Zhang, Lei Bai, and Jianru Xue. Fend: A future enhanced distribution-aware contrastive learning framework for long-tail trajectory prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1400–1409, 2023. 2

work page 2023
[40]

Joint metrics matter: A better standard for trajectory forecasting

Erica Weng, Hana Hoshino, Deva Ramanan, and Kris Ki- tani. Joint metrics matter: A better standard for trajectory forecasting. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 20315–20326, 2023. 3, 5, S1

work page 2023
[41]

Remember intentions: Retrospective-memory-based trajec- tory prediction

Chenxin Xu, Weibo Mao, Wenjun Zhang, and Siheng Chen. Remember intentions: Retrospective-memory-based trajec- tory prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6488– 6497, 2022. 2

work page 2022
[42]

Eqmo- tion: Equivariant multi-agent motion prediction with invari- ant interaction reasoning

Chenxin Xu, Robby T Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, and Yanfeng Wang. Eqmo- tion: Equivariant multi-agent motion prediction with invari- ant interaction reasoning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1410–1420, 2023. 6

work page 2023
[43]

So- cialvae: Human trajectory prediction using timewise latents

Pei Xu, Jean-Bernard Hayet, and Ioannis Karamouzas. So- cialvae: Human trajectory prediction using timewise latents. InEuropean Conference on Computer Vision, pages 511–

work page
[44]

Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting

Ye Yuan, Xinshuo Weng, Yanglan Ou, and Kris M Kitani. Agentformer: Agent-aware transformers for socio-temporal multi-agent forecasting. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 9813– 9823, 2021. 1, 2, 6, S3

work page 2021
[45]

Hivt: Hierarchical vector transformer for multi-agent motion prediction

Zikang Zhou, Luyao Ye, Jianping Wang, Kui Wu, and Ke- jie Lu. Hivt: Hierarchical vector transformer for multi-agent motion prediction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 8823–8833, 2022. 2

work page 2022
[46]

Query-centric trajectory prediction

Zikang Zhou, Jianping Wang, Yung-Hui Li, and Yu-Kai Huang. Query-centric trajectory prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 17863–17873, 2023. 2, 3 10 JACoP: Joint Alignment for Compliant Multi-Agent Prediction Supplementary Material Section A details our sampling algorithm for generating sc...

work page 2023
[47]

These joint metrics are de- signed to more accurately reflect a model’s capability to predict the collective future trajectories of all agents present within a given scene

(17) JADE/JFDEThe Joint Accuracy metrics, referred to as JADE/JFDE, were initially introduced in [40] with the objective of enhancing the widely used marginal minkADE/F DEmetrics. These joint metrics are de- signed to more accurately reflect a model’s capability to predict the collective future trajectories of all agents present within a given scene. Unli...

work page
[48]

Two agents are considered to collide if their positions come withinr= 0.2meters at any future time step

(18) Agent-to-Agent CollisionThe agent-to-agent collision rate measures the proportion of predicted trajectories that intersect with another agent’s path in the same scene predic- tion. Two agents are considered to collide if their positions come withinr= 0.2meters at any future time step. We first define an indicator function for collision as ⊮col( ˆY (k...

work page
[49]

A smaller value for these average metrics indicates that the model’s entire distribution of predictions is tightly concentrated around the ground- truth future

(23) The key difference betweenmin kADE/F DEand avgADE/F DElies in their aggregation strategy: instead of selecting the best sample (i.e., the minimum error) from theKoutputs,avgADE/F DEaverages the displacement error across allKmodel outputs. A smaller value for these average metrics indicates that the model’s entire distribution of predictions is tightl...

work page