pith. sign in

arxiv: 2605.15424 · v1 · pith:TMSREE44new · submitted 2026-05-14 · 💻 cs.CV

Social-Mamba: Socially-Aware Trajectory Forecasting with State-Space Models

Pith reviewed 2026-05-19 15:12 UTC · model grok-4.3

classification 💻 cs.CV
keywords trajectory forecastingstate-space modelssocial interactionscrowd navigationefficient sequence modelingegocentric representation
0
0 comments X

The pith

Social-Mamba reformulates social interactions as structured scans on an egocentric grid using a Cycle Mamba block to achieve accurate trajectory forecasts with linear-time computation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Social-Mamba as a way to forecast human paths in crowded spaces by treating social dependencies as sequential processes rather than dense attention graphs. It organizes nearby agents on an egocentric grid and breaks interactions into three complementary scans: temporal, egocentric, and goal-centric. These scans run through a Cycle Mamba block that keeps bidirectional information flowing continuously, then combine via a learnable social gate. If the approach holds, forecasts remain accurate while computation grows only linearly with the number of people instead of quadratically, which matters for real-time systems that must operate amid dozens of moving agents. Experiments on five standard benchmarks support the claim of state-of-the-art accuracy paired with better parameter counts and speed.

Core claim

Social-Mamba reformulates social interactions as structured sequential processes by placing agents on an egocentric grid and introducing social triplet factorization to decompose interactions into temporal, egocentric, and goal-centric scans. These are processed through a Cycle Mamba block that enables continuous bidirectional information flow and are dynamically integrated through a learnable social gate and global scan to generate accurate and efficient trajectory predictions.

What carries the argument

The Cycle Mamba block, which enables continuous bidirectional information flow while processing social triplet factorizations on an egocentric grid of agents.

If this is right

  • Forecasting models can scale to denser crowds without quadratic growth in compute or memory.
  • The architecture can be inserted into flow-matching pipelines to gain further accuracy and speed gains.
  • Lower parameter counts make real-time deployment feasible on edge hardware for navigation tasks.
  • Social-Mamba supplies a reusable backbone for other multi-agent prediction problems that currently rely on attention.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The egocentric grid plus cyclic scan pattern may transfer to other unstructured multi-agent settings such as traffic or sports analytics.
  • If the linear scaling holds under real sensor noise, it could reduce the hardware requirements for onboard crowd-aware planners in robots.
  • Future work could test whether adding explicit goal uncertainty modeling inside the scans improves long-horizon stability.

Load-bearing premise

Decomposing social interactions into temporal, egocentric, and goal-centric scans on an egocentric grid sufficiently captures the unstructured and dynamic nature of real-world social dependencies without major information loss.

What would settle it

Run the model on a new crowd dataset containing highly irregular agent arrangements or abrupt goal changes that break the assumed triplet structure, then check whether accuracy falls below attention-based baselines while computation remains linear.

Figures

Figures reproduced from arXiv: 2605.15424 by Alexandre Alahi, Po-Chien Luan, Wuyang Li, Yang Gao.

Figure 1
Figure 1. Figure 1: Comparison of interaction modeling approaches. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of Bidirectional Mamba architectures. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the Social-Mamba framework. The model first establishes an ego-centric view by creating a sorted social grid. This grid is then processed by three parallel interaction modules—temporal, egocentric, and goal-centric—which use our Cycle Mamba blocks to capture different facets of social influence. A dynamic gating network fuses these representations, and a final scan across agents captures global… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results of the NBA-Full. (a)-(c): Comparison with Multi￾Transmotion. (d)-(i): Multimodal outputs. (a) (b) (c) Neighbors GT Observation Start point GT Social Mamba [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional qualitative results on JRDB [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pipeline of MoFlow with Social Mamba. We utilize Social-Mamba to model context and social-temporal interactions for MoFlow. Neighbors Observation GT Social-Mamba Multi-Transmotion Failure cases Successful cases [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional qualitative results of the NBA-Full [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
read the original abstract

Human trajectory forecasting is crucial for safe navigation in crowded environments, requiring models that balance accuracy with computational efficiency. Efficiently modeling social interactions is key to performance in dense crowds. Yet, most recent methods rely on attention mechanisms, which are effective at capturing complex dependencies, but incur quadratic computational costs that scale poorly with the growing number of neighbors. Recently, Selective State-Space Models have provided a linear-time alternative; however, their inherently sequential design is misaligned with the unstructured and dynamic nature of social interactions. To address this challenge, we propose Social-Mamba, a forecasting architecture that reformulates social interactions as structured sequential processes. At its core is the Cycle Mamba block, a novel module that enables continuous bidirectional information flow. Social-Mamba organizes agents on an egocentric grid and introduces social triplet factorization, which decomposes interactions into temporal, egocentric, and goal-centric scans. These are dynamically integrated through a learnable social gate and global scan to generate accurate and efficient trajectory predictions. Extensive experiments on five trajectory forecasting benchmarks show that Social-Mamba achieves state-of-the-art accuracy while offering superior parameter efficiency and computational scalability. Furthermore, embedding Social-Mamba into a flow-matching framework further enhances both accuracy and efficiency, establishing it as a flexible and robust foundation for future trajectory forecasting research. The code is publicly available: https://github.com/vita-epfl/Social-Mamba

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Social-Mamba, a trajectory forecasting architecture that replaces quadratic attention with linear-time state-space models. It proposes the Cycle Mamba block for bidirectional information flow, organizes agents on an egocentric grid, and applies social triplet factorization to decompose interactions into temporal, egocentric, and goal-centric scans. These components are dynamically combined via a learnable social gate and global scan. The paper reports state-of-the-art accuracy together with improved parameter efficiency and scalability on five benchmarks, plus further gains when the model is embedded in a flow-matching framework. Code is released publicly.

Significance. If the empirical results are robust, the work would provide a practical advance in socially-aware forecasting by demonstrating that structured state-space scans can substitute for attention while preserving accuracy and gaining linear scaling. This is relevant for real-time applications in dense crowds. Public code and the flow-matching integration are additional strengths that support reproducibility and extensibility.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Cycle Mamba block and social triplet factorization): The headline claim of SOTA accuracy while retaining full modeling power rests on the assumption that the three fixed scan directions on the egocentric grid plus the Cycle Mamba block capture all relevant unstructured social dependencies. The decomposition imposes an ordering and locality bias; if simultaneous or non-axis-aligned couplings are common, the linear reformulation could under-model them. Direct evidence (e.g., an ablation that restores full attention on the same grid or a diagnostic measuring information loss) is needed to secure this precondition for the accuracy half of the result.
  2. [Experiments] Experiments section: The abstract states SOTA results on five benchmarks with efficiency gains, yet the support for both accuracy and scalability claims would be stronger with explicit reporting of baseline re-implementations, standard deviations across runs, ablation tables isolating the social gate and each scan direction, and precise train/validation/test splits. Without these, it is difficult to assess whether the reported margins are stable or sensitive to implementation details.
minor comments (2)
  1. [§3] Notation in §3: Define the precise construction of the egocentric grid and the ordering of the three scans inside the Cycle Mamba block more formally so that the linear-time claim can be verified from the equations alone.
  2. [Figure 1] Figure 1 (architecture diagram): Ensure the figure explicitly shows how the three scan outputs are fused by the learnable social gate before the global scan; current diagrams sometimes leave the integration step implicit.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the insightful comments on our work. We provide detailed responses to each major comment and describe the revisions we plan to make to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Cycle Mamba block and social triplet factorization): The headline claim of SOTA accuracy while retaining full modeling power rests on the assumption that the three fixed scan directions on the egocentric grid plus the Cycle Mamba block capture all relevant unstructured social dependencies. The decomposition imposes an ordering and locality bias; if simultaneous or non-axis-aligned couplings are common, the linear reformulation could under-model them. Direct evidence (e.g., an ablation that restores full attention on the same grid or a diagnostic measuring information loss) is needed to secure this precondition for the accuracy half of the result.

    Authors: We agree that providing direct evidence for the modeling capacity of our structured scans is important to support the SOTA accuracy claims. The design of social triplet factorization is based on the structured nature of pedestrian interactions, where temporal, spatial-egocentric, and goal-centric relations are primary. The Cycle Mamba enables bidirectional processing to reduce ordering bias. To address the request for evidence, we will add in the revised manuscript an ablation study that compares our model against a variant using standard attention within the same egocentric grid setup (noting the computational trade-off), along with a diagnostic plot showing the information preservation across the factorization steps. This will help quantify any potential under-modeling of complex dependencies. revision: yes

  2. Referee: [Experiments] Experiments section: The abstract states SOTA results on five benchmarks with efficiency gains, yet the support for both accuracy and scalability claims would be stronger with explicit reporting of baseline re-implementations, standard deviations across runs, ablation tables isolating the social gate and each scan direction, and precise train/validation/test splits. Without these, it is difficult to assess whether the reported margins are stable or sensitive to implementation details.

    Authors: We acknowledge that additional experimental details would strengthen the presentation. In the revised version, we will explicitly state that all baselines were re-implemented following their original papers or official repositories. We will report standard deviations from multiple independent runs (with at least three seeds per experiment). We will expand the ablation studies to include dedicated tables for the social gate and each individual scan direction (temporal, egocentric, goal-centric). Finally, we will provide precise details on the train, validation, and test splits used for each of the five benchmarks in the main text or supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical benchmarks and independent architectural design

full rationale

The paper presents Social-Mamba as a novel architecture that reformulates social interactions via an egocentric grid, social triplet factorization, and a Cycle Mamba block to enable linear-time processing. These components are introduced as original contributions without any derivation chain that reduces by construction to fitted parameters, self-citations, or renamed inputs. Performance claims (SOTA accuracy and efficiency on five benchmarks) are validated through direct experimental comparison rather than mathematical predictions that equate to the model's own design choices. No load-bearing self-citation chains, ansatz smuggling, or uniqueness theorems imported from prior author work appear in the central argument; the model is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the newly introduced Cycle Mamba block and the decomposition of interactions into three specific scan types, which are validated only through downstream empirical performance.

free parameters (1)
  • learnable social gate parameters
    Parameters that dynamically integrate the temporal, egocentric, and goal-centric scans.
axioms (1)
  • domain assumption Social interactions in crowds can be adequately represented as structured sequential processes on an egocentric grid without losing critical dynamic dependencies.
    This premise underpins the reformulation of interactions and the design of the triplet factorization.
invented entities (1)
  • Cycle Mamba block no independent evidence
    purpose: To enable continuous bidirectional information flow across the structured social scans.
    New module introduced to address the sequential limitation of standard state-space models in social settings.

pith-pipeline@v0.9.0 · 5785 in / 1327 out tokens · 47201 ms · 2026-05-19T15:12:52.765359+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 2 internal anchors

  1. [1]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social lstm: Human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 961–971 (2016) 1, 4

  2. [2]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Bae, I., Park, Y.J., Jeon, H.G.: Singulartrajectory: Universal trajectory predictor using diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17890–17901 (2024) 4

  3. [3]

    arXiv preprint arXiv:2509.22522 (2025) 3

    Capellera, G., Ferraz, L., Rubio, A., Alahi, A., Agudo, A.: Jointdiff: Bridg- ing continuous and discrete in multi-agent trajectory generation. arXiv preprint arXiv:2509.22522 (2025) 3

  4. [4]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Capellera, G., Rubio, A., Ferraz, L., Agudo, A.: Unified uncertainty-aware diffusion for multi-agent trajectory modeling. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 22476–22486 (2025) 2, 3, 4, 13

  5. [5]

    In: The Thirteenth International Conference on Learning Representations (2025) 9, 11, 20

    Fang, Z., Hsu, D., Lee, G.H.: Neuralized markov random field for interaction- aware stochastic human trajectory prediction. In: The Thirteenth International Conference on Learning Representations (2025) 9, 11, 20

  6. [6]

    In: Proceedings of the Computer Vision and Pattern Recognition Con- ference

    Fu, Y., Yan, Q., Wang, L., Li, K., Liao, R.: Moflow: One-step flow matching for human trajectory forecasting via implicit maximum likelihood estimation based distillation. In: Proceedings of the Computer Vision and Pattern Recognition Con- ference. pp. 17282–17293 (2025) 4, 12, 20

  7. [7]

    In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (2026) 1

    Gao, Y., Li, W., Luan, P.C., Alahi, A.: Deformable gaussian occupancy: Decou- pling rigid and nonrigid motion with factorized distillation. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2026 (2026) 1

  8. [8]

    arXiv preprint arXiv:2411.02673 (2024) 1, 9, 10, 11, 14, 20

    Gao, Y., Luan, P.C., Alahi, A.: Multi-transmotion: Pre-trained model for human motion prediction. arXiv preprint arXiv:2411.02673 (2024) 1, 9, 10, 11, 14, 20

  9. [9]

    arXiv preprint arXiv:2507.23657 (2025) 10

    Gao, Y., Luan, P.C., Messaoud, K., Feng, L., Alahi, A.: Omnitraj: Pre-training on heterogeneous data for adaptive and zero-shot human trajectory prediction. arXiv preprint arXiv:2507.23657 (2025) 10

  10. [10]

    IEEE Transactions on Intelligent Transportation Systems (2025) 2

    Gao, Y., Saadatnejad, S., Alahi, A.: Social-pose: Enhancing trajectory prediction with human body pose. IEEE Transactions on Intelligent Transportation Systems (2025) 2

  11. [12]

    Latent variable sequential set transformers for joint multi-agent motion prediction,

    Girgis, R., Golemo, F., Codevilla, F., Weiss, M., D’Souza, J.A., Kahou, S.E., Heide, F., Pal, C.: Latent variable sequential set transformers for joint multi-agent motion prediction. arXiv preprint arXiv:2104.00563 (2021) 2, 20

  12. [13]

    In: 2020 25th international conference on pattern recognition (ICPR)

    Giuliari, F., Hasan, I., Cristani, M., Galasso, F.: Transformer networks for tra- jectory forecasting. In: 2020 25th international conference on pattern recognition (ICPR). pp. 10335–10342. IEEE (2021) 4 16 P.-C. Luan et al

  13. [14]

    Mamba: Linear-Time Sequence Modeling with Selective State Spaces

    Gu, A., Dao, T.: Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752 (2023) 2, 4

  14. [15]

    Efficiently Modeling Long Sequences with Structured State Spaces

    Gu, A., Goel, K., Ré, C.: Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396 (2021) 4

  15. [16]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17113– 17122 (June 2022) 4

  16. [17]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Gu, T., Chen, G., Li, J., Lin, C., Rao, Y., Zhou, J., Lu, J.: Stochastic trajectory prediction via motion indeterminacy diffusion. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17113–17122 (2022) 20

  17. [18]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social gan: Socially acceptable trajectories with generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2255–2264 (2018) 4

  18. [19]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 6272–6281 (2019) 4

  19. [20]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Huang, Y., Cheng, Y., Wang, K.: Trajectory mamba: Efficient attention-mamba forecasting model based on selective ssm. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 12058–12067 (2025) 2, 4

  20. [21]

    IEEE Robotics and Automation Letters6(2), 295–302 (2020) 4

    Ivanovic, B., Leung, K., Schmerling, E., Pavone, M.: Multimodal deep generative models for trajectory prediction: A conditional variational autoencoder approach. IEEE Robotics and Automation Letters6(2), 295–302 (2020) 4

  21. [22]

    Kalman, R.E.: A new approach to linear filtering and prediction problems (1960) 4

  22. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Kim, S., Chi, H.g., Lim, H., Ramani, K., Kim, J., Kim, S.: Higher-order relational reasoning for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 15251–15260 (2024) 20

  23. [24]

    Advances in neural information processing systems32(2019) 4

    Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, H., Savarese, S.: Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. Advances in neural information processing systems32(2019) 4

  24. [25]

    Kothari, P., Kreiss, S., Alahi, A.: Human trajectory forecasting in crowds: A deep learning perspective (2021) 4, 19

  25. [26]

    Advances in neural information process- ing systems33, 19783–19794 (2020) 4

    Li, J., Yang, F., Tomizuka, M., Choi, C.: Evolvegraph: Multi-agent trajectory pre- diction with dynamic relational reasoning. Advances in neural information process- ing systems33, 19783–19794 (2020) 4

  26. [27]

    Stable video infinity: Infinite-length video generation with error recycling.arXiv preprint arXiv:2510.09212,

    Li, W., Pan, W., Luan, P.C., Gao, Y., Alahi, A.: Stable video infinity: Infinite- length video generation with error recycling. arXiv preprint arXiv:2510.09212 (2025) 1

  27. [28]

    arXiv preprint arXiv:2506.04623 (2025) 1

    Li, W., Yu, Z., Alahi, A.: Voxdet: Rethinking 3d semantic occupancy prediction as dense object detection. arXiv preprint arXiv:2506.04623 (2025) 1

  28. [29]

    Linou, K., Linou, D., de Boer, M.: Nba player movements.https://github.com/ linouk23/NBA-Player-Movements/tree/master(2016) 9

  29. [30]

    arXiv preprint arXiv:2503.03535 (2025) 4 Social-Mamba 17

    Luan, P.C., Gao, Y., Demonsant, C., Alahi, A.: Unified human localization and tra- jectory prediction with monocular vision. arXiv preprint arXiv:2503.03535 (2025) 4 Social-Mamba 17

  30. [31]

    Mao,W.,Xu,C.,Zhu,Q.,Chen,S.,Wang,Y.:Leapfrogdiffusionmodelforstochas- tictrajectoryprediction.In:ProceedingsoftheIEEE/CVFconferenceoncomputer vision and pattern recognition. pp. 5517–5526 (2023) 4, 9, 11, 20

  31. [32]

    IEEE transactions on pattern analysis and machine intelligence45(6), 6748–6765 (2021) 9

    Martin-Martin, R., Patel, M., Rezatofighi, H., Shenoi, A., Gwak, J., Frankel, E., Sadeghian, A., Savarese, S.: Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE transactions on pattern analysis and machine intelligence45(6), 6748–6765 (2021) 9

  32. [33]

    IEEE Transactions on Intelligent Vehicles6(1), 175– 185 (2020) 4

    Messaoud, K., Yahiaoui, I., Verroust-Blondet, A., Nashashibi, F.: Attention based vehicle trajectory prediction. IEEE Transactions on Intelligent Vehicles6(1), 175– 185 (2020) 4

  33. [34]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition

    Mohamed, A., Qian, K., Elhoseiny, M., Claudel, C.: Social-stgcnn: A social spatio- temporal graph convolutional neural network for human trajectory prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recog- nition. pp. 14424–14432 (2020) 4

  34. [35]

    In: 2020 25th International Con- ference on Pattern Recognition (ICPR)

    Monti, A., Bertugli, A., Calderara, S., Cucchiara, R.: Dag-net: Double attentive graph neural network for trajectory forecasting. In: 2020 25th International Con- ference on Pattern Recognition (ICPR). pp. 2551–2558. IEEE (2021) 20

  35. [36]

    In: Proceedings of the Computer Vision and Pattern Recognition Conference

    Qiu, R., Gong, J., Zhang, X., Luo, S., Zhang, B., Cen, Y.: Adapting to observa- tion length of trajectory prediction via contrastive learning. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 1645–1654 (2025) 11

  36. [37]

    In: European conference on computer vision

    Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: Human trajectory understanding in crowded scenes. In: European conference on computer vision. pp. 549–565. Springer (2016) 9

  37. [38]

    arXiv preprint arXiv:2312.16168 (2023) 2, 10, 11, 14

    Saadatnejad, S., Gao, Y., Messaoud, K., Alahi, A.: Social-transmotion: Promptable human trajectory prediction. arXiv preprint arXiv:2312.16168 (2023) 2, 10, 11, 14

  38. [39]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Rezatofighi, H., Savarese, S.: Sophie: An attentive gan for predicting paths compliant to social and physical constraints. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1349–1358 (2019) 4

  39. [40]

    In: European Conference on Computer Vision

    Salzmann, T., Ivanovic, B., Chakravarty, P., Pavone, M.: Trajectron++: Dynamically-feasible trajectory forecasting with heterogeneous data. In: European Conference on Computer Vision. pp. 683–700. Springer (2020) 1, 4, 11, 20

  40. [41]

    In: 2018 IEEE International Conference on Robotics and Automation (ICRA)

    Schmerling, E., Leung, K., Vollprecht, W., Pavone, M.: Multimodal probabilistic model-based planning for human-robot interaction. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). pp. 3399–3406. IEEE (2018) 4

  41. [42]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Shi, L., Wang, L., Zhou, S., Hua, G.: Trajectory unified transformer for pedestrian trajectory prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 9675–9684 (2023) 20

  42. [43]

    In: The Thirteenth International Conference on Learning Representations (2025) 4

    Sun, J., Li, Y., Chai, L., Lu, C.: Interactive adjustment for human trajectory prediction with individual feedback. In: The Thirteenth International Conference on Learning Representations (2025) 4

  43. [44]

    Advances in neural information pro- cessing systems30(2017) 2, 4, 13

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017) 2, 4, 13

  44. [45]

    IEEE Robotics and Automation Letters7(2), 2716–2723 (2022) 11, 20

    Wang, C., Wang, Y., Xu, M., Crandall, D.J.: Stepwise goal-driven networks for trajectory prediction. IEEE Robotics and Automation Letters7(2), 2716–2723 (2022) 11, 20

  45. [46]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Xu, C., Li, M., Ni, Z., Zhang, Y., Chen, S.: Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6498– 6507 (2022) 4 18 P.-C. Luan et al

  46. [47]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Xu, C., Mao, W., Zhang, W., Chen, S.: Remember intentions: Retrospective- memory-based trajectory prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6488–6497 (2022) 20

  47. [48]

    In: European Conference on Computer Vision

    Xu, P., Hayet, J.B., Karamouzas, I.: Socialvae: Human trajectory prediction us- ing timewise latents. In: European Conference on Computer Vision. pp. 511–528. Springer (2022) 4, 9, 11, 20

  48. [49]

    arXiv preprint arXiv:2405.17680 (2024) 2, 3, 4, 13, 20

    Xu, Y., Fu, Y.: Sports-traj: A unified trajectory generation model for multi-agent movement in sports. arXiv preprint arXiv:2405.17680 (2024) 2, 3, 4, 13, 20

  49. [50]

    IEEE Robotics and Automation Letters6(2), 1463–1470 (2021) 11, 20

    Yao, Y., Atkins, E., Johnson-Roberson, M., Vasudevan, R., Du, X.: Bitrap: Bi- directional pedestrian trajectory prediction with multi-modal goal estimation. IEEE Robotics and Automation Letters6(2), 1463–1470 (2021) 11, 20

  50. [51]

    In: Proceedings of the IEEE/CVF international conference on computer vision

    Yuan, Y., Weng, X., Ou, Y., Kitani, K.M.: Agentformer: Agent-aware transform- ers for socio-temporal multi-agent forecasting. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9813–9823 (2021) 2, 4

  51. [52]

    Advances in Neural Information Processing Systems 37, 106582–106606 (2024) 2

    Zhang, B., Song, N., Zhang, L.: Decoupling motion forecasting into directional intentions and dynamic states. Advances in Neural Information Processing Systems 37, 106582–106606 (2024) 2

  52. [53]

    IEEE Transactions on Circuits and Systems for Video Technology (2025) 3, 11, 12

    Zhang, S., Zhao, G., Lyu, F., Wang, S., Zhang, Z., Zhao, F., Li, J., Shan, C., Wang, L.: Mambaptp: Exploring the potential of mamba for pedestrian trajectory predic- tion. IEEE Transactions on Circuits and Systems for Video Technology (2025) 3, 11, 12

  53. [54]

    The most important thing is to try and inspire people so that they can be great in whatever they want to do,

    Zhang, Z., Liu, A., Reid, I., Hartley, R., Zhuang, B., Tang, H.: Motion mamba: Effi- cient and long sequence motion generation. In: European Conference on Computer Vision. pp. 265–282. Springer (2024) 3, 13 Social-Mamba 19 7 Appendix The appendix provides additional details and analyses to complement the main paper. We begin with implementation details, i...