GoRela: Go Relative for Viewpoint-Invariant Motion Forecasting
read the original abstract
The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.
This paper has not been read by Pith yet.
Forward citations
Cited by 3 Pith papers
-
Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting
Argoverse 2 introduces three new datasets with annotated sensor data, massive lidar collections, and challenging motion forecasting scenarios for autonomous driving research.
-
Goal-Oriented Reactive Simulation for Closed-Loop Trajectory Prediction
Closed-loop on-policy training with a reactive goal-oriented scene decoder cuts collision rates by up to 79.5% in dense traffic compared to standard open-loop baselines.
-
Conditional Flow-VAE for Safety-Critical Traffic Scenario Generation
A conditional flow matching model generates realistic safety-critical traffic scenarios by turning nominal scenes into dangerous rollouts using combined simulation and real data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.