pith. machine review for the scientific record. sign in

arxiv: 2511.22039 · v3 · submitted 2025-11-27 · 💻 cs.CV

Recognition: unknown

SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model

Authors on Pith no claims yet
classification 💻 cs.CV
keywords occupancyfuturediscreteforecastingscenesparsetrajectory-conditionedtransformer
0
0 comments X
read the original abstract

This paper introduces a novel architecture for trajectory-conditioned forecasting of future 3D scene occupancy. In contrast to methods that rely on variational autoencoders (VAEs) to generate discrete occupancy tokens, which inherently limit representational capacity, our approach predicts multi-frame future occupancy in an end-to-end manner directly from raw image features. Inspired by the success of attention-based transformer architectures in foundational vision and language models such as GPT and VGGT, we employ a sparse occupancy representation that bypasses the intermediate bird's eye view (BEV) projection and its explicit geometric priors. This design allows the transformer to capture spatiotemporal dependencies more effectively. By avoiding both the finite-capacity constraint of discrete tokenization and the structural limitations of BEV representations, our method achieves state-of-the-art performance on the nuScenes benchmark for 1-3 second occupancy forecasting, outperforming existing approaches by a significant margin. Furthermore, it demonstrates robust scene dynamics understanding, consistently delivering high accuracy under arbitrary future trajectory conditioning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

    cs.CV 2026-05 conditional novelty 6.0

    HiPR improves 3D occupancy prediction by adaptively reparameterizing projection sampling ranges using LiDAR height priors instead of fixed uniform pillars.

  2. Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

    cs.CV 2026-05 unverdicted novelty 6.0

    HiPR improves 3D occupancy prediction by reparameterizing image-to-voxel projections using LiDAR-derived height priors to adapt sampling ranges to scene sparsity and height variations.