pith. sign in

arxiv: 2606.03909 · v1 · pith:YAFXOY4Gnew · submitted 2026-06-02 · 💻 cs.CV

SparseStreet: Sparse Gaussian Splatting for Real-Time Street Scene Simulation

Pith reviewed 2026-06-28 10:50 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian SplattingScene CompressionStreet Scene ReconstructionDynamic ObjectsReal-Time RenderingPruningWaymonuScenes
0
0 comments X

The pith

SparseStreet prunes 3D Gaussians in street scenes to cut storage by up to 80 percent while keeping moving objects intact.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that 3D Gaussian Splatting for street scenes wastes many primitives on static backgrounds that can be pruned away. Dynamic objects require dense, high-fidelity Gaussians to hold their shape and motion across frames, yet the background holds enough redundancy that a two-stage process can remove most of it. The first stage uses node-based learnable pruning to drop low-contributing primitives; the second applies extra compression once the scene stabilizes. A reader would care because the result turns an otherwise memory-heavy representation into one that still supports real-time, high-quality rendering of city driving scenes on the Waymo and nuScenes datasets.

Core claim

SparseStreet introduces a compression framework for Gaussian Splatting in street scenes that first applies node-based learnable pruning to remove low-contributing primitives and then compresses static background regions, achieving up to 80% reduction in primitives while preserving the geometry and appearance of dynamic objects on Waymo and nuScenes datasets.

What carries the argument

Node-based learnable pruning strategy that removes low-contributing Gaussian primitives while preserving critical regions, followed by background compression.

If this is right

  • Real-time rendering becomes feasible on hardware with limited memory because the total number of Gaussians drops sharply.
  • Dynamic objects retain their geometry and appearance, so simulation applications can still track moving traffic accurately.
  • Storage costs for large-scale street scene datasets fall enough to allow more scenes to be kept on disk or in memory.
  • The same pruning logic can be applied after any initial Gaussian optimization step once the scene representation stabilizes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dynamic-versus-static distinction could guide compression in other outdoor reconstruction tasks such as aerial mapping.
  • Combining the pruning with existing acceleration structures for Gaussian splatting might push frame rates higher without further quality loss.
  • If the redundancy pattern holds in indoor scenes, the framework could be adapted by redefining which regions count as background.

Load-bearing premise

Static background regions contain substantial redundancy that can be pruned without harming overall scene quality or the temporal consistency of dynamic objects.

What would settle it

Rendering the compressed model on held-out Waymo sequences and measuring whether temporal consistency metrics for vehicles and pedestrians drop below the uncompressed baseline.

Figures

Figures reproduced from arXiv: 2606.03909 by Hao Wang, Ming Lu, Nan Huang, Ningning Ma, Peng Chen, Qingpo Wuwu, Shanghang Zhang, Xiaobao Wei, Zhongyu Zhao.

Figure 1
Figure 1. Figure 1: Visualization of Gaussian projections before and after pruning across three camera views. The first row displays the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of SparseStreet. Given the street video as input, our method first constructs a hierarchical scene graph [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparisons of ground truth (GT), OmniRe, and OmniRe + Ours. The fourth column shows Gaussian [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Comparison of different pruning strategies on dynamic objects across three camera views. First row: Ground truth [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

While 3D Gaussian Splatting has shown promising results in street scene reconstruction, existing methods require massive numbers of Gaussian primitives to capture fine details, leading to prohibitive storage costs and slow rendering speeds. We observe that dynamic objects (e.g., vehicles and pedestrians) demand high-fidelity representations to maintain temporal consistency, while static background regions often contain substantial redundancy. Motivated by this, we propose SparseStreet, a general compression framework specifically designed for street scenes. First, we introduce a node-based learnable pruning strategy that systematically removes low-contributing Gaussian primitives while preserving visually critical regions. Second, after the scene representation stabilizes, we apply background compression, further reducing redundancy in static regions. Our method effectively preserves the geometry and appearance of dynamic objects while significantly reducing the total number of Gaussian primitives. Extensive experiments on the Waymo and nuScenes demonstrate that SparseStreet achieves up to 80% compression ratio with minimal quality degradation, enabling resource-efficient, high-fidelity dynamic scene reconstruction. Project website: https://sparsestreet.github.io/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes SparseStreet, a compression framework for 3D Gaussian Splatting in street scenes. It introduces a node-based learnable pruning strategy to remove low-contributing primitives while preserving visually critical regions, followed by background compression on static areas. The central claim is that this achieves up to 80% compression ratio on Waymo and nuScenes with minimal quality degradation, while preserving the geometry and appearance of dynamic objects for temporal consistency.

Significance. If the empirical claims hold with proper validation, the work could support more resource-efficient real-time rendering and simulation of complex urban environments, which is relevant for applications requiring both fidelity on moving objects and reduced storage/rendering costs.

major comments (2)
  1. Abstract: the central claim of 'up to 80% compression ratio with minimal quality degradation' and preservation of dynamic objects is stated without any quantitative metrics, baselines, ablation studies, or error analysis, so the result cannot be assessed from the provided text.
  2. Abstract (motivation and method): the node-based learnable pruning is said to remove low-contributing Gaussians while 'preserving visually critical regions' and dynamic objects, but no explicit mechanism (dynamic mask, motion-aware weighting, or post-pruning re-optimization) is described; contribution-based pruning can assign low scores to dynamic-object Gaussians due to limited views and occlusions, directly risking the temporal-consistency claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful comments on the abstract. We address each point below and indicate where revisions will be made to strengthen the presentation of claims and method details.

read point-by-point responses
  1. Referee: Abstract: the central claim of 'up to 80% compression ratio with minimal quality degradation' and preservation of dynamic objects is stated without any quantitative metrics, baselines, ablation studies, or error analysis, so the result cannot be assessed from the provided text.

    Authors: We agree that the abstract summarizes results at a high level without embedding specific metrics. The full manuscript reports quantitative results on Waymo and nuScenes, including compression ratios up to 80%, PSNR/SSIM/LPIPS comparisons to 3DGS baselines and prior compression methods, ablation studies on the pruning components, and separate analysis of dynamic-object fidelity. To improve assessability from the abstract itself, we will revise it to incorporate one or two key quantitative indicators (e.g., “80% compression with <0.5 dB average PSNR drop”) while remaining within length limits. revision: yes

  2. Referee: Abstract (motivation and method): the node-based learnable pruning is said to remove low-contributing Gaussians while 'preserving visually critical regions' and dynamic objects, but no explicit mechanism (dynamic mask, motion-aware weighting, or post-pruning re-optimization) is described; contribution-based pruning can assign low scores to dynamic-object Gaussians due to limited views and occlusions, directly risking the temporal-consistency claim.

    Authors: The node-based pruning mechanism, which operates on learned per-node importance scores aggregated across multiple views rather than per-Gaussian contribution alone, is detailed in Section 3.2; this grouping helps mitigate the low-view-count problem for dynamic objects. Experiments in Section 4.3 and 4.4 demonstrate that temporal consistency on moving vehicles and pedestrians is maintained (quantified via per-object PSNR and visual inspection across frames). Because the abstract is brief, we will add a short clarifying phrase on the node-level aggregation. We acknowledge the referee’s concern about potential bias against dynamic Gaussians and will ensure the revised abstract and introduction explicitly note this design choice. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external empirical validation

full rationale

The paper introduces a node-based learnable pruning strategy followed by background compression for 3D Gaussian Splatting in street scenes. The central result (up to 80% compression with minimal quality loss on Waymo and nuScenes) is presented as an experimental outcome measured against public datasets, not derived from any self-referential equation or fitted parameter renamed as a prediction. No self-citation chain, uniqueness theorem, or ansatz is invoked to force the compression ratio. The motivation (dynamic objects need high fidelity while backgrounds are redundant) is an observation, not a definitional loop. This matches the default case of a non-circular empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method implicitly relies on standard Gaussian Splatting assumptions not detailed here.

pith-pipeline@v0.9.1-grok · 5732 in / 1094 out tokens · 38502 ms · 2026-06-28T10:50:29.219504+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 19 canonical work pages · 3 internal anchors

  1. [1]

    Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432(2013)

  2. [2]

    Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621–11631

  3. [3]

    Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, et al. 2026. Fastdrivevla: Efficient end-to-end driving via plug-and-play reconstruction-based token pruning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 2571–2579

  4. [4]

    Jiajun Cao, Xiaoan Zhang, Xiaobao Wei, Liyuqiu Huang, Wang Zijian, Hanzhen Zhang, Zhengyu Jia, Wei Mao, Hao Wang, Xianming Liu, et al . 2026. Evo- DriveVLA: Evolving Autonomous Driving Vision-Language-Action Model via Collaborative Perception-Planning Distillation.arXiv preprint arXiv:2603.09465 (2026)

  5. [5]

    Peng Chen, Xiaobao Wei, Qingpo Wuwu, Xinyi Wang, Xingyu Xiao, and Ming Lu. 2025. Mixedgaussianavatar: Realistically and geometrically accurate head avatar via mixed 2d-3d gaussians. InProceedings of the 33rd ACM International Conference on Multimedia. 945–954

  6. [6]

    Peng Chen, Xiaobao Wei, Yi Yang, Naiming Yao, Hui Chen, and Feng Tian. 2026. RSATalker: Realistic Socially-Aware Talking Head Generation for Multi-Turn Conversation.arXiv preprint arXiv:2601.10606(2026)

  7. [7]

    Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, and Li Zhang. 2023. Peri- odic Vibration Gaussian: Dynamic Urban Scene Reconstruction and Real-time Rendering.arXiv:2311.18561(2023)

  8. [8]

    Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, et al . 2024. Omnire: Omni urban scene reconstruction.arXiv preprint arXiv:2408.16760(2024)

  9. [9]

    Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager

    Simon Le Cleac’h, Hong Yu, Michelle Guo, Taylor A. Howell, Ruohan Gao, Jiajun Wu, Zachary Manchester, and Mac Schwager. 2022. Differentiable Physics Simu- lation of Dynamics-Augmented Neural Objects.IEEE Robotics and Automation Letters8 (2022), 2780–2787. https://api.semanticscholar.org/CorpusID:252967901

  10. [10]

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. InConference on robot learning. PMLR, 1–16

  11. [11]

    Zhiwen Fan, Kevin Wang, Kairun Wen, Zehao Zhu, Dejia Xu, Zhangyang Wang, et al. 2025. Lightgaussian: Unbounded 3d gaussian compression with 15x reduc- tion and 200+ fps.Advances in neural information processing systems37 (2025), 140138–140158

  12. [12]

    Guangchi Fang and Bing Wang. 2024. Mini-splatting: Representing scenes with a constrained number of gaussians. InEuropean Conference on Computer Vision. Springer, 165–181

  13. [13]

    Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulo, Marc Pollefeys, and Peter Kontschieder. 2024. Multi-level neural scene graphs for dynamic urban en- vironments. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21125–21135

  14. [14]

    Sharath Girish, Kamal Gupta, and Abhinav Shrivastava. 2024. Eagles: Efficient accelerated 3d gaussians with lightweight encodings. InEuropean Conference on Computer Vision. Springer, 54–71

  15. [15]

    Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, and Shanghang Zhang. 2026. S3Gaussian: Self-Supervised Street Gaussians for Autonomous Driving. (2026)

  16. [16]

    Sheng Yu Huang, Zan Gojcic, Zian Wang, Francis Williams, Yoni Kasten, Sanja Fidler, Konrad Schindler, and Or Litany. 2023. Neural LiDAR Fields for Novel View Synthesis.2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), 18190–18200. https://api.semanticscholar.org/CorpusID:258437311

  17. [17]

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Dret- takis. 2023. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. arXiv:2308.04079 [cs.GR] https://arxiv.org/abs/2308.04079

  18. [18]

    Segment Anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything.arXiv:2304.02643(2023)

  19. [19]

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park

  20. [20]

    InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Compact 3d gaussian representation for radiance field. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21719–21728

  21. [21]

    Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, and Junwei Han. 2024. VDG: Vision- Only Dynamic Gaussian for Driving Simulation.arXiv preprint arXiv:2406.18198 (2024)

  22. [22]

    Ying Li, Xiaobao Wei, Xiaowei Chi, Yuming Li, Zhongyu Zhao, Hao Wang, Ningn- ing Ma, Ming Lu, and Sirui Han. 2026. Manipdreamer3d: Synthesizing plausible robotic manipulation video with occupancy-aware 3d trajectory. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 6644–6652

  23. [23]

    Zhengqi Li, Simon Niklaus, Noah Snavely, and Oliver Wang. 2021. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  24. [24]

    Jeffrey Yunfan Liu, Yun Chen, Ze Yang, Jingkang Wang, Sivabalan Manivasagam, and Raquel Urtasun. 2023. Real-Time Neural Rasterization for Large Scenes.2023 IEEE/CVF International Conference on Computer Vision (ICCV)(2023), 8382–8393. https://api.semanticscholar.org/CorpusID:264870747

  25. [25]

    Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: a skinned multi-person linear model.ACM Trans. Graph.34, 6, Article 248 (Oct. 2015), 16 pages. doi:10.1145/2816795.2818013

  26. [26]

    Fan Lu, Yan Xu, Guang-Sheng Chen, Hongsheng Li, Kwan-Yee Lin, and Changjun Jiang. 2023. Urban Radiance Field Representation with Deformable Neural Mesh Primitives.2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023), 465–476. https://api.semanticscholar.org/CorpusID:259991347

  27. [27]

    Mechanical Simulation. 2024. CarSim. Available online: https://www.carsim. com/products/carsim/. Accessed on 16 July 2024

  28. [28]

    Mildenhall, P

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. arXiv:2003.08934 [cs.CV] https://arxiv.org/abs/2003. 08934

  29. [29]

    Simon Niedermayr, Josef Stumpfegger, and Rüdiger Westermann. 2024. Com- pressed 3d gaussian splatting for accelerated novel view synthesis. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10349– 10358

  30. [30]

    NVIDIA. 2023. NVIDIA DRIVE Sim. Available online: https://developer.nvidia. com/drive/drive-sim. Accessed on 18 April 2024

  31. [32]

    Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. 2021. Neu- ral Scene Graphs for Dynamic Scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2856–2865

  32. [33]

    Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. 2021. Neural scene graphs for dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2856–2865

  33. [34]

    Keunhong Park, Utkarsh Sinha, Jonathan T Barron, Sofien Bouaziz, Dan B Gold- man, Steven M Seitz, and Ricardo Martin-Brualla. 2021. Nerfies: Deformable neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision. 5865–5874

  34. [35]

    Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M

    Keunhong Park, Utkarsh Sinha, Peter Hedman, Jonathan T. Barron, Sofien Bouaziz, Dan B Goldman, Ricardo Martin-Brualla, and Steven M. Seitz. 2021. HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields.ACM Trans. Graph.40, 6, Article 238 (dec 2021)

  35. [36]

    Chensheng Peng, Chengwei Zhang, Yixiao Wang, Chenfeng Xu, Yichen Xie, Wen- zhao Zheng, Kurt Keutzer, Masayoshi Tomizuka, and Wei Zhan. 2024. Desire-gs: 4d street gaussians for static-dynamic decomposition and surface reconstruction for urban driving scenes.arXiv preprint arXiv:2411.11921(2024). SparseStreet: Sparse Gaussian Splatting for Real-Time Street ...

  36. [37]

    Albert Pumarola, Enric Corona, Gerard Pons-Moll, and Francesc Moreno-Noguer

  37. [38]

    InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    D-NeRF: Neural Radiance Fields for Dynamic Scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

  38. [39]

    Srinivasan, Jonathan T

    Konstantinos Rematas, Andrew Liu, Pratul P. Srinivasan, Jonathan T. Barron, Andrea Tagliasacchi, Tom Funkhouser, and Vittorio Ferrari. 2022. Urban Radiance Fields.CVPR(2022)

  39. [40]

    Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. 2018. Airsim: High-fidelity visual and physical simulation for autonomous vehicles. InField and Service Robotics: Results of the 11th International Conference. Springer, 621–635

  40. [41]

    Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla, Aurelien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, et al

  41. [42]

    In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2446–2454

  42. [43]

    Yihong Sun and Bharath Hariharan. 2023. Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes. InThirty-seventh Conference on Neural Information Processing Systems

  43. [44]

    Srinivasan, Jonathan T

    Matthew Tancik, Vincent Casser, Xinchen Yan, Sabeek Pradhan, Ben Mildenhall, Pratul P. Srinivasan, Jonathan T. Barron, and Henrik Kretzschmar. 2022. Block- NeRF: Scalable Large Scene Neural View Synthesis.2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2022), 8238–8248. https: //api.semanticscholar.org/CorpusID:246706356

  44. [45]

    Adam Tonderski, Carl Lindström, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. 2023. NeuRAD: Neural Rendering for Autonomous Driving.arXiv preprint arXiv:2311.15260(2023)

  45. [46]

    Edgar Tretschk, Ayush Tewari, Vladislav Golyanik, Michael Zollhöfer, Christoph Lassner, and Christian Theobalt. 2020. Non-Rigid Neural Radiance Fields: Recon- struction and Novel View Synthesis of a Dynamic Scene From Monocular Video. arXiv:2012.12247 [cs.CV]

  46. [47]

    Haithem Turki, Deva Ramanan, and Mahadev Satyanarayanan. 2021. Mega- NeRF: Scalable Construction of Large-Scale NeRFs for Virtual Fly- Throughs. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021), 12912–12921. https://api.semanticscholar.org/CorpusID:245334780

  47. [48]

    Haithem Turki, Jason Y Zhang, Francesco Ferroni, and Deva Ramanan. 2023. SUDS: Scalable Urban Dynamic Scenes. InComputer Vision and Pattern Recogni- tion (CVPR)

  48. [49]

    Hao Wang, Xiaobao Wei, Ying Li, Qingpo Wuwu, Dongli Wu, Jiajun Cao, Ming Lu, Wenzhao Zheng, and Shanghang Zhang. 2025. RoboArmGS: High-Quality Robotic Arm Splatting via B\’ezier Curve Refinement.arXiv preprint arXiv:2511.17961 (2025)

  49. [50]

    Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, and Shanghang Zhang. 2025. Embodiedocc++: Boosting embodied 3d occupancy prediction with plane regularization and uncertainty sampler. InProceedings of the 33rd ACM International Conference on Multimedia. 925–934

  50. [51]

    Liao Wang, Jiakai Zhang, Xinhang Liu, Fuqiang Zhao, Yanshun Zhang, Yingliang Zhang, Minye Wu, Jingyi Yu, and Lan Xu. 2022. Fourier PlenOctrees for Dynamic Radiance Field Rendering in Real-Time. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13524–13534

  51. [52]

    Yu Wang, Xiaobao Wei, Ming Lu, and Guoliang Kang. 2025. Plgs: Robust panoptic lifting with 3d gaussian splatting.IEEE Transactions on Image Processing(2025)

  52. [53]

    Xiaobao Wei, Jiajun Cao, Yizhu Jin, Ming Lu, Guangyu Wang, and Shanghang Zhang. 2024. I-medsam: Implicit medical image segmentation with segment anything. InEuropean Conference on Computer Vision. Springer, 90–107

  53. [54]

    Xiaobao Wei, Peng Chen, Guangyu Li, Ming Lu, Hui Chen, and Feng Tian. 2025. Gazegaussian: High-fidelity gaze redirection with 3d gaussian splatting. InPro- ceedings of the IEEE/CVF International Conference on Computer Vision. 13293– 13303

  54. [55]

    Xiaobao Wei, Peng Chen, Ming Lu, Hui Chen, and Feng Tian. 2025. Graphavatar: Compact head avatars with gnn-generated 3d gaussians. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 8295–8303

  55. [56]

    Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao, Zhuangzhe Wu, Nan Huang, Ming Lu, Ningning Ma, and Shanghang Zhang. 2025. Emd: Explicit motion modeling for high-quality street gaussian splatting. InProceedings of the IEEE/CVF international conference on computer vision. 28462–28472

  56. [57]

    Xiaobao Wei, Zhangjie Ye, Yuxiang Gu, Zunjie Zhu, Yunfei Guo, Yingying Shen, Shan Zhao, Ming Lu, Haiyang Sun, Bing Wang, et al . 2026. ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking.arXiv preprint arXiv:2601.01386(2026)

  57. [58]

    Xiaobao Wei, Renrui Zhang, Jiarui Wu, Jiaming Liu, Ming Lu, Yandong Guo, and Shanghang Zhang. 2024. Nto3d: Neural target object 3d reconstruction with segment anything. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 20352–20362

  58. [59]

    Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole, and Cengiz Öztireli. 2022. D2NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video.ArXivabs/2205.15838 (2022). https://api. semanticscholar.org/CorpusID:249210189

  59. [60]

    Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, Yuxin Huang, Xiaoyu Ye, Zike Yan, Yongliang Shi, Yiyi Liao, and Hao Zhao. 2023. MARS: An Instance-aware, Modular and Realistic Simulator for Autonomous Driving.CICAI(2023)

  60. [61]

    Wenqi Xian, Jia-Bin Huang, Johannes Kopf, and Changil Kim. 2020. Space-time Neural Irradiance Fields for Free-Viewpoint Video.2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)(2020), 9416–9426. https: //api.semanticscholar.org/CorpusID:227162620

  61. [62]

    Enze Xie, Wenhai Wang, Zhiding Yu, Anima Anandkumar, Jose M Alvarez, and Ping Luo. 2021. SegFormer: Simple and efficient design for semantic segmentation with transformers.Advances in neural information processing systems34 (2021), 12077–12090

  62. [63]

    Hongyi Xu, Thiemo Alldieck, and Cristian Sminchisescu. 2021. H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Mo- tion. InNeural Information Processing Systems. https://api.semanticscholar.org/ CorpusID:239885257

  63. [64]

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. 2024. Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting. InECCV

  64. [65]

    Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim, Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, and Yue Wang. 2023. EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision.arXiv preprint arXiv:2311.02077(2023)

  65. [66]

    Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng, and Heng- shuang Zhao. 2024. Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. InCVPR

  66. [67]

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin

  67. [68]

    arXiv preprint arXiv:2309.13101 (2023)

    Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction. arXiv:2309.13101 [cs.CV] https://arxiv.org/abs/2309.13101

  68. [69]

    Zhifan Ye, Chenxi Wan, Chaojian Li, Jihoon Hong, Sixu Li, Leshu Li, Yongan Zhang, and Yingyan Celine Lin. 2024. 3D Gaussian Rendering Can Be Sparser: Ef- ficient Rendering via Learned Fragment Pruning.Advances in Neural Information Processing Systems37 (2024), 5850–5869

  69. [70]

    Kai Zeng, Zhanqian Wu, Kaixin Xiong, Xiaobao Wei, Xiangyu Guo, Zhenxin Zhu, Kalok Ho, Lijun Zhou, Bohan Zeng, Ming Lu, et al . 2025. Rethinking Driving World Model as Synthetic Data Generator for Perception Tasks.arXiv preprint arXiv:2510.19195(2025)

  70. [71]

    Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. 2024. Hugs: Holistic urban 3d scene understanding via gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 21336–21345

  71. [72]

    Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, and Yiyi Liao. 2024. HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 21336–21345

  72. [73]

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming- Hsuan Yang. 2024. Drivinggaussian: Composite gaussian splatting for surround- ing dynamic autonomous driving scenes. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition. 21634–21643