pith. sign in

arxiv: 2507.18713 · v2 · submitted 2025-07-24 · 💻 cs.CV · cs.RO

SaLF: Sparse Local Fields for Multi-Sensor Rendering in Real-Time

Pith reviewed 2026-05-19 02:34 UTC · model grok-4.3

classification 💻 cs.CV cs.RO
keywords Sparse Local Fieldsmulti-sensor renderingreal-time simulationvolumetric representationautonomous drivingsensor simulationimplicit fieldsvoxel primitives
0
0 comments X p. Extension

The pith

SaLF represents driving scenes as sparse 3D voxel primitives with local implicit fields to support fast unified rendering across cameras and LiDARs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a new scene representation to make high-fidelity sensor simulation practical for testing self-driving systems. Prior approaches either train and render too slowly or work only with specific sensors like pinhole cameras. SaLF stores the environment as a collection of compact 3D blocks, each holding its own local implicit field, so the same data can be drawn with rasterization for images or ray tracing for depth scans. The method adds and removes blocks as needed to cover large areas while keeping training under half an hour and frame rates above real-time thresholds. A reader would care because this combination of speed, flexibility, and realism could let developers run many more varied tests before putting vehicles on the road.

Core claim

SaLF is a volumetric representation built from a sparse set of 3D voxel primitives, each encoding a local implicit field. This structure supports both rasterization and raytracing, allowing the same model to render camera images and LiDAR scans without changing the underlying data. Training finishes in under 30 minutes, camera rendering exceeds 50 frames per second, LiDAR rendering exceeds 600 frames per second, and adaptive pruning plus densification scales the representation to large driving environments while supporting non-pinhole cameras and spinning LiDARs. The resulting outputs match the visual and geometric fidelity of earlier NeRF and 3D Gaussian methods.

What carries the argument

Sparse Local Fields, a collection of 3D voxel primitives where each voxel stores a local implicit field; this decouples scene content from any single rendering algorithm so the same primitives can feed both rasterization and raytracing pipelines.

If this is right

  • A single trained model can produce both camera images and LiDAR scans without separate representations.
  • Rendering runs fast enough for interactive or batch testing of autonomy stacks.
  • Non-pinhole camera models and rotating LiDAR patterns can be simulated directly from the same primitives.
  • Large outdoor scenes are handled by adding or removing voxels only where detail is required.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same voxel structure might extend to other sensors such as radar if their ray or projection models can be expressed as queries into the local fields.
  • Because the representation separates content from renderer, it could support mixed real-time and offline simulation pipelines in a single framework.
  • Dynamic elements could be added by updating only the affected local fields rather than rebuilding the entire sparse set.

Load-bearing premise

Adaptive pruning and densification of the sparse voxel primitives keeps scene detail high enough across large driving areas that no visible artifacts appear in the final camera or LiDAR outputs.

What would settle it

Direct comparison of SaLF-rendered images and LiDAR point clouds against real sensor captures from the same large driving scene, measured by standard image metrics or point-cloud error, showing lower fidelity or new artifacts not present in prior methods.

Figures

Figures reproduced from arXiv: 2507.18713 by Jingkang Wang, Krzysztof Baron-Lis, Matthew Haines, Raquel Urtasun, Sahil Jain, Sivabalan Manivasagam, Yun Chen, Ze Yang.

Figure 1
Figure 1. Figure 1: SaLF combines high efficiency with advanced sensor modeling capabilities for self-driving simulations. modal sensor simulation system should be realistic to accu￾rately measure autonomy performance, and be very efficient to enable scalable testing and training. Neural Radiance Field (NeRF)-based representations [28] have made significant progress in building realistic 3D multi-sensor simulators for self-dr… view at source ↗
Figure 2
Figure 2. Figure 2: Real-time self-driving sensor simulation with SaLF representation. Our method achieves high-performance rendering for both camera and LiDAR, and supports advanced features including secondary effects (e.g., refraction, reflection and shadow) and complex sensor models (e.g., fisheye, rolling-shutter and panoramic cameras). This is made possible by an efficient and unified representation that supports both r… view at source ↗
Figure 3
Figure 3. Figure 3: SaLF Representation. Left: SaLF models scenes us￾ing an adaptive sparse voxel grid with variable scales. Each voxel is characterized by static parameters and learnable parameters . Right: Within a voxel, for any point with normalized coordinates x, the density σ and color c values are derived from Wσ and Wc along with the encoded view direction γ(ω) modulating Wsh. The opacity α of a ray is calculated usin… view at source ↗
Figure 5
Figure 5. Figure 5: Initialization and Densification. SaLF initializes the scene representation with a coarse regular grid partitioning, then adaptively prune empty region while densifying regions that need fine-details. space or a pointer to the corresponding voxel. This hierar￾chical structure enables fast ray traversal through the vol￾ume. Through a single ray-box intersection test, empty regions can be bypassed. Upon enco… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison on camera novel view synthesis. We achieve comparable photorealism compared to SoTA approaches. Ground Truth NeuRAD SaLF (ours) FPS 3.8 FPS 430 [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Qualitative comparison on LiDAR novel view syn￾thesis. Our method achieves comparable LiDAR rendering per￾formance compared to NeuRAD, while being 100 × faster in ren￾dering [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Rolling-shutter simulation via efficient ray-based rendering. Top: We render the same view using global shutter and rolling-shutter camera models (see highlighted distorted re￾gion). Bottom: SaLF simulates rolling-shutter effect and accu￾rately match ground truth point clouds (see lidar sweep seam and relative position for the dynamic actor in the highlighted region.) features like beam-divergence, actor-… view at source ↗
Figure 11
Figure 11. Figure 11: Additional qualitative comparison on camera novel view synthesis. Ground Truth NeuRAD SaLF (ours) FPS 3.8 FPS 430 [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Additional qualitative comparison on LiDAR novel view synthesis. self-driving multi-modal sensor simulators (e.g., UniSim, NeuRAD) while improving efficiency, enabling more scalable simulation. C.6. Additional Examples for Complex Sensor Modelling and Secondary Effects We provide additional examples of ray-based rendering which 3DGS does not support directly in [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Controllable simulation: actor removal [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Controllable simulation: SDV manipulation. additional examples of simulating rolling-shutter effects for both camera and LiDAR. C.7. Surface Normal Visualization [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Simulating ray-tracing based effects [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Simulating distorted fish-eye cameras. Rolling shutter with different actor speed Rolling shutter with different SDV speed Rolling shutter with different SDV speed [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Simulating rolling-shutter LiDAR and camera [PITH_FULL_IMAGE:figures/full_fig_p019_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Rendered Surface Normal. D. Discussions D.1. Sorting Ambiguities in Rasterization In voxel-based rasterization, a potential issue arises when determining front-to-back ordering. When sorting voxels by their center distances to the camera, a voxel whose center is farther away might have its front edge or corner positioned closer to the camera than another voxel with a nearer center. This sorting ambiguity … view at source ↗
Figure 19
Figure 19. Figure 19: Artifacts [PITH_FULL_IMAGE:figures/full_fig_p020_19.png] view at source ↗
read the original abstract

High-fidelity sensor simulation of light-based sensors such as cameras and LiDARs is critical for safe and accurate autonomy testing. Neural radiance field (NeRF)-based methods that reconstruct sensor observations via ray-casting of implicit representations have demonstrated accurate simulation of driving scenes, but are slow to train and render, hampering scalability. 3D Gaussian Splatting (3DGS) has demonstrated faster training and rendering times through rasterization, but is primarily restricted to pinhole camera sensors, preventing usage for realistic multi-sensor autonomy evaluation. Moreover, both NeRF and 3DGS couple the representation with the rendering procedure (implicit networks for ray-based evaluation, particles for rasterization), preventing interoperability, which is key for general usage. In this work, we present Sparse Local Fields (SaLF), a novel volumetric representation that supports rasterization and raytracing for unified multi-sensor simulation. SaLF represents volumes as a sparse set of 3D voxel primitives, where each voxel is a local implicit field. SaLF has fast training ($<$30 min) and rendering capabilities (50+ FPS for camera and 600+ FPS for LiDAR), has adaptive pruning and densification to easily handle large scenes, and can support non-pinhole cameras and spinning LiDARs. We demonstrate that SaLF has similar realism as existing self-driving sensor simulation methods while improving efficiency and enhancing capabilities, enabling more scalable simulation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Sparse Local Fields (SaLF), a volumetric scene representation consisting of a sparse collection of 3D voxel primitives, each encoding a local implicit field. This decouples the representation from sensor-specific rendering, enabling both rasterization (for cameras, including non-pinhole) and ray-tracing (for spinning LiDARs) within the same model. The method incorporates adaptive pruning and densification to scale to large driving scenes, reports training times under 30 minutes, rendering speeds of 50+ FPS for cameras and 600+ FPS for LiDAR, and claims comparable realism to prior NeRF- and 3DGS-based self-driving simulators while adding multi-sensor interoperability.

Significance. If the fidelity claims are substantiated, SaLF would offer a practical advance for scalable, unified sensor simulation in autonomy testing by combining the speed of explicit primitives with the flexibility of local implicits, addressing key bottlenecks in training/rendering time and sensor coupling that limit current approaches.

major comments (2)
  1. [Section describing adaptive pruning and densification (likely §3.3 or §4)] The central realism claim rests on the assertion that adaptive pruning and densification of the sparse voxel primitives preserves geometric and appearance fidelity across large driving scenes (hundreds of meters). The manuscript must supply concrete quantitative evidence—such as PSNR/SSIM deltas or LiDAR point-cloud metrics on pruned versus unpruned representations, plus visual inspection of high-frequency surfaces and distant geometry—to demonstrate that no visible artifacts are introduced when the same primitives are evaluated by rasterization versus ray-tracing. Without these, the unified multi-sensor fidelity argument remains unsupported.
  2. [Results and Experiments section (likely §5)] The abstract and results sections report performance numbers (training <30 min, 50+ FPS camera, 600+ FPS LiDAR) and qualitative parity but supply no quantitative metrics, error bars, or ablation details on the pruning strategy. The full evaluation must include baseline comparisons with error statistics and scene-scale ablations to verify that the reported realism is not affected by post-hoc scene selection or insufficient coverage of expansive environments.
minor comments (2)
  1. [Methods overview] Clarify the exact definition and parameterization of the local implicit field inside each voxel primitive early in the methods; the current description leaves the functional form and any learned parameters ambiguous.
  2. [Abstract] The abstract states performance claims without referencing the specific tables or figures that contain the supporting numbers; add these cross-references for immediate verifiability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important areas where additional quantitative support can strengthen the presentation of our adaptive pruning and densification approach as well as the experimental evaluation. We address each major comment below and will incorporate the suggested evidence in the revised manuscript.

read point-by-point responses
  1. Referee: [Section describing adaptive pruning and densification (likely §3.3 or §4)] The central realism claim rests on the assertion that adaptive pruning and densification of the sparse voxel primitives preserves geometric and appearance fidelity across large driving scenes (hundreds of meters). The manuscript must supply concrete quantitative evidence—such as PSNR/SSIM deltas or LiDAR point-cloud metrics on pruned versus unpruned representations, plus visual inspection of high-frequency surfaces and distant geometry—to demonstrate that no visible artifacts are introduced when the same primitives are evaluated by rasterization versus ray-tracing. Without these, the unified multi-sensor fidelity argument remains unsupported.

    Authors: We agree that quantitative evidence is necessary to substantiate the fidelity preservation claim. In the revised manuscript we will add a new table and accompanying text reporting PSNR and SSIM deltas between pruned and unpruned representations on the evaluated driving scenes. We will also include LiDAR point-cloud metrics (e.g., RMSE and Chamfer distance) for the same comparisons. Additional qualitative figures will show zoomed views of high-frequency surfaces and distant geometry under both rasterization and ray-tracing to confirm the absence of visible artifacts. These revisions will directly support the unified multi-sensor fidelity argument. revision: yes

  2. Referee: [Results and Experiments section (likely §5)] The abstract and results sections report performance numbers (training <30 min, 50+ FPS camera, 600+ FPS LiDAR) and qualitative parity but supply no quantitative metrics, error bars, or ablation details on the pruning strategy. The full evaluation must include baseline comparisons with error statistics and scene-scale ablations to verify that the reported realism is not affected by post-hoc scene selection or insufficient coverage of expansive environments.

    Authors: We acknowledge that the current version lacks detailed quantitative metrics, error bars, and ablations. The revised manuscript will expand the experiments section with baseline comparisons against prior NeRF- and 3DGS-based simulators, reporting mean and standard deviation of PSNR, SSIM, and LiDAR metrics across all scenes together with error bars on the reported FPS and training-time figures. We will also add scene-scale ablations that evaluate performance on driving scenes of varying spatial extent (including expansive environments of several hundred meters) to demonstrate that realism is maintained independently of scene selection. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or performance claims

full rationale

The paper introduces SaLF as a sparse volumetric representation consisting of 3D voxel primitives each containing a local implicit field, enabling both rasterization and ray-tracing for multi-sensor simulation. Performance metrics such as training time under 30 minutes, 50+ FPS camera rendering, and 600+ FPS LiDAR rendering are presented as empirical outcomes of the representation's design and adaptive pruning/densification, without any visible equations, fitted parameters, or self-citations that would reduce these results to inputs by construction. The claims of similar realism to prior methods rest on demonstration rather than self-referential fitting or uniqueness theorems imported from the authors' prior work. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical observation that local implicit fields inside sparse voxels can be pruned and densified while maintaining sensor realism; no explicit free parameters, axioms, or invented entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5812 in / 1149 out tokens · 23118 ms · 2026-05-19T02:34:00.048760+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Flux4D: Flow-based Unsupervised 4D Reconstruction

    cs.CV 2025-12 unverdicted novelty 6.0

    Flux4D reconstructs large-scale dynamic 4D scenes unsupervised by predicting moving 3D Gaussians from photometric losses and static regularization when trained across multiple scenes.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper

  1. [1]

    Mip-NeRF 360: Unbounded anti-aliased neural radiance fields

    Jonathan T Barron, Ben Mildenhall, Dor Verbin, Pratul P Srinivasan, and Peter Hedman. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022. 3

  2. [2]

    TensoRF: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. TensoRF: Tensorial radiance fields. InECCV, 2022. 3

  3. [3]

    LiDAR-GS: Real-time LiDAR re-simulation using gaussian splatting

    Qifeng Chen, Sheng Yang, Sicong Du, Tao Tang, Peng Chen, and Yuchi Huo. LiDAR-GS: Real-time LiDAR re-simulation using gaussian splatting. arXiv, 2024. 3

  4. [4]

    Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering

    Yurui Chen, Chun Gu, Junzhe Jiang, Xiatian Zhu, and Li Zhang. Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering. arXiv, 2023. 3

  5. [5]

    MobileNeRF: Exploiting the polygon ras- terization pipeline for efficient neural field rendering on mo- bile architectures

    Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and An- drea Tagliasacchi. MobileNeRF: Exploiting the polygon ras- terization pipeline for efficient neural field rendering on mo- bile architectures. arXiv, 2022. 1, 3

  6. [6]

    OmniRe: Omni urban scene reconstruction

    Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Janick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Goj- cic, Sanja Fidler, Marco Pavone, et al. OmniRe: Omni urban scene reconstruction. arXiv, 2024. 3, 8

  7. [7]

    Gaussianpro: 3D gaussian splatting with progressive propagation

    Kai Cheng, Xiaoxiao Long, Kaizhi Yang, Yao Yao, Wei Yin, Yuexin Ma, Wenping Wang, and Xuejin Chen. Gaussianpro: 3D gaussian splatting with progressive propagation. arXiv,

  8. [8]

    Carla: An open urban driving simulator

    Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. Carla: An open urban driving simulator. Conference on robot learning, 2017. 3

  9. [9]

    Multi-level neural scene graphs for dynamic urban environments

    Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulo, Marc Pollefeys, and Peter Kontschieder. Multi-level neural scene graphs for dynamic urban environments. In CVPR, 2024. 8

  10. [10]

    FastNeRF: High-fidelity neural rendering at 200fps

    Stephan J Garbin, Marek Kowalski, Matthew Johnson, Jamie Shotton, and Julien Valentin. FastNeRF: High-fidelity neural rendering at 200fps. ICCV, 2021. 3

  11. [11]

    Streetsurf: Extending multi-view implicit surface reconstruction to street views

    Jianfei Guo, Nianchen Deng, Xinyang Li, Yeqi Bai, Bo- tian Shi, Chiyu Wang, Chenjing Ding, Dongliang Wang, and Yikang Li. Streetsurf: Extending multi-view implicit surface reconstruction to street views. arXiv, 2023. 1, 4

  12. [12]

    Baking neural ra- diance fields for real-time view synthesis

    Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. In ICCV, 2021. 1, 3

  13. [13]

    Splatad: Real-time li- dar and camera rendering with 3d gaussian splatting for au- tonomous driving

    Georg Hess, Carl Lindstr ¨om, Maryam Fatemi, Christoffer Petersson, and Lennart Svensson. Splatad: Real-time li- dar and camera rendering with 3d gaussian splatting for au- tonomous driving. arXiv preprint arXiv:2411.16816, 2024. 3

  14. [14]

    Taichi: a language for high-performance computation on spatially sparse data structures

    Yuanming Hu, Tzu-Mao Li, Luke Anderson, Jonathan Ragan-Kelley, and Fr ´edo Durand. Taichi: a language for high-performance computation on spatially sparse data structures. TOG, 2019. 6

  15. [15]

    2d gaussian splatting for geometrically ac- curate radiance fields

    Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2d gaussian splatting for geometrically ac- curate radiance fields. InACM SIGGRAPH 2024 Conference Papers, 2024. 5

  16. [16]

    S3gaussian: Self-supervised street gaussians for autonomous driving

    Nan Huang, Xiaobao Wei, Wenzhao Zheng, Pengju An, Ming Lu, Wei Zhan, Masayoshi Tomizuka, Kurt Keutzer, and Shanghang Zhang. S3gaussian: Self-supervised street gaussians for autonomous driving. arXiv, 2024. 3

  17. [17]

    3D gaussian splatting for real-time radiance field rendering

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3D gaussian splatting for real-time radiance field rendering. TOG, 2023. 2, 3

  18. [18]

    Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction

    Mustafa Khan, Hamidreza Fazlali, Dhruv Sharma, Tongtong Cao, Dongfeng Bai, Yuan Ren, and Bingbing Liu. Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction. arXiv preprint arXiv:2407.02598, 2024. 6

  19. [19]

    Adam: A method for stochastic optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. ICLR, 2015. 6

  20. [20]

    Efficient NeRF optimization–not all samples remain equally hard

    Juuso Korhonen, Goutham Rangu, Hamed R Tavakoli, and Juho Kannala. Efficient NeRF optimization–not all samples remain equally hard. arXiv, 2024. 3

  21. [21]

    Panoptic neural fields: A semantic object-aware neural scene representation

    Abhijit Kundu, Kyle Genova, Xiaoqi Yin, Alireza Fathi, Car- oline Pantofaru, Leonidas J Guibas, Andrea Tagliasacchi, Frank Dellaert, and Thomas Funkhouser. Panoptic neural fields: A semantic object-aware neural scene representation. In CVPR, 2022. 3

  22. [22]

    Lihi-gs: Lidar-supervised gaussian splat- ting for highway driving scene reconstruction.arXiv preprint arXiv:2412.15447, 2024

    Pou-Chun Kung, Xianling Zhang, Katherine A Skinner, and Nikita Jaipuria. Lihi-gs: Lidar-supervised gaussian splat- ting for highway driving scene reconstruction.arXiv preprint arXiv:2412.15447, 2024. 3

  23. [23]

    Fisheye-gs: Lightweight and extensible gaus- sian splatting module for fisheye cameras

    Zimu Liao, Siyan Chen, Rong Fu, Yi Wang, Zhongling Su, Hao Luo, Li Ma, Linning Xu, Bo Dai, Hengjie Li, et al. Fisheye-gs: Lightweight and extensible gaus- sian splatting module for fisheye cameras. arXiv preprint arXiv:2409.04751, 2024. 3

  24. [24]

    Efficient neural radiance fields for interactive free-viewpoint video

    Haotong Lin, Sida Peng, Zhen Xu, Yunzhi Yan, Qing Shuai, Hujun Bao, and Xiaowei Zhou. Efficient neural radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia 2022 Conference Papers, 2022. 3

  25. [25]

    Real-time neural rasterization for large scenes

    Jeffrey Yunfan Liu, Yun Chen, Ze Yang, Jingkang Wang, Sivabalan Manivasagam, and Raquel Urtasun. Real-time neural rasterization for large scenes. In ICCV, 2023. 1, 3

  26. [26]

    Ever: Exact volumet- ric ellipsoid rendering for real-time view synthesis

    Alexander Mai, Peter Hedman, George Kopanas, Dor Verbin, David Futschik, Qiangeng Xu, Falko Kuester, Jonathan T Barron, and Yinda Zhang. Ever: Exact volumet- ric ellipsoid rendering for real-time view synthesis. arXiv preprint arXiv:2410.01804, 2024. 20

  27. [27]

    Towards zero domain gap: A comprehensive study of realistic LiDAR simulation for autonomy testing

    Sivabalan Manivasagam, Ioan Andrei B ˆarsan, Jingkang Wang, Ze Yang, and Raquel Urtasun. Towards zero domain gap: A comprehensive study of realistic LiDAR simulation for autonomy testing. In ICCV, 2023. 2, 8

  28. [28]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis. ECCV, 2020. 1, 3

  29. [29]

    3D gaussian ray tracing: Fast tracing of particle scenes

    Nicolas Moenne-Loccoz, Ashkan Mirzaei, Or Perel, Ric- cardo de Lutio, Janick Martinez Esturo, Gavriel State, Sanja Fidler, Nicholas Sharp, and Zan Gojcic. 3D gaussian ray tracing: Fast tracing of particle scenes. In SIGGRAPH Asia 2024, 2024. 3

  30. [30]

    Instant neural graphics primitives with a multires- olution hash encoding

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a multires- olution hash encoding. 2022. 3

  31. [31]

    Neural scene graphs for dynamic scenes.CVPR,

    Julian Ost, Fahim Mannan, Nils Thuerey, Julian Knodt, and Felix Heide. Neural scene graphs for dynamic scenes.CVPR,

  32. [32]

    Neural lighting simulation for urban scenes

    Ava Pun, Gary Sun, Jingkang Wang, Yun Chen, Ze Yang, Sivabalan Manivasagam, Wei-Chiu Ma, and Raquel Urtasun. Neural lighting simulation for urban scenes. In NeurIPS,

  33. [33]

    Stopthepop: Sorted gaussian splatting for view-consistent real-time rendering

    Lukas Radl, Michael Steiner, Mathias Parger, Alexan- der Weinrauch, Bernhard Kerbl, and Markus Steinberger. Stopthepop: Sorted gaussian splatting for view-consistent real-time rendering. ACM Transactions on Graphics (TOG), 43(4):1–17, 2024. 20

  34. [34]

    KiloNeRF: Speeding up neural radiance fields with thousands of tiny MLPs

    Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. KiloNeRF: Speeding up neural radiance fields with thousands of tiny MLPs. ICCV, 2021. 3

  35. [35]

    Srinivasan, Ben Mildenhall, Andreas Geiger, Jonathan T

    Christian Reiser, Richard Szeliski, Dor Verbin, Pratul P. Srinivasan, Ben Mildenhall, Andreas Geiger, Jonathan T. Barron, and Peter Hedman. MERF: Memory-efficient radi- ance fields for real-time view synthesis in unbounded scenes. arXiv, 2023. 2, 3

  36. [36]

    Unigaus- sian: Driving scene reconstruction from multiple camera models via unified gaussian representations

    Yuan Ren, Guile Wu, Runhao Li, Zheyuan Yang, Yibo Liu, Xingxin Chen, Tongtong Cao, and Bingbing Liu. Unigaus- sian: Driving scene reconstruction from multiple camera models via unified gaussian representations. arXiv preprint arXiv:2411.15355, 2024. 3

  37. [37]

    Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles

    Shital Shah, Debadeepta Dey, Chris Lovett, and Ashish Kapoor. Airsim: High-fidelity visual and physical simula- tion for autonomous vehicles. In Field and service robotics,

  38. [38]

    Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials

    Yawar Siddiqui, Tom Monnier, Filippos Kokkinos, Mahen- dra Kariya, Yanir Kleiman, Emilien Garreau, Oran Gafni, Natalia Neverova, Andrea Vedaldi, Roman Shapovalov, et al. Meta 3d assetgen: Text-to-mesh generation with high- quality geometry, texture, and pbr materials. arXiv, 2024. 4

  39. [39]

    Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction

    Cheng Sun, Min Sun, and Hwann-Tzong Chen. Direct voxel grid optimization: Super-fast convergence for radiance fields reconstruction. CVPR, 2022. 3, 5

  40. [40]

    Sparse voxels rasterization: Real- time high-fidelity radiance field rendering

    Cheng Sun, Jaesung Choe, Charles Loop, Wei-Chiu Ma, and Yu-Chiang Frank Wang. Sparse voxels rasterization: Real- time high-fidelity radiance field rendering. arXiv preprint arXiv:2412.04459, 2024. 3, 20

  41. [41]

    Taichi 3D Gaussian Splatting, 2023

    Kuangyuan Sun. Taichi 3D Gaussian Splatting, 2023. 6, 12

  42. [42]

    NeuRAD: Neural rendering for autonomous driving

    Adam Tonderski, Carl Lindstr ¨om, Georg Hess, William Ljungbergh, Lennart Svensson, and Christoffer Petersson. NeuRAD: Neural rendering for autonomous driving. In CVPR, 2024. 1, 3, 4, 6, 7, 15

  43. [43]

    Suds: Scalable urban dynamic scenes

    Haithem Turki, Jason Y Zhang, Francesco Ferroni, and Deva Ramanan. Suds: Scalable urban dynamic scenes. In CVPR,

  44. [44]

    Neural light field estimation for street scenes with differentiable virtual object insertion

    Zian Wang, Wenzheng Chen, David Acuna, Jan Kautz, and Sanja Fidler. Neural light field estimation for street scenes with differentiable virtual object insertion. ECCV, 2022. 2

  45. [45]

    Meet the 6th generation waymo driver, 2024

    Waymo. Meet the 6th generation waymo driver, 2024. 1

  46. [46]

    Dynamic lidar re- simulation using compositional neural fields

    Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger, Or Litany, Konrad Schindler, and Shengyu Huang. Dynamic lidar re- simulation using compositional neural fields. In CVPR,

  47. [47]

    3dgut: Enabling distorted cameras and secondary rays in gaussian splatting

    Qi Wu, Janick Martinez Esturo, Ashkan Mirzaei, Nicolas Moenne-Loccoz, and Zan Gojcic. 3dgut: Enabling distorted cameras and secondary rays in gaussian splatting. arXiv preprint arXiv:2412.12507, 2024. 3

  48. [48]

    MARS: An instance-aware, mod- ular and realistic simulator for autonomous driving

    Zirui Wu, Tianyu Liu, Liyi Luo, Zhide Zhong, Jianteng Chen, Hongmin Xiao, Chao Hou, Haozhe Lou, Yuantao Chen, Runyi Yang, et al. MARS: An instance-aware, mod- ular and realistic simulator for autonomous driving. arXiv,

  49. [49]

    Pandaset: Advanced sensor suite dataset for autonomous driving

    Pengchuan Xiao, Zhenlei Shao, Steven Hao, Zishuo Zhang, Xiaolin Chai, Judy Jiao, Zesong Li, Jian Wu, Kai Sun, Kun Jiang, et al. Pandaset: Advanced sensor suite dataset for autonomous driving. In ITSC, 2021. 6

  50. [50]

    Street gaussians for modeling dynamic ur- ban scenes

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians for modeling dynamic ur- ban scenes. arXiv, 2024. 2, 3, 7, 15

  51. [51]

    Unisim: A neural closed-loop sensor simulator

    Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Mani- vasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Ur- tasun. Unisim: A neural closed-loop sensor simulator. In CVPR, 2023. 1, 3, 4, 6, 7, 15

  52. [52]

    Unisim: A neural closed-loop sensor simulator

    Ze Yang, Yun Chen, Jingkang Wang, Sivabalan Mani- vasagam, Wei-Chiu Ma, Anqi Joyce Yang, and Raquel Ur- tasun. Unisim: A neural closed-loop sensor simulator. In CVPR, 2023. 6

  53. [53]

    V ol- ume rendering of neural implicit surfaces

    Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. V ol- ume rendering of neural implicit surfaces. NeurIPS, 2021. 4

  54. [54]

    Srinivasan, Richard Szeliski, Jonathan T

    Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P. Srinivasan, Richard Szeliski, Jonathan T. Barron, and Ben Mildenhall. BakedSDF: Meshing neural SDFs for real-time view synthesis. arXiv, 2023. 1, 3

  55. [55]

    GaussianDreamer: Fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models

    Taoran Yi, Jiemin Fang, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Qi Tian, and Xinggang Wang. GaussianDreamer: Fast generation from text to 3D gaussians by bridging 2D and 3D diffusion models. In CVPR, 2024. 3

  56. [56]

    Plenoctrees for real-time rendering of neural radiance fields

    Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. ICCV, 2021. 2, 3

  57. [57]

    Plenoxels: Radiance fields without neural networks

    Alex Yu, Sara Fridovich-Keil, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. CVPR, 2022. 3, 5

  58. [58]

    GS-LRM: Large recon- struction model for 3D gaussian splatting

    Kai Zhang, Sai Bi, Hao Tan, Yuanbo Xiangli, Nanxuan Zhao, Kalyan Sunkavalli, and Zexiang Xu. GS-LRM: Large recon- struction model for 3D gaussian splatting. In ECCV, 2025. 3

  59. [59]

    The unreasonable effectiveness of deep features as a perceptual metric

    Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. CVPR, 2018. 6

  60. [60]

    Lidar-rt: Gaussian-based ray tracing for dynamic lidar re-simulation

    Chenxu Zhou, Lvchang Fu, Sida Peng, Yunzhi Yan, Zhanhua Zhang, Yong Chen, Jiazhi Xia, and Xiaowei Zhou. Lidar-rt: Gaussian-based ray tracing for dynamic lidar re-simulation. arXiv preprint arXiv:2412.15199, 2024. 3

  61. [61]

    DrivingGaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. DrivingGaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes. In CVPR, 2024. 2, 3, 6 3D Gaussian Splatting SaLF Parameters Center µ ∈ R3 p ∈ R3 (not learnable) Rotation q ∈ R4 (quaternion) q ∈ R4 (quaternion, not learnable) Scale s ∈ ...