pith. machine review for the scientific record. sign in

arxiv: 2604.05908 · v1 · submitted 2026-04-07 · 💻 cs.CV

Recognition: 2 theorem links

· Lean Theorem

Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords multi-traversal reconstructionGaussian splattingappearance decompositionneural light fieldillumination modelingautonomous driving simulationscene consistencydigital twins
0
0 comments X

The pith

Decomposing appearance into fixed material and variable illumination lets Gaussian splatting combine multiple traversals into one consistent 3D scene.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to handle appearance changes across multiple passes over the same area by explicitly separating what stays the same from what changes. For the static parts of the scene, it models intrinsic material colors separately from the lighting conditions at each traversal. This separation uses a neural light field that encodes low-frequency diffuse light and high-frequency reflections differently, guided by surface normals and reflection directions. The result is more consistent renderings when combining sequences taken at different times, as shown on driving datasets. A reader would care because it supports building accurate simulations for autonomous vehicles and digital twins from real-world captures.

Core claim

The central claim is that applying an explicit appearance decomposition to the static background in Gaussian splatting, separating traversal-invariant material from traversal-dependent illumination via a neural light field with frequency-separated hybrid encoding and explicit normals and reflections, allows integrating multiple traversals while reducing appearance inconsistency.

What carries the argument

The appearance decomposition into material and illumination components, implemented through a neural light field using frequency-separated hybrid encoding that incorporates surface normals and reflection vectors.

Load-bearing premise

The underlying geometry of the static background stays identical across traversals and all appearance differences come only from changes in illumination.

What would settle it

A set of traversals in which the same physical location shows actual geometric changes such as new construction or persistent dynamic objects, which would make the decomposed renders inconsistent regardless of lighting adjustments.

Figures

Figures reproduced from arXiv: 2604.05908 by Baoquan Yang, Hesheng Wang, Siting Zhu, Tianchen Deng, Yangyi Xiao, Yongbo Chen.

Figure 1
Figure 1. Figure 1: Consistent Multi-Traversal Reconstruction via Appearance Decomposition. (Left) Input images from multi-traversal scenes exhibit appearance inconsistencies due to changes in illumination, weather, and time of day. (Middle) ADM-GS decomposes the static scene appearance into a traversal-invariant material field (M) and a traversal-dependent light field (Lm), enabling structured modeling of cross-traversal app… view at source ↗
Figure 2
Figure 2. Figure 2: Framework of Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction. ADM-GS represents the scene with a hybrid scene graph consisting of static, object, and sky nodes. Its core static node decomposes appearance into a traversal-invariant material field and a traversal￾dependent light field, where normal and reflection-vector cues are introduced to improve illumination prediction. A … view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of frequency-separated illumination cues. Surface normals are mainly associated with low-frequency diffuse illumination, while reflection vectors are more informative for high-frequency view-dependent specular effects. By feeding r into the light field, we align the input domain with the peak direction of the specular lobe. This effectively simplifies the network’s task from learning a spatial… view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison of novel view synthesis on dynamic scenes. We compare our method with recent approaches on the Argoverse 2 (top row) and Waymo Open (bottom row) datasets. Methods such as Bilateral-Driving [21] and OmniRe [18] show noticeable blur and ghosting artifacts in dynamic-object regions. In contrast, our method produces sharper novel-view renderings and is visually closer to the ground truth. bot… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative analysis of appearance decomposition on the Argoverse 2 multi-traversal dataset. Each row corresponds to a different traversal of the same scene. The estimated material remains largely consistent across traversals, while the illumination maps capture traversal-dependent illumination changes. The normal maps provide geometric cues that support illumination estimation. Source Scene (T1) Target Il… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative application: cross-traversal scene relighting. We demonstrate the controllability of the learned decomposition by transferring traversal￾dependent illumination from Traversal T2 to a scene observed in Traversal T1. Column 1 shows the source image from T1. Column 2 shows the target traversal used to provide the illumination condition from T2. Column 3 shows the estimated traversal-invariant mate… view at source ↗
read the original abstract

Multi-traversal scene reconstruction is important for high-fidelity autonomous driving simulation and digital twin construction. This task involves integrating multiple sequences captured from the same geographical area at different times. In this context, a primary challenge is the significant appearance inconsistency across traversals caused by varying illumination and environmental conditions, despite the shared underlying geometry. This paper presents ADM-GS (Appearance Decomposition Gaussian Splatting for Multi-Traversal Reconstruction), a framework that applies an explicit appearance decomposition to the static background to alleviate appearance entanglement across traversals. For the static background, we decompose the appearance into traversal-invariant material, representing intrinsic material properties, and traversal-dependent illumination, capturing lighting variations. Specifically, we propose a neural light field that utilizes a frequency-separated hybrid encoding strategy. By incorporating surface normals and explicit reflection vectors, this design separately captures low-frequency diffuse illumination and high-frequency specular reflections. Quantitative evaluations on the Argoverse 2 and Waymo Open datasets demonstrate the effectiveness of ADM-GS. In multi-traversal experiments, our method achieves a +0.98 dB PSNR improvement over existing latent-based baselines while producing more consistent appearance across traversals. Code will be available at https://github.com/IRMVLab/ADM-GS.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes ADM-GS, an extension of Gaussian Splatting for multi-traversal reconstruction that decomposes static-background appearance into a traversal-invariant material component and a traversal-dependent illumination component. The decomposition is realized via a neural light field employing frequency-separated hybrid encoding together with explicit surface normals and reflection vectors to separately model low-frequency diffuse and high-frequency specular effects. On Argoverse 2 and Waymo Open multi-traversal sequences the method reports a +0.98 dB PSNR gain over latent-based baselines together with improved cross-traversal appearance consistency.

Significance. If the geometric-identity assumption holds and the reported gain is shown to be robust, the explicit material/illumination factorization would constitute a useful, interpretable advance for high-fidelity autonomous-driving simulation and digital-twin construction, where appearance drift across repeated traversals is a practical obstacle. The frequency-separated encoding with normals and reflections is a concrete, testable design choice that could be adopted by other explicit radiance-field pipelines.

major comments (2)
  1. [Abstract and Experiments] Abstract and §4 (Experiments): the central quantitative claim of a +0.98 dB PSNR improvement is presented without error bars, without ablation tables isolating the contribution of the frequency-separated encoding or the reflection-vector term, and without any description of how the neural-light-field hyperparameters were selected; these omissions make it impossible to determine whether the measured delta arises from the intended decomposition or from the added network capacity.
  2. [Method] §3 (Method, neural light field): the separation of diffuse and specular illumination presupposes that Gaussian positions and normals are exactly shared across traversals; the manuscript provides no geometric-alignment procedure, no dynamic-object mask, and no quantitative verification that residual drift or unmasked movers are negligible on the Argoverse 2 / Waymo sequences used for evaluation. Any such misalignment would couple geometry error directly into the learned illumination field and thereby undermine both the PSNR delta and the consistency claim.
minor comments (1)
  1. [Abstract] The abstract states that code will be released at https://github.com/IRMVLab/ADM-GS but supplies neither a commit hash nor a release tag, which hinders immediate reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of presentation and methodological assumptions. We address each major comment below and commit to revisions that strengthen the clarity and robustness of the claims without altering the core technical contributions.

read point-by-point responses
  1. Referee: [Abstract and Experiments] Abstract and §4 (Experiments): the central quantitative claim of a +0.98 dB PSNR improvement is presented without error bars, without ablation tables isolating the contribution of the frequency-separated encoding or the reflection-vector term, and without any description of how the neural-light-field hyperparameters were selected; these omissions make it impossible to determine whether the measured delta arises from the intended decomposition or from the added network capacity.

    Authors: We agree that the absence of error bars, targeted ablations, and hyperparameter details weakens the interpretability of the reported +0.98 dB gain. In the revised version we will add (i) error bars computed as standard deviation over five independent training runs with different random seeds for all reported metrics, (ii) an ablation table that isolates the frequency-separated hybrid encoding and the explicit reflection-vector term while keeping total network capacity constant, and (iii) a supplementary section describing the hyperparameter selection procedure, including the frequency bands chosen for diffuse versus specular components and the grid-search protocol used on a held-out validation split. These additions will allow readers to attribute the observed improvement to the proposed decomposition rather than incidental capacity increases. revision: yes

  2. Referee: [Method] §3 (Method, neural light field): the separation of diffuse and specular illumination presupposes that Gaussian positions and normals are exactly shared across traversals; the manuscript provides no geometric-alignment procedure, no dynamic-object mask, and no quantitative verification that residual drift or unmasked movers are negligible on the Argoverse 2 / Waymo sequences used for evaluation. Any such misalignment would couple geometry error directly into the learned illumination field and thereby undermine both the PSNR delta and the consistency claim.

    Authors: The method indeed assumes that the static-background Gaussians (positions and normals) are shared across traversals. Alignment is performed by registering all sequences to a common world coordinate frame using the dataset-provided camera poses and COLMAP reconstructions; we select only sequences whose SfM point clouds overlap sufficiently and visually confirm static content. We acknowledge that the current manuscript does not describe this procedure in detail, does not apply an explicit dynamic-object mask, and provides no quantitative drift metric. In revision we will expand §3 with a dedicated preprocessing subsection that (a) states the alignment steps, (b) notes the static-scene assumption together with qualitative examples of minimal movers in the chosen Argoverse 2 and Waymo clips, and (c) discusses why residual drift is expected to be small given the dataset construction. A full quantitative drift analysis would require additional per-pixel annotations unavailable in the public releases; we will therefore treat this as an explicit modeling assumption rather than an empirically verified claim. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical gains from explicit decomposition are measured, not derived by construction.

full rationale

The paper introduces ADM-GS as a modeling framework that decomposes static-background appearance via a neural light field with frequency-separated encoding, normals, and reflection vectors, then reports measured PSNR improvements (+0.98 dB) and consistency on Argoverse 2 / Waymo multi-traversal data. No derivation chain exists that reduces a claimed result to its inputs by definition, no fitted parameter is relabeled as an independent prediction, and no load-bearing self-citation or uniqueness theorem is invoked. The quantitative results are external evaluations of the proposed architecture rather than tautological outputs of the same fitting process.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that geometry is shared and that appearance variation is purely photometric. No explicit free parameters are named in the abstract, but the neural light field necessarily introduces learned weights. No new physical entities are postulated.

axioms (2)
  • domain assumption Static background geometry is identical across all traversals
    Stated in the abstract as the premise that allows material to be traversal-invariant.
  • domain assumption All appearance change is caused by illumination only
    The decomposition separates material from illumination; any other source of variation would invalidate the split.

pith-pipeline@v0.9.0 · 5525 in / 1447 out tokens · 33320 ms · 2026-05-10T19:07:04.474747+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

67 extracted references · 15 canonical work pages · 1 internal anchor

  1. [1]

    Dynamic 3d gaussian fields for urban areas,

    T. Fischer, J. Kulhanek, S. Rota Bul `o, L. Porzi, M. Pollefeys, and P. Kontschieder, “Dynamic 3d gaussian fields for urban areas,”Advances in Neural Information Processing Systems, vol. 37, pp. 80 466–80 494, 2024

  2. [2]

    Mtgs: Multi-traversal gaussian splatting,

    T. Li, Y . Qiu, Z. Wu, C. Lindstr¨om, P. Su, M. Nießner, and H. Li, “Mtgs: Multi-traversal gaussian splatting,”arXiv preprint arXiv:2503.12552, 2025

  3. [3]

    3d gaussian splatting for real-time radiance field rendering

    B. Kerbl, G. Kopanas, T. Leimk ¨uhler, and G. Drettakis, “3d gaussian splatting for real-time radiance field rendering.”ACM Trans. Graph., vol. 42, no. 4, pp. 139–1, 2023

  4. [4]

    Sni-slam: Semantic neural implicit slam,

    S. Zhu, G. Wang, H. Blum, J. Liu, L. Song, M. Pollefeys, and H. Wang, “Sni-slam: Semantic neural implicit slam,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 167–21 177

  5. [5]

    Sni-slam++: Tightly-coupled semantic neural implicit slam,

    S. Zhu, G. Wang, H. Blum, Z. Wang, G. Zhang, D. Cremers, M. Polle- feys, and H. Wang, “Sni-slam++: Tightly-coupled semantic neural implicit slam,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 48, no. 3, pp. 3399–3416, 2026

  6. [6]

    What is the best 3d scene representation for robotics? from geometric to foundation models.arXiv preprint arXiv:2512.03422, 2025

    T. Deng, Y . Pan, S. Yuan, D. Li, C. Wang, M. Li, L. Chen, L. Xie, D. Wang, J. Wang, J. Civera, H. Wang, and W. Chen, “What is the best 3d scene representation for robotics? from geometric to foundation models,”arXiv preprint arXiv:2512.03422, 2025

  7. [7]

    3d gaussian splatting in robotics: A survey,

    S. Zhu, G. Wang, X. Kong, D. Kong, and H. Wang, “3d gaussian splatting in robotics: A survey,”arXiv preprint arXiv:2410.12262, 2024

  8. [8]

    Gaussiandwm: 3d gaussian driving world model for unified scene understanding and multi-modal generation.arXiv preprint arXiv:2512.23180,

    T. Deng, X. Chen, Y . Chen, Q. Chen, Y . Xu, L. Yang, L. Xu, Y . Zhang, B. Zhang, W. Huang, and H. Wang, “Gaussiandwm: 3d gaussian driving world model for unified scene understanding and multi-modal generation,”arXiv preprint arXiv:2512.23180, 2025

  9. [9]

    Nerf: Representing scenes as neural radiance fields for view synthesis,

    B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,”Communications of the ACM, vol. 65, no. 1, pp. 99–106, 2021

  10. [10]

    Neural scene graphs for dynamic scenes,

    J. Ost, F. Mannan, N. Thuerey, J. Knodt, and F. Heide, “Neural scene graphs for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 2856–2865

  11. [11]

    S-nerf: Neural radiance fields for street views,

    Z. Xie, J. Zhang, W. Li, F. Zhang, and L. Zhang, “S-nerf: Neural radiance fields for street views,” inInternational Conference on Learning Representations (ICLR), 2023

  12. [12]

    Unisim: A neural closed-loop sensor simulator,

    Z. Yang, Y . Chen, J. Wang, S. Manivasagam, W.-C. Ma, A. J. Yang, and R. Urtasun, “Unisim: A neural closed-loop sensor simulator,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1389–1399

  13. [13]

    Neurad: Neural rendering for autonomous driving,

    A. Tonderski, C. Lindstr ¨om, G. Hess, W. Ljungbergh, L. Svensson, and C. Petersson, “Neurad: Neural rendering for autonomous driving,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 14 895–14 904

  14. [14]

    Driv- inggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes,

    X. Zhou, Z. Lin, X. Shan, Y . Wang, D. Sun, and M.-H. Yang, “Driv- inggaussian: Composite gaussian splatting for surrounding dynamic au- tonomous driving scenes,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 21 634–21 643

  15. [15]

    Street gaussians: Modeling dynamic urban scenes with gaussian splatting,

    Y . Yan, H. Lin, C. Zhou, W. Wang, H. Sun, K. Zhan, X. Lang, X. Zhou, and S. Peng, “Street gaussians: Modeling dynamic urban scenes with gaussian splatting,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 156–173

  16. [16]

    Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction,

    M. Khan, H. Fazlali, D. Sharma, T. Cao, D. Bai, Y . Ren, and B. Liu, “Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction,” in2025 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2025, pp. 8315–8321

  17. [17]

    Hugs: Holistic urban 3d scene understanding via gaussian splatting,

    H. Zhou, J. Shao, L. Xu, D. Bai, W. Qiu, B. Liu, Y . Wang, A. Geiger, and Y . Liao, “Hugs: Holistic urban 3d scene understanding via gaussian splatting,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 336–21 345

  18. [18]

    Om- nire: Omni urban scene reconstruction,

    Z. Chen, J. Yang, J. Huang, R. de Lutio, J. M. Esturo, B. Ivanovic, O. Litany, Z. Gojcic, S. Fidler, M. Pavone, L. Song, and Y . Wang, “Om- nire: Omni urban scene reconstruction,” inThe Thirteenth International Conference on Learning Representations, 2025

  19. [19]

    Splatad: Real-time lidar and camera rendering with 3d gaussian splat- ting for autonomous driving,

    G. Hess, C. Lindstr ¨om, M. Fatemi, C. Petersson, and L. Svensson, “Splatad: Real-time lidar and camera rendering with 3d gaussian splat- ting for autonomous driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 11 982–11 992

  20. [20]

    Drivesplat: Decoupled driving scene reconstruction with geometry-enhanced partitioned neural gaussians,

    C. Wang, X. Guo, W. Xu, W. Tian, R. Song, C. Zhang, L. Li, and L. Chen, “Drivesplat: Decoupled driving scene reconstruction with geometry-enhanced partitioned neural gaussians,”arXiv preprint arXiv:2508.15376, 2025

  21. [21]

    Unifying appearance codes and bilateral grids for driving scene gaussian splatting.arXiv preprint arXiv:2506.05280, 2025

    N. Wang, Y . Chen, L. Xiao, W. Xiao, B. Li, Z. Chen, C. Ye, S. Xu, S. Zhang, Z. Yanet al., “Unifying appearance codes and bilateral grids for driving scene gaussian splatting,”arXiv preprint arXiv:2506.05280, 2025

  22. [22]

    Drivinged- itor: 4d composite gaussian splatting for reconstruction and edition of dynamic autonomous driving scenes,

    W. Xu, Y . Qian, Y .-F. Liu, L. Tuo, H. Chen, and M. Yang, “Drivinged- itor: 4d composite gaussian splatting for reconstruction and edition of dynamic autonomous driving scenes,”IEEE Transactions on Image Processing, 2026

  23. [23]

    Suds: Scalable urban dynamic scenes,

    H. Turki, J. Y . Zhang, F. Ferroni, and D. Ramanan, “Suds: Scalable urban dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 375–12 385

  24. [24]

    Emernerf: Emergent spatial-temporal scene decomposition via self-supervision,

    J. Yang, B. Ivanovic, O. Litany, X. Weng, S. W. Kim, B. Li, T. Che, D. Xu, S. Fidler, M. Pavoneet al., “Emernerf: Emergent spatial-temporal scene decomposition via self-supervision,” inThe Twelfth International Conference on Learning Representations

  25. [25]

    ProSGNeRF: Progressive dynamic neural scene graph with frequency modulated foundation model in urban scenes,

    T. Deng, S. Liu, X. Wang, Y . Liu, D. Wang, and W. Chen, “Prosgnerf: Progressive dynamic neural scene graph with frequency modulated auto- encoder in urban scenes,”arXiv preprint arXiv:2312.09076, 2023

  26. [26]

    Freedriverf: Monocular rgb dynamic nerf without poses for autonomous driving via point-level dynamic-static decoupling,

    Y . Wen, L. Song, Y . Liu, S. Zhu, Y . Miao, L. Han, and H. Wang, “Freedriverf: Monocular rgb dynamic nerf without poses for autonomous driving via point-level dynamic-static decoupling,” in2025 IEEE In- ternational Conference on Robotics and Automation (ICRA), 2025, pp. 13 950–13 956

  27. [27]

    S3 Gaussian: Self-supervised street gaussians for autonomous driving.arXiv preprint arXiv:2405.20323, 2024

    N. Huang, X. Wei, W. Zheng, P. An, M. Lu, W. Zhan, M. Tomizuka, K. Keutzer, and S. Zhang, “S3gaussian: Self-supervised street gaussians for autonomous driving,”arXiv preprint arXiv:2405.20323, 2024

  28. [28]

    Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering,

    Y . Chen, C. Gu, J. Jiang, X. Zhu, and L. Zhang, “Periodic vibration gaussian: Dynamic urban scene reconstruction and real-time rendering,” arXiv preprint arXiv:2311.18561, 2023

  29. [29]

    Hexplane: A fast representation for dynamic scenes,

    A. Cao and J. Johnson, “Hexplane: A fast representation for dynamic scenes,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 130–141

  30. [30]

    Splatflow: Self- supervised dynamic gaussian splatting in neural motion flow field for autonomous driving,

    S. Sun, C. Zhao, Z. Sun, Y . V . Chen, and M. Chen, “Splatflow: Self- supervised dynamic gaussian splatting in neural motion flow field for autonomous driving,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 27 487–27 496

  31. [31]

    Coda-4dgs: Dynamic gaussian splatting with context and deformation awareness for autonomous driving,

    R. Song, C. Liang, Y . Xia, W. Zimmer, H. Cao, H. Caesar, A. Festag, and A. Knoll, “Coda-4dgs: Dynamic gaussian splatting with context and deformation awareness for autonomous driving,” inIEEE/CVF International Conference on Computer Vision (ICCV). IEEE/CVF, 2025

  32. [32]

    Desire-gs: 4d street gaussians for static- dynamic decomposition and surface reconstruction for urban driving scenes,

    C. Peng, C. Zhang, Y . Wang, C. Xu, Y . Xie, W. Zheng, K. Keutzer, M. Tomizuka, and W. Zhan, “Desire-gs: 4d street gaussians for static- dynamic decomposition and surface reconstruction for urban driving scenes,” inProceedings of the Computer Vision and Pattern Recognition Conference, 2025, pp. 6782–6791

  33. [33]

    Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving,

    J. Xu, K. Deng, Z. Fan, S. Wang, J. Xie, and J. Yang, “Ad-gs: Object-aware b-spline gaussian splatting for self-supervised autonomous driving,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 24 770–24 779

  34. [34]

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    T. Ren, S. Liu, A. Zeng, J. Lin, K. Li, H. Cao, J. Chen, X. Huang, Y . Chen, F. Yanet al., “Grounded sam: Assembling open-world models for diverse visual tasks,”arXiv preprint arXiv:2401.14159, 2024

  35. [35]

    Street gaussians without 3d object tracker,

    R. Zhang, C. Li, C. Zhang, X. Liu, H. Yuan, Y . Li, X. Ji, and G. H. Lee, “Street gaussians without 3d object tracker,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 25 722–25 734

  36. [36]

    Idsplat: Instance-decomposed 3d gaussian splatting for driving scenes,

    C. Lindstr ¨om, M. Rafidashti, M. Fatemi, L. Hammarstrand, M. R. Oswald, and L. Svensson, “Idsplat: Instance-decomposed 3d gaussian splatting for driving scenes,”arXiv preprint arXiv:2511.19235, 2025

  37. [37]

    Nerf in the wild: Neural radiance fields for unconstrained photo collections,

    R. Martin-Brualla, N. Radwan, M. S. Sajjadi, J. T. Barron, A. Doso- vitskiy, and D. Duckworth, “Nerf in the wild: Neural radiance fields for unconstrained photo collections,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 7210– 7219

  38. [38]

    Hallucinated neural radiance fields in the wild,

    X. Chen, Q. Zhang, X. Li, Y . Chen, Y . Feng, X. Wang, and J. Wang, “Hallucinated neural radiance fields in the wild,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 943–12 952

  39. [39]

    Nerf-ms: Neural radiance fields with multi-sequence,

    P. Li, S. Wang, C. Yang, B. Liu, W. Qiu, and H. Wang, “Nerf-ms: Neural radiance fields with multi-sequence,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 18 591–18 600. PREPRINT 12

  40. [40]

    Block-nerf: Scalable large scene neural view synthesis,

    M. Tancik, V . Casser, X. Yan, S. Pradhan, B. Mildenhall, P. P. Srinivasan, J. T. Barron, and H. Kretzschmar, “Block-nerf: Scalable large scene neural view synthesis,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8248–8258

  41. [41]

    Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,

    H. Turki, D. Ramanan, and M. Satyanarayanan, “Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 12 922–12 931

  42. [42]

    Gaussian in the wild: 3d gaussian splatting for unconstrained image collections,

    D. Zhang, C. Wang, W. Wang, P. Li, M. Qin, and H. Wang, “Gaussian in the wild: 3d gaussian splatting for unconstrained image collections,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 341– 359

  43. [43]

    Wildgaussians: 3d gaussian splatting in the wild.arXiv preprint arXiv:2407.08447, 2024

    J. Kulhanek, S. Peng, Z. Kukelova, M. Pollefeys, and T. Sattler, “Wildgaussians: 3d gaussian splatting in the wild,”arXiv preprint arXiv:2407.08447, 2024

  44. [44]

    Swag: Splatting in the wild images with appearance-conditioned gaus- sians,

    H. Dahmani, M. Bennehar, N. Piasco, L. Roldao, and D. Tsishkou, “Swag: Splatting in the wild images with appearance-conditioned gaus- sians,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 325–340

  45. [45]

    Vastgaussian: Vast 3d gaussians for large scene reconstruction,

    J. Lin, Z. Li, X. Tang, J. Liu, S. Liu, J. Liu, Y . Lu, X. Wu, S. Xu, Y . Yan et al., “Vastgaussian: Vast 3d gaussians for large scene reconstruction,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5166–5175

  46. [46]

    Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,

    X. Zhang, P. P. Srinivasan, B. Deng, P. Debevec, W. T. Freeman, and J. T. Barron, “Nerfactor: Neural factorization of shape and reflectance under an unknown illumination,”ACM Transactions on Graphics (ToG), vol. 40, no. 6, pp. 1–18, 2021

  47. [47]

    Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,

    K. Zhang, F. Luan, Q. Wang, K. Bala, and N. Snavely, “Physg: Inverse rendering with spherical gaussians for physics-based material editing and relighting,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 5453–5462

  48. [48]

    Tensoir: Tensorial inverse rendering,

    H. Jin, I. Liu, P. Xu, X. Zhang, S. Han, S. Bi, X. Zhou, Z. Xu, and H. Su, “Tensoir: Tensorial inverse rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 165–174

  49. [49]

    Modeling indirect illumination for inverse rendering,

    Y . Zhang, J. Sun, X. He, H. Fu, R. Jia, and X. Zhou, “Modeling indirect illumination for inverse rendering,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 18 643–18 652

  50. [50]

    Nerf for outdoor scene relighting,

    V . Rudnev, M. Elgharib, W. Smith, L. Liu, V . Golyanik, and C. Theobalt, “Nerf for outdoor scene relighting,” inEuropean Conference on Com- puter Vision. Springer, 2022, pp. 615–631

  51. [51]

    Sol-nerf: Sunlight modeling for outdoor scene decomposition and relighting,

    J.-M. Sun, T. Wu, Y .-L. Yang, Y .-K. Lai, and L. Gao, “Sol-nerf: Sunlight modeling for outdoor scene decomposition and relighting,” in SIGGRAPH Asia 2023 Conference Papers, 2023, pp. 1–11

  52. [52]

    Gaus- sianshader: 3d gaussian splatting with shading functions for reflective surfaces,

    Y . Jiang, J. Tu, Y . Liu, X. Gao, X. Long, W. Wang, and Y . Ma, “Gaus- sianshader: 3d gaussian splatting with shading functions for reflective surfaces,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 5322–5332

  53. [53]

    Relightable 3d gaussians: Realistic point cloud relighting with brdf decomposition and ray tracing,

    J. Gao, C. Gu, Y . Lin, Z. Li, H. Zhu, X. Cao, L. Zhang, and Y . Yao, “Relightable 3d gaussians: Realistic point cloud relighting with brdf decomposition and ray tracing,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 73–89

  54. [54]

    Urbanir: Large-scale urban scene inverse rendering from a single video,

    C.-H. Lin, B. Liu, Y .-T. Chen, K.-S. Chen, D. Forsyth, J.-B. Huang, A. Bhattad, and S. Wang, “Urbanir: Large-scale urban scene inverse rendering from a single video,” in2025 International Conference on 3D Vision (3DV). IEEE, 2025, pp. 512–523

  55. [55]

    Invrgb+ l: Inverse rendering of complex scenes with unified color and lidar reflectance modeling,

    X. Chen, B. Chandaka, C.-H. Lin, Y .-Q. Zhang, D. Forsyth, H. Zhao, and S. Wang, “Invrgb+ l: Inverse rendering of complex scenes with unified color and lidar reflectance modeling,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 27 176–27 186

  56. [56]

    Crowd- sourced nerf: Collecting data from production vehicles for 3d street view reconstruction,

    T. Qin, C. Li, H. Ye, S. Wan, M. Li, H. Liu, and M. Yang, “Crowd- sourced nerf: Collecting data from production vehicles for 3d street view reconstruction,”IEEE Transactions on Intelligent Transportation Systems, vol. 25, no. 11, pp. 16 145–16 156, 2024

  57. [57]

    Memorize what matters: Emergent scene decomposition from multitraverse,

    Y . Li, Z. Wang, Y . Wang, Z. Yu, Z. Gojcic, M. Pavone, C. Feng, and J. M. Alvarez, “Memorize what matters: Emergent scene decomposition from multitraverse,”Advances in Neural Information Processing Systems, vol. 37, pp. 108 389–108 438, 2024

  58. [58]

    Gaussianupdate: Continual 3d gaussian splatting update for changing environments,

    L. Zeng, B. Zhao, J. Hu, X. Shen, Z. Dang, H. Bao, and Z. Cui, “Gaussianupdate: Continual 3d gaussian splatting update for changing environments,” inProceedings of the IEEE/CVF International Confer- ence on Computer Vision, 2025, pp. 25 800–25 809

  59. [59]

    Extrapolated urban view synthesis bench- mark,

    X. Han, Z. Jia, B. Li, Y . Wang, B. Ivanovic, Y . You, L. Liu, Y . Wang, M. Pavone, C. Fenget al., “Extrapolated urban view synthesis bench- mark,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 28 718–28 728

  60. [60]

    Multi-level neural scene graphs for dynamic urban environments,

    T. Fischer, L. Porzi, S. R. Bulo, M. Pollefeys, and P. Kontschieder, “Multi-level neural scene graphs for dynamic urban environments,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 21 125–21 135

  61. [61]

    Hybridworldsim: A scalable and controllable high-fidelity simulator for autonomous driving,

    Q. Li, Y . Jiang, T. Li, D. Chen, X. Feng, Y . Ao, S. Liu, X. Yu, Y . Cai, Y . Liuet al., “Hybridworldsim: A scalable and controllable high-fidelity simulator for autonomous driving,”arXiv preprint arXiv:2511.22187, 2025

  62. [62]

    The rendering equation,

    J. T. Kajiya, “The rendering equation,” inProceedings of the 13th annual conference on Computer graphics and interactive techniques, 1986, pp. 143–150

  63. [63]

    Ref-nerf: Structured view-dependent appearance for neural radiance fields,

    D. Verbin, P. Hedman, B. Mildenhall, T. Zickler, J. T. Barron, and P. P. Srinivasan, “Ref-nerf: Structured view-dependent appearance for neural radiance fields,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  64. [64]

    Unidepthv2: Universal monocular metric depth estimation made simpler

    L. Piccinelli, C. Sakaridis, Y .-H. Yang, M. Segu, S. Li, W. Abbeloos, and L. Van Gool, “Unidepthv2: Universal monocular metric depth estimation made simpler,”arXiv preprint arXiv:2502.20110, 2025

  65. [65]

    Ouroboros: Single-step diffusion models for cycle-consistent forward and inverse rendering,

    S. Sun, Y . Wang, H. Zhang, Y . Xiong, Q. Ren, R. Fang, X. Xie, and C. You, “Ouroboros: Single-step diffusion models for cycle-consistent forward and inverse rendering,” inProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, 2025, pp. 10 386–10 397

  66. [66]

    Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

    B. Wilson, W. Qi, T. Agarwal, J. Lambert, J. Singh, S. Khandelwal, B. Pan, R. Kumar, A. Hartnett, J. K. Ponteset al., “Argoverse 2: Next generation datasets for self-driving perception and forecasting,”arXiv preprint arXiv:2301.00493, 2023

  67. [67]

    Scalability in perception for autonomous driving: Waymo open dataset,

    P. Sun, H. Kretzschmar, X. Dotiwalla, A. Chouard, V . Patnaik, P. Tsui, J. Guo, Y . Zhou, Y . Chai, B. Caineet al., “Scalability in perception for autonomous driving: Waymo open dataset,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2446–2454