pith. sign in

arxiv: 2605.23602 · v1 · pith:FVL5ADDDnew · submitted 2026-05-22 · 💻 cs.CV

GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

Pith reviewed 2026-05-25 04:58 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian SplattingNighttime scenesGlow regionsSemantic feature generationDiffusion modelsNovel view synthesisVision foundation modelsScene reconstruction
0
0 comments X

The pith

Semantic feature banks built from diffusion models let 3D Gaussian splatting reconstruct nighttime glow scenes without ground-truth images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix 3D Gaussian Splatting when it encounters night scenes full of glow, where the usual cues of texture and edges are absent and reconstruction breaks down. It compensates by generating synthetic novel views with a diffusion model, then using a vision foundation model to score those views and pull out semantic features that serve as substitute structure. These features populate a bank that later guides the splatting optimization: rendered views are compared against the bank and pulled toward semantic matches, enforcing consistency without any real target image. A reader would care because this removes a practical barrier to building 3D models from footage taken in low-light or glowing conditions.

Core claim

GlowGS uses semantic feature generation to synthesize novel views via diffusion, filter them with a vision foundation model, and store the extracted features in a bank; it then applies novel-view semantic learning so that 3D Gaussian Splatting optimizes rendered views by minimizing distance to the closest bank entries, thereby imposing implicit structural constraints that produce semantically coherent, artifact-free results in glow regions.

What carries the argument

The semantic feature bank, which holds vision-foundation-model features from high-quality diffusion-synthesized novel views and supplies the distance-minimization target that enforces structural consistency during 3D Gaussian Splatting training.

If this is right

  • 3D Gaussian Splatting can now produce usable novel views in nighttime glow regions that previously caused failure.
  • Optimization of rendered views can proceed without any ground-truth images by relying on semantic distance to a pre-built feature bank.
  • The same two-stage process of feature-bank creation followed by semantic matching can be applied whenever structural cues are weak.
  • Semantic accuracy of rendered night scenes improves measurably over standard 3DGS pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same bank-and-match pattern could be tested on other low-cue settings such as fog or heavy rain where textures also disappear.
  • Improvements in diffusion or foundation models would directly raise the quality of the feature bank without changing the rest of the pipeline.
  • The approach suggests a general route for injecting external generative priors into splatting methods that currently depend only on observed pixels.

Load-bearing premise

A diffusion model can generate novel views with unknown camera poses whose quality a vision foundation model can reliably judge and whose semantic features remain stable enough to guide splatting in glow scenes.

What would settle it

Apply the method and a plain 3DGS baseline to the same set of nighttime glow scenes, then measure whether the GlowGS novel views show higher semantic similarity to real captured images and fewer glow-induced artifacts than the baseline.

Figures

Figures reproduced from arXiv: 2605.23602 by Beibei Lin, Jingyuan Guo, Robby T. Tan, Xiao Cao.

Figure 1
Figure 1. Figure 1: Qualitative results from 3DGS [26], CGS [29], MGS [64], and our method on nighttime scenes. All results are from novel views. The zoom-in insets highlight artifacts in glow regions produced by existing methods, including duplicated light sources and unnatural bright spots in dark areas. Our method reconstructs glow regions with smoother light diffusion, effectively reducing these artifacts. (a) Input (b) C… view at source ↗
Figure 2
Figure 2. Figure 2: Visualization of features extracted by different Vision [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of our GlowGS, which centers around two core ideas: semantic feature generation and novel-view semantic learning. Given the training views, our semantic feature generation uses image-to-video diffusion models to create novel views with unknown poses, followed by a VFM-based module to assess their quality. We then extract robust semantic features from the high-quality views and build a semantic fea… view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative results from 3DGS [26], CGS [29], MGS [64] and our method, on nighttime glow scenes. All results are from novel views. Our method not only preserves the details of night images but also effectively reconstructs glow regions. Zoom in for better visualization. Novel-View Semantic Learning During training, we generate new camera poses by perturbing the rotation and translation matrices of the trai… view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative results from 3DGS [26], CGS [29], MGS [64] and our method, on nighttime glow scenes. All results are from novel views. Our method not only preserves the details of night images but also effectively reconstructs glow regions. Zoom in for better visualization. ture bank, enabling 3DGS models to optimize novel view semantics and boost performance. GlowGS consistently outperforms baseline 3DGS meth… view at source ↗
Figure 6
Figure 6. Figure 6: Visualization of features extracted by different Vision [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Glow masks and results from MGS [64] and ours [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of generated results. Given a training view, we use image-to-video diffusion models to synthesize novel views. The [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
read the original abstract

Existing 3DGS methods effectively render high-quality novel views in clear-day scenes. However, they struggle with night scenes, particularly in glow regions, due to the lack of structural features such as textures and edges, which are key cues for splatting-based reconstruction. To address this problem, we leverage a diffusion model and a Vision Foundation Model (VFM) to compensate for missing structural cues. Our method consists of two key novel ideas: semantic feature generation and novel-view semantic learning. First, semantic feature generation produces high-quality semantic features as implicit structural cues for novel views. Specifically, a diffusion model synthesizes novel views with unknown camera poses from training views, while a VFM evaluates their quality. Once high-quality novel views are identified, the VFM extracts robust features to construct the semantic feature bank. Second, novel-view semantic learning enables 3DGS to optimize rendered novel views without requiring ground truth. It achieves this by extracting semantic features from a rendered novel view, searching the feature bank for the most similar features, and minimizing their distance. This process enforces implicit structural constraints, ensuring semantically coherent, artifact-free rendered views. Extensive experiments demonstrate the effectiveness of our GlowGS in generating semantically accurate 3D views, showing significant improvements over existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GlowGS to extend 3D Gaussian Splatting to nighttime glow scenes, where standard methods fail due to missing textures and edges. It introduces two components: (1) semantic feature generation, in which a diffusion model synthesizes novel views for arbitrary camera poses from training views, a vision foundation model (VFM) filters for quality, and the VFM then extracts features into a semantic bank; (2) novel-view semantic learning, which renders a novel view, retrieves the closest bank entry, and minimizes feature distance to enforce implicit structural constraints without ground-truth images. Experiments are reported to show significant gains over prior 3DGS methods in semantic accuracy.

Significance. If the synthesis and filtering steps prove reliable, the approach offers a concrete mechanism for injecting semantic priors into 3DGS optimization, addressing a recognized failure mode in low-texture nighttime reconstruction. The two-stage pipeline (generative bank construction followed by distance-based regularization) is a plausible way to obtain supervision where photometric losses are uninformative.

major comments (2)
  1. [Semantic feature generation] Semantic feature generation (abstract and method description): the claim that a diffusion model can produce reliable novel-view geometry and semantics for unknown poses directly from training views is load-bearing for both novelties. Glow regions are defined precisely by the absence of edges and textures—the same cues diffusion models typically rely on—yet no quantitative validation (e.g., pose-consistency metrics, artifact rates on held-out glow patches, or comparison against ground-truth novel views) is supplied to show that the generated bank entries are structurally faithful rather than hallucinated. Without such evidence the subsequent distance-minimization objective risks enforcing consistency with synthesis artifacts.
  2. [Novel-view semantic learning] Novel-view semantic learning (abstract and method description): the optimization objective minimizes distance to the nearest bank feature without any mechanism to detect or reject inconsistent bank entries. If the diffusion step populates the bank with pose-inconsistent or glow-specific artifacts, the rendered outputs will be driven toward those artifacts; the manuscript provides no ablation that isolates the effect of bank quality on final rendering metrics.
minor comments (1)
  1. [Abstract] The abstract states “significant improvements over existing methods” but does not name the baselines or report numerical deltas; these details should appear in the experiments section with standard error bars.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for major revision. The two major comments identify important gaps in validation that we will address directly.

read point-by-point responses
  1. Referee: [Semantic feature generation] Semantic feature generation (abstract and method description): the claim that a diffusion model can produce reliable novel-view geometry and semantics for unknown poses directly from training views is load-bearing for both novelties. Glow regions are defined precisely by the absence of edges and textures—the same cues diffusion models typically rely on—yet no quantitative validation (e.g., pose-consistency metrics, artifact rates on held-out glow patches, or comparison against ground-truth novel views) is supplied to show that the generated bank entries are structurally faithful rather than hallucinated. Without such evidence the subsequent distance-minimization objective risks enforcing consistency with synthesis artifacts.

    Authors: We agree that the reliability of diffusion-generated views is central to the method and that the current description relies on VFM-based filtering without supplying dedicated quantitative checks such as pose-consistency metrics or artifact rates. In the revised manuscript we will add these evaluations on held-out glow patches to demonstrate structural faithfulness of the bank entries. revision: yes

  2. Referee: [Novel-view semantic learning] Novel-view semantic learning (abstract and method description): the optimization objective minimizes distance to the nearest bank feature without any mechanism to detect or reject inconsistent bank entries. If the diffusion step populates the bank with pose-inconsistent or glow-specific artifacts, the rendered outputs will be driven toward those artifacts; the manuscript provides no ablation that isolates the effect of bank quality on final rendering metrics.

    Authors: The design currently depends on VFM quality filtering to ensure bank reliability, yet we acknowledge the absence of both an explicit rejection mechanism and an ablation isolating bank quality. We will add such an ablation study in the revision to quantify the impact of bank quality on final rendering metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: method relies on external diffusion/VFM models without self-referential derivations

full rationale

The paper's two claimed novelties (semantic feature generation via diffusion synthesis + VFM bank construction, followed by distance minimization to the bank) are presented as procedural steps that invoke pre-trained external models. No equations, parameter-fitting steps, or self-citations appear in the abstract or described chain that would reduce a claimed prediction or result to the input data by construction. The central claims concern empirical effectiveness on nighttime scenes rather than a closed mathematical derivation; therefore the derivation chain is self-contained against external benchmarks and receives score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only, so the ledger records only the explicit domain assumptions stated in the abstract.

axioms (2)
  • domain assumption Diffusion models can synthesize high-quality novel views with unknown camera poses from training views.
    Invoked in the semantic feature generation stage.
  • domain assumption Vision foundation models can reliably evaluate synthesized view quality and extract robust semantic features.
    Used to build the semantic feature bank and to drive the novel-view semantic learning loss.

pith-pipeline@v0.9.0 · 5768 in / 1272 out tokens · 25388 ms · 2026-05-25T04:58:14.774950+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 2 internal anchors

  1. [1]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 13

  2. [2]

    3dot: Texture transfer for 3dgs objects from a single reference image

    Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, and Robby T Tan. 3dot: Texture transfer for 3dgs objects from a single reference image. InThe Thirty-ninth Annual Confer- ence on Neural Information Processing Systems. 2

  3. [3]

    Emerg- ing properties in self-supervised vision transformers

    Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 1, 2, 3, 4, 5, 8, 12

  4. [4]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean Conference on Computer Vision, pages 333–350. Springer,

  5. [5]

    Dual-rain: Video rain removal using assertive and gentle teachers

    Tingting Chen, Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, and Robby T Tan. Dual-rain: Video rain removal using assertive and gentle teachers. InEuropean Conference on Computer Vision, pages 127–143. Springer,

  6. [6]

    Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025

    Tingting Chen, Srinivas Anumasa, Beibei Lin, Vedant Shah, Anirudh Goyal, and Dianbo Liu. Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025. 13

  7. [7]

    Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025

    Tingting Chen, Beibei Lin, Zifeng Yuan, Qiran Zou, Hongyu He, Anirudh Goyal, Yew-Soon Ong, and Dianbo Liu. Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025. 13

  8. [8]

    Control-a-video: Controllable text-to-video generation with diffusion models

    Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv preprint arXiv:2305.13840, 2023. 3

  9. [9]

    Learning implicit fields for generative shape modeling

    Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5939–5948, 2019. 2

  10. [10]

    Neurbf: A neural fields repre- sentation with adaptive radial basis functions

    Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, and Yi Xu. Neurbf: A neural fields repre- sentation with adaptive radial basis functions. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 4182–4194, 2023. 2

  11. [11]

    Aleth-nerf: Illumination adaptive nerf with concealing field assumption

    Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, and Tatsuya Harada. Aleth-nerf: Illumination adaptive nerf with concealing field assumption. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1435– 1444, 2024. 5, 6

  12. [12]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 5, 8

  13. [13]

    Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026

    Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, and Shunli Zhang. Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026. 13

  14. [14]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2

  15. [15]

    Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024

    Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, and William T Freeman. Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024. 5, 8

  16. [16]

    Yuan Gao, Guanyu Chen, Luo Qi, Wujie Fu, Zifeng Yuan, and Aaron J. Danner. Photonic ising machines for combina- torial optimization problems.Applied Physics Reviews, 11 (4):041307, 2024. 13

  17. [17]

    The lumigraph

    Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. The lumigraph. InSeminal Graphics Pa- pers: Pushing the Boundaries, Volume 2, pages 453–464

  18. [18]

    Masked autoencoders are scalable vision learners

    Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 3

  19. [19]

    Baking neural ra- diance fields for real-time view synthesis

    Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 5875–5884, 2021. 2

  20. [20]

    Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022

    Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022. 3

  21. [21]

    Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution

    Yeying Jin, Beibei Lin, Wending Yan, Yuan Yuan, Wei Ye, and Robby T Tan. Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution. InProceedings of the 31st ACM international conference on multimedia, pages 2446–2457, 2023. 1, 12

  22. [22]

    Hydra: A hy- per agent for dynamic compositional visual reasoning

    Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, and Hamid Rezatofighi. Hydra: A hy- per agent for dynamic compositional visual reasoning. In European Conference on Computer Vision, pages 132–149. Springer, 2024. 13

  23. [23]

    Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning

    Fucai Ke, Vijay Kumar B G, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, and Manmohan Chandraker. Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3378–3389, 2025

  24. [24]

    Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

    Fucai Ke, Joy Hsu, Zhixi Cai, Zixian Ma, Xin Zheng, Xindi Wu, Sukai Huang, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, et al. Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

  25. [25]

    View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026

    Fucai Ke, Zhixi Cai, Boying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, and Hamid Rezatofighi. View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026. 13

  26. [26]

    3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023. 1, 2, 4, 5, 6, 7

  27. [27]

    Segment any- thing

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

  28. [28]

    Tetra-nerf: Represent- ing neural radiance fields using tetrahedra

    Jonas Kulhanek and Torsten Sattler. Tetra-nerf: Represent- ing neural radiance fields using tetrahedra. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 18458–18469, 2023. 2

  29. [29]

    Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

    Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 5, 6, 7

  30. [30]

    Light field rendering

    Marc Levoy and Pat Hanrahan. Light field rendering. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 441–452. 2023. 2

  31. [31]

    Nightcc: nighttime color con- stancy via adaptive channel masking

    Shuwei Li and Robby T Tan. Nightcc: nighttime color con- stancy via adaptive channel masking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25522–25531, 2024. 13

  32. [32]

    Bridging day and night: Target-class hallucination suppression in unpaired im- age translation

    Shuwei Li, Lei Tan, and Robby T Tan. Bridging day and night: Target-class hallucination suppression in unpaired im- age translation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6424–6432, 2026. 13

  33. [33]

    Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion

    Beibei Lin, Tingting Chen, and Robby T Tan. Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, . 3

  34. [34]

    Rgb-to- polarization estimation: A new task and benchmark study

    Beibei Lin, Zifeng Yuan, and Tingting Chen. Rgb-to- polarization estimation: A new task and benchmark study. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems Datasets and Benchmarks Track,

  35. [35]

    Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction

    Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, and Robby T Tan. Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 3378–3385, 2024. 3

  36. [36]

    Nighthaze: Nighttime image dehazing via self-prior learning

    Beibei Lin, Yeying Jin, Yan Wending, Wei Ye, Yuan Yuan, and Robby T Tan. Nighthaze: Nighttime image dehazing via self-prior learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5209–5217, 2025. 1

  37. [37]

    Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025

    Beibei Lin, Stephen Lin, and Robby Tan. Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025. 3

  38. [38]

    Optical models for direct volume rendering

    Nelson Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. 2

  39. [39]

    Local and global illumination in the volume rendering integral

    Nelson Max and Min Chen. Local and global illumination in the volume rendering integral. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2005. 2

  40. [40]

    Occupancy networks: Learning 3d reconstruction in function space

    Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019. 2

  41. [41]

    Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

    Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

  42. [42]

    Nerf in the dark: High dynamic range view synthesis from noisy raw images

    Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P Srinivasan, and Jonathan T Barron. Nerf in the dark: High dynamic range view synthesis from noisy raw images. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 16190–16199, 2022. 5, 6, 13

  43. [43]

    Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

    Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

  44. [44]

    Deepsdf: Learning con- tinuous signed distance functions for shape representation

    Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 165–174, 2019. 2

  45. [45]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 4195–4205,

  46. [46]

    Pika: Ai video creation platform, 2024

    Pika. Pika: Ai video creation platform, 2024. 1, 3, 5, 7, 12

  47. [47]

    Promeai: Professional ai solutions, 2024

    PromeAI. Promeai: Professional ai solutions, 2024. 1, 3, 5, 7, 12

  48. [48]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 2, 3, 4, 5, 8, 12

  49. [49]

    Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps

    Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. InProceedings of the IEEE/CVF international conference on computer vision, pages 14335– 14345, 2021. 2

  50. [50]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

  51. [51]

    Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025

    Zhiqiang Teng, Beibei Lin, Tingting Chen, Zifeng Yuan, Xu- anyi Li, Xuanyu Zhang, and Shunli Zhang. Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025. 2

  52. [52]

    Lighting up nerf via unsupervised decomposition and en- hancement

    Haoyuan Wang, Xiaogang Xu, Ke Xu, and Rynson WH Lau. Lighting up nerf via unsupervised decomposition and en- hancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12632–12641, 2023. 5, 6

  53. [53]

    Imaginator: Conditional spatio-temporal gan for video generation

    Yaohui Wang, Piotr Bilinski, Francois Bremond, and Antitza Dantcheva. Imaginator: Conditional spatio-temporal gan for video generation. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 1160–1169, 2020. 3

  54. [54]

    Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024

    Yuehao Wang, Chaoyi Wang, Bingchen Gong, and Tianfan Xue. Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024. 5, 6, 13

  55. [55]

    Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

    Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7623–7633, 2023. 3

  56. [56]

    Point- nerf: Point-based neural radiance fields

    Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point- nerf: Point-based neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022. 2

  57. [57]

    Tan, Bing Zeng, and Shuaicheng Liu

    Weilong Yan, Robby T. Tan, Bing Zeng, and Shuaicheng Liu. Deep homography mixture for single image rolling shutter correction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9868–9877,

  58. [58]

    Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, and Robby T. Tan. Synthetic-to-real self-supervised robust depth estimation via learning with motion and structure priors. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 21880–21890, 2025. 13

  59. [59]

    Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025

    Weilong Yan, Xin Zhang, and Robby T Tan. Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025. 13

  60. [60]

    LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026

    Weilong Yan, Haipeng Li, Hao Xu, Nianjin Ye, Yihao Ai, Shuaicheng Liu, and Jingyu Hu. LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026. 13

  61. [61]

    Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions

    Xin Yang, Xin Zhang, and Xinchao Wang. Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions. InProceedings of the AAAI Conference on Artificial Intelligence, pages 9301–9309, 2025. 13

  62. [62]

    Bakedsdf: Meshing neural sdfs for real- time view synthesis

    Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P Srinivasan, Richard Szeliski, Jonathan T Barron, and Ben Mildenhall. Bakedsdf: Meshing neural sdfs for real- time view synthesis. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–9, 2023. 2

  63. [63]

    Plenoctrees for real-time rendering of neural radiance fields

    Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752– 5761, 2021. 2

  64. [64]

    Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

    Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 3, 5, 6, 7, 8, 12, 13

  65. [65]

    Zifeng Yuan, Tingting Chen, Dewen Zhang, Yuan Gao, Wenkai Shan, Beibei Lin, and Aaron J. Danner. Tailored polarization-switchable VCSEL arrays for photonic ising computing.Applied Physics Letters, 127(22):221102, 2025. 13

  66. [66]

    Zifeng Yuan, Wenkai Shan, Tingting Chen, Beibei Lin, and Aaron J. Danner. Mesa orientation engineering for polariza- tion locking in VCSELs. In2025 IEEE Photonics Confer- ence (IPC), pages 1–2, Singapore, Singapore, 2025. 13

  67. [67]

    Zifeng Yuan, Dewen Zhang, Yuan Gao, Luo Qi, Wujie Fu, and Aaron J. Danner. Large-scale fabrication and analysis of polarization behavior in VCSELs with tailored apertures. Journal of Lightwave Technology, 43(14):6819–6827, 2025

  68. [68]

    Zifeng Yuan, Dewen Zhang, Hong-Lin Lin, and Aaron J. Danner. Engineering polarization switching in VCSELs with custom aperture shapes. InCLEO: Conference on Lasers and Electro-Optics, page JPS200 47. Optica Publishing Group,

  69. [69]

    Zifeng Yuan, Dewen Zhang, Lei Shi, Yutong Liu, and Aaron J. Danner. Enhanced polarization locking in VCSELs. Applied Physics Letters, 126(15):151101, 2025. 13

  70. [70]

    Dan- ner

    Dewen Zhang, Zifeng Yuan, Thanh Xuan Hoang, Wujie Fu, Ching Eng Png, Soon Thor Lim, and Aaron J. Dan- ner. All-optical scalable and programmable VCSEL-based ising annealer with parallel feedback.Optics Express, 33 (11):22119–22131, 2025. 13

  71. [71]

    Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation

    Xin Zhang and Robby T Tan. Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 14527–14537, 2025. 13

  72. [72]

    Slow camera movement, static scene, no new objects

    Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, and Robby T Tan. Heap: unsupervised object discovery and localization with contrastive grouping. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7323– 7331, 2024. 13 Table 3. (PSNR/SSIM) vs. Number of Training Views Method2 views 4 views 6 views 8 views 10 views MGS [64]21.0/0.64 25....