GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

Beibei Lin; Jingyuan Guo; Robby T. Tan; Xiao Cao

arxiv: 2605.23602 · v1 · pith:FVL5ADDDnew · submitted 2026-05-22 · 💻 cs.CV

GlowGS: Generative Semantic Feature Learning for 3D Gaussian Splatting in Nighttime Glow Scenes

Beibei Lin , Xiao Cao , Jingyuan Guo , Robby T. Tan This is my paper

Pith reviewed 2026-05-25 04:58 UTC · model grok-4.3

classification 💻 cs.CV

keywords 3D Gaussian SplattingNighttime scenesGlow regionsSemantic feature generationDiffusion modelsNovel view synthesisVision foundation modelsScene reconstruction

0 comments

The pith

Semantic feature banks built from diffusion models let 3D Gaussian splatting reconstruct nighttime glow scenes without ground-truth images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix 3D Gaussian Splatting when it encounters night scenes full of glow, where the usual cues of texture and edges are absent and reconstruction breaks down. It compensates by generating synthetic novel views with a diffusion model, then using a vision foundation model to score those views and pull out semantic features that serve as substitute structure. These features populate a bank that later guides the splatting optimization: rendered views are compared against the bank and pulled toward semantic matches, enforcing consistency without any real target image. A reader would care because this removes a practical barrier to building 3D models from footage taken in low-light or glowing conditions.

Core claim

GlowGS uses semantic feature generation to synthesize novel views via diffusion, filter them with a vision foundation model, and store the extracted features in a bank; it then applies novel-view semantic learning so that 3D Gaussian Splatting optimizes rendered views by minimizing distance to the closest bank entries, thereby imposing implicit structural constraints that produce semantically coherent, artifact-free results in glow regions.

What carries the argument

The semantic feature bank, which holds vision-foundation-model features from high-quality diffusion-synthesized novel views and supplies the distance-minimization target that enforces structural consistency during 3D Gaussian Splatting training.

If this is right

3D Gaussian Splatting can now produce usable novel views in nighttime glow regions that previously caused failure.
Optimization of rendered views can proceed without any ground-truth images by relying on semantic distance to a pre-built feature bank.
The same two-stage process of feature-bank creation followed by semantic matching can be applied whenever structural cues are weak.
Semantic accuracy of rendered night scenes improves measurably over standard 3DGS pipelines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bank-and-match pattern could be tested on other low-cue settings such as fog or heavy rain where textures also disappear.
Improvements in diffusion or foundation models would directly raise the quality of the feature bank without changing the rest of the pipeline.
The approach suggests a general route for injecting external generative priors into splatting methods that currently depend only on observed pixels.

Load-bearing premise

A diffusion model can generate novel views with unknown camera poses whose quality a vision foundation model can reliably judge and whose semantic features remain stable enough to guide splatting in glow scenes.

What would settle it

Apply the method and a plain 3DGS baseline to the same set of nighttime glow scenes, then measure whether the GlowGS novel views show higher semantic similarity to real captured images and fewer glow-induced artifacts than the baseline.

Figures

Figures reproduced from arXiv: 2605.23602 by Beibei Lin, Jingyuan Guo, Robby T. Tan, Xiao Cao.

**Figure 1.** Figure 1: Qualitative results from 3DGS [26], CGS [29], MGS [64], and our method on nighttime scenes. All results are from novel views. The zoom-in insets highlight artifacts in glow regions produced by existing methods, including duplicated light sources and unnatural bright spots in dark areas. Our method reconstructs glow regions with smoother light diffusion, effectively reducing these artifacts. (a) Input (b) C… view at source ↗

**Figure 2.** Figure 2: Visualization of features extracted by different Vision [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of our GlowGS, which centers around two core ideas: semantic feature generation and novel-view semantic learning. Given the training views, our semantic feature generation uses image-to-video diffusion models to create novel views with unknown poses, followed by a VFM-based module to assess their quality. We then extract robust semantic features from the high-quality views and build a semantic fea… view at source ↗

**Figure 4.** Figure 4: Qualitative results from 3DGS [26], CGS [29], MGS [64] and our method, on nighttime glow scenes. All results are from novel views. Our method not only preserves the details of night images but also effectively reconstructs glow regions. Zoom in for better visualization. Novel-View Semantic Learning During training, we generate new camera poses by perturbing the rotation and translation matrices of the trai… view at source ↗

**Figure 5.** Figure 5: Qualitative results from 3DGS [26], CGS [29], MGS [64] and our method, on nighttime glow scenes. All results are from novel views. Our method not only preserves the details of night images but also effectively reconstructs glow regions. Zoom in for better visualization. ture bank, enabling 3DGS models to optimize novel view semantics and boost performance. GlowGS consistently outperforms baseline 3DGS meth… view at source ↗

**Figure 6.** Figure 6: Visualization of features extracted by different Vision [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Glow masks and results from MGS [64] and ours [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of generated results. Given a training view, we use image-to-video diffusion models to synthesize novel views. The [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

read the original abstract

Existing 3DGS methods effectively render high-quality novel views in clear-day scenes. However, they struggle with night scenes, particularly in glow regions, due to the lack of structural features such as textures and edges, which are key cues for splatting-based reconstruction. To address this problem, we leverage a diffusion model and a Vision Foundation Model (VFM) to compensate for missing structural cues. Our method consists of two key novel ideas: semantic feature generation and novel-view semantic learning. First, semantic feature generation produces high-quality semantic features as implicit structural cues for novel views. Specifically, a diffusion model synthesizes novel views with unknown camera poses from training views, while a VFM evaluates their quality. Once high-quality novel views are identified, the VFM extracts robust features to construct the semantic feature bank. Second, novel-view semantic learning enables 3DGS to optimize rendered novel views without requiring ground truth. It achieves this by extracting semantic features from a rendered novel view, searching the feature bank for the most similar features, and minimizing their distance. This process enforces implicit structural constraints, ensuring semantically coherent, artifact-free rendered views. Extensive experiments demonstrate the effectiveness of our GlowGS in generating semantically accurate 3D views, showing significant improvements over existing methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GlowGS adds diffusion synthesis plus VFM features to 3DGS for glow scenes, but the synthesis step in textureless regions looks like the load-bearing risk.

read the letter

The main takeaway is that this paper tries to fix 3D Gaussian Splatting for nighttime glow by generating a semantic feature bank from diffusion-synthesized novel views and then pulling rendered outputs toward the closest bank entry via distance minimization. The two claimed novelties are semantic feature generation and novel-view semantic learning without ground truth. The motivation is straightforward: glow regions lack the edges and textures that 3DGS normally relies on, so the method injects implicit semantic constraints instead. That direction makes sense for practical low-light reconstruction. The paper states the problem clearly and outlines a concrete pipeline that combines existing diffusion and VFM tools inside the 3DGS loop. If the experiments hold up, the idea could be useful to people extending 3DGS beyond daytime scenes. The soft spot is exactly the one flagged in the stress test. Glow areas are defined by missing structural cues, so a diffusion model asked to synthesize views for unknown poses has almost nothing reliable to condition on. Any geometry or lighting artifacts it produces would populate the feature bank, and the minimization objective would then enforce consistency with those artifacts. The VFM quality filter is unlikely to catch this if the errors are semantically plausible. The abstract gives no detail on how unknown poses are estimated or validated, and no ground-truth novel views exist to measure the mismatch. This assumption sits at the center of both novelties. The work is aimed at researchers doing 3D reconstruction or novel-view synthesis under adverse lighting. A reader already working on generative priors for explicit 3D models might pick up the feature-bank idea even if the current version needs fixes. It deserves peer review because the problem is real and the method is specific enough to evaluate with targeted experiments, though the authors will have to demonstrate that the synthesis step improves rather than degrades the output.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GlowGS to extend 3D Gaussian Splatting to nighttime glow scenes, where standard methods fail due to missing textures and edges. It introduces two components: (1) semantic feature generation, in which a diffusion model synthesizes novel views for arbitrary camera poses from training views, a vision foundation model (VFM) filters for quality, and the VFM then extracts features into a semantic bank; (2) novel-view semantic learning, which renders a novel view, retrieves the closest bank entry, and minimizes feature distance to enforce implicit structural constraints without ground-truth images. Experiments are reported to show significant gains over prior 3DGS methods in semantic accuracy.

Significance. If the synthesis and filtering steps prove reliable, the approach offers a concrete mechanism for injecting semantic priors into 3DGS optimization, addressing a recognized failure mode in low-texture nighttime reconstruction. The two-stage pipeline (generative bank construction followed by distance-based regularization) is a plausible way to obtain supervision where photometric losses are uninformative.

major comments (2)

[Semantic feature generation] Semantic feature generation (abstract and method description): the claim that a diffusion model can produce reliable novel-view geometry and semantics for unknown poses directly from training views is load-bearing for both novelties. Glow regions are defined precisely by the absence of edges and textures—the same cues diffusion models typically rely on—yet no quantitative validation (e.g., pose-consistency metrics, artifact rates on held-out glow patches, or comparison against ground-truth novel views) is supplied to show that the generated bank entries are structurally faithful rather than hallucinated. Without such evidence the subsequent distance-minimization objective risks enforcing consistency with synthesis artifacts.
[Novel-view semantic learning] Novel-view semantic learning (abstract and method description): the optimization objective minimizes distance to the nearest bank feature without any mechanism to detect or reject inconsistent bank entries. If the diffusion step populates the bank with pose-inconsistent or glow-specific artifacts, the rendered outputs will be driven toward those artifacts; the manuscript provides no ablation that isolates the effect of bank quality on final rendering metrics.

minor comments (1)

[Abstract] The abstract states “significant improvements over existing methods” but does not name the baselines or report numerical deltas; these details should appear in the experiments section with standard error bars.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and recommendation for major revision. The two major comments identify important gaps in validation that we will address directly.

read point-by-point responses

Referee: [Semantic feature generation] Semantic feature generation (abstract and method description): the claim that a diffusion model can produce reliable novel-view geometry and semantics for unknown poses directly from training views is load-bearing for both novelties. Glow regions are defined precisely by the absence of edges and textures—the same cues diffusion models typically rely on—yet no quantitative validation (e.g., pose-consistency metrics, artifact rates on held-out glow patches, or comparison against ground-truth novel views) is supplied to show that the generated bank entries are structurally faithful rather than hallucinated. Without such evidence the subsequent distance-minimization objective risks enforcing consistency with synthesis artifacts.

Authors: We agree that the reliability of diffusion-generated views is central to the method and that the current description relies on VFM-based filtering without supplying dedicated quantitative checks such as pose-consistency metrics or artifact rates. In the revised manuscript we will add these evaluations on held-out glow patches to demonstrate structural faithfulness of the bank entries. revision: yes
Referee: [Novel-view semantic learning] Novel-view semantic learning (abstract and method description): the optimization objective minimizes distance to the nearest bank feature without any mechanism to detect or reject inconsistent bank entries. If the diffusion step populates the bank with pose-inconsistent or glow-specific artifacts, the rendered outputs will be driven toward those artifacts; the manuscript provides no ablation that isolates the effect of bank quality on final rendering metrics.

Authors: The design currently depends on VFM quality filtering to ensure bank reliability, yet we acknowledge the absence of both an explicit rejection mechanism and an ablation isolating bank quality. We will add such an ablation study in the revision to quantify the impact of bank quality on final rendering metrics. revision: yes

Circularity Check

0 steps flagged

No circularity: method relies on external diffusion/VFM models without self-referential derivations

full rationale

The paper's two claimed novelties (semantic feature generation via diffusion synthesis + VFM bank construction, followed by distance minimization to the bank) are presented as procedural steps that invoke pre-trained external models. No equations, parameter-fitting steps, or self-citations appear in the abstract or described chain that would reduce a claimed prediction or result to the input data by construction. The central claims concern empirical effectiveness on nighttime scenes rather than a closed mathematical derivation; therefore the derivation chain is self-contained against external benchmarks and receives score 0.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review is abstract-only, so the ledger records only the explicit domain assumptions stated in the abstract.

axioms (2)

domain assumption Diffusion models can synthesize high-quality novel views with unknown camera poses from training views.
Invoked in the semantic feature generation stage.
domain assumption Vision foundation models can reliably evaluate synthesized view quality and extract robust semantic features.
Used to build the semantic feature bank and to drive the novel-view semantic learning loss.

pith-pipeline@v0.9.0 · 5768 in / 1272 out tokens · 25388 ms · 2026-05-25T04:58:14.774950+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 2 internal anchors

[1]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 13

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

3dot: Texture transfer for 3dgs objects from a single reference image

Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, and Robby T Tan. 3dot: Texture transfer for 3dgs objects from a single reference image. InThe Thirty-ninth Annual Confer- ence on Neural Information Processing Systems. 2

work page
[3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 1, 2, 3, 4, 5, 8, 12

work page 2021
[4]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean Conference on Computer Vision, pages 333–350. Springer,

work page
[5]

Dual-rain: Video rain removal using assertive and gentle teachers

Tingting Chen, Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, and Robby T Tan. Dual-rain: Video rain removal using assertive and gentle teachers. InEuropean Conference on Computer Vision, pages 127–143. Springer,

work page
[6]

Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025

Tingting Chen, Srinivas Anumasa, Beibei Lin, Vedant Shah, Anirudh Goyal, and Dianbo Liu. Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025. 13

work page arXiv 2025
[7]

Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025

Tingting Chen, Beibei Lin, Zifeng Yuan, Qiran Zou, Hongyu He, Anirudh Goyal, Yew-Soon Ong, and Dianbo Liu. Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025. 13

work page arXiv 2025
[8]

Control-a-video: Controllable text-to-video generation with diffusion models

Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv preprint arXiv:2305.13840, 2023. 3

work page arXiv 2023
[9]

Learning implicit fields for generative shape modeling

Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5939–5948, 2019. 2

work page 2019
[10]

Neurbf: A neural fields repre- sentation with adaptive radial basis functions

Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, and Yi Xu. Neurbf: A neural fields repre- sentation with adaptive radial basis functions. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 4182–4194, 2023. 2

work page 2023
[11]

Aleth-nerf: Illumination adaptive nerf with concealing field assumption

Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, and Tatsuya Harada. Aleth-nerf: Illumination adaptive nerf with concealing field assumption. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1435– 1444, 2024. 5, 6

work page 2024
[12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 5, 8

work page internal anchor Pith review Pith/arXiv arXiv 2010
[13]

Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026

Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, and Shunli Zhang. Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026. 13

work page arXiv 2026
[14]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2

work page 2022
[15]

Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024

Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, and William T Freeman. Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024. 5, 8

work page arXiv 2024
[16]

Yuan Gao, Guanyu Chen, Luo Qi, Wujie Fu, Zifeng Yuan, and Aaron J. Danner. Photonic ising machines for combina- torial optimization problems.Applied Physics Reviews, 11 (4):041307, 2024. 13

work page 2024
[17]

The lumigraph

Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. The lumigraph. InSeminal Graphics Pa- pers: Pushing the Boundaries, Volume 2, pages 453–464

work page
[18]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 3

work page 2022
[19]

Baking neural ra- diance fields for real-time view synthesis

Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 5875–5884, 2021. 2

work page 2021
[20]

Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022. 3

work page 2022
[21]

Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution

Yeying Jin, Beibei Lin, Wending Yan, Yuan Yuan, Wei Ye, and Robby T Tan. Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution. InProceedings of the 31st ACM international conference on multimedia, pages 2446–2457, 2023. 1, 12

work page 2023
[22]

Hydra: A hy- per agent for dynamic compositional visual reasoning

Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, and Hamid Rezatofighi. Hydra: A hy- per agent for dynamic compositional visual reasoning. In European Conference on Computer Vision, pages 132–149. Springer, 2024. 13

work page 2024
[23]

Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning

Fucai Ke, Vijay Kumar B G, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, and Manmohan Chandraker. Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3378–3389, 2025

work page 2025
[24]

Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

Fucai Ke, Joy Hsu, Zhixi Cai, Zixian Ma, Xin Zheng, Xindi Wu, Sukai Huang, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, et al. Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

work page arXiv 2025
[25]

View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026

Fucai Ke, Zhixi Cai, Boying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, and Hamid Rezatofighi. View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026. 13

work page arXiv 2026
[26]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023. 1, 2, 4, 5, 6, 7

work page 2023
[27]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023
[28]

Tetra-nerf: Represent- ing neural radiance fields using tetrahedra

Jonas Kulhanek and Torsten Sattler. Tetra-nerf: Represent- ing neural radiance fields using tetrahedra. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 18458–18469, 2023. 2

work page 2023
[29]

Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 5, 6, 7

work page 2024
[30]

Light field rendering

Marc Levoy and Pat Hanrahan. Light field rendering. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 441–452. 2023. 2

work page 2023
[31]

Nightcc: nighttime color con- stancy via adaptive channel masking

Shuwei Li and Robby T Tan. Nightcc: nighttime color con- stancy via adaptive channel masking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25522–25531, 2024. 13

work page 2024
[32]

Bridging day and night: Target-class hallucination suppression in unpaired im- age translation

Shuwei Li, Lei Tan, and Robby T Tan. Bridging day and night: Target-class hallucination suppression in unpaired im- age translation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6424–6432, 2026. 13

work page 2026
[33]

Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion

Beibei Lin, Tingting Chen, and Robby T Tan. Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, . 3

work page
[34]

Rgb-to- polarization estimation: A new task and benchmark study

Beibei Lin, Zifeng Yuan, and Tingting Chen. Rgb-to- polarization estimation: A new task and benchmark study. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems Datasets and Benchmarks Track,

work page
[35]

Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction

Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, and Robby T Tan. Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 3378–3385, 2024. 3

work page 2024
[36]

Nighthaze: Nighttime image dehazing via self-prior learning

Beibei Lin, Yeying Jin, Yan Wending, Wei Ye, Yuan Yuan, and Robby T Tan. Nighthaze: Nighttime image dehazing via self-prior learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5209–5217, 2025. 1

work page 2025
[37]

Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025

Beibei Lin, Stephen Lin, and Robby Tan. Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025. 3

work page arXiv 2025
[38]

Optical models for direct volume rendering

Nelson Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. 2

work page 1995
[39]

Local and global illumination in the volume rendering integral

Nelson Max and Min Chen. Local and global illumination in the volume rendering integral. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2005. 2

work page 2005
[40]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019. 2

work page 2019
[41]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

work page 2021
[42]

Nerf in the dark: High dynamic range view synthesis from noisy raw images

Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P Srinivasan, and Jonathan T Barron. Nerf in the dark: High dynamic range view synthesis from noisy raw images. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 16190–16199, 2022. 5, 6, 13

work page 2022
[43]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

work page 2022
[44]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 165–174, 2019. 2

work page 2019
[45]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 4195–4205,

work page
[46]

Pika: Ai video creation platform, 2024

Pika. Pika: Ai video creation platform, 2024. 1, 3, 5, 7, 12

work page 2024
[47]

Promeai: Professional ai solutions, 2024

PromeAI. Promeai: Professional ai solutions, 2024. 1, 3, 5, 7, 12

work page 2024
[48]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 2, 3, 4, 5, 8, 12

work page 2021
[49]

Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps

Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. InProceedings of the IEEE/CVF international conference on computer vision, pages 14335– 14345, 2021. 2

work page 2021
[50]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022
[51]

Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025

Zhiqiang Teng, Beibei Lin, Tingting Chen, Zifeng Yuan, Xu- anyi Li, Xuanyu Zhang, and Shunli Zhang. Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025. 2

work page arXiv 2025
[52]

Lighting up nerf via unsupervised decomposition and en- hancement

Haoyuan Wang, Xiaogang Xu, Ke Xu, and Rynson WH Lau. Lighting up nerf via unsupervised decomposition and en- hancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12632–12641, 2023. 5, 6

work page 2023
[53]

Imaginator: Conditional spatio-temporal gan for video generation

Yaohui Wang, Piotr Bilinski, Francois Bremond, and Antitza Dantcheva. Imaginator: Conditional spatio-temporal gan for video generation. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 1160–1169, 2020. 3

work page 2020
[54]

Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024

Yuehao Wang, Chaoyi Wang, Bingchen Gong, and Tianfan Xue. Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024. 5, 6, 13

work page 2024
[55]

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7623–7633, 2023. 3

work page 2023
[56]

Point- nerf: Point-based neural radiance fields

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point- nerf: Point-based neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022. 2

work page 2022
[57]

Tan, Bing Zeng, and Shuaicheng Liu

Weilong Yan, Robby T. Tan, Bing Zeng, and Shuaicheng Liu. Deep homography mixture for single image rolling shutter correction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9868–9877,

work page
[58]

Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, and Robby T. Tan. Synthetic-to-real self-supervised robust depth estimation via learning with motion and structure priors. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 21880–21890, 2025. 13

work page 2025
[59]

Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025

Weilong Yan, Xin Zhang, and Robby T Tan. Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025. 13

work page arXiv 2025
[60]

LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026

Weilong Yan, Haipeng Li, Hao Xu, Nianjin Ye, Yihao Ai, Shuaicheng Liu, and Jingyu Hu. LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026. 13

work page arXiv 2026
[61]

Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions

Xin Yang, Xin Zhang, and Xinchao Wang. Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions. InProceedings of the AAAI Conference on Artificial Intelligence, pages 9301–9309, 2025. 13

work page 2025
[62]

Bakedsdf: Meshing neural sdfs for real- time view synthesis

Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P Srinivasan, Richard Szeliski, Jonathan T Barron, and Ben Mildenhall. Bakedsdf: Meshing neural sdfs for real- time view synthesis. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–9, 2023. 2

work page 2023
[63]

Plenoctrees for real-time rendering of neural radiance fields

Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752– 5761, 2021. 2

work page 2021
[64]

Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 3, 5, 6, 7, 8, 12, 13

work page 2024
[65]

Zifeng Yuan, Tingting Chen, Dewen Zhang, Yuan Gao, Wenkai Shan, Beibei Lin, and Aaron J. Danner. Tailored polarization-switchable VCSEL arrays for photonic ising computing.Applied Physics Letters, 127(22):221102, 2025. 13

work page 2025
[66]

Zifeng Yuan, Wenkai Shan, Tingting Chen, Beibei Lin, and Aaron J. Danner. Mesa orientation engineering for polariza- tion locking in VCSELs. In2025 IEEE Photonics Confer- ence (IPC), pages 1–2, Singapore, Singapore, 2025. 13

work page 2025
[67]

Zifeng Yuan, Dewen Zhang, Yuan Gao, Luo Qi, Wujie Fu, and Aaron J. Danner. Large-scale fabrication and analysis of polarization behavior in VCSELs with tailored apertures. Journal of Lightwave Technology, 43(14):6819–6827, 2025

work page 2025
[68]

Zifeng Yuan, Dewen Zhang, Hong-Lin Lin, and Aaron J. Danner. Engineering polarization switching in VCSELs with custom aperture shapes. InCLEO: Conference on Lasers and Electro-Optics, page JPS200 47. Optica Publishing Group,

work page
[69]

Zifeng Yuan, Dewen Zhang, Lei Shi, Yutong Liu, and Aaron J. Danner. Enhanced polarization locking in VCSELs. Applied Physics Letters, 126(15):151101, 2025. 13

work page 2025
[70]

Dan- ner

Dewen Zhang, Zifeng Yuan, Thanh Xuan Hoang, Wujie Fu, Ching Eng Png, Soon Thor Lim, and Aaron J. Dan- ner. All-optical scalable and programmable VCSEL-based ising annealer with parallel feedback.Optics Express, 33 (11):22119–22131, 2025. 13

work page 2025
[71]

Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation

Xin Zhang and Robby T Tan. Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 14527–14537, 2025. 13

work page 2025
[72]

Slow camera movement, static scene, no new objects

Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, and Robby T Tan. Heap: unsupervised object discovery and localization with contrastive grouping. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7323– 7331, 2024. 13 Table 3. (PSNR/SSIM) vs. Number of Training Views Method2 views 4 views 6 views 8 views 10 views MGS [64]21.0/0.64 25....

work page 2024

[1] [1]

Qwen3-VL Technical Report

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025. 13

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

3dot: Texture transfer for 3dgs objects from a single reference image

Xiao Cao, Beibei Lin, Bo Wang, Zhiyong Huang, and Robby T Tan. 3dot: Texture transfer for 3dgs objects from a single reference image. InThe Thirty-ninth Annual Confer- ence on Neural Information Processing Systems. 2

work page

[3] [3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the IEEE/CVF international conference on com- puter vision, pages 9650–9660, 2021. 1, 2, 3, 4, 5, 8, 12

work page 2021

[4] [4]

Tensorf: Tensorial radiance fields

Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean Conference on Computer Vision, pages 333–350. Springer,

work page

[5] [5]

Dual-rain: Video rain removal using assertive and gentle teachers

Tingting Chen, Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, and Robby T Tan. Dual-rain: Video rain removal using assertive and gentle teachers. InEuropean Conference on Computer Vision, pages 127–143. Springer,

work page

[6] [6]

Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025

Tingting Chen, Srinivas Anumasa, Beibei Lin, Vedant Shah, Anirudh Goyal, and Dianbo Liu. Auto-bench: An automated benchmark for scientific discovery in llms.arXiv preprint arXiv:2502.15224, 2025. 13

work page arXiv 2025

[7] [7]

Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025

Tingting Chen, Beibei Lin, Zifeng Yuan, Qiran Zou, Hongyu He, Anirudh Goyal, Yew-Soon Ong, and Dianbo Liu. Hy- pospace: Evaluating llm creativity as set-valued hypoth- esis generators under underdetermination.arXiv preprint arXiv:2510.15614, 2025. 13

work page arXiv 2025

[8] [8]

Control-a-video: Controllable text-to-video generation with diffusion models

Weifeng Chen, Jie Wu, Pan Xie, Hefeng Wu, Jiashi Li, Xin Xia, Xuefeng Xiao, and Liang Lin. Control-a-video: Controllable text-to-video generation with diffusion models. arXiv preprint arXiv:2305.13840, 2023. 3

work page arXiv 2023

[9] [9]

Learning implicit fields for generative shape modeling

Zhiqin Chen and Hao Zhang. Learning implicit fields for generative shape modeling. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5939–5948, 2019. 2

work page 2019

[10] [10]

Neurbf: A neural fields repre- sentation with adaptive radial basis functions

Zhang Chen, Zhong Li, Liangchen Song, Lele Chen, Jingyi Yu, Junsong Yuan, and Yi Xu. Neurbf: A neural fields repre- sentation with adaptive radial basis functions. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 4182–4194, 2023. 2

work page 2023

[11] [11]

Aleth-nerf: Illumination adaptive nerf with concealing field assumption

Ziteng Cui, Lin Gu, Xiao Sun, Xianzheng Ma, Yu Qiao, and Tatsuya Harada. Aleth-nerf: Illumination adaptive nerf with concealing field assumption. InProceedings of the AAAI Conference on Artificial Intelligence, pages 1435– 1444, 2024. 5, 6

work page 2024

[12] [12]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 5, 8

work page internal anchor Pith review Pith/arXiv arXiv 2010

[13] [13]

Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026

Wanjun Du, Zifeng Yuan, Tingting Chen, Fucai Ke, Beibei Lin, and Shunli Zhang. Weatherreasonseg: A benchmark for weather-aware reasoning segmentation in visual language models.arXiv preprint arXiv:2603.17680, 2026. 13

work page arXiv 2026

[14] [14]

Plenoxels: Radiance fields without neural networks

Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Benjamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5501–5510, 2022. 2

work page 2022

[15] [15]

Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024

Stephanie Fu, Mark Hamilton, Laura Brandt, Axel Feldman, Zhoutong Zhang, and William T Freeman. Featup: A model- agnostic framework for features at any resolution.arXiv preprint arXiv:2403.10516, 2024. 5, 8

work page arXiv 2024

[16] [16]

Yuan Gao, Guanyu Chen, Luo Qi, Wujie Fu, Zifeng Yuan, and Aaron J. Danner. Photonic ising machines for combina- torial optimization problems.Applied Physics Reviews, 11 (4):041307, 2024. 13

work page 2024

[17] [17]

The lumigraph

Steven J Gortler, Radek Grzeszczuk, Richard Szeliski, and Michael F Cohen. The lumigraph. InSeminal Graphics Pa- pers: Pushing the Boundaries, Volume 2, pages 453–464

work page

[18] [18]

Masked autoencoders are scalable vision learners

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000– 16009, 2022. 3

work page 2022

[19] [19]

Baking neural ra- diance fields for real-time view synthesis

Peter Hedman, Pratul P Srinivasan, Ben Mildenhall, Jonathan T Barron, and Paul Debevec. Baking neural ra- diance fields for real-time view synthesis. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 5875–5884, 2021. 2

work page 2021

[20] [20]

Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022

Jonathan Ho, Tim Salimans, Alexey Gritsenko, William Chan, Mohammad Norouzi, and David J Fleet. Video dif- fusion models.Advances in Neural Information Processing Systems, 35:8633–8646, 2022. 3

work page 2022

[21] [21]

Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution

Yeying Jin, Beibei Lin, Wending Yan, Yuan Yuan, Wei Ye, and Robby T Tan. Enhancing visibility in nighttime haze images using guided apsf and gradient adaptive convolution. InProceedings of the 31st ACM international conference on multimedia, pages 2446–2457, 2023. 1, 12

work page 2023

[22] [22]

Hydra: A hy- per agent for dynamic compositional visual reasoning

Fucai Ke, Zhixi Cai, Simindokht Jahangard, Weiqing Wang, Pari Delir Haghighi, and Hamid Rezatofighi. Hydra: A hy- per agent for dynamic compositional visual reasoning. In European Conference on Computer Vision, pages 132–149. Springer, 2024. 13

work page 2024

[23] [23]

Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning

Fucai Ke, Vijay Kumar B G, Xingjian Leng, Zhixi Cai, Zaid Khan, Weiqing Wang, Pari Delir Haghighi, Hamid Rezatofighi, and Manmohan Chandraker. Dwim: Towards tool-aware visual reasoning via discrepancy-aware workflow generation & instruct-masking tuning. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 3378–3389, 2025

work page 2025

[24] [24]

Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

Fucai Ke, Joy Hsu, Zhixi Cai, Zixian Ma, Xin Zheng, Xindi Wu, Sukai Huang, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, et al. Explain before you answer: A survey on compositional visual reasoning.arXiv preprint arXiv:2508.17298, 2025

work page arXiv 2025

[25] [25]

View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026

Fucai Ke, Zhixi Cai, Boying Li, Long Chen, Beibei Lin, Weiqing Wang, Pari Delir Haghighi, Gholamreza Haffari, and Hamid Rezatofighi. View2space: Studying multi-view visual reasoning from sparse observations.arXiv preprint arXiv:2603.16506, 2026. 13

work page arXiv 2026

[26] [26]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4):1–14, 2023. 1, 2, 4, 5, 6, 7

work page 2023

[27] [27]

Segment any- thing

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, et al. Segment any- thing. InProceedings of the IEEE/CVF International Con- ference on Computer Vision, pages 4015–4026, 2023. 3

work page 2023

[28] [28]

Tetra-nerf: Represent- ing neural radiance fields using tetrahedra

Jonas Kulhanek and Torsten Sattler. Tetra-nerf: Represent- ing neural radiance fields using tetrahedra. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 18458–18469, 2023. 2

work page 2023

[29] [29]

Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Joo Chan Lee, Daniel Rho, Xiangyu Sun, Jong Hwan Ko, and Eunbyung Park. Compact 3d gaussian representation for radiance field.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 5, 6, 7

work page 2024

[30] [30]

Light field rendering

Marc Levoy and Pat Hanrahan. Light field rendering. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 441–452. 2023. 2

work page 2023

[31] [31]

Nightcc: nighttime color con- stancy via adaptive channel masking

Shuwei Li and Robby T Tan. Nightcc: nighttime color con- stancy via adaptive channel masking. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 25522–25531, 2024. 13

work page 2024

[32] [32]

Bridging day and night: Target-class hallucination suppression in unpaired im- age translation

Shuwei Li, Lei Tan, and Robby T Tan. Bridging day and night: Target-class hallucination suppression in unpaired im- age translation. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6424–6432, 2026. 13

work page 2026

[33] [33]

Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion

Beibei Lin, Tingting Chen, and Robby T Tan. Geocomplete: Geometry-aware diffusion for reference-driven image com- pletion. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, . 3

work page

[34] [34]

Rgb-to- polarization estimation: A new task and benchmark study

Beibei Lin, Zifeng Yuan, and Tingting Chen. Rgb-to- polarization estimation: A new task and benchmark study. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems Datasets and Benchmarks Track,

work page

[35] [35]

Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction

Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, and Robby T Tan. Nightrain: Nighttime video deraining via adaptive-rain-removal and adaptive-correction. InProceedings of the AAAI Conference on Artificial Intelli- gence, pages 3378–3385, 2024. 3

work page 2024

[36] [36]

Nighthaze: Nighttime image dehazing via self-prior learning

Beibei Lin, Yeying Jin, Yan Wending, Wei Ye, Yuan Yuan, and Robby T Tan. Nighthaze: Nighttime image dehazing via self-prior learning. InProceedings of the AAAI Conference on Artificial Intelligence, pages 5209–5217, 2025. 1

work page 2025

[37] [37]

Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025

Beibei Lin, Stephen Lin, and Robby Tan. Seeing beyond haze: Generative nighttime image dehazing.arXiv preprint arXiv:2503.08073, 2025. 3

work page arXiv 2025

[38] [38]

Optical models for direct volume rendering

Nelson Max. Optical models for direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 1(2):99–108, 1995. 2

work page 1995

[39] [39]

Local and global illumination in the volume rendering integral

Nelson Max and Min Chen. Local and global illumination in the volume rendering integral. Technical report, Lawrence Livermore National Lab.(LLNL), Livermore, CA (United States), 2005. 2

work page 2005

[40] [40]

Occupancy networks: Learning 3d reconstruction in function space

Lars Mescheder, Michael Oechsle, Michael Niemeyer, Se- bastian Nowozin, and Andreas Geiger. Occupancy networks: Learning 3d reconstruction in function space. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019. 2

work page 2019

[41] [41]

Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021

Ben Mildenhall, Pratul P Srinivasan, Matthew Tancik, Jonathan T Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view syn- thesis.Communications of the ACM, 65(1):99–106, 2021. 2

work page 2021

[42] [42]

Nerf in the dark: High dynamic range view synthesis from noisy raw images

Ben Mildenhall, Peter Hedman, Ricardo Martin-Brualla, Pratul P Srinivasan, and Jonathan T Barron. Nerf in the dark: High dynamic range view synthesis from noisy raw images. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 16190–16199, 2022. 5, 6, 13

work page 2022

[43] [43]

Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022

Thomas M ¨uller, Alex Evans, Christoph Schied, and Alexan- der Keller. Instant neural graphics primitives with a mul- tiresolution hash encoding.ACM transactions on graphics (TOG), 41(4):1–15, 2022. 2

work page 2022

[44] [44]

Deepsdf: Learning con- tinuous signed distance functions for shape representation

Jeong Joon Park, Peter Florence, Julian Straub, Richard Newcombe, and Steven Lovegrove. Deepsdf: Learning con- tinuous signed distance functions for shape representation. InProceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 165–174, 2019. 2

work page 2019

[45] [45]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pages 4195–4205,

work page

[46] [46]

Pika: Ai video creation platform, 2024

Pika. Pika: Ai video creation platform, 2024. 1, 3, 5, 7, 12

work page 2024

[47] [47]

Promeai: Professional ai solutions, 2024

PromeAI. Promeai: Professional ai solutions, 2024. 1, 3, 5, 7, 12

work page 2024

[48] [48]

Learning transferable visual models from natural language supervi- sion

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 1, 2, 3, 4, 5, 8, 12

work page 2021

[49] [49]

Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps

Christian Reiser, Songyou Peng, Yiyi Liao, and Andreas Geiger. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. InProceedings of the IEEE/CVF international conference on computer vision, pages 14335– 14345, 2021. 2

work page 2021

[50] [50]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 3

work page 2022

[51] [51]

Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025

Zhiqiang Teng, Beibei Lin, Tingting Chen, Zifeng Yuan, Xu- anyi Li, Xuanyu Zhang, and Shunli Zhang. Raindropgs: A benchmark for 3d gaussian splatting under raindrop condi- tions.arXiv preprint arXiv:2510.17719, 2025. 2

work page arXiv 2025

[52] [52]

Lighting up nerf via unsupervised decomposition and en- hancement

Haoyuan Wang, Xiaogang Xu, Ke Xu, and Rynson WH Lau. Lighting up nerf via unsupervised decomposition and en- hancement. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 12632–12641, 2023. 5, 6

work page 2023

[53] [53]

Imaginator: Conditional spatio-temporal gan for video generation

Yaohui Wang, Piotr Bilinski, Francois Bremond, and Antitza Dantcheva. Imaginator: Conditional spatio-temporal gan for video generation. InProceedings of the IEEE/CVF Win- ter Conference on Applications of Computer Vision, pages 1160–1169, 2020. 3

work page 2020

[54] [54]

Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024

Yuehao Wang, Chaoyi Wang, Bingchen Gong, and Tianfan Xue. Bilateral guided radiance field processing.ACM Trans- actions on Graphics (TOG), 43(4):1–13, 2024. 5, 6, 13

work page 2024

[55] [55]

Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation

Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Stan Weixian Lei, Yuchao Gu, Yufei Shi, Wynne Hsu, Ying Shan, Xiaohu Qie, and Mike Zheng Shou. Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7623–7633, 2023. 3

work page 2023

[56] [56]

Point- nerf: Point-based neural radiance fields

Qiangeng Xu, Zexiang Xu, Julien Philip, Sai Bi, Zhixin Shu, Kalyan Sunkavalli, and Ulrich Neumann. Point- nerf: Point-based neural radiance fields. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5438–5448, 2022. 2

work page 2022

[57] [57]

Tan, Bing Zeng, and Shuaicheng Liu

Weilong Yan, Robby T. Tan, Bing Zeng, and Shuaicheng Liu. Deep homography mixture for single image rolling shutter correction. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9868–9877,

work page

[58] [58]

Weilong Yan, Ming Li, Haipeng Li, Shuwei Shao, and Robby T. Tan. Synthetic-to-real self-supervised robust depth estimation via learning with motion and structure priors. In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), pages 21880–21890, 2025. 13

work page 2025

[59] [59]

Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025

Weilong Yan, Xin Zhang, and Robby T Tan. Er-lora: Effective-rank guided adaptation for weather-generalized depth estimation.arXiv preprint arXiv:2509.00665, 2025. 13

work page arXiv 2025

[60] [60]

LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026

Weilong Yan, Haipeng Li, Hao Xu, Nianjin Ye, Yihao Ai, Shuaicheng Liu, and Jingyu Hu. LaS-Comp: Zero-shot 3D Completion with Latent-Spatial Consistency.arXiv preprint arXiv:2602.18735, 2026. 13

work page arXiv 2026

[61] [61]

Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions

Xin Yang, Xin Zhang, and Xinchao Wang. Erf: A benchmark dataset for robust semantic segmentation under extreme rain- fall conditions. InProceedings of the AAAI Conference on Artificial Intelligence, pages 9301–9309, 2025. 13

work page 2025

[62] [62]

Bakedsdf: Meshing neural sdfs for real- time view synthesis

Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P Srinivasan, Richard Szeliski, Jonathan T Barron, and Ben Mildenhall. Bakedsdf: Meshing neural sdfs for real- time view synthesis. InACM SIGGRAPH 2023 Conference Proceedings, pages 1–9, 2023. 2

work page 2023

[63] [63]

Plenoctrees for real-time rendering of neural radiance fields

Alex Yu, Ruilong Li, Matthew Tancik, Hao Li, Ren Ng, and Angjoo Kanazawa. Plenoctrees for real-time rendering of neural radiance fields. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752– 5761, 2021. 2

work page 2021

[64] [64]

Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Zehao Yu, Anpei Chen, Binbin Huang, Torsten Sattler, and Andreas Geiger. Mip-splatting: Alias-free 3d gaussian splat- ting.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024. 1, 2, 3, 5, 6, 7, 8, 12, 13

work page 2024

[65] [65]

Zifeng Yuan, Tingting Chen, Dewen Zhang, Yuan Gao, Wenkai Shan, Beibei Lin, and Aaron J. Danner. Tailored polarization-switchable VCSEL arrays for photonic ising computing.Applied Physics Letters, 127(22):221102, 2025. 13

work page 2025

[66] [66]

Zifeng Yuan, Wenkai Shan, Tingting Chen, Beibei Lin, and Aaron J. Danner. Mesa orientation engineering for polariza- tion locking in VCSELs. In2025 IEEE Photonics Confer- ence (IPC), pages 1–2, Singapore, Singapore, 2025. 13

work page 2025

[67] [67]

Zifeng Yuan, Dewen Zhang, Yuan Gao, Luo Qi, Wujie Fu, and Aaron J. Danner. Large-scale fabrication and analysis of polarization behavior in VCSELs with tailored apertures. Journal of Lightwave Technology, 43(14):6819–6827, 2025

work page 2025

[68] [68]

Zifeng Yuan, Dewen Zhang, Hong-Lin Lin, and Aaron J. Danner. Engineering polarization switching in VCSELs with custom aperture shapes. InCLEO: Conference on Lasers and Electro-Optics, page JPS200 47. Optica Publishing Group,

work page

[69] [69]

Zifeng Yuan, Dewen Zhang, Lei Shi, Yutong Liu, and Aaron J. Danner. Enhanced polarization locking in VCSELs. Applied Physics Letters, 126(15):151101, 2025. 13

work page 2025

[70] [70]

Dan- ner

Dewen Zhang, Zifeng Yuan, Thanh Xuan Hoang, Wujie Fu, Ching Eng Png, Soon Thor Lim, and Aaron J. Dan- ner. All-optical scalable and programmable VCSEL-based ising annealer with parallel feedback.Optics Express, 33 (11):22119–22131, 2025. 13

work page 2025

[71] [71]

Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation

Xin Zhang and Robby T Tan. Mamba as a bridge: Where vision foundation models meet vision language models for domain-generalized semantic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 14527–14537, 2025. 13

work page 2025

[72] [72]

Slow camera movement, static scene, no new objects

Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, and Robby T Tan. Heap: unsupervised object discovery and localization with contrastive grouping. InProceedings of the AAAI Conference on Artificial Intelligence, pages 7323– 7331, 2024. 13 Table 3. (PSNR/SSIM) vs. Number of Training Views Method2 views 4 views 6 views 8 views 10 views MGS [64]21.0/0.64 25....

work page 2024