Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis

Federico Lincetto; Gianluca Agresti; Mattia Rossi; Piergiorgio Sartor; Pietro Zanuttigh

arxiv: 2607.02372 · v1 · pith:2AAF7Y3Snew · submitted 2026-07-02 · 💻 cs.CV

Learning Spectral and Polarimetric Clues for One-to-Multimodal Novel View Synthesis

Federico Lincetto , Gianluca Agresti , Mattia Rossi , Piergiorgio Sartor , Pietro Zanuttigh This is my paper

Pith reviewed 2026-07-03 15:18 UTC · model grok-4.3

classification 💻 cs.CV

keywords neural renderingnovel view synthesismultimodalinfraredpolarimetricmultispectralimplicit representation

0 comments

The pith

Pre-training on multimodal scenes enables RGB-only fine-tuning to render infrared, polarimetric, and multispectral views.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to synthesize novel views in non-RGB modalities without needing to capture those modalities for each new scene. It does this by first pre-training a neural renderer on scenes that supply multiple types of images so that the model learns how the modalities relate to one another. Those learned relations then allow the model, when adapted to a fresh scene using only its RGB photographs, to output consistent images in the other modalities as well. A reader would care because this removes the requirement for costly specialized cameras on every target scene.

Core claim

SPoILeR performs a multimodal pre-training phase in which the model learns the mutual correlation between different modalities. This correlation knowledge then supports accurate prediction of unconventional modalities during fine-tuning that is supervised solely by RGB images, yielding multi-view consistent renderings of infrared, polarimetric, and multispectral frames even when no samples from those sensors are available for the scene.

What carries the argument

The Spectral and Polarimetric Implicit Learned Representation (SPoILeR), which encodes learned correlations across imaging modalities to enable transfer from pre-training to RGB-supervised fine-tuning.

If this is right

The approach produces accurate renderings of infrared, polarimetric, and multispectral data without any input samples from those sensors.
Renderings remain multi-view consistent across the new scene.
Fine-tuning requires supervision from RGB frames only.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If modality correlations prove stable across scenes, the method could be applied to additional imaging types such as thermal or depth data.
Applications in fields that use multiple sensors might reduce their dependence on full multimodal capture setups.

Load-bearing premise

Correlations between modalities discovered during pre-training on some scenes will transfer reliably to new scenes that provide only RGB supervision.

What would settle it

Acquire real infrared or polarimetric images of a previously unseen scene and measure whether the model's RGB-only renderings match those ground-truth captures within expected error bounds; systematic mismatch would falsify the transfer claim.

Figures

Figures reproduced from arXiv: 2607.02372 by Federico Lincetto, Gianluca Agresti, Mattia Rossi, Piergiorgio Sartor, Pietro Zanuttigh.

**Figure 1.** Figure 1: The proposed approach firstly learns the correlation across multiple modalities on a collection of scenes. Then, a single-scene fine-tuning on RGB data alone produces a model able to render views of arbitrary modalities for the considered scene. tasks or enhance result quality. For example, there exist Multispectral (MS) sensors that are sensitive to different bands of visible light, Near-Infrared (NIR) se… view at source ↗

**Figure 2.** Figure 2: Radiance module architecture scheme: features from basis and coefficients are combined and decoded into different radiance modalities [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 5.** Figure 5: Scheme of the inverse function module I. It predicts the estimated latent vector zˆe from multimodal radiance values {mˆ 1, ..., mˆ k}. Latent Space Geometry Loss This regularization loss aims to promote the latent space explainability. Considering that the multimodal supervision is strongly limited in the FT, it is reasonable to observe the model estimating suboptimal latents. We observed that the dec… view at source ↗

**Figure 6.** Figure 6: Scheme of the modality-to-luma module M. It encourages consistency between the reference RGB luminance g and the estimated luminance ge predicted from one random modality. As shown in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative samples from the “Fruits” scene FT performed with only RGB data MultimodalStudio [42], we follow the same strategy of training our model with MMS-DATA raw mosaicked frames and computing the metrics only on masked foreground regions. To evaluate the quality of all rendering results, we use Peak Signal-to-Noise Ratio (PSNR) as the main metric. In addition, we also show Structural Similarity Index… view at source ↗

**Figure 8.** Figure 8: Tests with an unbalanced combination of modalities. X axis refers to the number of additional modality (MS, Pol, or NIR) frames. Comparison between SPoILeR (Ours) and MMS-FW. 4.4 Unbalanced Combination of Modalities In this section, we investigate the results achieved by MMS-FW when trained with an unbalanced combination of modalities. This test is introduced by [42] and involves scenarios where many fram… view at source ↗

**Figure 9.** Figure 9: MS, Pol, and NIR renderings with different loss ablations DoP in terms of mean angle error (MAngE) and mean absolute error (MAbsE). In Tab. 3 we compare the results achieved by SPoILeR fine-tuning with those of MMS-FW using PolarAnything data. Our model outperforms the competitor in terms of both MAngE and MAbsE, by 14.08° and 0.044, respectively. In this case, PolarAnything cannot generate multi-view con… view at source ↗

**Figure 10.** Figure 10: Tests with an unbalanced combination of modalities. The X axis corresponds to the number of additional modality (MS or NIR) frames. Comparison between SPoILeR (Ours) and MMS-FW. Results averaged on all 16 X-NeRF scenes. procedure cannot benefit from the multi-view consistency enforced by NeRF-like models during training, as the modality conversion is performed as a final step on each rendering independent… view at source ↗

**Figure 11.** Figure 11: Qualitative renderings of the “Teddybear” scene from the FT step supervised with only RGB data. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗

**Figure 12.** Figure 12: Qualitative renderings of the “Toys” scene from the FT step supervised with only RGB data. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p027_12.png] view at source ↗

**Figure 13.** Figure 13: Qualitative renderings of the “Birdhouse” scene from the FT step supervised with only RGB data. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p028_13.png] view at source ↗

**Figure 14.** Figure 14: Qualitative renderings of the “Bouquet” scene from the FT step supervised with only RGB data. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p029_14.png] view at source ↗

**Figure 15.** Figure 15: Qualitative comparison between SPoILeR and MMS-FW + MST++ in terms of recovered multispectral radiance on the “Bouquet” scene. All frames are mosaicked. “before” and “after” refer to whether the RGB-to-MS conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p030_15.png] view at source ↗

**Figure 16.** Figure 16: Qualitative comparison between SPoILeR and MMS-FW + MST++ in terms of recovered multispectral radiance on the “Teddybear” scene. All frames are mosaicked. “before” and “after” refer to whether the RGB-to-MS conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p031_16.png] view at source ↗

**Figure 17.** Figure 17: Qualitative comparison between SPoILeR and MMS-FW + MST++ in terms of recovered multispectral radiance on the “Toys” scene. All frames are mosaicked. “before” and “after” refer to whether the RGB-to-MS conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p032_17.png] view at source ↗

**Figure 18.** Figure 18: Qualitative comparison between SPoILeR and MMS-FW + MST++ in terms of recovered multispectral radiance on the “Birdhouse” scene. All frames are mosaicked. “before” and “after” refer to whether the RGB-to-MS conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p033_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative comparison between SPoILeR and MMS-FW + MST++ in terms of recovered multispectral radiance on the “Bouquet” scene. All frames are mosaicked. “before” and “after” refer to whether the RGB-to-MS conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p034_19.png] view at source ↗

**Figure 20.** Figure 20: Qualitative comparison between SPoILeR and MMS-FW + PolarAnything (PA) in terms of recovered polarization on the “Fruits” scene. “before” and “after” refer to whether the RGB-to-Pol conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p035_20.png] view at source ↗

**Figure 21.** Figure 21: Qualitative comparison between SPoILeR and MMS-FW + PolarAnything (PA) in terms of recovered polarization on the “Teddybear” scene. “before” and “after” refer to whether the RGB-to-Pol conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p036_21.png] view at source ↗

**Figure 22.** Figure 22: Qualitative comparison between SPoILeR and MMS-FW + PolarAnything (PA) in terms of recovered polarization on the “Toys” scene. “before” and “after” refer to whether the RGB-to-Pol conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p037_22.png] view at source ↗

**Figure 23.** Figure 23: Qualitative comparison between SPoILeR and MMS-FW + PolarAnything (PA) in terms of recovered polarization on the “Birdhouse” scene. “before” and “after” refer to whether the RGB-to-Pol conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p038_23.png] view at source ↗

**Figure 24.** Figure 24: Qualitative comparison between SPoILeR and MMS-FW + PolarAnything (PA) in terms of recovered polarization on the “Bouquet” scene. “before” and “after” refer to whether the RGB-to-Pol conversion in performed before or after the training of MMS-FW [PITH_FULL_IMAGE:figures/full_fig_p039_24.png] view at source ↗

**Figure 25.** Figure 25: Qualitative renderings of the “Chess” scene from the pre-training step with all-modality supervision. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p040_25.png] view at source ↗

**Figure 26.** Figure 26: Qualitative renderings of the “Forestgang 1” scene from the pre-training step with all-modality supervision. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p041_26.png] view at source ↗

**Figure 27.** Figure 27: Qualitative renderings of the “Laurelwreath” scene from the pre-training step with all-modality supervision. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p042_27.png] view at source ↗

**Figure 28.** Figure 28: Qualitative renderings of the “Truck” scene from the pre-training step with all-modality supervision. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p043_28.png] view at source ↗

**Figure 29.** Figure 29: Qualitative renderings of the “Aloe” scene from the pre-training step with all-modality supervision. All frames are mosaicked, except RGB frames. RGB is demosaicked only for visualization purposes [PITH_FULL_IMAGE:figures/full_fig_p044_29.png] view at source ↗

read the original abstract

Neural rendering techniques allow for accurate reconstruction of the geometry and color appearance of 3D scenes. Some methods have extended their use to additional imaging modalities, such as multispectral, infrared, or polarimetric data. However, all of these approaches require expensive sensors and calibrated setups to capture new multimodal frames for each new scene. We propose Spectral and Polarimetric Implicit Learned Representation (SPoILeR), a novel method to obtain multi-view consistent renderings of unconventional modalities for scenes where either only RGB frames or very few of the additional modalities are available. Thanks to a multimodal pre-training phase, the model learns the mutual correlation between different modalities. This step allows predicting accurate renderings of unconventional modalities during a fine-tuning phase supervised only by RGB images. Experimental results show that the approach can accurately render infrared, polarimetric, and multispectral frames for scenes where no input sample captured by these types of sensors is provided.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The pre-training plus RGB-only fine-tuning idea targets a real hardware cost problem, but the transfer of cross-modal correlations to new scenes without any target data remains the unproven step.

read the letter

The paper's main move is to pre-train a neural renderer on scenes that have RGB plus IR, polarimetric, and multispectral captures so the model picks up statistical links between modalities, then fine-tune the same model on a new scene using only RGB supervision to render the other modalities. That pipeline is presented as new relative to existing neural rendering work that usually needs full multimodal captures per scene.

It does address a practical bottleneck: collecting calibrated multispectral or polarimetric data for every new environment is expensive, so anything that reduces that requirement has clear use in robotics and vision applications.

The weakest part is the generalization claim. The abstract and stress-test note both rest on the idea that correlations learned during pre-training will hold for entirely different scenes when no samples from the target modalities are available at all. Scene-specific factors like surface materials, geometry, and illumination drive those relationships, and nothing shown so far demonstrates that the pre-training distribution covers the test cases or that the mapping stays accurate without any anchor data from the new scene. Without numbers, baselines, or ablation on scene mismatch, it is hard to judge whether the results are robust or mostly interpolation within similar environments.

This is the kind of work that would interest people already running neural radiance fields or implicit representations and looking to extend them to non-RGB sensors. It is worth sending to review so the experiments and any quantitative validation can be checked directly; the idea is straightforward enough that a referee can assess the transfer assumption quickly once the full results are on the table.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces SPoILeR, a neural implicit representation method for one-to-multimodal novel view synthesis. It performs multimodal pre-training across RGB, infrared, polarimetric, and multispectral data to learn cross-modal correlations, followed by fine-tuning on RGB-only supervision for new scenes. This enables rendering of the unconventional modalities without any target-modality input samples for those scenes. The central claim is that the pre-trained correlations transfer sufficiently to produce accurate multi-view consistent renderings of IR, polarimetric, and multispectral frames.

Significance. If the transfer of pre-trained modality correlations holds for novel scenes, the approach would meaningfully lower the barrier to multimodal 3D reconstruction by eliminating the need for expensive calibrated sensors on every target scene. The paper reports experimental results demonstrating accurate renderings under RGB-only fine-tuning, which would be a practical advance if the generalization is robust.

major comments (1)

[Abstract and Experimental Results section] The load-bearing assumption that cross-modal correlations learned during pre-training are sufficiently scene-independent to enable accurate rendering of non-RGB modalities with zero multimodal supervision on entirely new scenes is not adequately secured by the reported experiments. The abstract and described pipeline provide no evidence (e.g., cross-dataset testing or ablation on material/illumination variation) that the pre-training distribution covers the statistics of the test scenes.

minor comments (1)

[Abstract] The abstract supplies no quantitative metrics, error analysis, or baseline comparisons, making it difficult to assess the strength of the reported results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and the opportunity to clarify the generalization properties of SPoILeR. Below we address the major comment point by point.

read point-by-point responses

Referee: [Abstract and Experimental Results section] The load-bearing assumption that cross-modal correlations learned during pre-training are sufficiently scene-independent to enable accurate rendering of non-RGB modalities with zero multimodal supervision on entirely new scenes is not adequately secured by the reported experiments. The abstract and described pipeline provide no evidence (e.g., cross-dataset testing or ablation on material/illumination variation) that the pre-training distribution covers the statistics of the test scenes.

Authors: We respectfully disagree that the reported experiments fail to secure the central assumption. The pre-training corpus comprises multiple distinct scenes spanning varied materials, surface properties, and illumination conditions across all four modalities. Fine-tuning and quantitative evaluation are performed exclusively on held-out scenes that were never observed during pre-training; these test scenes were captured under different viewpoints, lighting, and material configurations from the pre-training set. The consistent accuracy of the rendered IR, polarimetric, and multispectral outputs on these unseen scenes constitutes direct empirical evidence that the learned cross-modal correlations transfer beyond the exact training scenes. While we do not conduct an explicit cross-dataset evaluation (owing to the scarcity of publicly available calibrated multimodal 3D datasets), the intra-dataset scene diversity and the zero-shot transfer results already address the core concern of scene independence. We are prepared to expand the experimental section with additional qualitative examples highlighting material and illumination variation if the editor deems it necessary. revision: no

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper presents a standard two-stage neural rendering pipeline: multimodal pre-training to capture cross-modal correlations, followed by RGB-supervised fine-tuning on novel scenes. No equations, parameter-fitting steps, or self-citations are described that would make any claimed prediction equivalent to its inputs by construction. The transfer of learned correlations to unseen scenes is an empirical claim resting on the pre-training distribution, not a definitional or fitted tautology. The method is therefore self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract, the central claim rests on the domain assumption that modality correlations are stable across scenes and can be learned from pre-training data.

axioms (1)

domain assumption Multimodal correlations learned on pre-training scenes generalize to new scenes without any multimodal input
This transfer is required for the fine-tuning phase to produce accurate unconventional modality renderings from RGB alone.

invented entities (1)

SPoILeR model no independent evidence
purpose: Implicit representation that encodes cross-modality correlations for novel view synthesis
New architecture introduced to perform the described pre-training and fine-tuning

pith-pipeline@v0.9.1-grok · 5704 in / 1270 out tokens · 20902 ms · 2026-07-03T15:18:30.231293+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

80 extracted references · 61 canonical work pages

[1]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Arad, B., Timofte, R., Yahel, R., Morag, N., Bernat, A., Cai, Y., Lin, J., Lin, Z., Wang, H., Zhang, Y., et al.: Ntire 2022 spectral recovery challenge and data set. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop. pp. 862–880. IEEE (2022).https://doi.org/10.1109/CVPRW56347.2022.001024

work page doi:10.1109/cvprw56347.2022.001024 2022
[3]

In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Bachmann, R., Kar, O.F., Mizrahi, D., Garjani, A., Gao, M., Griffiths, D., Hu, J., Dehghan, A., Zamir, A.: 4m-21: An any-to-any vision model for tens of tasks and modalities. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Systems. vol. 37, pp. 61872–61911. Curran As...

2024
[4]

In: European Conference on Computer Vision

Bachmann, R., Mizrahi, D., Atanov, A., Zamir, A.: Multimae: Multi-modal multi- task masked autoencoders. In: European Conference on Computer Vision. pp. 348–
[5]

Springer (2022).https://doi.org/10.1007/978-3-031-19836-6_203

work page doi:10.1007/978-3-031-19836-6_203 2022
[6]

Swin transformer: Hierarchical vision transformer using shifted windows,

Boss, M., Braun, R., Jampani, V., Barron, J.T., Liu, C., Lensch, H.: Nerd: Neu- ral reflectance decomposition from image collections. In: International Confer- ence on Computer Vision. pp. 12684–12694 (2021).https://doi.org/10.1109/ ICCV48922.2021.012451

work page arXiv 2021
[7]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Cai,Y.,Lin,J.,Lin,Z.,Wang,H.,Zhang,Y.,Pfister,H.,Timofte,R.,VanGool,L.: Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 745–755 (2022).https://doi.org/10.1109/CVPRW56347.2022.000902, 3, 4, 10, 13, 23

work page doi:10.1109/cvprw56347.2022.000902 2022
[8]

In: European Conference on Computer Vision (2026) 8

Camuffo, E., Barbato, F., Ozay, M., Milani, S., Michieli, U.: Mocha: Multi-modal objects-aware cross-architecture alignment. In: European Conference on Computer Vision (2026) 8

2026
[9]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Charatan,D.,Li,S.L.,Tagliasacchi,A.,Sitzmann,V.:pixelsplat:3dgaussiansplats from image pairs for scalable generalizable 3d reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 19457–19467 (2024).https: //doi.org/10.1109/CVPR52733.2024.018404

work page doi:10.1109/cvpr52733.2024.018404 2024
[10]

ACM Transactions on Graphics (2023),https:// doi.org/10.1145/35921354, 5

Chen, A., Xu, Z., Wei, X., Tang, S., Su, H., Geiger, A.: Dictionary fields: Learning a neural basis decomposition. ACM Transactions on Graphics (2023),https:// doi.org/10.1145/35921354, 5

work page doi:10.1145/35921354 2023
[11]

In: European Conference on Computer Vision

Chen, Q., Shu, S., Bai, X.: Thermal3d-gs: Physics-induced 3d gaussians for thermal infrared novel-view synthesis. In: European Conference on Computer Vision. pp. 253–269. Springer (2024).https://doi.org/10.1007/978-3-031-73383-3_152

work page doi:10.1007/978-3-031-73383-3_152 2024
[13]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chou, Z.T., Huang, S.Y., Liu, I., Wang, Y.C.F., et al.: Gsnerf: Generalizable se- mantic neural radiance fields with enhanced 3d scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 20806–20815 (2024). https://doi.org/10.1109/CVPR52733.2024.019661

work page doi:10.1109/cvpr52733.2024.019661 2024
[15]

In: Color Imaging Conference

Darling, B.A., Ferwerda, J.A., Berns, R.S., Chen, T.: Real-time multispectral ren- dering with complex illumination. In: Color Imaging Conference. vol. 19, pp. 345–
[16]

1145/3721250.37430352

Society of Imaging Science and Technology (2011).https://doi.org/10. 1145/3721250.37430352

work page arXiv 2011
[17]

In: European Conference on Computer Vision

Dave, A., Zhao, Y., Veeraraghavan, A.: Pandora: Polarization-aided neural decom- position of radiance. In: European Conference on Computer Vision. pp. 538–556. Springer (2022).https://doi.org/10.1007/978-3-031-20071-7_322

work page doi:10.1007/978-3-031-20071-7_322 2022
[18]

In: International Con- ference on Learning Representations (2021) 4

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Con- ference on Learning Representations (2021) 4

2021
[19]

In: IEEE Symposium on Volume Visualization and Graphics

Gibson, S.F.: Using distance maps for accurate surface representation in sampled volumes. In: IEEE Symposium on Volume Visualization and Graphics. pp. 23–30 (1998) 6

1998
[20]

In: International Conference on Learning Representa- tions (2026),https://openreview.net/forum?id=BR2ItBcqOo3

Griffiths, R., Dansereau, D.G.: RoRE: Rotary ray embedding for generalised multi- modal scene understanding. In: International Conference on Learning Representa- tions (2026),https://openreview.net/forum?id=BR2ItBcqOo3

2026
[21]

In: International Conference on Machine Learning (2020),https://dl.acm.org/doi/abs/10.5555/3524938.35252936

Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric reg- ularization for learning shapes. In: International Conference on Machine Learning (2020),https://dl.acm.org/doi/abs/10.5555/3524938.35252936

work page doi:10.5555/3524938.35252936 2020
[22]

Scientific Reports12(1), 17288 (2022).https://doi

Großmann, W., Horn, H., Niggemann, O.: Improving remote material classification ability with thermal imagery. Scientific Reports12(1), 17288 (2022).https://doi. org/10.1038/s41598-022-21588-42

work page doi:10.1038/s41598-022-21588-42 2022
[23]

In: AAAI Conference on Artificial Intelligence

Guo, H., Liu, H., Wen, J., Li, J.: Cross-spectral gaussian splatting with spatial occupancy consistency. In: AAAI Conference on Artificial Intelligence. vol. 39, pp. 3229–3237 (2025).https://doi.org/10.1609/aaai.v39i3.323332, 3

work page doi:10.1609/aaai.v39i3.323332 2025
[24]

In: International Conference on Computer Vision

Han, Y., Tie, B., Guo, H., Lyu, Y., Li, S., Shi, B., Jia, Y., Ma, Z.: Polgs: Polari- metric gaussian splatting for fast reflective surface reconstruction. In: International Conference on Computer Vision. pp. 28073–28082 (2025) 2

2025
[25]

Optics Express19(10), 9315–9329 (2011).https://doi.org/10.1364/OE

Hashimoto, N., Murakami, Y., Bautista, P.A., Yamaguchi, M., Obi, T., Ohyama, N., Uto, K., Kosugi, Y.: Multispectral image enhancement for effective visualiza- tion. Optics Express19(10), 9315–9329 (2011).https://doi.org/10.1364/OE. 19.0093152

work page doi:10.1364/oe 2011
[26]

acha.2010.07.001

Hassan, M., Forest, F., Fink, O., Mielle, M.: Thermonerf: A multimodal neural radiance field for joint rgb-thermal novel view synthesis of building facades. Ad- vanced Engineering Informatics65, 103345 (2025).https://doi.org/10.1016/j. aei.2025.1033452

work page doi:10.1016/j 2025
[27]

Computer Graphics Forum42(2023).https://doi.org/10.1111/cgf.149404

He, H., Liang, Y., Xiao, S., Chen, J., Chen, Y.: Cp-nerf: Conditionally parameter- ized neural radiance fields for cross-scene novel view synthesis. Computer Graphics Forum42(2023).https://doi.org/10.1111/cgf.149404

work page doi:10.1111/cgf.149404 2023
[28]

In: ACM International Conference on Computer Graphics and Interactive Techniques

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for geomet- rically accurate radiance fields. In: ACM International Conference on Computer Graphics and Interactive Techniques. pp. 1–11 (2024).https://doi.org/10.1145/ 3641519.36574281

work page arXiv 2024
[29]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Huang, Y.H., He, Y., Yuan, Y.J., Lai, Y.K., Gao, L.: Stylizednerf: Consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 18321–18331 (2022).https: //doi.org/10.1109/CVPR52688.2022.017804 18 F. Lincetto et al

work page doi:10.1109/cvpr52688.2022.017804 2022
[30]

Swin transformer: Hierarchical vision transformer using shifted windows,

Jain, A., Tancik, M., Abbeel, P.: Putting nerf on a diet: Semantically consistent few-shot view synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5885–5894 (2021).https://doi.org/10.1109/ICCV48922.2021. 005834

work page doi:10.1109/iccv48922.2021 2021
[31]

Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P

Jin, H., Liu, I., Xu, P., Zhang, X., Han, S., Bi, S., Zhou, X., Xu, Z., Su, H.: Tensoir: Tensorial inverse rendering. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 165–174 (2023).https://doi.org/10.1109/CVPR52729.2023. 000241

work page doi:10.1109/cvpr52729.2023 2023
[32]

ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/35924331

Kerbl, B., Kopanas, G., Leimkuehler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/35924331

work page doi:10.1145/35924331 2023
[33]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: Language embedded radiance fields. In: International Conference on Computer Vision. pp. 19729–19739 (2023).https://doi.org/10.1109/ICCV51070.2023.018071

work page doi:10.1109/iccv51070.2023.018071 2023
[34]

In: ACM International Conference on Computer Graphics and Interactive Techniques - Asia

Kim, Y., Jin, W., Cho, S., Baek, S.H.: Neural spectro-polarimetric fields. In: ACM International Conference on Computer Graphics and Interactive Techniques - Asia. pp. 1–11 (2023).https://doi.org/10.1145/3610548.36181722, 3

work page doi:10.1145/3610548.36181722 2023
[35]

In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

Lei, C., Huang, X., Zhang, M., Yan, Q., Sun, W., Chen, Q.: Polarized reflection removal with perfect alignment in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 1750–1758 (2020).https://doi.org/10. 1109/CVPR42600.2020.001822

work page arXiv 2020
[36]

In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.12632–12641(2022).https://doi.org/10.1109/CVPR52688

Lei, C., Qi, C., Xie, J., Fan, N., Koltun, V., Chen, Q.: Shape from polarization for complex scenes in the wild. In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.12632–12641(2022).https://doi.org/10.1109/CVPR52688. 2022.012302

work page doi:10.1109/cvpr52688 2022
[37]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, C., Ono, T., Uemori, T., Mihara, H., Gatto, A., Nagahara, H., Moriuchi, Y.: Neisf: Neural incident stokes field for geometry and material estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 21434–21445 (2024). https://doi.org/10.1109/CVPR52733.2024.020252

work page doi:10.1109/cvpr52733.2024.020252 2024
[39]

In: International Conference on Acoustics, Speech, and Signal Process- ing

Li, J., Li, Y., Sun, C., Wang, C., Xiang, J.: Spec-nerf: Multi-spectral neural radi- ance fields. In: International Conference on Acoustics, Speech, and Signal Process- ing. pp. 2485–2489. IEEE (2024).https://doi.org/10.1109/ICASSP48485.2024. 104460152

work page doi:10.1109/icassp48485.2024 2024
[40]

In: AAAI Conference on Artificial In- telligence

Li, R., Liu, J., Liu, G., Zhang, S., Zeng, B., Liu, S.: Spectralnerf: Physically based spectral rendering with neural radiance field. In: AAAI Conference on Artificial In- telligence. vol. 38, pp. 3154–3162 (2024).https://doi.org/10.1609/aaai.v38i4. 280992

work page doi:10.1609/aaai.v38i4 2024
[41]

Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P

Li, Z., Müller, T., Evans, A., Taylor, R.H., Unberath, M., Liu, M.Y., Lin, C.H.: Neuralangelo: High-fidelity neural surface reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (2023).https://doi.org/10.1109/ CVPR52729.2023.008171, 6, 10

work page arXiv 2023
[42]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Liang, Z., Zhang, Q., Feng, Y., Shan, Y., Jia, K.: Gs-ir: 3d gaussian splatting for inverse rendering. In: IEEE Conference on Computer Vision and Pattern Recogni- tion. pp. 21644–21653 (2024).https://doi.org/10.1109/CVPR52733.2024.02045 1 Learning Spect. and Pol. Clues for One-to-Multimodal Novel View Synthesis 19

work page doi:10.1109/cvpr52733.2024.02045 2024
[43]

In: British Machine Vision Conference

Lincetto, F., Agresti, G., Rossi, M., Zanuttigh, P.: Exploiting multiple priors for neural 3d indoor reconstruction. In: British Machine Vision Conference. BMVA (2023) 1

2023
[44]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Lincetto, F., Agresti, G., Rossi, M., Zanuttigh, P.: Multimodalstudio: A heteroge- neous sensor dataset and framework for neural rendering across multiple imaging modalities. In: IEEE Conference on Computer Vision and Pattern Recognition (2025).https://doi.org/10.1109/CVPR52734.2025.010242, 3, 5, 10, 11, 12, 22

work page doi:10.1109/cvpr52734.2025.010242 2025
[45]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Liu, Y., Hu, B., Huang, J., Tai, Y.W., Tang, C.K.: Instance neural radiance field. In: International Conference on Computer Vision. pp. 787–796 (October 2023). https://doi.org/10.1109/ICCV51070.2023.000791

work page doi:10.1109/iccv51070.2023.000791 2023
[46]

In: International Conference on Learning Repre- sentations (2025) 2

Lu, R., Chen, H., Zhu, Z., Qin, Y., Lu, M., Yan, C., et al.: Thermalgaussian: Thermal 3d gaussian splatting. In: International Conference on Learning Repre- sentations (2025) 2

2025
[47]

IEEE Access12, 45331–45341 (2024).https: //doi.org/10.1109/ACCESS.2024.33815312

Ma, R., Ma, T., Guo, D., He, S.: Novel view synthesis and dataset augmentation for hyperspectral data using nerf. IEEE Access12, 45331–45341 (2024).https: //doi.org/10.1109/ACCESS.2024.33815312

work page doi:10.1109/access.2024.33815312 2024
[48]

In: Advances in Neural Informa- tion Processing Systems (2025) 4

Meng, G., Cai, Z., Chen, R., Tu, J., Wang, Y., Huang, Y., Ding, X.: Frn: Fractal- based recursive spectral reconstruction network. In: Advances in Neural Informa- tion Processing Systems (2025) 4

2025
[49]

In: Eu- ropean Conference on Computer Vision (2020),https://doi.org/10.1007/978- 3-030-58452-8_241

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: Eu- ropean Conference on Computer Vision (2020),https://doi.org/10.1007/978- 3-030-58452-8_241

work page doi:10.1007/978- 2020
[50]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Mizrahi,D.,Bachmann,R.,Kar,O.,Yeo,T.,Gao,M.,Dehghan,A.,Zamir,A.:4m: Massively multimodal masked modeling. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 58363–58408. Curran Associates, Inc. (2023),https://dl. acm.org/doi/10.5555/3666122.36686663

work page doi:10.5555/3666122.36686663 2023
[51]

Applied Mechanics Reviews57(3), B15–B15 (2004),https://doi.org/10

Osher, S., Fedkiw, R., Piechor, K.: Level set methods and dynamic implicit sur- faces. Applied Mechanics Reviews57(3), B15–B15 (2004),https://doi.org/10. 1007/b988796

2004
[52]

In: European Confer- ence on Computer Vision

Özer, M., Weiherer, M., Hundhausen, M., Egger, B.: Exploring multi-modal neural scene representations with applications on thermal imaging. In: European Confer- ence on Computer Vision. pp. 82–98. Springer (2024).https://doi.org/10.1007/ 978-3-031-92805-5_62

2024
[53]

In: IEEE Conference on Computer Vision and Pattern Recognition

Perez, F., Rojas, S., Hinojosa, C., Rueda-Chacón, H., Ghanem, B.: Unmix-nerf: Spectral unmixing meets neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 26284–26293 (2025) 2

2025
[54]

In: International Conference on 3D Vision

Poggi, M., Ramirez, P.Z., Tosi, F., Salti, S., Mattoccia, S., Di Stefano, L.: Cross- spectral neural radiance fields. In: International Conference on 3D Vision. pp. 606–616. IEEE (2022).https://doi.org/10.1109/3DV57658.2022.000712, 3, 24

work page doi:10.1109/3dv57658.2022.000712 2022
[55]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 20051–20060 (2024).https://doi.org/10.1109/CVPR52733.2024.018951

work page doi:10.1109/cvpr52733.2024.018951 2024
[56]

In: ACM International Conference on Multimedia

Qu, Y., Dai, S., Li, X., Lin, J., Cao, L., Zhang, S., Ji, R.: Goi: Find 3d gaussians of interest with an optimizable open-vocabulary semantic-space hyperplane. In: ACM International Conference on Multimedia. pp. 5328–5337 (2024).https:// doi.org/10.1145/3664647.36808521

work page doi:10.1145/3664647.36808521 2024
[57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conference on Computer 20 F. Lincetto et al. Vision and Pattern Recognition. pp. 10684–10695 (2022).https://doi.org/10. 1109/CVPR52688.2022.010424

work page arXiv 2022
[58]

In: IEEE Conference on Computer Vision and Pat- tern Recognition

Saponaro, P., Sorensen, S., Kolagunda, A., Kambhamettu, C.: Material classifi- cation with thermal imagery. In: IEEE Conference on Computer Vision and Pat- tern Recognition. pp. 4649–4656 (2015).https://doi.org/10.1109/CVPR.2015. 72990962

work page doi:10.1109/cvpr.2015 2015
[59]

In: IEEE Conference on Computer Vision and Pattern Recognition Workshop

Shi, Z., Chen, C., Xiong, Z., Liu, D., Wu, F.: Hscnn+: Advanced cnn-based hyper- spectral recovery from rgb images. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop. pp. 939–947 (2018).https://doi.org/10.1109/ CVPRW.2018.001394

work page arXiv 2018
[60]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Thirgood, C., Mendez, O., Ling, E., Storey, J., Hadfield, S.: Hypergs: Hyperspec- tral 3d gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5970–5979 (2025).https://doi.org/10.1109/CVPR52734.2025. 005602

work page doi:10.1109/cvpr52734.2025 2025
[61]

International Journal of Remote Sens- ing26(15), 3241–3254 (2005).https://doi.org/10.1080/014311605001276092

Tsagaris,V., Anastassopoulos,V.: Multispectralimage fusionfor improved rgbrep- resentation based on perceptual attributes. International Journal of Remote Sens- ing26(15), 3241–3254 (2005).https://doi.org/10.1080/014311605001276092

work page doi:10.1080/014311605001276092 2005
[62]

Varma, M., Wang, P., Chen, X., Chen, T., Venugopalan, S., Wang, Z.: Is atten- tion all that nerf needs? In: International Conference on Learning Representations (2023) 4

2023
[63]

In: ACM International Conference on Multimedia

Wang, H., Wen, S., Guo, B.: Polarimetric monocular gaussian splatting slam for dense surface reconstruction. In: ACM International Conference on Multimedia. pp. 7519–7528 (2025).https://doi.org/10.1145/3746027.37549252

work page doi:10.1145/3746027.37549252 2025
[64]

Ad- vances in Neural Information Processing Systems (2021),https://dl.acm.org/ doi/10.5555/3540261.35423421

Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Ad- vances in Neural Information Processing Systems (2021),https://dl.acm.org/ doi/10.5555/3540261.35423421

work page doi:10.5555/3540261.35423421 2021
[65]

In: IEEE International Conference on Image Processing

Wang, W., Zhang, J., Shen, C.: Improved human detection and classification in thermal images. In: IEEE International Conference on Image Processing. pp. 2313–
[66]

IEEE (2010).https://doi.org/10.1109/ICIP.2010.56499462

work page doi:10.1109/icip.2010.56499462 2010
[67]

Advances in Neural Information Processing Systems37, 103168–103197 (2024),https://dl.acm.org/ doi/10.5555/3737916.37411941

Wang, Y., Huang, D., Ye, W., Zhang, G., Ouyang, W., He, T.: Neurodin: A two- stage framework for high-fidelity neural surface reconstruction. Advances in Neural Information Processing Systems37, 103168–103197 (2024),https://dl.acm.org/ doi/10.5555/3737916.37411941

work page doi:10.5555/3737916.37411941 2024
[68]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Wang, Y., Han, Q., Habermann, M., Daniilidis, K., Theobalt, C., Liu, L.: Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In: Interna- tional Conference on Computer Vision. pp. 3295–3306 (2023).https://doi.org/ 10.1109/ICCV51070.2023.003051

work page doi:10.1109/iccv51070.2023.003051 2023
[69]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Wang, Y., Fang, L., Zhu, H., Hu, F., Ye, L., Ma, Z.: Golf-nrt: Integrating global context and local geometry for few-shot view synthesis*. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 21349–21359 (2025).https: //doi.org/10.1109/CVPR52734.2025.019894

work page doi:10.1109/cvpr52734.2025.019894 2025
[71]

IEEE Transactions on Neural Networks and Learning Systems 36(7), 12736–12746 (2024).https://doi.org/10.1109/TNNLS.2024.34601904 Learning Spect

Wu, Y., Dian, R., Li, S.: Multistage spatial-spectral fusion network for spectral super-resolution. IEEE Transactions on Neural Networks and Learning Systems 36(7), 12736–12746 (2024).https://doi.org/10.1109/TNNLS.2024.34601904 Learning Spect. and Pol. Clues for One-to-Multimodal Novel View Synthesis 21

work page doi:10.1109/tnnls.2024.34601904 2024
[72]

In: European Conference on Com- puter Vision

Xu, J., Liao, M., Kathirvel, R.P., Patel, V.M.: Leveraging thermal modality to enhance reconstruction in low-light conditions. In: European Conference on Com- puter Vision. pp. 321–339. Springer (2024).https://doi.org/10.1007/978-3- 031-72913-3_182

work page doi:10.1007/978-3- 2024
[73]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Xu, Y., Zoss, G., Chandran, P., Gross, M., Bradley, D., Gotardo, P.: Renerf: Re- lightable neural radiance fields with nearfield lighting. In: International Confer- ence on Computer Vision. pp. 22581–22591 (2023).https://doi.org/10.1109/ ICCV51070.2023.020641

work page arXiv 2023
[74]

In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.10890–10899(2025).https://doi.org/10.1109/CVPR52734

Yao, M., Wang, M., Tam, K.M., Li, L., Xue, T., Gu, J.: Polarfree: Polarization- based reflection-free imaging. In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.10890–10899(2025).https://doi.org/10.1109/CVPR52734. 2025.010172

work page doi:10.1109/cvpr52734 2025
[75]

In: Advances in Neural Information Processing Systems (2021),https: //dl.acm.org/doi/10.5555/3540261.35406281, 7

Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems (2021),https: //dl.acm.org/doi/10.5555/3540261.35406281, 7

work page doi:10.5555/3540261.35406281 2021
[76]

In: International Conference on Intelligent Robots and Systems

Ye, T., Wu, Q., Deng, J., Liu, G., Liu, L., Xia, S., Pang, L., Yu, W., Pei, L.: Thermal-nerf: Neural radiance fields from an infrared camera. In: International Conference on Intelligent Robots and Systems. pp. 1046–1053. IEEE (2024) 2

2024
[77]

Yin, Q., Guo, P.: Multispectral remote sensing image classification with multiple features.In:InternationalConferenceonMachineLearningandCybernetics.vol.1, pp. 360–365. IEEE (2007).https://doi.org/10.1109/ICMLC.2007.43701702

work page doi:10.1109/icmlc.2007.43701702 2007
[78]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11961–11973 (2025).https://doi.org/ 10.1109/TPAMI.2025.36040104

Zhang, D., Yuan, Y.J., Chen, Z., Zhang, F.L., He, Z., Shan, S., Gao, L.: Stylizedgs: Controllable stylization for 3d gaussian splatting. IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11961–11973 (2025).https://doi.org/ 10.1109/TPAMI.2025.36040104

work page doi:10.1109/tpami.2025.36040104 2025
[79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, K., Luan, F., Li, Z., Snavely, N.: Iron: Inverse rendering by optimiz- ing neural sdfs and materials from photometric images. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5565–5574 (2022).https: //doi.org/10.1109/CVPR52688.2022.005481

work page doi:10.1109/cvpr52688.2022.005481 2022
[80]

26466–26476 (2025) 2, 3, 4, 10, 13, 23

Zhang, K., Lyu, Y., Guo, H., Li, S., Ma, Z., Shi, B.: Polaranything: Diffusion-based polarimetricimagesynthesis.In:InternationalConferenceonComputerVision.pp. 26466–26476 (2025) 2, 3, 4, 10, 13, 23

2025
[81]

IEEE Access11, 27401–27413 (2023)

Zhang, Q., Wang, B.H., Yang, M.C., Zou, H.: Mmnerf: multi-modal and multi-view optimized cross-scene neural radiance fields. IEEE Access11, 27401–27413 (2023). https://doi.org/10.1109/ACCESS.2023.32545484

work page doi:10.1109/access.2023.32545484 2023
[82]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, Y., Chen, A., Wan, Y., Song, Z., Yu, J., Luo, Y., Yang, W.: Ref-gs: Direc- tional factorization for 2d gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 26483–26492 (2025).https://doi.org/10. 1109/CVPR52734.2025.024661

work page arXiv 2025
[83]

Pattern Recognition161, 111271 (2025) 4

Zhao, C., Huang, X., Yang, K., Wang, X., Wang, Q.: Generalizable 3d gaussian splatting for novel view synthesis. Pattern Recognition161, 111271 (2025) 4

2025
[84]

Applied Optics55(23), 6480–6490 (2016).https: //doi.org/10.1364/AO.55.0064802

Zhou,Z.,Dong,M.,Xie,X.,Gao,Z.:Fusionofinfraredandvisibleimagesfornight- vision context enhancement. Applied Optics55(23), 6480–6490 (2016).https: //doi.org/10.1364/AO.55.0064802

work page doi:10.1364/ao.55.0064802 2016
[85]

Fruits”, “Teddybear

Zhu, H., Ding, T., Chen, T., Zharkov, I., Nevatia, R., Liang, L.: Caesarnerf: Cal- ibrated semantic representation for few-shot generalizable neural rendering. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) European Conference on Computer Vision. pp. 71–89. Springer Nature Switzer- land, Cham (2025).https://doi.org/...

work page doi:10.1007/978-3-031-72658-3_54 2025

[1] [1]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Arad, B., Timofte, R., Yahel, R., Morag, N., Bernat, A., Cai, Y., Lin, J., Lin, Z., Wang, H., Zhang, Y., et al.: Ntire 2022 spectral recovery challenge and data set. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop. pp. 862–880. IEEE (2022).https://doi.org/10.1109/CVPRW56347.2022.001024

work page doi:10.1109/cvprw56347.2022.001024 2022

[2] [3]

In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Bachmann, R., Kar, O.F., Mizrahi, D., Garjani, A., Gao, M., Griffiths, D., Hu, J., Dehghan, A., Zamir, A.: 4m-21: An any-to-any vision model for tens of tasks and modalities. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Advances in Neural Information Processing Systems. vol. 37, pp. 61872–61911. Curran As...

2024

[3] [4]

In: European Conference on Computer Vision

Bachmann, R., Mizrahi, D., Atanov, A., Zamir, A.: Multimae: Multi-modal multi- task masked autoencoders. In: European Conference on Computer Vision. pp. 348–

[4] [5]

Springer (2022).https://doi.org/10.1007/978-3-031-19836-6_203

work page doi:10.1007/978-3-031-19836-6_203 2022

[5] [6]

Swin transformer: Hierarchical vision transformer using shifted windows,

Boss, M., Braun, R., Jampani, V., Barron, J.T., Liu, C., Lensch, H.: Nerd: Neu- ral reflectance decomposition from image collections. In: International Confer- ence on Computer Vision. pp. 12684–12694 (2021).https://doi.org/10.1109/ ICCV48922.2021.012451

work page arXiv 2021

[6] [7]

In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

Cai,Y.,Lin,J.,Lin,Z.,Wang,H.,Zhang,Y.,Pfister,H.,Timofte,R.,VanGool,L.: Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 745–755 (2022).https://doi.org/10.1109/CVPRW56347.2022.000902, 3, 4, 10, 13, 23

work page doi:10.1109/cvprw56347.2022.000902 2022

[7] [8]

In: European Conference on Computer Vision (2026) 8

Camuffo, E., Barbato, F., Ozay, M., Milani, S., Michieli, U.: Mocha: Multi-modal objects-aware cross-architecture alignment. In: European Conference on Computer Vision (2026) 8

2026

[8] [9]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Charatan,D.,Li,S.L.,Tagliasacchi,A.,Sitzmann,V.:pixelsplat:3dgaussiansplats from image pairs for scalable generalizable 3d reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 19457–19467 (2024).https: //doi.org/10.1109/CVPR52733.2024.018404

work page doi:10.1109/cvpr52733.2024.018404 2024

[9] [10]

ACM Transactions on Graphics (2023),https:// doi.org/10.1145/35921354, 5

Chen, A., Xu, Z., Wei, X., Tang, S., Su, H., Geiger, A.: Dictionary fields: Learning a neural basis decomposition. ACM Transactions on Graphics (2023),https:// doi.org/10.1145/35921354, 5

work page doi:10.1145/35921354 2023

[10] [11]

In: European Conference on Computer Vision

Chen, Q., Shu, S., Bai, X.: Thermal3d-gs: Physics-induced 3d gaussians for thermal infrared novel-view synthesis. In: European Conference on Computer Vision. pp. 253–269. Springer (2024).https://doi.org/10.1007/978-3-031-73383-3_152

work page doi:10.1007/978-3-031-73383-3_152 2024

[11] [13]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Chou, Z.T., Huang, S.Y., Liu, I., Wang, Y.C.F., et al.: Gsnerf: Generalizable se- mantic neural radiance fields with enhanced 3d scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 20806–20815 (2024). https://doi.org/10.1109/CVPR52733.2024.019661

work page doi:10.1109/cvpr52733.2024.019661 2024

[12] [15]

In: Color Imaging Conference

Darling, B.A., Ferwerda, J.A., Berns, R.S., Chen, T.: Real-time multispectral ren- dering with complex illumination. In: Color Imaging Conference. vol. 19, pp. 345–

[13] [16]

1145/3721250.37430352

Society of Imaging Science and Technology (2011).https://doi.org/10. 1145/3721250.37430352

work page arXiv 2011

[14] [17]

In: European Conference on Computer Vision

Dave, A., Zhao, Y., Veeraraghavan, A.: Pandora: Polarization-aided neural decom- position of radiance. In: European Conference on Computer Vision. pp. 538–556. Springer (2022).https://doi.org/10.1007/978-3-031-20071-7_322

work page doi:10.1007/978-3-031-20071-7_322 2022

[15] [18]

In: International Con- ference on Learning Representations (2021) 4

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. In: International Con- ference on Learning Representations (2021) 4

2021

[16] [19]

In: IEEE Symposium on Volume Visualization and Graphics

Gibson, S.F.: Using distance maps for accurate surface representation in sampled volumes. In: IEEE Symposium on Volume Visualization and Graphics. pp. 23–30 (1998) 6

1998

[17] [20]

In: International Conference on Learning Representa- tions (2026),https://openreview.net/forum?id=BR2ItBcqOo3

Griffiths, R., Dansereau, D.G.: RoRE: Rotary ray embedding for generalised multi- modal scene understanding. In: International Conference on Learning Representa- tions (2026),https://openreview.net/forum?id=BR2ItBcqOo3

2026

[18] [21]

In: International Conference on Machine Learning (2020),https://dl.acm.org/doi/abs/10.5555/3524938.35252936

Gropp, A., Yariv, L., Haim, N., Atzmon, M., Lipman, Y.: Implicit geometric reg- ularization for learning shapes. In: International Conference on Machine Learning (2020),https://dl.acm.org/doi/abs/10.5555/3524938.35252936

work page doi:10.5555/3524938.35252936 2020

[19] [22]

Scientific Reports12(1), 17288 (2022).https://doi

Großmann, W., Horn, H., Niggemann, O.: Improving remote material classification ability with thermal imagery. Scientific Reports12(1), 17288 (2022).https://doi. org/10.1038/s41598-022-21588-42

work page doi:10.1038/s41598-022-21588-42 2022

[20] [23]

In: AAAI Conference on Artificial Intelligence

Guo, H., Liu, H., Wen, J., Li, J.: Cross-spectral gaussian splatting with spatial occupancy consistency. In: AAAI Conference on Artificial Intelligence. vol. 39, pp. 3229–3237 (2025).https://doi.org/10.1609/aaai.v39i3.323332, 3

work page doi:10.1609/aaai.v39i3.323332 2025

[21] [24]

In: International Conference on Computer Vision

Han, Y., Tie, B., Guo, H., Lyu, Y., Li, S., Shi, B., Jia, Y., Ma, Z.: Polgs: Polari- metric gaussian splatting for fast reflective surface reconstruction. In: International Conference on Computer Vision. pp. 28073–28082 (2025) 2

2025

[22] [25]

Optics Express19(10), 9315–9329 (2011).https://doi.org/10.1364/OE

Hashimoto, N., Murakami, Y., Bautista, P.A., Yamaguchi, M., Obi, T., Ohyama, N., Uto, K., Kosugi, Y.: Multispectral image enhancement for effective visualiza- tion. Optics Express19(10), 9315–9329 (2011).https://doi.org/10.1364/OE. 19.0093152

work page doi:10.1364/oe 2011

[23] [26]

acha.2010.07.001

Hassan, M., Forest, F., Fink, O., Mielle, M.: Thermonerf: A multimodal neural radiance field for joint rgb-thermal novel view synthesis of building facades. Ad- vanced Engineering Informatics65, 103345 (2025).https://doi.org/10.1016/j. aei.2025.1033452

work page doi:10.1016/j 2025

[24] [27]

Computer Graphics Forum42(2023).https://doi.org/10.1111/cgf.149404

He, H., Liang, Y., Xiao, S., Chen, J., Chen, Y.: Cp-nerf: Conditionally parameter- ized neural radiance fields for cross-scene novel view synthesis. Computer Graphics Forum42(2023).https://doi.org/10.1111/cgf.149404

work page doi:10.1111/cgf.149404 2023

[25] [28]

In: ACM International Conference on Computer Graphics and Interactive Techniques

Huang, B., Yu, Z., Chen, A., Geiger, A., Gao, S.: 2d gaussian splatting for geomet- rically accurate radiance fields. In: ACM International Conference on Computer Graphics and Interactive Techniques. pp. 1–11 (2024).https://doi.org/10.1145/ 3641519.36574281

work page arXiv 2024

[26] [29]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Huang, Y.H., He, Y., Yuan, Y.J., Lai, Y.K., Gao, L.: Stylizednerf: Consistent 3d scene stylization as stylized nerf via 2d-3d mutual learning. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 18321–18331 (2022).https: //doi.org/10.1109/CVPR52688.2022.017804 18 F. Lincetto et al

work page doi:10.1109/cvpr52688.2022.017804 2022

[27] [30]

Swin transformer: Hierarchical vision transformer using shifted windows,

Jain, A., Tancik, M., Abbeel, P.: Putting nerf on a diet: Semantically consistent few-shot view synthesis. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5885–5894 (2021).https://doi.org/10.1109/ICCV48922.2021. 005834

work page doi:10.1109/iccv48922.2021 2021

[28] [31]

Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P

Jin, H., Liu, I., Xu, P., Zhang, X., Han, S., Bi, S., Zhou, X., Xu, Z., Su, H.: Tensoir: Tensorial inverse rendering. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 165–174 (2023).https://doi.org/10.1109/CVPR52729.2023. 000241

work page doi:10.1109/cvpr52729.2023 2023

[29] [32]

ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/35924331

Kerbl, B., Kopanas, G., Leimkuehler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics42(4), 1–14 (2023).https://doi.org/10.1145/35924331

work page doi:10.1145/35924331 2023

[30] [33]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: Language embedded radiance fields. In: International Conference on Computer Vision. pp. 19729–19739 (2023).https://doi.org/10.1109/ICCV51070.2023.018071

work page doi:10.1109/iccv51070.2023.018071 2023

[31] [34]

In: ACM International Conference on Computer Graphics and Interactive Techniques - Asia

Kim, Y., Jin, W., Cho, S., Baek, S.H.: Neural spectro-polarimetric fields. In: ACM International Conference on Computer Graphics and Interactive Techniques - Asia. pp. 1–11 (2023).https://doi.org/10.1145/3610548.36181722, 3

work page doi:10.1145/3610548.36181722 2023

[32] [35]

In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

Lei, C., Huang, X., Zhang, M., Yan, Q., Sun, W., Chen, Q.: Polarized reflection removal with perfect alignment in the wild. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 1750–1758 (2020).https://doi.org/10. 1109/CVPR42600.2020.001822

work page arXiv 2020

[33] [36]

In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.12632–12641(2022).https://doi.org/10.1109/CVPR52688

Lei, C., Qi, C., Xie, J., Fan, N., Koltun, V., Chen, Q.: Shape from polarization for complex scenes in the wild. In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.12632–12641(2022).https://doi.org/10.1109/CVPR52688. 2022.012302

work page doi:10.1109/cvpr52688 2022

[34] [37]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Li, C., Ono, T., Uemori, T., Mihara, H., Gatto, A., Nagahara, H., Moriuchi, Y.: Neisf: Neural incident stokes field for geometry and material estimation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 21434–21445 (2024). https://doi.org/10.1109/CVPR52733.2024.020252

work page doi:10.1109/cvpr52733.2024.020252 2024

[35] [39]

In: International Conference on Acoustics, Speech, and Signal Process- ing

Li, J., Li, Y., Sun, C., Wang, C., Xiang, J.: Spec-nerf: Multi-spectral neural radi- ance fields. In: International Conference on Acoustics, Speech, and Signal Process- ing. pp. 2485–2489. IEEE (2024).https://doi.org/10.1109/ICASSP48485.2024. 104460152

work page doi:10.1109/icassp48485.2024 2024

[36] [40]

In: AAAI Conference on Artificial In- telligence

Li, R., Liu, J., Liu, G., Zhang, S., Zeng, B., Liu, S.: Spectralnerf: Physically based spectral rendering with neural radiance field. In: AAAI Conference on Artificial In- telligence. vol. 38, pp. 3154–3162 (2024).https://doi.org/10.1609/aaai.v38i4. 280992

work page doi:10.1609/aaai.v38i4 2024

[37] [41]

Castro, Benedikt Boecking, Harshita Sharma, Kenza Bouzid, Anja Thieme, Anton Schwaighofer, Maria Wetscherek, Matthew P

Li, Z., Müller, T., Evans, A., Taylor, R.H., Unberath, M., Liu, M.Y., Lin, C.H.: Neuralangelo: High-fidelity neural surface reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition (2023).https://doi.org/10.1109/ CVPR52729.2023.008171, 6, 10

work page arXiv 2023

[38] [42]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Liang, Z., Zhang, Q., Feng, Y., Shan, Y., Jia, K.: Gs-ir: 3d gaussian splatting for inverse rendering. In: IEEE Conference on Computer Vision and Pattern Recogni- tion. pp. 21644–21653 (2024).https://doi.org/10.1109/CVPR52733.2024.02045 1 Learning Spect. and Pol. Clues for One-to-Multimodal Novel View Synthesis 19

work page doi:10.1109/cvpr52733.2024.02045 2024

[39] [43]

In: British Machine Vision Conference

Lincetto, F., Agresti, G., Rossi, M., Zanuttigh, P.: Exploiting multiple priors for neural 3d indoor reconstruction. In: British Machine Vision Conference. BMVA (2023) 1

2023

[40] [44]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Lincetto, F., Agresti, G., Rossi, M., Zanuttigh, P.: Multimodalstudio: A heteroge- neous sensor dataset and framework for neural rendering across multiple imaging modalities. In: IEEE Conference on Computer Vision and Pattern Recognition (2025).https://doi.org/10.1109/CVPR52734.2025.010242, 3, 5, 10, 11, 12, 22

work page doi:10.1109/cvpr52734.2025.010242 2025

[41] [45]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Liu, Y., Hu, B., Huang, J., Tai, Y.W., Tang, C.K.: Instance neural radiance field. In: International Conference on Computer Vision. pp. 787–796 (October 2023). https://doi.org/10.1109/ICCV51070.2023.000791

work page doi:10.1109/iccv51070.2023.000791 2023

[42] [46]

In: International Conference on Learning Repre- sentations (2025) 2

Lu, R., Chen, H., Zhu, Z., Qin, Y., Lu, M., Yan, C., et al.: Thermalgaussian: Thermal 3d gaussian splatting. In: International Conference on Learning Repre- sentations (2025) 2

2025

[43] [47]

IEEE Access12, 45331–45341 (2024).https: //doi.org/10.1109/ACCESS.2024.33815312

Ma, R., Ma, T., Guo, D., He, S.: Novel view synthesis and dataset augmentation for hyperspectral data using nerf. IEEE Access12, 45331–45341 (2024).https: //doi.org/10.1109/ACCESS.2024.33815312

work page doi:10.1109/access.2024.33815312 2024

[44] [48]

In: Advances in Neural Informa- tion Processing Systems (2025) 4

Meng, G., Cai, Z., Chen, R., Tu, J., Wang, Y., Huang, Y., Ding, X.: Frn: Fractal- based recursive spectral reconstruction network. In: Advances in Neural Informa- tion Processing Systems (2025) 4

2025

[45] [49]

In: Eu- ropean Conference on Computer Vision (2020),https://doi.org/10.1007/978- 3-030-58452-8_241

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. In: Eu- ropean Conference on Computer Vision (2020),https://doi.org/10.1007/978- 3-030-58452-8_241

work page doi:10.1007/978- 2020

[46] [50]

In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S

Mizrahi,D.,Bachmann,R.,Kar,O.,Yeo,T.,Gao,M.,Dehghan,A.,Zamir,A.:4m: Massively multimodal masked modeling. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 58363–58408. Curran Associates, Inc. (2023),https://dl. acm.org/doi/10.5555/3666122.36686663

work page doi:10.5555/3666122.36686663 2023

[47] [51]

Applied Mechanics Reviews57(3), B15–B15 (2004),https://doi.org/10

Osher, S., Fedkiw, R., Piechor, K.: Level set methods and dynamic implicit sur- faces. Applied Mechanics Reviews57(3), B15–B15 (2004),https://doi.org/10. 1007/b988796

2004

[48] [52]

In: European Confer- ence on Computer Vision

Özer, M., Weiherer, M., Hundhausen, M., Egger, B.: Exploring multi-modal neural scene representations with applications on thermal imaging. In: European Confer- ence on Computer Vision. pp. 82–98. Springer (2024).https://doi.org/10.1007/ 978-3-031-92805-5_62

2024

[49] [53]

In: IEEE Conference on Computer Vision and Pattern Recognition

Perez, F., Rojas, S., Hinojosa, C., Rueda-Chacón, H., Ghanem, B.: Unmix-nerf: Spectral unmixing meets neural radiance fields. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 26284–26293 (2025) 2

2025

[50] [54]

In: International Conference on 3D Vision

Poggi, M., Ramirez, P.Z., Tosi, F., Salti, S., Mattoccia, S., Di Stefano, L.: Cross- spectral neural radiance fields. In: International Conference on 3D Vision. pp. 606–616. IEEE (2022).https://doi.org/10.1109/3DV57658.2022.000712, 3, 24

work page doi:10.1109/3dv57658.2022.000712 2022

[51] [55]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 20051–20060 (2024).https://doi.org/10.1109/CVPR52733.2024.018951

work page doi:10.1109/cvpr52733.2024.018951 2024

[52] [56]

In: ACM International Conference on Multimedia

Qu, Y., Dai, S., Li, X., Lin, J., Cao, L., Zhang, S., Ji, R.: Goi: Find 3d gaussians of interest with an optimizable open-vocabulary semantic-space hyperplane. In: ACM International Conference on Multimedia. pp. 5328–5337 (2024).https:// doi.org/10.1145/3664647.36808521

work page doi:10.1145/3664647.36808521 2024

[53] [57]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: IEEE Conference on Computer 20 F. Lincetto et al. Vision and Pattern Recognition. pp. 10684–10695 (2022).https://doi.org/10. 1109/CVPR52688.2022.010424

work page arXiv 2022

[54] [58]

In: IEEE Conference on Computer Vision and Pat- tern Recognition

Saponaro, P., Sorensen, S., Kolagunda, A., Kambhamettu, C.: Material classifi- cation with thermal imagery. In: IEEE Conference on Computer Vision and Pat- tern Recognition. pp. 4649–4656 (2015).https://doi.org/10.1109/CVPR.2015. 72990962

work page doi:10.1109/cvpr.2015 2015

[55] [59]

In: IEEE Conference on Computer Vision and Pattern Recognition Workshop

Shi, Z., Chen, C., Xiong, Z., Liu, D., Wu, F.: Hscnn+: Advanced cnn-based hyper- spectral recovery from rgb images. In: IEEE Conference on Computer Vision and Pattern Recognition Workshop. pp. 939–947 (2018).https://doi.org/10.1109/ CVPRW.2018.001394

work page arXiv 2018

[56] [60]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Thirgood, C., Mendez, O., Ling, E., Storey, J., Hadfield, S.: Hypergs: Hyperspec- tral 3d gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5970–5979 (2025).https://doi.org/10.1109/CVPR52734.2025. 005602

work page doi:10.1109/cvpr52734.2025 2025

[57] [61]

International Journal of Remote Sens- ing26(15), 3241–3254 (2005).https://doi.org/10.1080/014311605001276092

Tsagaris,V., Anastassopoulos,V.: Multispectralimage fusionfor improved rgbrep- resentation based on perceptual attributes. International Journal of Remote Sens- ing26(15), 3241–3254 (2005).https://doi.org/10.1080/014311605001276092

work page doi:10.1080/014311605001276092 2005

[58] [62]

Varma, M., Wang, P., Chen, X., Chen, T., Venugopalan, S., Wang, Z.: Is atten- tion all that nerf needs? In: International Conference on Learning Representations (2023) 4

2023

[59] [63]

In: ACM International Conference on Multimedia

Wang, H., Wen, S., Guo, B.: Polarimetric monocular gaussian splatting slam for dense surface reconstruction. In: ACM International Conference on Multimedia. pp. 7519–7528 (2025).https://doi.org/10.1145/3746027.37549252

work page doi:10.1145/3746027.37549252 2025

[60] [64]

Ad- vances in Neural Information Processing Systems (2021),https://dl.acm.org/ doi/10.5555/3540261.35423421

Wang, P., Liu, L., Liu, Y., Theobalt, C., Komura, T., Wang, W.: Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. Ad- vances in Neural Information Processing Systems (2021),https://dl.acm.org/ doi/10.5555/3540261.35423421

work page doi:10.5555/3540261.35423421 2021

[61] [65]

In: IEEE International Conference on Image Processing

Wang, W., Zhang, J., Shen, C.: Improved human detection and classification in thermal images. In: IEEE International Conference on Image Processing. pp. 2313–

[62] [66]

IEEE (2010).https://doi.org/10.1109/ICIP.2010.56499462

work page doi:10.1109/icip.2010.56499462 2010

[63] [67]

Advances in Neural Information Processing Systems37, 103168–103197 (2024),https://dl.acm.org/ doi/10.5555/3737916.37411941

Wang, Y., Huang, D., Ye, W., Zhang, G., Ouyang, W., He, T.: Neurodin: A two- stage framework for high-fidelity neural surface reconstruction. Advances in Neural Information Processing Systems37, 103168–103197 (2024),https://dl.acm.org/ doi/10.5555/3737916.37411941

work page doi:10.5555/3737916.37411941 2024

[64] [68]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Wang, Y., Han, Q., Habermann, M., Daniilidis, K., Theobalt, C., Liu, L.: Neus2: Fast learning of neural implicit surfaces for multi-view reconstruction. In: Interna- tional Conference on Computer Vision. pp. 3295–3306 (2023).https://doi.org/ 10.1109/ICCV51070.2023.003051

work page doi:10.1109/iccv51070.2023.003051 2023

[65] [69]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Wang, Y., Fang, L., Zhu, H., Hu, F., Ye, L., Ma, Z.: Golf-nrt: Integrating global context and local geometry for few-shot view synthesis*. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 21349–21359 (2025).https: //doi.org/10.1109/CVPR52734.2025.019894

work page doi:10.1109/cvpr52734.2025.019894 2025

[66] [71]

IEEE Transactions on Neural Networks and Learning Systems 36(7), 12736–12746 (2024).https://doi.org/10.1109/TNNLS.2024.34601904 Learning Spect

Wu, Y., Dian, R., Li, S.: Multistage spatial-spectral fusion network for spectral super-resolution. IEEE Transactions on Neural Networks and Learning Systems 36(7), 12736–12746 (2024).https://doi.org/10.1109/TNNLS.2024.34601904 Learning Spect. and Pol. Clues for One-to-Multimodal Novel View Synthesis 21

work page doi:10.1109/tnnls.2024.34601904 2024

[67] [72]

In: European Conference on Com- puter Vision

Xu, J., Liao, M., Kathirvel, R.P., Patel, V.M.: Leveraging thermal modality to enhance reconstruction in low-light conditions. In: European Conference on Com- puter Vision. pp. 321–339. Springer (2024).https://doi.org/10.1007/978-3- 031-72913-3_182

work page doi:10.1007/978-3- 2024

[68] [73]

In: Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

Xu, Y., Zoss, G., Chandran, P., Gross, M., Bradley, D., Gotardo, P.: Renerf: Re- lightable neural radiance fields with nearfield lighting. In: International Confer- ence on Computer Vision. pp. 22581–22591 (2023).https://doi.org/10.1109/ ICCV51070.2023.020641

work page arXiv 2023

[69] [74]

In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.10890–10899(2025).https://doi.org/10.1109/CVPR52734

Yao, M., Wang, M., Tam, K.M., Li, L., Xue, T., Gu, J.: Polarfree: Polarization- based reflection-free imaging. In: IEEE Conference on Computer Vision and Pat- ternRecognition.pp.10890–10899(2025).https://doi.org/10.1109/CVPR52734. 2025.010172

work page doi:10.1109/cvpr52734 2025

[70] [75]

In: Advances in Neural Information Processing Systems (2021),https: //dl.acm.org/doi/10.5555/3540261.35406281, 7

Yariv, L., Gu, J., Kasten, Y., Lipman, Y.: Volume rendering of neural implicit surfaces. In: Advances in Neural Information Processing Systems (2021),https: //dl.acm.org/doi/10.5555/3540261.35406281, 7

work page doi:10.5555/3540261.35406281 2021

[71] [76]

In: International Conference on Intelligent Robots and Systems

Ye, T., Wu, Q., Deng, J., Liu, G., Liu, L., Xia, S., Pang, L., Yu, W., Pei, L.: Thermal-nerf: Neural radiance fields from an infrared camera. In: International Conference on Intelligent Robots and Systems. pp. 1046–1053. IEEE (2024) 2

2024

[72] [77]

Yin, Q., Guo, P.: Multispectral remote sensing image classification with multiple features.In:InternationalConferenceonMachineLearningandCybernetics.vol.1, pp. 360–365. IEEE (2007).https://doi.org/10.1109/ICMLC.2007.43701702

work page doi:10.1109/icmlc.2007.43701702 2007

[73] [78]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11961–11973 (2025).https://doi.org/ 10.1109/TPAMI.2025.36040104

Zhang, D., Yuan, Y.J., Chen, Z., Zhang, F.L., He, Z., Shan, S., Gao, L.: Stylizedgs: Controllable stylization for 3d gaussian splatting. IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11961–11973 (2025).https://doi.org/ 10.1109/TPAMI.2025.36040104

work page doi:10.1109/tpami.2025.36040104 2025

[74] [79]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, K., Luan, F., Li, Z., Snavely, N.: Iron: Inverse rendering by optimiz- ing neural sdfs and materials from photometric images. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 5565–5574 (2022).https: //doi.org/10.1109/CVPR52688.2022.005481

work page doi:10.1109/cvpr52688.2022.005481 2022

[75] [80]

26466–26476 (2025) 2, 3, 4, 10, 13, 23

Zhang, K., Lyu, Y., Guo, H., Li, S., Ma, Z., Shi, B.: Polaranything: Diffusion-based polarimetricimagesynthesis.In:InternationalConferenceonComputerVision.pp. 26466–26476 (2025) 2, 3, 4, 10, 13, 23

2025

[76] [81]

IEEE Access11, 27401–27413 (2023)

Zhang, Q., Wang, B.H., Yang, M.C., Zou, H.: Mmnerf: multi-modal and multi-view optimized cross-scene neural radiance fields. IEEE Access11, 27401–27413 (2023). https://doi.org/10.1109/ACCESS.2023.32545484

work page doi:10.1109/access.2023.32545484 2023

[77] [82]

In: 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Zhang, Y., Chen, A., Wan, Y., Song, Z., Yu, J., Luo, Y., Yang, W.: Ref-gs: Direc- tional factorization for 2d gaussian splatting. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 26483–26492 (2025).https://doi.org/10. 1109/CVPR52734.2025.024661

work page arXiv 2025

[78] [83]

Pattern Recognition161, 111271 (2025) 4

Zhao, C., Huang, X., Yang, K., Wang, X., Wang, Q.: Generalizable 3d gaussian splatting for novel view synthesis. Pattern Recognition161, 111271 (2025) 4

2025

[79] [84]

Applied Optics55(23), 6480–6490 (2016).https: //doi.org/10.1364/AO.55.0064802

Zhou,Z.,Dong,M.,Xie,X.,Gao,Z.:Fusionofinfraredandvisibleimagesfornight- vision context enhancement. Applied Optics55(23), 6480–6490 (2016).https: //doi.org/10.1364/AO.55.0064802

work page doi:10.1364/ao.55.0064802 2016

[80] [85]

Fruits”, “Teddybear

Zhu, H., Ding, T., Chen, T., Zharkov, I., Nevatia, R., Liang, L.: Caesarnerf: Cal- ibrated semantic representation for few-shot generalizable neural rendering. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) European Conference on Computer Vision. pp. 71–89. Springer Nature Switzer- land, Cham (2025).https://doi.org/...

work page doi:10.1007/978-3-031-72658-3_54 2025