pith. sign in

arxiv: 2606.28321 · v1 · pith:N7CN4YICnew · submitted 2026-06-26 · 💻 cs.CV

StructSplat: Generalizable 3D Gaussian Splatting from Uncalibrated Sparse Views

Pith reviewed 2026-06-29 03:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D Gaussian SplattingGeneralizable ReconstructionUncalibrated ImagesSparse ViewsFeed-forward ModelSemantic PriorsCamera Alignment
0
0 comments X

The pith

StructSplat reconstructs 3D Gaussians from uncalibrated sparse views by assigning explicit roles to geometry, semantic and texture cues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces StructSplat as a feed-forward framework that builds 3D Gaussian models directly from uncalibrated images. It avoids per-scene optimization and known camera parameters by using a structured representation that keeps geometry, semantic, and texture information in separate, explicit roles during reconstruction. Pixel-aligned feature injection handles texture details from 2D inputs, semantic-aware priors enforce global consistency, and a camera alignment step blocks information leakage. This design produces higher reconstruction quality and stronger cross-dataset generalization than prior methods. A reader would care because the approach simplifies turning casual photos into usable 3D models without specialized equipment or lengthy tuning.

Core claim

We present StructSplat, a feed-forward and generalizable 3D Gaussian reconstruction framework that operates directly on uncalibrated images without requiring camera parameters. Our key idea is to adopt a structured representation that organizes geometry, semantic, and texture cues with explicit roles in the reconstruction process. Specifically, we introduce a pixel-aligned feature injection mechanism to enable accurate texture modeling from 2D observations, incorporate semantic-aware priors to improve global consistency, and design a camera alignment strategy to prevent information leakage and improve generalization.

What carries the argument

Structured representation that assigns explicit roles to geometry, semantic, and texture cues through pixel-aligned feature injection, semantic-aware priors, and a camera alignment strategy.

If this is right

  • The method reaches 28.045 PSNR on DL3DV, exceeding AnySplat by 5.67 dB.
  • Cross-dataset tests show gains of 1.94 dB on ACID and 1.72 dB on RealEstate10K over AnySplat.
  • Reconstruction succeeds without input camera parameters or per-scene optimization.
  • Explicit cue separation improves both fidelity and generalization compared with entangled backbones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cue-separation pattern could be tested on other 3D representations such as NeRF variants.
  • If the alignment strategy scales, casual multi-view capture on phones might become sufficient for high-quality 3D output.
  • Limits may appear when scenes contain strong lighting changes or repetitive textures that weaken semantic priors.

Load-bearing premise

Separating geometry, semantic, and texture cues into explicit roles via pixel-aligned feature injection, semantic-aware priors, and a camera alignment strategy prevents information leakage and enables generalization from uncalibrated sparse views without per-scene optimization.

What would settle it

Performance on a held-out set of uncalibrated sparse-view scenes that falls below baselines using known camera poses would falsify the claim that the structured separation enables reliable generalization.

Figures

Figures reproduced from arXiv: 2606.28321 by Beiqi Chen, Guangcong Wang, Jia-Chen Zhao, Liqiang Nie, Xinyang Chen.

Figure 1
Figure 1. Figure 1: Qualitative comparison for novel-view synthesis from uncalibrated [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our proposed StructSplat. Given uncalibrated source images, we perform feed-forward 3D reconstruction without camera parameters by adopting a structured representation that organizes texture, semantic, and geometric cues. En￾coders extract multi-level features, which drive two decoding pathways: a Gaussian decoder predicts camera-centric Gaussians, while a camera decoder estimates camera parame… view at source ↗
Figure 3
Figure 3. Figure 3: Geometry-oriented features fail to capture appearance details. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Detailed architecture of the proposed Gaussian decoder head. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Rendering pipeline with or without camera alignment. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Visual comparison of novel view synthesis on DL3DV [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual comparison of novel view synthesis on RealEstate10K [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative visualization of rendered novel views and depth maps. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Qualitative comparison of L1 error maps. [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Qualitative comparison of ablation. 1) Using only geometric features causes severe structural collapse and large color artifacts. 2) Adding semantic features restores coherent structures and object layouts, while 3) further introducing texture features enables sharper images with accurate high-frequency details. supervision, our proposed Structured Representation intrinsically captures the underlying 3D g… view at source ↗
read the original abstract

We present StructSplat, a feed-forward and generalizable 3D Gaussian reconstruction framework that operates directly on uncalibrated images without requiring camera parameters. Existing methods either rely on per-scene optimization or assume known camera poses, and often entangle geometry and appearance within a unified backbone, limiting reconstruction fidelity and generalization. Our key idea is to adopt a structured representation that organizes geometry, semantic, and texture cues with explicit roles in the reconstruction process. Specifically, we introduce a pixel-aligned feature injection mechanism to enable accurate texture modeling from 2D observations, incorporate semantic-aware priors to improve global consistency, and design a camera alignment strategy to prevent information leakage and improve generalization. Experiments show that our method significantly outperforms prior approaches on challenging benchmarks. On DL3DV, our method achieves 28.045 PSNR, surpassing AnySplat (22.377) by +5.67 dB. In cross-dataset evaluation, our method achieves +1.94 dB over AnySplat on ACID and +1.72 dB on RealEstate10K. Project page: https://structsplat.github.io Code: https://github.com/J-C-Zhao/StructSplat

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces StructSplat, a feed-forward 3D Gaussian Splatting framework that reconstructs from uncalibrated sparse views without per-scene optimization or known camera poses. It organizes geometry, semantic, and texture cues via three mechanisms: pixel-aligned feature injection for texture modeling, semantic-aware priors for global consistency, and a camera alignment strategy to avoid information leakage. The central empirical claim is large performance gains, including 28.045 PSNR on DL3DV (+5.67 dB over AnySplat), +1.94 dB on ACID, and +1.72 dB on RealEstate10K in cross-dataset tests.

Significance. If the reported gains are supported by detailed ablations and analysis, the work would represent a meaningful step toward practical generalizable 3D reconstruction from casual image collections. The public release of code and project page is a clear strength that supports reproducibility and follow-on research.

major comments (2)
  1. [Experiments] Experiments section: The abstract reports specific PSNR deltas (e.g., +5.67 dB on DL3DV) but the provided source contains no ablation tables, variance across runs, or component-wise breakdowns isolating the contribution of pixel-aligned injection versus semantic priors versus the alignment module; without these, attribution of the gains to the structured representation remains unverified.
  2. [Method] Method section: The camera alignment strategy is described at a high level as preventing leakage, yet no equations, loss terms, or pseudocode detail how alignment is enforced during feature injection or how it interacts with the uncalibrated input assumption; this is load-bearing for the generalization claim.
minor comments (1)
  1. [Abstract] The abstract and introduction would benefit from a brief statement of the input resolution and number of views used in the reported benchmarks to allow direct comparison with prior feed-forward methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that the current version would benefit from expanded experimental analysis and methodological details. We will revise the manuscript to address both major comments.

read point-by-point responses
  1. Referee: [Experiments] Experiments section: The abstract reports specific PSNR deltas (e.g., +5.67 dB on DL3DV) but the provided source contains no ablation tables, variance across runs, or component-wise breakdowns isolating the contribution of pixel-aligned injection versus semantic priors versus the alignment module; without these, attribution of the gains to the structured representation remains unverified.

    Authors: We agree that the manuscript lacks the requested ablation studies and statistical details. In the revision we will add (i) a dedicated ablation table isolating the contribution of pixel-aligned feature injection, semantic-aware priors, and the camera alignment module, (ii) standard deviation across multiple random seeds for the reported PSNR values, and (iii) component-wise breakdowns on DL3DV, ACID, and RealEstate10K to substantiate attribution of the observed gains. revision: yes

  2. Referee: [Method] Method section: The camera alignment strategy is described at a high level as preventing leakage, yet no equations, loss terms, or pseudocode detail how alignment is enforced during feature injection or how it interacts with the uncalibrated input assumption; this is load-bearing for the generalization claim.

    Authors: We acknowledge that the camera alignment module is currently presented at a high level. In the revised manuscript we will expand Section 3 with the explicit formulation (including the alignment loss term and its weighting), the precise interaction between the alignment module and the pixel-aligned feature injection pathway, and pseudocode that shows how alignment is maintained under the uncalibrated-input assumption. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The abstract and available description present a high-level architectural description (pixel-aligned feature injection, semantic-aware priors, camera alignment) and empirical benchmark results without any equations, parameter-fitting steps, self-citations, or derivation chains. No load-bearing claim reduces to a fitted input, self-definition, or author-prior ansatz by construction. The reported PSNR gains are presented as experimental outcomes rather than derived predictions, leaving the method self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities; ledger left empty.

pith-pipeline@v0.9.1-grok · 5755 in / 1069 out tokens · 33514 ms · 2026-06-29T03:56:46.467636+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 7 linked inside Pith

  1. [1]

    In: ICCV

    Barron, J.T., Mildenhall, B., Tancik, M., Hedman, P., Martin-Brualla, R., Srini- vasan, P.P.: Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In: ICCV. pp. 5855–5864 (2021)

  2. [2]

    In: ICCV

    Barron, J.T., Mildenhall, B., Verbin, D., Srinivasan, P.P., Hedman, P.: Zip-nerf: Anti-aliased grid-based neural radiance fields. In: ICCV. pp. 19697–19705 (2023)

  3. [3]

    In: CVPR

    Bian, W., Wang, Z., Li, K., Bian, J.W.: NoPe-NeRF: Optimising neural radiance field with no pose prior. In: CVPR. pp. 4160–4169 (2023)

  4. [4]

    In: CVPR

    Bourigault, E., Bourigault, P.: MVDiff: Scalable and flexible multi-view diffusion for 3d object reconstruction from single-view. In: CVPR. pp. 7579–7586 (2024)

  5. [5]

    In: CVPR

    Charatan, D., Li, S.L., Tagliasacchi, A., Sitzmann, V.: PixelSplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In: CVPR. pp. 19457–19467 (2024)

  6. [6]

    In: ICCV

    Chen, A., Xu, Z., Zhao, F., Zhang, X., Xiang, F., Yu, J., Su, H.: MVSNerf: Fast generalizable radiance field reconstruction from multi-view stereo. In: ICCV. pp. 14124–14133 (2021)

  7. [7]

    In: ECCV

    Chen, S., Li, X., Wang, Z., Prisacariu, V.A.: DFNet: Enhance absolute pose re- gression with direct feature matching. In: ECCV. pp. 1–17 (2022)

  8. [8]

    In: ECCV

    Chen, Y., Xu, H., Zheng, C., Zhuang, B., Pollefeys, M., Geiger, A., Cham, T.J., Cai, J.: MVSplat: Efficient 3d gaussian splatting from sparse multi-view images. In: ECCV. pp. 370–386 (2024)

  9. [9]

    arXiv preprint arXiv:2010.11929 (2020)

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  10. [10]

    In: ECCV

    Fang,G.,Wang,B.:Mini-splatting:Representingsceneswithaconstrainednumber of gaussians. In: ECCV. pp. 165–181. Springer (2024) 16 J.-C. Zhao et al

  11. [11]

    In: CVPR

    Fu, Y., Wang, X., Liu, S., Kulkarni, A., Kautz, J., Efros, A.A.: COLMAP-free 3d gaussian splatting. In: CVPR. pp. 20796–20805 (2024)

  12. [12]

    In: CVPR

    He, Y., Yan, R., Fragkiadaki, K., Yu, S.I.: Epipolar transformers. In: CVPR. pp. 7776–7785 (2020)

  13. [13]

    In: ICML (2025)

    Hong, S., Jung, J., Shin, H., Han, J., Yang, J., Luo, C., Kim, S.: PF3plat: Pose-free feed-forward 3d gaussian splatting for novel view synthesis. In: ICML (2025)

  14. [14]

    In: CVPR

    Hong, S., Jung, J., Shin, H., Yang, J., Kim, S., Luo, C.: Unifying correspondence, pose and nerf for generalized pose-free novel view synthesis. In: CVPR. pp. 20196– 20206 (2024)

  15. [15]

    In: ICLR (2024)

    Hong, Y., Zhang, K., Gu, J., Bi, S., Zhou, Y., Liu, D., Liu, F., Sunkavalli, K., Bui, T., Tan, H.: LRM: Large reconstruction model for single image to 3d. In: ICLR (2024)

  16. [16]

    arXiv preprint arXiv:2404.06395 (2024)

    Hu, S., Tu, Y., Han, X., et al.: Minicpm: Unveiling the potential of small language models with scalable training strategies. arXiv preprint arXiv:2404.06395 (2024)

  17. [17]

    In: ICCV

    Jia,H.,Zhu,L.,Zhao,N.:H3R:Hybridmulti-viewcorrespondenceforgeneralizable 3d reconstruction. In: ICCV. pp. 7655–7665 (2025)

  18. [18]

    In: ICCV

    Jiang, H., Tan, H., Wang, P., et al.: Rayzer: A self-supervised large view synthesis model. In: ICCV. pp. 4918–4929 (2025)

  19. [19]

    arXiv preprint arXiv:2505.23716 (2025)

    Jiang, L., Mao, Y., Xu, L., Lu, T., Ren, K., Jin, Y., Xu, X., Yu, M., Pang, J., Zhao, F., et al.: AnySplat: Feed-forward 3d gaussian splatting from unconstrained views. arXiv preprint arXiv:2505.23716 (2025)

  20. [20]

    In: ICLR

    Jin, H., Jiang, H., Tan, H., Zhang, K., Bi, S., Zhang, T., Luan, F., Snavely, N., Xu, Z.: LVSM: A large view synthesis model with minimal 3d inductive bias. In: ICLR. vol. 2025, pp. 60001–60021 (2025)

  21. [21]

    In: CVPR

    Johari, M.M., Lepoittevin, Y., Fleuret, F.: Geonerf: Generalizing nerf with geom- etry priors. In: CVPR. pp. 18365–18375 (2022)

  22. [22]

    arXiv preprint arXiv:1905.12322 (2019)

    Kalamkar, D., Mudigere, D., Mellempudi, N., Das, D., Banerjee, K., Avancha, S., Vooturi, D.T., Jammalamadaka, N., Huang, J., Yuen, H., et al.: A study of bfloat16 for deep learning training. arXiv preprint arXiv:1905.12322 (2019)

  23. [23]

    In: CVPR

    Kang, G., Yoo, J., Park, J., Nam, S., Im, H., Shin, S., Kim, S., Park, E.: Selfsplat: Pose-free and 3d prior-free generalizable 3d gaussian splatting. In: CVPR. pp. 22012–22022 (2025)

  24. [24]

    ACM Trans

    Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

  25. [25]

    In: ECCV

    Leroy, V., Cabon, Y., Revaud, J.: Grounding image matching in 3d with mast3r. In: ECCV. pp. 71–91 (2024)

  26. [26]

    ICLR (2024)

    Li, J., Tan, H., Zhang, K., Xu, Z., Luan, F., Xu, Y., Hong, Y., Sunkavalli, K., Shakhnarovich, G., Bi, S.: Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. ICLR (2024)

  27. [27]

    arXiv preprint arXiv:2511.10647 (2025)

    Lin, H., Chen, S., Liew, J., Chen, D.Y., Li, Z., Shi, G., Feng, J., Kang, B.: Depth anything 3: Recovering the visual space from any views. arXiv preprint arXiv:2511.10647 (2025)

  28. [28]

    In: CVPR

    Ling, L., Sheng, Y., Tu, Z., Zhao, W., Xin, C., Wan, K., Yu, L., Guo, Q., Yu, Z., Lu, Y., et al.: Dl3dv-10k: A large-scale scene dataset for deep learning-based 3d vision. In: CVPR. pp. 22160–22169 (2024)

  29. [29]

    In: ICCV

    Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N., Kanazawa, A.: Infinite nature: Perpetual view generation of natural scenes from a single image. In: ICCV. pp. 14458–14467 (2021)

  30. [30]

    In: ECCV

    Liu, T., Wang, G., Hu, S., Shen, L., Ye, X., Zang, Y., Cao, Z., Li, W., Liu, Z.: MVSGaussian: Fast generalizable gaussian splatting reconstruction from multi- view stereo. In: ECCV. pp. 37–53 (2024) StructSplat 17

  31. [31]

    Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM65(1), 99–106 (2021)

  32. [32]

    In: ECCV

    Pan, L., Baráth, D., Pollefeys, M., Schönberger, J.L.: Global structure-from-motion revisited. In: ECCV. pp. 58–77 (2024)

  33. [33]

    Rajbhandari, S., Rasley, J., Ruwase, O., He, Y.: Zero: Memory optimizations to- ward training trillion parameter models. In: Int. Conf. High Perform. Comput. Netw. Storage Anal. pp. 1–16 (2020)

  34. [34]

    Rajbhandari, S., Ruwase, O., Rasley, J., Smith, S., He, Y.: Zero-infinity: Breaking the gpu memory wall for extreme scale deep learning. In: Int. Conf. High Perform. Comput. Netw. Storage Anal. pp. 1–14 (2021)

  35. [35]

    In: ICCV

    Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: ICCV. pp. 12179–12188 (2021)

  36. [36]

    In: USENIX Annu

    Ren, J., Rajbhandari, S., Aminabadi, R.Y., Ruwase, O., Yang, S., Zhang, M., Li, D., He, Y.: Zero-offload: Democratizing billion-scale model training. In: USENIX Annu. Tech. Conf. pp. 551–564 (2021)

  37. [37]

    In: CVPR

    Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: CVPR. pp. 4104–4113 (2016)

  38. [38]

    In: ICCV

    Sheng, Y., Deng, J., Zhang, X., Zhang, Y., Hua, B., Zhang, Y., Ji, J.: Spatialsplat: Efficient semantic 3d from sparse unposed images. In: ICCV. pp. 26404–26414 (2025)

  39. [39]

    arXiv preprint arXiv:2508.10104 (2025)

    Siméoni, O., Vo, H.V., Seitzer, M., Baldassarre, F., Oquab, M., et al.: Dinov3. arXiv preprint arXiv:2508.10104 (2025)

  40. [40]

    arXiv preprint arXiv:2408.13912 (2024)

    Smart, B., Zheng, C., Laina, I., Prisacariu, V.A.: Splatt3r: Zero-shot gaussian splatting from uncalibrated image pairs. arXiv preprint arXiv:2408.13912 (2024)

  41. [41]

    In: ECCV

    Suhail, M., Esteves, C., Sigal, L., Makadia, A.: Generalizable patch-based neural rendering. In: ECCV. pp. 156–174 (2022)

  42. [42]

    In: CVPR

    Sun, X., Jiang, H., Liu, L., et al.: Uni3r: Unified 3d reconstruction and semantic un- derstanding via generalizable gaussian splatting from unposed multi-view images. In: CVPR. pp. 33280–33290 (2026)

  43. [43]

    In: ECCV

    Tang, J., Chen, Z., Chen, X., Wang, T., Zeng, G., Liu, Z.: Lgm: Large multi-view gaussian model for high-resolution 3d content creation. In: ECCV. pp. 1–18 (2024)

  44. [44]

    In: ICCV

    Wang, G., Chen, Z., Loy, C.C., Liu, Z.: Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In: ICCV. pp. 9065–9076 (2023)

  45. [45]

    In: CVPR

    Wang, J., Chen, M., Karaev, N., Vedaldi, A., Rupprecht, C., Novotny, D.: VGGT: Visual geometry grounded transformer. In: CVPR. pp. 5294–5306 (2025)

  46. [46]

    In: CVPR

    Wang, J., Karaev, N., Rupprecht, C., Novotny, D.: VGGSfM: Visual geometry grounded deep structure from motion. In: CVPR. pp. 21686–21697 (2024)

  47. [47]

    ICLR (2024)

    Wang, P., Tan, H., Bi, S., Xu, Y., Luan, F., Sunkavalli, K., Wang, W., Xu, Z., Zhang, K.: Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. ICLR (2024)

  48. [48]

    IEEE TIP13(4), 600–612 (2004)

    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE TIP13(4), 600–612 (2004)

  49. [49]

    arXiv preprint arXiv:2102.07064 (2022)

    Wang, Z., Wu, S., Xie, W., Chen, M., Prisacariu, V.A.: NeRF–: Neural radiance fields without known camera parameters. arXiv preprint arXiv:2102.07064 (2022)

  50. [50]

    In: ICCV

    Xu, G., Yin, W., Chen, H., Shen, C., Cheng, K., Zhao, F.: FrozenRecon: Pose-free 3d scene reconstruction with frozen depth models. In: ICCV. pp. 9276–9286 (2023)

  51. [51]

    Zhao et al

    Xu, H., Peng, S., Wang, F., Blum, H., Barath, D., Geiger, A., Pollefeys, M.: Depth- Splat:Connectinggaussiansplattinganddepth.In:CVPR.pp.16453–16463(2025) 18 J.-C. Zhao et al

  52. [52]

    In: AAAI

    Yan, Q., Wang, Q., Zhao, K., Chen, J., Li, B., Chu, X., Deng, F.: CF-NeRF: Camera parameter free neural radiance fields with incremental learning. In: AAAI. vol. 38, pp. 6440–6448 (2024)

  53. [53]

    arXiv preprint arXiv:2410.24207 (2024)

    Ye, B., Liu, S., Xu, H., Li, X., Pollefeys, M., Yang, M.H., Peng, S.: No pose, no problem: Surprisingly simple 3d gaussian splats from sparse unposed images. arXiv preprint arXiv:2410.24207 (2024)

  54. [54]

    In: IROS

    Yen-Chen,L.,Florence,P.,Barron,J.T.,Rodriguez,A.,Isola,P.,Lin,T.Y.:iNeRF: Inverting neural radiance fields for pose estimation. In: IROS. pp. 1323–1330 (2021)

  55. [55]

    In: CVPR

    Yu, Z., Chen, A., Huang, B., Sattler, T., Geiger, A.: Mip-splatting: Alias-free 3d gaussian splatting. In: CVPR. pp. 19447–19456 (2024)

  56. [56]

    In: ECCV

    Zhang, K., Bi, S., Tan, H., Xiangli, Y., Zhao, N., Sunkavalli, K., Xu, Z.: GS-LRM: Large reconstruction model for 3d gaussian splatting. In: ECCV. pp. 1–19 (2024)

  57. [57]

    In: CVPR

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR. pp. 586–595 (2018)

  58. [58]

    In: CVPR

    Zhang, S., Wang, J., Xu, Y., Xue, N., Rupprecht, C., Zhou, X., Shen, Y., Wet- zstein, G.: Flare: Feed-forward geometry, appearance and camera estimation from uncalibrated sparse views. In: CVPR. pp. 21936–21947 (2025)

  59. [59]

    In: IROS

    Zhang, Z., Scaramuzza, D.: A tutorial on quantitative trajectory evaluation for visual (-inertial) odometry. In: IROS. pp. 7244–7251 (2018)

  60. [60]

    Zhou, T., Tucker, R., Flynn, J., Fyffe, G., Snavely, N.: Stereo magnification: Learn- ingviewsynthesisusingmultiplaneimages.arXivpreprintarXiv:1805.09817(2018)

  61. [61]

    In: ICCV

    Ziwen, C., Tan, H., Zhang, K., Bi, S., Luan, F., Hong, Y., Fuxin, L., Xu, Z.: Long- LRM: Long-sequence large reconstruction model for wide-coverage gaussian splats. In: ICCV. pp. 4349–4359 (2025) StructSplat 19 A Implementation and Architecture Details Gaussian Activations.As summarized in Table 7, the output activations of different attribute heads in th...