FastPano3D: Feed-Forward Indoor Panoramic 3D Reconstruction from a Single Image

Di Lu; Hanchi Ren; Jianqiang Li; Jingjing Deng; Liumei Zhang; Tianlong Feng; Wenjia Guo; Yongzhi Liao

arxiv: 2606.30352 · v1 · pith:FGNMZ7JVnew · submitted 2026-06-29 · 💻 cs.CV

FastPano3D: Feed-Forward Indoor Panoramic 3D Reconstruction from a Single Image

Jianqiang Li , Liumei Zhang , Wenjia Guo , Tianlong Feng , Yongzhi Liao , Di Lu , Hanchi Ren , Jingjing Deng This is my paper

Pith reviewed 2026-06-30 06:24 UTC · model grok-4.3

classification 💻 cs.CV

keywords panoramic image3D reconstructionGaussian representationfeed-forward networkindoor scenesingle imagefast inferencerenderable model

0 comments

The pith

A single panoramic image can produce a renderable 3D Gaussian scene model in seconds using only feed-forward processing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents FastPano3D as an end-to-end network that converts one indoor panoramic image into a set of 3D Gaussians ready for rendering. It compensates for the characteristic distortions and uneven sampling of equirectangular projections through a lightweight encoder, adaptive sampling of Gaussians, and refinement driven by an initial point cloud. This design removes the need for multi-view inputs or any per-scene optimization at test time. The result is reconstruction that runs in seconds while using roughly half the parameters of earlier approaches and delivering rendering quality on par with slower, optimization-based techniques.

Core claim

FastPano3D directly generates renderable 3D Gaussian representations from a single panoramic image by means of a lightweight feature encoder, adaptive Gaussian sampling, and a point-cloud-guided refinement strategy, achieving high-fidelity indoor scene reconstruction without test-time optimization.

What carries the argument

Lightweight feature encoder with adaptive Gaussian sampling and point-cloud-guided refinement that produces 3D Gaussians directly from one equirectangular image.

If this is right

Indoor 3D models become available from ordinary single-shot panoramic captures without extra views or computation at inference.
Deployment on resource-limited devices becomes practical because model size and run time are both reduced.
Real-time or near-real-time 3D scene generation from live panoramic video feeds becomes feasible.
Training data requirements for 3D reconstruction drop because only single panoramic images are needed at inference.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same compensation strategy might allow feed-forward reconstruction from other wide-field or distorted sensors such as fisheye cameras.
If the adaptive sampling proves robust, similar single-image pipelines could be tested on outdoor or dynamic scenes where multi-view capture is costly.
The absence of test-time optimization opens the possibility of embedding the model inside graphics pipelines that expect immediate 3D output.

Load-bearing premise

The distortions and spatially varying feature densities of panoramic images can be corrected well enough by the lightweight encoder and adaptive sampling so that no multi-view data or scene-specific optimization is required.

What would settle it

Run the model on a panoramic image whose equirectangular distortion is artificially increased beyond the training distribution and measure whether rendering quality collapses relative to a multi-view baseline on the same scene.

Figures

Figures reproduced from arXiv: 2606.30352 by Di Lu, Hanchi Ren, Jianqiang Li, Jingjing Deng, Liumei Zhang, Tianlong Feng, Wenjia Guo, Yongzhi Liao.

**Figure 1.** Figure 1: In this paper, we present FastPano3D, an ultra-fast end-to-end generative model for Gaussian Splatting that can reconstruct high-fidelity scenes from a single panoramic image in just a few seconds (achieving up to 156× speed-up). In the figure, we showcase the qualitative performance using various indoor scenes, such as (a) “Bedroom”, (b) “Dining Room” and (c) “Study Room”. 2. Related Work 2.1. 3D Reconstr… view at source ↗

**Figure 2.** Figure 2: Overview of FastPano3D. Given a single panoramic image, FastPano3D first employs EGformer to predict a dense depth map, which is then lifted into a point cloud to extract geometric keypoints as guidance. A lightweight Feature Encoder extracts multi-scale features from the panorama, which are decoded by the Gaussian Generator into per-Gaussian attributes. The Scale & Texture Analysis module estimates the t… view at source ↗

**Figure 3.** Figure 3: Architecture of CamPosNet. Given a panoramic image, a fine-tuned ResNet50 extracts backbone [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Candidate Keypoints. Keypoints are selected based on texture and geometric edges to provide [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cubemap rendering. The scene Gaussians are rasterized onto six cube faces via fixed perspective [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative panoramic comparison with other methods. For each method, we show the panoramic [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative perspective comparison with other methods. Consistent with the panoramic results, [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Novel-view rendering results across additional indoor scenes. The top row presents ground-truth [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of Gaussian sampling strategies. Without the Gauss-Generator, per-pixel sampling [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Recent advances in 3D scene reconstruction have highlighted the intricate trade-offs among rendering quality, inference efficiency, and data dependency. To address the challenge of rapidly reconstructing detailed 3D indoor scenes from minimal input, we introduce FastPano3D, an end-to-end framework that directly generates renderable 3D Gaussian representations from a single panoramic image. Unlike perspective-based methods, panoramic images inherently suffer from equirectangular projection distortions and spatially non-uniform feature distributions, making direct feed-forward Gaussian generation particularly challenging. In contrast to existing Gaussian Splatting based methods that rely on multi-view supervision or per-scene optimization, FastPano3D employs a lightweight feature encoder, adaptive Gaussian sampling, and a point-cloud-guided refinement strategy to achieve efficient and accurate scene generation without any test-time optimization. Our approach reconstructs high-fidelity 3D scenes within seconds, achieving up to 156 times faster inference than prior state-of-the-art methods such as Pano2Room, while using only half the parameters. Extensive experiments demonstrate that FastPano3D delivers rendering quality comparable to NeRF- and 3DGS-based reconstructions, establishing a new benchmark for rapid, single-view 3D scene inference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FastPano3D claims a fast single-panorama feed-forward Gaussian pipeline but the abstract gives no numbers or experiments to check the 156x speedup or quality claims.

read the letter

The core idea is a lightweight encoder plus adaptive Gaussian sampling plus point-cloud refinement that turns one equirectangular indoor image straight into renderable 3D Gaussians with no test-time optimization. That combination is presented as the new piece relative to multi-view or per-scene 3DGS baselines.

The paper does a clear job naming the practical pain point: equirectangular distortion and uneven feature density make direct feed-forward generation harder than perspective views, and most existing Gaussian methods still lean on optimization or multiple inputs. Targeting seconds-scale inference for robotics or AR use cases is a reasonable engineering goal.

The obvious soft spot is that the abstract states large speed and parameter wins plus comparable rendering quality but supplies zero quantitative results, error bars, dataset names, or ablation tables. Without those, the central claim that the proposed components actually solve the distortion problem cannot be checked. The reader's soundness score of 3.0 matches what is visible.

This is aimed at practitioners who need quick single-view indoor reconstructions rather than researchers chasing new theoretical bounds. If the full paper contains solid experiments and comparisons, it could be worth a look for applied work; right now the evidence is too thin to cite or build on. I would send it out for review so the experiments can be examined, but it is not yet a finished contribution on the strength of the abstract alone.

Referee Report

0 major / 2 minor

Summary. The paper introduces FastPano3D, an end-to-end feed-forward framework that generates renderable 3D Gaussian representations directly from a single equirectangular panoramic image for indoor scenes. It employs a lightweight feature encoder, adaptive Gaussian sampling, and point-cloud-guided refinement to handle projection distortions and non-uniform features without multi-view supervision or test-time optimization, claiming up to 156x faster inference than Pano2Room (with half the parameters) and rendering quality comparable to NeRF- and 3DGS-based methods.

Significance. If the quantitative claims hold, the work would be significant for enabling practical, rapid single-view panoramic 3D reconstruction, substantially improving inference speed over optimization-heavy baselines while maintaining competitive fidelity. This could support real-time applications in AR/VR and robotics; the engineering focus on panoramic-specific challenges via adaptive sampling represents a useful incremental advance.

minor comments (2)

[Abstract] Abstract: the claim of 'extensive experiments' and specific performance numbers (156x speedup, half the parameters, comparable quality) is stated without any supporting metrics, dataset names, error bars, or ablation results in the provided text; this weakens immediate verifiability of the central performance claims even though the method description itself is internally consistent.
[Method (inferred from abstract)] The description of the adaptive Gaussian sampling strategy would benefit from an explicit equation or pseudocode showing how sampling density is adjusted for equirectangular distortion; without it, the compensation mechanism remains somewhat opaque.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation of minor revision. The provided referee report contains no specific major comments to address.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The manuscript describes an end-to-end neural architecture (lightweight encoder + adaptive sampling + refinement) whose performance claims rest on empirical benchmarks rather than any closed-form derivation or self-referential definition. No equations appear that equate a claimed output to a fitted input by construction, and no load-bearing self-citations or uniqueness theorems are invoked. The central engineering claim therefore remains independent of its own fitted parameters.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly relies on standard assumptions of 3D Gaussian Splatting and neural feature extraction.

axioms (1)

domain assumption A single panoramic image contains sufficient geometric information for high-fidelity 3D reconstruction when processed by the described encoder and sampler.
This premise underpins the entire feed-forward claim and is stated as the motivation for handling equirectangular distortions.

pith-pipeline@v0.9.1-grok · 5773 in / 1159 out tokens · 31590 ms · 2026-06-30T06:24:55.940164+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 17 canonical work pages · 2 internal anchors

[1]

Mildenhall, P

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing scenes as neural radiance fields for view synthesis, in: Eur. Conf. Comput. Vis., 2020, pp. 405–421

2020
[2]

Kerbl, G

B. Kerbl, G. Kopanas, T. Leimkühler, G. Drettakis, 3D Gaussian splatting for real-time radiance field rendering, ACM Trans. Graph. 42 (4) (2023) 1–14. 17

2023
[3]

Malarz, J

D. Malarz, J. Tabor, S. Tadeja, P. Spurek, Gaussian splatting with NeRF-based color and opacity (2024).arXiv:2312.13729

work page arXiv 2024
[4]

P. Guo, Y . Zhao, J. Hu, Pano2room: Novel view synthesis from a single indoor panorama, in: SIGGRAPH Asia 2024 Conference Papers, 2024, pp. 1–12

2024
[5]

Szymanowicz, C

S. Szymanowicz, C. Rupprecht, A. Vedaldi, Splatter image: Ultra-fast single- view 3D reconstruction, in: IEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 10208–10217

2024
[6]

Charatan, S

D. Charatan, S. L. Li, A. Tagliasacchi, V . Sitzmann, pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction, in: IEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 19457–19467

2024
[7]

Tatarchenko, A

M. Tatarchenko, A. Dosovitskiy, T. Brox, Multi-view 3D models from single images with a convolutional network, in: Eur. Conf. Comput. Vis., 2016, pp. 322–337

2016
[8]

H. Xie, H. Yao, X. Sun, S. Zhou, S. Zhang, Pix2vox: Context-aware 3D recon- struction from single and multi-view images, in: Int. Conf. Comput. Vis., 2019, pp. 2690–2698

2019
[9]

Wiles, G

O. Wiles, G. Gkioxari, R. Szeliski, J. Johnson, SynSin: End-to-end view synthesis from a single image, in: IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 7467–7477

2020
[10]

Ranftl, K

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, V . Koltun, Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset trans- fer, IEEE Trans. Pattern Anal. Mach. Intell. 44 (3) (2022) 1623–1637

2022
[11]

Ranftl, A

R. Ranftl, A. Bochkovskiy, V . Koltun, Vision transformers for dense prediction, in: Int. Conf. Comput. Vis., 2021, pp. 12179–12188

2021
[12]

Godard, O

C. Godard, O. Mac Aodha, G. J. Brostow, Unsupervised monocular depth esti- mation with left-right consistency, in: IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 270–279

2017
[13]

Godard, O

C. Godard, O. Mac Aodha, M. Firman, G. J. Brostow, Digging into self- supervised monocular depth estimation, in: Int. Conf. Comput. Vis., 2019, pp. 3828–3838

2019
[14]

Z. Chen, C. Wang, Y .-C. Guo, S.-H. Zhang, StructNeRF: Neural radiance fields for indoor scenes with structural hints, IEEE Trans. Pattern Anal. Mach. Intell. 45 (12) (2023) 15694–15705.doi:10.1109/TPAMI.2023.3305295

work page doi:10.1109/tpami.2023.3305295 2023
[15]

C. Zhao, X. Huang, K. Yang, X. Wang, Q. Wang, Generalizable 3D Gaussian splatting for novel view synthesis, Pattern Recognition 161 (2025) 111271.doi: 10.1016/j.patcog.2024.111271

work page doi:10.1016/j.patcog.2024.111271 2025
[16]

J. Xu, B. Stenger, T. Kerola, T. Tung, Pano2CAD: Room layout from a single panorama image (2016).arXiv:1609.09270. 18

work page internal anchor Pith review Pith/arXiv arXiv 2016
[17]

C. Zou, A. Colburn, Q. Shan, D. Hoiem, LayoutNet: Reconstructing the 3D room layout from a single RGB image, in: IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2051–2059

2018
[18]

Sun, C.-W

C. Sun, C.-W. Hsiao, M. Sun, H.-T. Chen, HorizonNet: Learning room layout with 1D representation and pano stretch data augmentation, in: IEEE Conf. Com- put. Vis. Pattern Recog., 2019, pp. 1047–1056

2019
[19]

Zhang, S

Y . Zhang, S. Song, P. Tan, J. Xiao, PanoContext: A whole-room 3D context model for panoramic scene understanding, in: Eur. Conf. Comput. Vis., 2014, pp. 668–686

2014
[20]

Wang, Y .-H

F.-E. Wang, Y .-H. Yeh, M. Sun, W.-C. Chiu, Y .-H. Tsai, BiFuse: Monocular 360 depth estimation via bi-projection fusion, in: IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 459–468

2020
[21]

Zioulis, A

N. Zioulis, A. Karakottas, D. Zarpalas, P. Daras, OmniDepth: Dense depth esti- mation for indoors spherical panoramas, in: Eur. Conf. Comput. Vis., 2018, pp. 448–465

2018
[22]

G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, Z. Liu, PERF: Panoramic neural radiance field from a single panorama, IEEE Trans. Pattern Anal. Mach. Intell. 46 (10) (2024) 6905–6918.doi:10.1109/TPAMI.2024.3387307

work page doi:10.1109/tpami.2024.3387307 2024
[23]

Z. Lu, Q. Zheng, B. Shi, X. Jiang, Pano-NeRF: Synthesizing high dynamic range novel views with geometry from sparse low dynamic range panoramic images (2024).arXiv:2312.15942

work page arXiv 2024
[24]

X. Sun, A. Dai, Y .-C. Guo, PanoGRF: Generalizable spherical radiance fields for wide-baseline panoramas (2023).arXiv:2306.01531

work page arXiv 2023
[25]

S. Lee, J. Chung, J. Huh, K. M. Lee, ODGS: 3D scene reconstruction from om- nidirectional images with 3D Gaussian splattings (2024).arXiv:2410.20686

work page arXiv 2024
[26]

L. Li, H. Huang, S.-K. Yeung, H. Cheng, OmniGS: Fast radiance field reconstruc- tion using omnidirectional Gaussian splatting (2024).arXiv:2404.03202

work page arXiv 2024
[27]

Zhang, H

C. Zhang, H. Xu, Q. Li, et al., PanSplat: 4K panorama synthesis with feed- forward Gaussian splatting (2024).arXiv:2412.12096

work page arXiv 2024
[28]

Y . Ma, D. Zhan, Z. Jin, FastScene: Text-driven fast 3D indoor scene generation via panoramic Gaussian splatting, in: Proc. Thirty-Third Int. Joint Conf. Artificial Intelligence (IJCAI-24), 2024, pp. 1173–1181.doi:10.24963/ijcai.2024/ 130

work page doi:10.24963/ijcai.2024/ 2024
[29]

W. Li, F. Cai, Y . Mi, et al., SceneDreamer360: Text-driven 3D-consistent scene generation with panoramic Gaussian splatting (2024).arXiv:2408.13711

work page arXiv 2024
[30]

Huang, J

Z. Huang, J. He, J. Ye, et al., Scene4U: Hierarchical layered 3D scene reconstruc- tion from single panoramic image (2025).arXiv:2504.00387. 19

work page arXiv 2025
[31]

I. Yun, C. Shin, H. Lee, H.-J. Lee, C. E. Rhee, EGformer: Equirectangular geometry-biased transformer for 360 depth estimation, in: Int. Conf. Comput. Vis., 2023, pp. 3738–3748

2023
[32]

Zheng, J

J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, Z. Zhou, Structured3D: A large photo- realistic dataset for structured 3D modeling, in: Eur. Conf. Comput. Vis., 2020, pp. 519–535

2020
[33]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770–778

2016
[34]

J. L. Schönberger, J.-M. Frahm, Structure-from-motion revisited, in: IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 4104–4113

2016
[35]

Y . Wan, M. Shao, Y . Cheng, W. Zuo, S2Gaussian: Sparse-view super-resolution 3D Gaussian splatting (2025).arXiv:2503.04314

work page arXiv 2025
[36]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, et al., The Replica dataset: A digital replica of indoor spaces (2019).arXiv:1906.05797

work page internal anchor Pith review Pith/arXiv arXiv 2019
[37]

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes,

J. Chung, S. Lee, H. Nam, J. Lee, K. M. Lee, LucidDreamer: Domain-free gen- eration of 3D Gaussian splatting scenes (2023).arXiv:2311.13384

work page arXiv 2023
[38]

J. Bai, L. Huang, J. Guo, W. Gong, Y . Li, Y . Guo, 360-GS: Layout-guided panoramic Gaussian splatting for indoor roaming (2024).arXiv:2402.00763. 20

work page arXiv 2024

[1] [1]

Mildenhall, P

B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, R. Ng, NeRF: Representing scenes as neural radiance fields for view synthesis, in: Eur. Conf. Comput. Vis., 2020, pp. 405–421

2020

[2] [2]

Kerbl, G

B. Kerbl, G. Kopanas, T. Leimkühler, G. Drettakis, 3D Gaussian splatting for real-time radiance field rendering, ACM Trans. Graph. 42 (4) (2023) 1–14. 17

2023

[3] [3]

Malarz, J

D. Malarz, J. Tabor, S. Tadeja, P. Spurek, Gaussian splatting with NeRF-based color and opacity (2024).arXiv:2312.13729

work page arXiv 2024

[4] [4]

P. Guo, Y . Zhao, J. Hu, Pano2room: Novel view synthesis from a single indoor panorama, in: SIGGRAPH Asia 2024 Conference Papers, 2024, pp. 1–12

2024

[5] [5]

Szymanowicz, C

S. Szymanowicz, C. Rupprecht, A. Vedaldi, Splatter image: Ultra-fast single- view 3D reconstruction, in: IEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 10208–10217

2024

[6] [6]

Charatan, S

D. Charatan, S. L. Li, A. Tagliasacchi, V . Sitzmann, pixelSplat: 3D Gaussian splats from image pairs for scalable generalizable 3D reconstruction, in: IEEE Conf. Comput. Vis. Pattern Recog., 2024, pp. 19457–19467

2024

[7] [7]

Tatarchenko, A

M. Tatarchenko, A. Dosovitskiy, T. Brox, Multi-view 3D models from single images with a convolutional network, in: Eur. Conf. Comput. Vis., 2016, pp. 322–337

2016

[8] [8]

H. Xie, H. Yao, X. Sun, S. Zhou, S. Zhang, Pix2vox: Context-aware 3D recon- struction from single and multi-view images, in: Int. Conf. Comput. Vis., 2019, pp. 2690–2698

2019

[9] [9]

Wiles, G

O. Wiles, G. Gkioxari, R. Szeliski, J. Johnson, SynSin: End-to-end view synthesis from a single image, in: IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 7467–7477

2020

[10] [10]

Ranftl, K

R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, V . Koltun, Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset trans- fer, IEEE Trans. Pattern Anal. Mach. Intell. 44 (3) (2022) 1623–1637

2022

[11] [11]

Ranftl, A

R. Ranftl, A. Bochkovskiy, V . Koltun, Vision transformers for dense prediction, in: Int. Conf. Comput. Vis., 2021, pp. 12179–12188

2021

[12] [12]

Godard, O

C. Godard, O. Mac Aodha, G. J. Brostow, Unsupervised monocular depth esti- mation with left-right consistency, in: IEEE Conf. Comput. Vis. Pattern Recog., 2017, pp. 270–279

2017

[13] [13]

Godard, O

C. Godard, O. Mac Aodha, M. Firman, G. J. Brostow, Digging into self- supervised monocular depth estimation, in: Int. Conf. Comput. Vis., 2019, pp. 3828–3838

2019

[14] [14]

Z. Chen, C. Wang, Y .-C. Guo, S.-H. Zhang, StructNeRF: Neural radiance fields for indoor scenes with structural hints, IEEE Trans. Pattern Anal. Mach. Intell. 45 (12) (2023) 15694–15705.doi:10.1109/TPAMI.2023.3305295

work page doi:10.1109/tpami.2023.3305295 2023

[15] [15]

C. Zhao, X. Huang, K. Yang, X. Wang, Q. Wang, Generalizable 3D Gaussian splatting for novel view synthesis, Pattern Recognition 161 (2025) 111271.doi: 10.1016/j.patcog.2024.111271

work page doi:10.1016/j.patcog.2024.111271 2025

[16] [16]

J. Xu, B. Stenger, T. Kerola, T. Tung, Pano2CAD: Room layout from a single panorama image (2016).arXiv:1609.09270. 18

work page internal anchor Pith review Pith/arXiv arXiv 2016

[17] [17]

C. Zou, A. Colburn, Q. Shan, D. Hoiem, LayoutNet: Reconstructing the 3D room layout from a single RGB image, in: IEEE Conf. Comput. Vis. Pattern Recog., 2018, pp. 2051–2059

2018

[18] [18]

Sun, C.-W

C. Sun, C.-W. Hsiao, M. Sun, H.-T. Chen, HorizonNet: Learning room layout with 1D representation and pano stretch data augmentation, in: IEEE Conf. Com- put. Vis. Pattern Recog., 2019, pp. 1047–1056

2019

[19] [19]

Zhang, S

Y . Zhang, S. Song, P. Tan, J. Xiao, PanoContext: A whole-room 3D context model for panoramic scene understanding, in: Eur. Conf. Comput. Vis., 2014, pp. 668–686

2014

[20] [20]

Wang, Y .-H

F.-E. Wang, Y .-H. Yeh, M. Sun, W.-C. Chiu, Y .-H. Tsai, BiFuse: Monocular 360 depth estimation via bi-projection fusion, in: IEEE Conf. Comput. Vis. Pattern Recog., 2020, pp. 459–468

2020

[21] [21]

Zioulis, A

N. Zioulis, A. Karakottas, D. Zarpalas, P. Daras, OmniDepth: Dense depth esti- mation for indoors spherical panoramas, in: Eur. Conf. Comput. Vis., 2018, pp. 448–465

2018

[22] [22]

G. Wang, P. Wang, Z. Chen, W. Wang, C. C. Loy, Z. Liu, PERF: Panoramic neural radiance field from a single panorama, IEEE Trans. Pattern Anal. Mach. Intell. 46 (10) (2024) 6905–6918.doi:10.1109/TPAMI.2024.3387307

work page doi:10.1109/tpami.2024.3387307 2024

[23] [23]

Z. Lu, Q. Zheng, B. Shi, X. Jiang, Pano-NeRF: Synthesizing high dynamic range novel views with geometry from sparse low dynamic range panoramic images (2024).arXiv:2312.15942

work page arXiv 2024

[24] [24]

X. Sun, A. Dai, Y .-C. Guo, PanoGRF: Generalizable spherical radiance fields for wide-baseline panoramas (2023).arXiv:2306.01531

work page arXiv 2023

[25] [25]

S. Lee, J. Chung, J. Huh, K. M. Lee, ODGS: 3D scene reconstruction from om- nidirectional images with 3D Gaussian splattings (2024).arXiv:2410.20686

work page arXiv 2024

[26] [26]

L. Li, H. Huang, S.-K. Yeung, H. Cheng, OmniGS: Fast radiance field reconstruc- tion using omnidirectional Gaussian splatting (2024).arXiv:2404.03202

work page arXiv 2024

[27] [27]

Zhang, H

C. Zhang, H. Xu, Q. Li, et al., PanSplat: 4K panorama synthesis with feed- forward Gaussian splatting (2024).arXiv:2412.12096

work page arXiv 2024

[28] [28]

Y . Ma, D. Zhan, Z. Jin, FastScene: Text-driven fast 3D indoor scene generation via panoramic Gaussian splatting, in: Proc. Thirty-Third Int. Joint Conf. Artificial Intelligence (IJCAI-24), 2024, pp. 1173–1181.doi:10.24963/ijcai.2024/ 130

work page doi:10.24963/ijcai.2024/ 2024

[29] [29]

W. Li, F. Cai, Y . Mi, et al., SceneDreamer360: Text-driven 3D-consistent scene generation with panoramic Gaussian splatting (2024).arXiv:2408.13711

work page arXiv 2024

[30] [30]

Huang, J

Z. Huang, J. He, J. Ye, et al., Scene4U: Hierarchical layered 3D scene reconstruc- tion from single panoramic image (2025).arXiv:2504.00387. 19

work page arXiv 2025

[31] [31]

I. Yun, C. Shin, H. Lee, H.-J. Lee, C. E. Rhee, EGformer: Equirectangular geometry-biased transformer for 360 depth estimation, in: Int. Conf. Comput. Vis., 2023, pp. 3738–3748

2023

[32] [32]

Zheng, J

J. Zheng, J. Zhang, J. Li, R. Tang, S. Gao, Z. Zhou, Structured3D: A large photo- realistic dataset for structured 3D modeling, in: Eur. Conf. Comput. Vis., 2020, pp. 519–535

2020

[33] [33]

K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 770–778

2016

[34] [34]

J. L. Schönberger, J.-M. Frahm, Structure-from-motion revisited, in: IEEE Conf. Comput. Vis. Pattern Recog., 2016, pp. 4104–4113

2016

[35] [35]

Y . Wan, M. Shao, Y . Cheng, W. Zuo, S2Gaussian: Sparse-view super-resolution 3D Gaussian splatting (2025).arXiv:2503.04314

work page arXiv 2025

[36] [36]

The Replica Dataset: A Digital Replica of Indoor Spaces

J. Straub, T. Whelan, L. Ma, et al., The Replica dataset: A digital replica of indoor spaces (2019).arXiv:1906.05797

work page internal anchor Pith review Pith/arXiv arXiv 2019

[37] [37]

Luciddreamer: Domain-free generation of 3d gaussian splatting scenes,

J. Chung, S. Lee, H. Nam, J. Lee, K. M. Lee, LucidDreamer: Domain-free gen- eration of 3D Gaussian splatting scenes (2023).arXiv:2311.13384

work page arXiv 2023

[38] [38]

J. Bai, L. Huang, J. Guo, W. Gong, Y . Li, Y . Guo, 360-GS: Layout-guided panoramic Gaussian splatting for indoor roaming (2024).arXiv:2402.00763. 20

work page arXiv 2024