PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis
Pith reviewed 2026-05-20 11:31 UTC · model grok-4.3
The pith
PanoWorld generates consistent whole-house panoramas by decoupling shell-based geometry from visual memory cache.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PanoWorld treats whole-house synthesis as autoregressive generation of node-based 360-degree panoramas. It uses a floorplan-derived 3D shell as a global geometric proxy and a dynamic 3D Gaussian Splatting cache as renderable spatial memory. A feed-forward panoramic LRM lifts panoramas into local 3DGS updates, Room-aware Group Attention suppresses cross-room interference, and topology-aware progressive caching fuses updates without full reconstruction. By decoupling shell-based geometry guidance from cache-rendered visual memory, the model preserves high-frequency 2D synthesis quality while improving cross-node layout and material consistency.
What carries the argument
Decoupling of a floorplan-derived 3D shell for geometry guidance from a dynamic 3D Gaussian Splatting cache for visual memory, combined with room-aware attention and progressive caching.
Load-bearing premise
The floorplan-derived 3D shell provides a sufficient global geometric proxy to maintain spatial coherence across multiple rooms without requiring full metric 3D reconstruction or additional depth sensors.
What would settle it
Generating panoramas for a complex multi-room floorplan and measuring if adjacent room views show matching layouts, door positions, and material properties when the viewpoint shifts.
Figures
read the original abstract
Generating a consistent whole-house VR tour from a floorplan and style reference requires both photorealistic panoramas and cross-view spatial coherence. Pure 2D generators produce appealing single panoramas but re-imagine geometry and materials when the viewpoint changes, whereas monolithic 3D generation becomes expensive and loses fine texture at multi-room scale. We introduce PanoWorld, a generative spatial world model that treats whole-house synthesis as autoregressive generation of node-based 360-degree panoramas, matching the discrete navigation used by real VR tour products. PanoWorld uses a floorplan-derived 3D shell as a global geometric proxy and a dynamic 3D Gaussian Splatting cache as renderable spatial memory. A feed-forward panoramic LRM designed for metric-scale multi-room 360-degree inputs lifts generated panoramas into local 3DGS updates, while Room-aware Group Attention suppresses cross-room feature interference. A topology-aware progressive caching strategy fuses these local updates without repeatedly reconstructing the full history. By decoupling shell-based geometry guidance from cache-rendered visual memory, PanoWorld preserves high-frequency 2D synthesis quality while improving cross-node layout and material consistency. The project link is https://jjrcn.github.io/PanoWorld-project-home/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PanoWorld, a generative spatial world model for synthesizing consistent whole-house 360-degree panoramas from a floorplan and style reference. It frames the task as autoregressive node-based panorama generation, using a floorplan-derived 3D shell as global geometric proxy, a dynamic 3D Gaussian Splatting cache as renderable spatial memory, a feed-forward panoramic LRM for local metric-scale updates, Room-aware Group Attention to suppress cross-room interference, and a topology-aware progressive caching strategy to fuse updates without full history reconstruction. The central contribution is decoupling shell-based geometry guidance from cache-rendered visual memory to preserve high-frequency 2D synthesis quality while improving cross-node layout and material consistency.
Significance. If the central claims hold under quantitative validation, PanoWorld offers a promising scalable alternative to monolithic 3D generation or pure 2D synthesis for VR tour content, balancing photorealism with spatial coherence at house scale. The decoupling strategy and use of 3DGS cache with LRM updates represent a thoughtful architectural choice that could influence future work on multi-view generative consistency; the introduction of Room-aware Group Attention and topology-aware caching are specific technical contributions worth further exploration if supported by evidence.
major comments (3)
- Abstract and Experiments: The manuscript describes the architecture and intended benefits but supplies no quantitative results, ablation studies, error metrics, or baseline comparisons. This absence is load-bearing for the central claim of improved cross-node consistency, as the benefits of the proposed decoupling and attention mechanisms remain unverified.
- Methods (floorplan-derived 3D shell): The assumption that a standard floorplan-derived 3D shell (typically 2D polygons extruded to uniform height) provides a sufficient global geometric proxy for cross-room coherence is not accompanied by validation or discussion of limitations. This is critical because mismatches in non-Manhattan layouts, multi-height rooms, sloped roofs, or varying wall/door heights could propagate through the autoregressive generation and topology-aware cache, undermining the consistency gains.
- Methods (Room-aware Group Attention and LRM updates): The claim that local LRM updates combined with Room-aware Group Attention maintain layout and material coherence without full metric reconstruction lacks concrete analysis of how these components interact with the proxy shell under partial mismatches, which is necessary to establish the decoupling's effectiveness.
minor comments (2)
- Abstract: The project link is given, but the text would benefit from a brief statement on the scope of the evaluation (e.g., number of houses or room types tested) to set reader expectations.
- Notation and terminology: The novel terms 'Room-aware Group Attention' and 'topology-aware progressive caching strategy' are introduced without immediate contrast to standard multi-head attention or FIFO caching; a short clarifying paragraph or diagram in the methods would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that quantitative validation and expanded methodological analysis are important to substantiate the claims. We address each major comment below and will incorporate the suggested revisions in the next version of the manuscript.
read point-by-point responses
-
Referee: Abstract and Experiments: The manuscript describes the architecture and intended benefits but supplies no quantitative results, ablation studies, error metrics, or baseline comparisons. This absence is load-bearing for the central claim of improved cross-node consistency, as the benefits of the proposed decoupling and attention mechanisms remain unverified.
Authors: We acknowledge that the current manuscript does not include quantitative results, ablations, or baseline comparisons, which limits the strength of the consistency claims. The initial submission prioritized describing the novel architecture and its components. In the revision we will add a dedicated Experiments section containing quantitative metrics for cross-node layout and material consistency, ablation studies on the Room-aware Group Attention and dynamic 3DGS cache, and comparisons against 2D autoregressive panorama generators as well as monolithic 3D scene synthesis baselines. These additions will directly support the central claims. revision: yes
-
Referee: Methods (floorplan-derived 3D shell): The assumption that a standard floorplan-derived 3D shell (typically 2D polygons extruded to uniform height) provides a sufficient global geometric proxy for cross-room coherence is not accompanied by validation or discussion of limitations. This is critical because mismatches in non-Manhattan layouts, multi-height rooms, sloped roofs, or varying wall/door heights could propagate through the autoregressive generation and topology-aware cache, undermining the consistency gains.
Authors: We agree that the limitations of the floorplan-derived 3D shell require explicit discussion. The shell is used as a lightweight global proxy rather than an exact reconstruction. In the revised Methods section we will add a subsection addressing its assumptions and potential shortcomings for non-Manhattan layouts, multi-height rooms, and sloped structures. We will also explain how the local LRM updates and progressive 3DGS caching limit error propagation by focusing on visual and local geometric corrections. Additional qualitative examples on diverse floorplan types will be included. revision: yes
-
Referee: Methods (Room-aware Group Attention and LRM updates): The claim that local LRM updates combined with Room-aware Group Attention maintain layout and material coherence without full metric reconstruction lacks concrete analysis of how these components interact with the proxy shell under partial mismatches, which is necessary to establish the decoupling's effectiveness.
Authors: We recognize the need for more detailed analysis of component interactions. In the revision we will expand the Methods section with additional diagrams and explanations of the information flow between the Room-aware Group Attention, LRM updates, and the 3D shell proxy. This will include discussion of how attention suppresses cross-room interference and how topology-aware caching integrates local updates under partial geometric mismatches. The expanded analysis will better demonstrate the robustness of the decoupling strategy. revision: yes
Circularity Check
No circularity: method integrates external components without self-referential reduction
full rationale
The paper describes an architectural pipeline that combines a floorplan-derived 3D shell (external geometric proxy), dynamic 3D Gaussian Splatting cache, feed-forward panoramic LRM, Room-aware Group Attention, and topology-aware caching. These are presented as standard integrations of existing techniques (3DGS, LRM) applied to autoregressive node generation, with no equations or steps where a claimed prediction or consistency metric is fitted to itself or defined circularly in terms of the target output. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming reduces the central claim to prior author work by construction. The derivation remains self-contained, relying on design choices whose validity can be assessed against external benchmarks rather than internal parameter fitting.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Floorplan-derived 3D shell serves as adequate global geometric proxy for multi-room coherence
invented entities (2)
-
Room-aware Group Attention
no independent evidence
-
topology-aware progressive caching strategy
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
PanoWorld uses a floorplan-derived 3D shell as a global geometric proxy and a dynamic 3D Gaussian Splatting cache as renderable spatial memory... Room-aware Group Attention suppresses cross-room feature interference.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery theorem unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
topology-aware progressive 3DGS caching... autoregressive generation of node-based 360-degree panoramas
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
pixelsplat: 3d gaus- sian splats from image pairs for scalable generalizable 3d reconstruction
David Charatan, Sizhe Li, Andrea Sun, Jonathon Luiten, Gordon Wetzstein, and Leonidas Smith. pixelsplat: 3d gaus- sian splats from image pairs for scalable generalizable 3d reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 25828–25838, 2024. 3
work page 2024
-
[2]
Dreamhome-pano: Design-aware and conflict-free panoramic interior generation, 2026
Lulu Chen, Yijiang Hu, Yuanqing Liu, Yulong Li, and Yue Yang. Dreamhome-pano: Design-aware and conflict-free panoramic interior generation, 2026. 6, 7
work page 2026
-
[3]
Zhaoxi Chen, Guangcong Wang, and Ziwei Liu. Text2light: Zero-shot text-driven hdr panorama generation.ACM Trans- actions on Graphics (TOG), 41(6):1–16, 2022. 2
work page 2022
-
[4]
Graph-to-3d: End-to-end generation and ma- nipulation of 3d scenes using scene graphs
Helisa Dhamo, Fabian Bobrovsky, Nassir Navab, and Fed- erico Tombari. Graph-to-3d: End-to-end generation and ma- nipulation of 3d scenes using scene graphs. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion (ICCV), pages 16352–16361, 2021. 3
work page 2021
-
[5]
Diffusion360: Seamless 360 degree panoramic image generation based on diffusion models, 2023
Mengyang Feng, Jinlin Liu, Miaomiao Cui, and Xuansong Xie. Diffusion360: Seamless 360 degree panoramic image generation based on diffusion models, 2023. 2
work page 2023
-
[6]
Scenescape: Text-driven consistent scene gener- ation
Rafail Fridman, Amit Carmeli, Tali Dekel, and Tomer Michaeli. Scenescape: Text-driven consistent scene gener- ation. InSIGGRAPH Asia 2023 Conference Papers, pages 1–10, 2023. 2
work page 2023
-
[7]
3d-front: 3d furnished rooms with layouts and semantics, 2021
Huan Fu, Bowen Cai, Lin Gao, Lingxiao Zhang, Jiaming Wang Cao Li, Zengqi Xun, Chengyue Sun, Rongfei Jia, Bin- qiang Zhao, and Hao Zhang. 3d-front: 3d furnished rooms with layouts and semantics, 2021. 6, 3
work page 2021
- [8]
-
[9]
Gs-lrm: Large reconstruc- tion model for 3d gaussian splatting
Zhenxing He, Zhisheng Wang, Yuhui Kuang, Min Zhao, Menglei Wang, Hao Chen, Fujun Luan, Thomas M ¨uller, Ji- aqi Wang, Chunhua Shen, et al. Gs-lrm: Large reconstruc- tion model for 3d gaussian splatting. InEuropean Confer- ence on Computer Vision (ECCV). Springer, 2024. 3
work page 2024
-
[10]
CLIPScore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. CLIPScore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Pro- cessing, pages 7514–7528, Online and Punta Cana, Domini- can Republic, 2021. Association for Computational Linguis- tics. 7
work page 2021
-
[11]
Text2room: Extracting textured 3d meshes from 2d text-to-image models
Lukas H ¨ollein, Ang Cao, Andrew Owens, Justin Johnson, and Matthias Nießner. Text2room: Extracting textured 3d meshes from 2d text-to-image models. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7909–7920, 2023. 2
work page 2023
-
[12]
Lrm: Large reconstruction model for single image to 3d
Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, and Hao Tan. Lrm: Large reconstruction model for single image to 3d. InICLR, 2024. 2
work page 2024
-
[13]
Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024
Siyi Hu, Diego Martin Arroyo, Stephanie Debats, Fabian Manhardt, Luca Carlone, and Federico Tombari. Mixed diffusion for 3d indoor scene synthesis.arXiv preprint arXiv:2405.21066, 2024. 3
-
[14]
Hy-world 2.0: A multi-modal world model for reconstructing, generating, and simulating 3d worlds
Team HY-World. Hy-world 2.0: A multi-modal world model for reconstructing, generating, and simulating 3d worlds. arXiv preprint, 2026. 8
work page 2026
-
[15]
You only gaussian once: Controllable 3d gaussian splatting for ultra-densely sampled scenes, 2026
Jinrang Jia, Zhenjia Li, and Yifeng Shi. You only gaussian once: Controllable 3d gaussian splatting for ultra-densely sampled scenes, 2026. 2
work page 2026
-
[16]
Multi- view pyramid transformer: Look coarser to see broader
Gyeongjin Kang, Seungkwon Yang, Seungtae Nam, Younggeun Lee, Jungwoo Kim, and Eunbyung Park. Multi- view pyramid transformer: Look coarser to see broader. arXiv preprint arXiv:2512.07806, 2025. 4, 8
-
[17]
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42 (4), 2023. 2
work page 2023
-
[18]
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross Girshick. Segment anything. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 4015–4026, 2023. 6
work page 2023
-
[19]
Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation
Jialu Li and Mohit Bansal. Panogen: Text-conditioned panoramic environment generation for vision-and-language navigation. InNeurIPS, 2023. 2
work page 2023
-
[20]
Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model
Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, and Sai Bi. Instant3d: Fast text-to-3d with sparse-view generation and large reconstruction model. In ICLR, 2024. 3
work page 2024
-
[21]
Realsee3d: A large- scale multi-view rgb-d dataset of indoor scenes (version 1.0),
Linyuan Li, Yan Wu, Xi Li, Lingli Wang, Tong Rao, Jie Zhou, Cihui Pan, and Xinchen Hui. Realsee3d: A large- scale multi-view rgb-d dataset of indoor scenes (version 1.0),
-
[22]
M-lrm: Multi-view large re- construction model.arXiv preprint arXiv:2406.07648, 2024
Mengfei Li, Xiaoxiao Long, Yixun Liang, Weiyu Li, Yuan Liu, Peng Li, Yatian Wang, Xingqun Qi, Wei Xue, Wenhan Luo, Qifeng Liu, and Yike Guo. M-lrm: Multi-view large re- construction model.arXiv preprint arXiv:2406.07648, 2024. 3
-
[23]
Cameras as relative positional encod- ing
Ruilong Li, Brent Yi, Junchen Liu, Hang Gao, Yi Ma, and Angjoo Kanazawa. Cameras as relative positional encod- ing. InAdvances in Neural Information Processing Systems,
-
[24]
Depth any panoramas: A foundation model for panoramic depth estimation
Xin Lin, Meixi Song, Dizhe Zhang, Wenxuan Lu, Haodong Li, Bo Du, Ming-Hsuan Yang, Truong Nguyen, and Lu Qi. Depth any panoramas: A foundation model for panoramic depth estimation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2026. 6
work page 2026
-
[25]
Omniroam: World wandering via long-horizon panoramic video genera- tion.SIGGRAPH, 2026
Yuheng Liu, Xin Lin, Xinke Li, Baihan Yang, Chen Wang, Kalyan Sunkavalli, Yannick Hold-Geoffroy, Hao Tan, Kai Zhang, Xiaohui Xie, Zifan Shi, and Yiwei Hu. Omniroam: World wandering via long-horizon panoramic video genera- tion.SIGGRAPH, 2026. 7, 2
work page 2026
-
[26]
Hpsv3: Towards wide-spectrum human preference score
Yuhang Ma, Xiaoshi Wu, Keqiang Sun, and Hongsheng Li. Hpsv3: Towards wide-spectrum human preference score. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 15086–15095, 2025. 7
work page 2025
-
[27]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InEuropean Conference on Computer Vision (ECCV), pages 405–421. Springer, 2020. 2
work page 2020
-
[28]
Jinhong Ni, Chang-Bin Zhang, Qiang Zhang, and Jing Zhang. What makes for text to 360-degree panorama gener- ation with stable diffusion? InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2
work page 2025
-
[29]
Atiss: Autoregres- sive transformers for indoor scene synthesis
Despoina Paschalidou, Amlan Kar, Maria Shugrina, Karsten Kreis, Andreas Geiger, and Sanja Fidler. Atiss: Autoregres- sive transformers for indoor scene synthesis. InNeurIPS,
-
[30]
Pano2room: Novel view synthesis from a single indoor panorama
Guo Pu, Yiming Zhao, and Zhouhui Lian. Pano2room: Novel view synthesis from a single indoor panorama. In SIGGRAPH Asia 2024 Conference Papers, New York, NY , USA, 2024. Association for Computing Machinery. 7
work page 2024
-
[31]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. 1
work page 2022
-
[32]
Housediffusion: Vector floorplan generation via a diffusion model
Amin Shabani, Sepideh Hosseini, and Yasutaka Furukawa. Housediffusion: Vector floorplan generation via a diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5466–5475, 2023. 3
work page 2023
-
[33]
Lgm: Large multi-view gaus- sian model for high-resolution 3d content creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen, Tengfei Wang, Gang Zeng, and Ziwei Liu. Lgm: Large multi-view gaus- sian model for high-resolution 3d content creation. InECCV,
-
[34]
Diffuscene: Denoising dif- fusion probabilistic models for generative indoor scene syn- thesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin, Angela Dai, Jus- tus Thies, and Matthias Nießner. Diffuscene: Denoising dif- fusion probabilistic models for generative indoor scene syn- thesis. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), 2024. 3
work page 2024
-
[35]
Mvdiffusion: Enabling holistic multi- view image generation with correspondence-aware diffusion
Shitao Tang, Fuyang Zhang, Jiacheng Chen, Peng Wang, and Yasutaka Furukawa. Mvdiffusion: Enabling holistic multi- view image generation with correspondence-aware diffusion. InNeurIPS, 2023. 2
work page 2023
-
[36]
Seedream 4.0: Toward Next-generation Multimodal Image Generation
Team Seedream, Yunpeng Chen, Yu Gao, Lixue Gong, Meng Guo, Qiushan Guo, et al. Seedream 4.0: Toward next- generation multimodal image generation.arXiv preprint arXiv:2509.20427, 2025. 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
TripoSR: Fast 3D Object Reconstruction from a Single Image
Dmitry Tochilkin, David Pankratz, Zexiang Liu, Zixuan Huang, Adam Letts, Yangguang Li, Ding Liang, Christian Laforte, Varun Jampani, and Yan-Pei Cao. Triposr: Fast 3d object reconstruction from a single image.arXiv preprint arXiv:2403.02151, 2024. 3
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[38]
Plan2scene: Convert- ing floorplans to 3d scenes
Madhawa Vidanapathirana, Qirui Wu, Yasutaka Furukawa, Angel X Chang, and Manolis Savva. Plan2scene: Convert- ing floorplans to 3d scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10733–10742, 2021. 2, 3
work page 2021
-
[39]
Moge-2: Accurate monocular geometry with metric scale and sharp details
Ruicheng Wang, Sicheng Xu, Yue Dong, Yu Deng, Jianfeng Xiang, Zelong Lv, Guangzhong Sun, Xin Tong, and Jiaolong Yang. Moge-2: Accurate monocular geometry with metric scale and sharp details. InAdvances in Neural Information Processing Systems, 2025. 6
work page 2025
-
[40]
Sceneformer: Indoor scene generation with transformers
Xin-Yang Wang, Yu-An Yeh, Che-Wei Tang, Anton Rob- bins, and Yu-Chiang Frank Wang. Sceneformer: Indoor scene generation with transformers. In2021 International Conference on 3D Vision (3DV), pages 106–115. IEEE,
-
[41]
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, De- qing Li, Hang Zhang, Hao Meng, Hu Wei, Jingyuan Ni, Kai Chen, Kuan Cao, Liang Peng, Lin Qu, Minggang Wu, Peng Wang, Shuting Yu, Tingk...
-
[42]
Pan- odiffusion: 360-degree panorama outpainting via diffusion
Tianhao Wu, Chuanxia Zheng, and Tat-Jen Cham. Pan- odiffusion: 360-degree panorama outpainting via diffusion. arXiv preprint arXiv:2307.03177, 2023. 2
-
[43]
Adapt- splat: Adapting vision foundation models for feed-forward 3d gaussian splatting, 2026
Mingwei Xing, Xinliang Wang, and Yifeng Shi. Adapt- splat: Adapting vision foundation models for feed-forward 3d gaussian splatting, 2026. 8
work page 2026
-
[44]
Taming stable diffusion for text to 360 degree panorama im- age generation
Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella, Xi- aoshui Huang, Dinh Phung, Wanli Ouyang, and Jianfei Cai. Taming stable diffusion for text to 360 degree panorama im- age generation. InCVPR, pages 6347–6357, 2024. 2
work page 2024
-
[45]
Pansplat: 4k panorama synthesis with feed-forward gaussian splatting
Cheng Zhang, Haofei Xu, Qianyi Wu, Camilo Cruz Gam- bardella, Dinh Phung, and Jianfei Cai. Pansplat: 4k panorama synthesis with feed-forward gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, 2025. 2
work page 2025
-
[46]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shecht- man, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 586–595, 2018. 7 PanoWorld: A Generative Spatial World Model for Consistent Whole-House Panorama Synthesis Supplemen...
work page 2018
-
[47]
Visualization without Panoramic Position Encoding We include qualitative failure cases for the variant without Panoramic Position Encoding. Without circular horizontal encoding, the generator treats the left and right panorama boundaries as distant image regions rather than adjacent rays. This often produces inconsistent structures or textures across the ...
-
[48]
Baseline Adaptation Details 8.1. Pano2room Pano2room shares a broadly similar pipeline with our method, relying on monocular depth estimation to ob- tain a point cloud that is subsequently converted into a mesh, followed by iterative refinement through a render- then-estimate loop to progressively extend the scene to more distant regions. However, Pano2ro...
-
[49]
Figure 11 shows representative examples from 3D-FRONT
Training Data Visualization We further visualize the training data used by PanoWorld. Figure 11 shows representative examples from 3D-FRONT
-
[50]
and RealSee3D [21], including rendered panoramas, depth or shell-proxy images, and room-level BEV maps. The BEV maps illustrate the floorplan topology, room par- titions, doorway connectivity, sampled camera nodes, and local room groups used to construct the training views. These visualizations clarify the difference between syn- thetic CAD-derived scenes...
-
[51]
Additional Experimental Details We include implementation details that are useful for repro- ducing the evaluation but too specific for the main paper, including panorama resolution, node sampling rules, and overlap-mask construction for cross-view PSNR. 10.1. Cross-Node Consistency Evaluation We evaluate cross-node consistency on manually selected co-vis...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.