Monocular Open Vocabulary Occupancy Prediction for Indoor Scenes
Pith reviewed 2026-05-21 11:46 UTC · model grok-4.3
The pith
A monocular method predicts open-vocabulary 3D occupancy in indoor scenes from single images using only binary occupancy labels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that 3D Language-Embedded Gaussians can serve as a unified intermediate representation for open-vocabulary occupancy prediction when paired with an opacity-aware Poisson-based volumetric aggregation operator and a Progressive Temperature Decay schedule that gradually sharpens opacities during splatting, allowing stable geometry-language alignment from monocular images under binary supervision alone.
What carries the argument
3D Language-Embedded Gaussians as a unified intermediate representation that couples fine-grained 3D geometry with language-aligned semantic embeddings, stabilized by an opacity-aware Poisson-based operator and Progressive Temperature Decay schedule.
If this is right
- Open-vocabulary occupancy becomes feasible indoors without requiring dense semantic ground truth for every object category.
- Embodied agents can maintain consistent 3D semantic maps as new object categories appear over time.
- The same intermediate Gaussian representation can support both geometric reconstruction and language-based queries in a single forward pass.
- Binary supervision reduces the annotation burden for training occupancy models in complex indoor layouts.
Where Pith is reading between the lines
- The same stabilization techniques might transfer to outdoor settings if adjusted for sparser point distributions and different scale ranges.
- Integration with online mapping systems could allow robots to incrementally update open-vocabulary occupancy without full scene re-training.
- The approach suggests a broader pattern where language embeddings are anchored to geometry through differentiable rendering rather than direct feature matching.
Load-bearing premise
Existing Gaussian-to-Occupancy operators fail to converge under binary occupancy supervision, and the proposed opacity-aware Poisson replacement together with the temperature decay schedule will produce stable alignment without new artifacts or extra dense labels.
What would settle it
A direct test would replace the Poisson operator with a standard Gaussian-to-occupancy conversion while keeping all other components fixed and measure whether convergence and alignment quality collapse on the same indoor dataset under binary supervision.
Figures
read the original abstract
Open-vocabulary 3D occupancy is vital for embodied agents, which need to understand complex indoor environments where semantic categories are abundant and evolve beyond fixed taxonomies. While recent work has explored open-vocabulary occupancy in outdoor driving scenarios, such methods transfer poorly indoors, where geometry is denser, layouts are more intricate, and semantics are far more fine-grained. To address these challenges, we adopt a geometry-only supervision paradigm that uses only binary occupancy labels (occupied vs free). Our framework builds upon 3D Language-Embedded Gaussians, which serve as a unified intermediate representation coupling fine-grained 3D geometry with a language-aligned semantic embedding. On the geometry side, we find that existing Gaussian-to-Occupancy operators fail to converge under such weak supervision, and we introduce an opacity-aware, Poisson-based approach that stabilizes volumetric aggregation. On the semantic side, direct alignment between rendered features and open-vocabulary segmentation features suffers from feature mixing; we therefore propose a Progressive Temperature Decay schedule that gradually sharpens opacities during splatting, strengthening Gaussian-language alignment. On Occ-ScanNet, our framework achieves 59.50 IoU and 21.05 mIoU in the open-vocabulary setting, surpassing all existing occupancy methods in IoU and outperforming prior open-vocabulary approaches by a large margin in mIoU. Code will be released at https://github.com/JuIvyy/LegoOcc.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a monocular framework for open-vocabulary 3D occupancy prediction in indoor scenes. It builds on 3D Language-Embedded Gaussians as a unified representation for geometry and semantics. Under a geometry-only supervision paradigm using only binary occupied/free labels, the authors replace standard Gaussian-to-occupancy operators with an opacity-aware Poisson-based aggregation because existing operators are stated to fail to converge; they also introduce a Progressive Temperature Decay schedule to mitigate feature mixing in semantic alignment. On Occ-ScanNet the method reports 59.50 IoU and 21.05 mIoU, claiming to surpass prior occupancy and open-vocabulary approaches.
Significance. If the central claims hold, the work is significant for advancing open-vocabulary indoor occupancy prediction under weak supervision, which is relevant for embodied agents operating in dense, semantically rich environments. The geometry-only paradigm and the use of Gaussians as an intermediate representation are promising directions. Explicit credit is given for the commitment to release code, which supports reproducibility.
major comments (2)
- [Abstract and §3] Abstract and §3 (geometry supervision): the claim that 'existing Gaussian-to-Occupancy operators fail to converge under such weak supervision' is load-bearing for the introduction of the opacity-aware Poisson replacement, yet the manuscript provides no training curves, loss plots, or controlled ablation demonstrating divergence or collapse of standard splatting operators under binary labels. Without this evidence the necessity of the new operator remains unproven and any performance lift could be attributable to other factors.
- [§4] §4 (experiments): the reported 59.50 IoU and 21.05 mIoU on Occ-ScanNet are presented without error bars, multiple-run statistics, or explicit dataset-split details; in addition, no ablation isolates the contribution of the Progressive Temperature Decay schedule. These omissions make it impossible to verify the robustness of the superiority claims over baselines.
minor comments (2)
- [Abstract] The abstract states that the method 'surpasses all existing occupancy methods in IoU' but does not name the specific baselines or reference the corresponding table/figure for this comparison.
- [§3] Notation for the parameters of the Progressive Temperature Decay schedule could be introduced more explicitly when first defined to aid readability.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive suggestions. We address each of the major comments below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (geometry supervision): the claim that 'existing Gaussian-to-Occupancy operators fail to converge under such weak supervision' is load-bearing for the introduction of the opacity-aware Poisson replacement, yet the manuscript provides no training curves, loss plots, or controlled ablation demonstrating divergence or collapse of standard splatting operators under binary labels. Without this evidence the necessity of the new operator remains unproven and any performance lift could be attributable to other factors.
Authors: We agree that providing empirical evidence for the convergence failure of existing operators under geometry-only supervision would strengthen our justification for the opacity-aware Poisson aggregation. In the revised manuscript, we will add training curves and loss plots comparing standard splatting operators with our proposed method under binary occupancy labels. This will clearly demonstrate the divergence issue and support the necessity of the new operator. revision: yes
-
Referee: [§4] §4 (experiments): the reported 59.50 IoU and 21.05 mIoU on Occ-ScanNet are presented without error bars, multiple-run statistics, or explicit dataset-split details; in addition, no ablation isolates the contribution of the Progressive Temperature Decay schedule. These omissions make it impossible to verify the robustness of the superiority claims over baselines.
Authors: We acknowledge the importance of statistical robustness in the experimental results. We will conduct multiple runs with different random seeds and report the mean and standard deviation for the IoU and mIoU metrics, including error bars in the tables. We will also provide explicit details on the dataset splits used for Occ-ScanNet. Furthermore, we will include an ablation study that isolates the effect of the Progressive Temperature Decay schedule to demonstrate its contribution to the performance. revision: yes
Circularity Check
No significant circularity; results are empirical measurements on external benchmark
full rationale
The paper reports measured performance (59.50 IoU, 21.05 mIoU on Occ-ScanNet) as outcomes of an experimental framework rather than quantities algebraically derived from its own fitted parameters or equations. The claim that existing Gaussian-to-Occupancy operators fail to converge is presented as an empirical observation motivating the new opacity-aware Poisson operator and Progressive Temperature Decay schedule; no self-citation, self-definitional loop, or fitted-input-renamed-as-prediction is exhibited in the provided text that would reduce the reported metrics to the inputs by construction. The geometry-language alignment and supervision paradigm are validated externally, leaving the derivation chain self-contained against benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- Progressive Temperature Decay schedule parameters
axioms (1)
- domain assumption 3D Language-Embedded Gaussians serve as a unified intermediate representation coupling fine-grained 3D geometry with language-aligned semantic embedding
Forward citations
Cited by 1 Pith paper
-
FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction
FreeOcc enables training-free open-vocabulary 3D occupancy prediction from RGB-D sequences by combining SLAM, dense Gaussian maps, off-the-shelf vision-language models, and probabilistic projection, achieving over 2x ...
Reference graph
Works this paper leans on
-
[1]
S2GO: Streaming sparse gaussian occupancy
Anonymous. S2GO: Streaming sparse gaussian occupancy. InSubmitted to The Fourteenth International Conference on Learning Representations, 2025. under review. 4
work page 2025
-
[2]
VGMOcc: Sparse gaussian occupancy predic- tion with visual geometry model priors
Anonymous. VGMOcc: Sparse gaussian occupancy predic- tion with visual geometry model priors. InSubmitted to The Fourteenth International Conference on Learning Represen- tations, 2025. under review. 6
work page 2025
-
[3]
Armen Avetisyan, Manuel Dahnert, Angela Dai, Manolis Savva, Angel X. Chang, and Matthias Niessner. Scan2cad: Learning cad model alignment in rgb-d scans. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019. 2
work page 2019
-
[4]
Shuai Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Sibo Song, Kai Dang, Peng Wang, Shijie Wang, Jun Tang, Humen Zhong, Yuanzhi Zhu, Mingkun Yang, Zhao- hai Li, Jianqiang Wan, Pengfei Wang, Wei Ding, Zheren Fu, Yiheng Xu, Jiabo Ye, Xi Zhang, Tianbao Xie, Zesen Cheng, Hang Zhang, Zhibo Yang, Haiyang Xu, and Jun- yang Lin. Qwen2.5-vl technical repor...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[5]
Simon Boeder, Fabian Gigengack, and Benjamin Risse. Lan- gocc: Self-supervised open vocabulary occupancy estima- tion via volume rendering.arXiv preprint arXiv:2407.17310,
-
[6]
Monoscene: Monoc- ular 3d semantic scene completion
Anh-Quan Cao and Raoul De Charette. Monoscene: Monoc- ular 3d semantic scene completion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3991–4001, 2022. 2, 3, 6
work page 2022
-
[7]
Gaussrender: Learning 3d occupancy with gaussian rendering
Loick Chambon, Eloi Zablocki, Alexandre Boulch, Mick- ael Chen, and Matthieu Cord. Gaussrender: Learning 3d occupancy with gaussian rendering. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 27010–27020, 2025. 1
work page 2025
-
[8]
Zichao Dong, Hang Ji, Weikun Zhang, Xufeng Huang, and Junbo Chen. Og: Equip vision occupancy with in- stance segmentation and visual grounding.arXiv preprint arXiv:2307.05873, 2023. 3
-
[9]
Yuhang Gao, Xiang Xiang, Sheng Zhong, and Guoyou Wang. Loc: A general language-guided framework for open-set 3d occupancy prediction.arXiv preprint arXiv:2510.22141, 2025. 3
-
[10]
Tri-perspective view for vision- based 3d semantic occupancy prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, and Jiwen Lu. Tri-perspective view for vision- based 3d semantic occupancy prediction. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9223–9232, 2023. 6
work page 2023
-
[11]
Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng, Yunpeng Zhang, Dalong Du, and Jiwen Lu. Gaussianformer-2: Probabilistic gaussian superposition for efficient 3d occupancy prediction.arXiv preprint arXiv:2412.04384, 2024. 1, 2, 4, 7
-
[12]
Selfocc: Self-supervised vision-based 3d occupancy prediction
Yuanhui Huang, Wenzhao Zheng, Borui Zhang, Jie Zhou, and Jiwen Lu. Selfocc: Self-supervised vision-based 3d occupancy prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19946–19956, 2024. 3
work page 2024
-
[13]
Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, and Jiwen Lu. Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction. InEuropean Conference on Computer Vision, pages 376–393. Springer,
-
[14]
Openocc: Open vocab- ulary 3d scene reconstruction via occupancy representation
Haochen Jiang, Yueming Xu, Yihan Zeng, Hang Xu, Wei Zhang, Jianfeng Feng, and Li Zhang. Openocc: Open vocab- ulary 3d scene reconstruction via occupancy representation. InIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2024. 3
work page 2024
-
[15]
Towards open world object de- tection
KJ Joseph, Salman Khan, Fahad Shahbaz Khan, and Vi- neeth N Balasubramanian. Towards open world object de- tection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5830–5840,
-
[16]
Kim Jun-Seong, Kim GeonU, Kim Yu-Ji, Yu-Chiang Frank Wang, Jaesung Choe, and Tae-Hyun Oh. Dr. splat: Directly referring 3d gaussian splatting via direct language embed- ding registration. InCVPR, 2025. 5
work page 2025
-
[17]
3d gaussian splatting for real-time radiance field rendering.ACM Trans
Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Trans. Graph., 42(4):139–1,
-
[18]
J. F. C. Kingman.Poisson processes. The Clarendon Press Oxford University Press, New York, 1993. Oxford Science Publications. 5
work page 1993
-
[19]
Ago: Adaptive grounding for open world 3d occupancy prediction.arXiv preprint arXiv:2504.10117, 2025
Peizheng Li, Shuxiao Ding, You Zhou, Qingwen Zhang, Onat Inak, Larissa Triess, Niklas Hanselmann, Marius Cordts, and Andreas Zell. Ago: Adaptive grounding for open world 3d occupancy prediction.arXiv preprint arXiv:2504.10117, 2025. 2, 3
-
[20]
V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion
Yiming Li, Zhiding Yu, Christopher Choy, Chaowei Xiao, Jose M Alvarez, Sanja Fidler, Chen Feng, and Anima Anand- kumar. V oxformer: Sparse voxel transformer for camera- based 3d semantic scene completion. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9087–9098, 2023. 1
work page 2023
-
[21]
Fb-occ: 3d occupancy prediction based on forward-backward view transformation,
Zhiqi Li, Zhiding Yu, David Austin, Mingsheng Fang, Shiyi Lan, Jan Kautz, and Jose M Alvarez. FB-OCC: 3D occu- pancy prediction based on forward-backward view transfor- mation.arXiv:2307.01492, 2023. 3
-
[22]
V olumetric environ- ment representation for vision-language navigation
Rui Liu, Wenguan Wang, and Yi Yang. V olumetric environ- ment representation for vision-language navigation. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16317–16328, 2024. 1
work page 2024
-
[23]
Ruixun Liu, Lingyu Kong, Derun Li, and Hang Zhao. Oc- cvla: Vision-language-action model with implicit 3d occu- pancy supervision.arXiv preprint arXiv:2509.05578, 2025. 1
-
[24]
Grounding dino: Marrying dino with 9 grounded pre-training for open-set object detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, et al. Grounding dino: Marrying dino with 9 grounded pre-training for open-set object detection. InEuro- pean conference on computer vision, pages 38–55. Springer,
-
[25]
SGDR: Stochastic Gradient Descent with Warm Restarts
Ilya Loshchilov and Frank Hutter. Sgdr: Stochas- tic gradient descent with warm restarts.arXiv preprint arXiv:1608.03983, 2016. 6
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.arXiv preprint arXiv:1711.05101, 2017. 6
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[27]
On the difficulty of training recurrent neural networks
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. InInter- national conference on machine learning, pages 1310–1318. Pmlr, 2013. 6
work page 2013
-
[28]
Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d
Jonah Philion and Sanja Fidler. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. InEuropean conference on computer vision, pages 194–210. Springer, 2020. 3
work page 2020
-
[29]
Splatssc: Decoupled depth-guided gaussian splat- ting for semantic scene completion, 2025
Rui Qian, Haozhi Cao, Tianchen Deng, Shenghai Yuan, and Lihua Xie. Splatssc: Decoupled depth-guided gaussian splat- ting for semantic scene completion, 2025. 4
work page 2025
-
[30]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 2, 7
work page 2021
-
[31]
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks
Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, et al. Grounded sam: Assembling open-world models for diverse visual tasks.arXiv preprint arXiv:2401.14159,
work page internal anchor Pith review Pith/arXiv arXiv
- [32]
-
[33]
Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, and Shao- Hua Guan. Language embedded 3d gaussians for open-vocabulary scene understanding.arXiv preprint arXiv:2311.18482, 2023. 2
-
[34]
Yiang Shi, Tianheng Cheng, Qian Zhang, Wenyu Liu, and Xinggang Wang. Occupancy as set of points. InEuropean Conference on Computer Vision, pages 72–87. Springer,
-
[35]
Yuheng Shi, Minjing Dong, and Chang Xu. Har- nessing vision foundation models for high-performance, training-free open vocabulary segmentation.arXiv preprint arXiv:2411.09219, 2024. 2, 6
-
[36]
Zhan Shi, Song Wang, Junbo Chen, and Jianke Zhu. A coarse-to-fine approach to multi-modality 3d occupancy grounding.arXiv preprint arXiv:2508.01197, 2025. 2
-
[37]
Semantic scene com- pletion from a single depth image
Shuran Song, Fisher Yu, Andy Zeng, Angel X Chang, Mano- lis Savva, and Thomas Funkhouser. Semantic scene com- pletion from a single depth image. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 1746–1754, 2017. 2
work page 2017
-
[38]
Ovo: Open-vocabulary occupancy,
Zhiyu Tan, Zichao Dong, Cheng Zhang, Weikun Zhang, Hang Ji, and Hao Li. Ovo: Open-vocabulary occupancy,
-
[39]
Pin Tang, Zhongdao Wang, Guoqing Wang, Jilai Zheng, Xi- angxuan Ren, Bailan Feng, and Chao Ma. Sparseocc: Re- thinking sparse latent representation for vision-based seman- tic occupancy prediction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15035–15044, 2024. 1
work page 2024
-
[40]
Emanuele Vespa, Nikolay Nikolov, Marius Grimm, Luigi Nardi, Paul H. J. Kelly, and Stefan Leutenegger. Efficient octree-based volumetric slam supporting signed-distance and occupancy mapping.IEEE Robotics and Automation Letters, 3(2):1144–1151, 2018. 2
work page 2018
-
[41]
Antonin V obecky, Oriane Sim ´eoni, David Hurych, Spyri- don Gidaris, Andrei Bursuc, Patrick P ´erez, and Josef Sivic. Pop-3d: Open-vocabulary 3d occupancy prediction from im- ages.Advances in Neural Information Processing Systems, 36:50545–50557, 2023. 3, 6, 7
work page 2023
-
[42]
Hao Wang, Xiaobao Wei, Xiaoan Zhang, Jianing Li, Chengyu Bai, Ying Li, Ming Lu, Wenzhao Zheng, and Shanghang Zhang. Embodiedocc++: Boosting embodied 3d occupancy prediction with plane regularization and uncer- tainty sampler.arXiv preprint arXiv:2504.09540, 2025. 1, 3, 6
-
[43]
Forknet: Multi-branch volumetric semantic com- pletion from a single depth image, 2019
Yida Wang, David Joseph Tan, Nassir Navab, and Federico Tombari. Forknet: Multi-branch volumetric semantic com- pletion from a single depth image, 2019. 2
work page 2019
-
[44]
Julong Wei, Shanshuai Yuan, Pengfei Li, Qingda Hu, Zhongxue Gan, and Wenchao Ding. Occllama: An occupancy-language-action generative world model for au- tonomous driving.arXiv preprint arXiv:2409.03272, 2024. 1
-
[45]
Surroundocc: Multi-camera 3d occu- pancy prediction for autonomous driving
Yi Wei, Linqing Zhao, Wenzhao Zheng, Zheng Zhu, Jie Zhou, and Jiwen Lu. Surroundocc: Multi-camera 3d occu- pancy prediction for autonomous driving. InProceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 21729–21740, 2023. 1, 6
work page 2023
-
[46]
Scfusion: Real-time incremental scene recon- struction with semantic completion
Shun-Cheng Wu, Kesuke Tateno, Nassir Navab, and Fed- erico Tombari. Scfusion: Real-time incremental scene recon- struction with semantic completion. In2020 International Conference on 3D Vision (3DV), pages 801–810, 2020. 2
work page 2020
-
[47]
Yuqi Wu, Wenzhao Zheng, Sicheng Zuo, Yuanhui Huang, Jie Zhou, and Jiwen Lu. Embodiedocc: Embodied 3d occu- pancy prediction for vision-based online scene understand- ing.arXiv preprint arXiv:2412.04380, 2024. 1, 2, 3, 5, 6, 7
-
[48]
Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024
Lihe Yang, Bingyi Kang, Zilong Huang, Zhen Zhao, Xiao- gang Xu, Jiashi Feng, and Hengshuang Zhao. Depth any- thing v2.Advances in Neural Information Processing Sys- tems, 37:21875–21911, 2024. 6
work page 2024
-
[49]
Ndc-scene: Boost monocular 3d semantic scene completion in normalized de- vice coordinates space
Jiawei Yao, Chuming Li, Keqiang Sun, Yingjie Cai, Hao Li, Wanli Ouyang, and Hongsheng Li. Ndc-scene: Boost monocular 3d semantic scene completion in normalized de- vice coordinates space. In2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9421–9431. IEEE Computer Society, 2023. 2, 3
work page 2023
-
[50]
Monocular occupancy prediction for scalable indoor scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen, and Zhaoxiang Zhang. Monocular occupancy prediction for scalable indoor scenes. InEuropean Conference on Computer Vision, pages 38–54. Springer, 2024. 1, 2, 3, 6, 7
work page 2024
-
[51]
Qiucheng Yu, Yuan Xie, and Xin Tan. Shtocc: Effective 3d occupancy prediction with sparse head and tail voxels.arXiv preprint arXiv:2505.22461, 2025. 1 10
-
[52]
Zehao Yu, Torsten Sattler, and Andreas Geiger. Gaussian opacity fields: Efficient adaptive surface reconstruction in unbounded scenes.ACM Transactions on Graphics, 2024. 4
work page 2024
-
[53]
Language driven occupancy prediction
Zhu Yu, Bowen Pang, Lizhe Liu, Runmin Zhang, Qiang Li, Si-Yuan Cao, Maochun Luo, Mingxia Chen, Sheng Yang, and Hui-Liang Shen. Language driven occupancy prediction. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 7548–7558, 2025. 2, 3, 6, 7, 8
work page 2025
-
[54]
Chubin Zhang, Juncheng Yan, Yi Wei, Jiaxin Li, Li Liu, Yansong Tang, Yueqi Duan, and Jiwen Lu. Occnerf: Self- supervised multi-camera occupancy prediction with neural radiance fields.CoRR, abs/2312.09243, 2023. 3
-
[55]
Yifan Zhang, Bingyi Kang, Bryan Hooi, Shuicheng Yan, and Jiashi Feng. Deep long-tailed learning: A survey.IEEE transactions on pattern analysis and machine intelligence, 45(9):10795–10816, 2023. 1
work page 2023
-
[56]
Occformer: Dual-path transformer for vision-based 3d semantic occu- pancy prediction
Yunpeng Zhang, Zheng Zhu, and Dalong Du. Occformer: Dual-path transformer for vision-based 3d semantic occu- pancy prediction. InProceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV), pages 9433– 9443, 2023. 1
work page 2023
-
[57]
Zhang Zhang, Qiang Zhang, Wei Cui, Shuai Shi, Yijie Guo, Gang Han, Wen Zhao, Hengle Ren, Renjing Xu, and Jian Tang. Roboocc: Enhancing the geometric and semantic scene understanding for robots.arXiv preprint arXiv:2504.14604, 2025. 3, 6
-
[58]
Veon: V ocabulary- enhanced occupancy prediction
Jilai Zheng, Pin Tang, Zhongdao Wang, Guoqing Wang, Xi- angxuan Ren, Bailan Feng, and Chao Ma. Veon: V ocabulary- enhanced occupancy prediction. InEuropean Conference on Computer Vision, pages 92–108. Springer, 2024. 2, 3
work page 2024
-
[59]
Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields
Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Ze- hao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, and Achuta Kadambi. Feature 3dgs: Supercharging 3d gaussian splatting to enable distilled feature fields. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21676–21685, 2024. 2 11
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.