pith. sign in

arxiv: 2606.19733 · v1 · pith:FJZQ4OYSnew · submitted 2026-06-18 · 💻 cs.CV · cs.AI

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

Pith reviewed 2026-06-26 17:50 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords 3D instance retrievalopen-vocabularyGaussian splattingtraining-freescalable 3D searchsemantic liftingtemporal fusioncity-scale scenes
0
0 comments X

The pith

QueryGaussian retrieves open-vocabulary 3D instances from city-scale scenes by lifting 2D masks into 3D without training or scene-wide embeddings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing approaches embed semantic features into every 3D primitive, so memory and compute costs grow directly with scene size and cause failures on large environments. QueryGaussian instead uses pre-trained 2D models to interpret text prompts and projects segmentation masks into 3D through concurrent maximum-weight association. A temporal fusion module with multi-stage adaptive density clustering resolves projection ambiguities across views. The result matches prior accuracy while cutting GPU memory by over 70 percent and speeding inference by 180 times, allowing retrieval on scenes with tens of millions of Gaussians on ordinary hardware.

Core claim

QueryGaussian is a training-free framework for open-vocabulary 3D instance retrieval that decouples semantic understanding from geometric representation by lifting 2D segmentation masks into 3D via concurrent maximum-weight association and a temporal fusion module with multi-stage adaptive density clustering, thereby avoiding the linear scaling of memory and compute that occurs when semantic features are distilled into every primitive.

What carries the argument

Instance-level query mechanism that lifts 2D segmentation masks into 3D via concurrent maximum-weight association plus temporal fusion with multi-stage adaptive density clustering.

If this is right

  • GPU memory usage drops by more than 70 percent relative to scene-embedding baselines.
  • Inference accelerates by a factor of 180 while accuracy remains comparable to state-of-the-art methods.
  • Retrieval becomes feasible on city-scale scenes holding tens of millions of Gaussians using only consumer-grade hardware.
  • No per-scene training or tuning is required because the method relies on off-the-shelf 2D vision models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decoupling of semantics from geometry could be tested on other 3D representations such as point clouds or meshes to check whether the efficiency gain generalizes.
  • The temporal fusion module might extend naturally to video sequences, allowing retrieval in dynamic rather than static scenes.
  • Because the method avoids storing semantic features per primitive, it opens the possibility of on-the-fly retrieval during interactive navigation of very large environments.

Load-bearing premise

Lifting 2D segmentation masks into consistent 3D instances through maximum-weight association and adaptive density clustering will preserve semantic-visual alignment across views without any training or scene-specific tuning.

What would settle it

Apply the method to a city-scale scene containing at least ten million Gaussians, run a set of text-prompt retrievals on consumer hardware, and check whether it finishes without out-of-memory errors and matches ground-truth instance labels at rates comparable to prior methods.

Figures

Figures reproduced from arXiv: 2606.19733 by Chao Yue, Dongming Zhang, Jian Xue, Ke Lu, Xiuyuan Zhu, Zijie Yang.

Figure 1
Figure 1. Figure 1: Given a pre-trained 3DGS scene and a natural [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Overview of QueryGaussian. Given a 3DGS scene and a text query, the framework first renders multi-view images and records per-pixel maximum [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparisons on the small-scale indoor scene dataset. QueryGaussian produces cleaner segmentation with fewer floaters and noise [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results on large-scale outdoor scenes. Existing scene-level methods fail due to OOM, while QueryGaussian successfully localizes [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the 3D spatial reasoning agent. The LLM decomposes a user query into retrieval instructions, QueryGaussian returns masks and 3D [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
read the original abstract

Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a formidable challenge in multimedia analysis. Existing approaches predominantly follow a "scene-level embedding" paradigm, which requires distilling high-dimensional semantic features into every 3D primitive. This strategy suffers from a fundamental architectural bottleneck: memory and computational costs scale linearly with scene complexity, inevitably triggering out-of-memory (OOM) failures in city-scale environments. To address this barrier, we propose QueryGaussian, a training-free framework for expeditious and scalable open-vocabulary 3D instance retrieval. Unlike holistic semantic distillation, QueryGaussian employs an instance-level query mechanism that decouples semantic understanding from geometric representation. Specifically, we leverage pre-trained 2D vision models to interpret user prompts and lift segmentation masks into 3D via a concurrent maximum-weight association strategy, ensuring semantic-visual consistency. To mitigate projection ambiguity, we introduce a temporal fusion module with multi-stage adaptive density clustering. Experimental results demonstrate that QueryGaussian not only matches the accuracy of state-of-the-art methods but also delivers a decisive efficiency leap, reducing GPU memory usage by over 70% and accelerating inference by 180x. Crucially, QueryGaussian enables expeditious instance retrieval on city-scale scenes containing tens of millions of Gaussians using consumer-grade hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes QueryGaussian, a training-free open-vocabulary 3D instance retrieval framework for large-scale Gaussian scenes. It decouples semantics from geometry by using pre-trained 2D vision models to process natural language prompts, lifting 2D segmentation masks into 3D via concurrent maximum-weight association, and applying a temporal fusion module with multi-stage adaptive density clustering to address projection ambiguities. The central claims are that this achieves accuracy parity with state-of-the-art methods while reducing GPU memory by over 70% and accelerating inference by 180x, enabling city-scale retrieval on scenes with tens of millions of Gaussians using consumer hardware.

Significance. If the accuracy and efficiency claims hold under rigorous evaluation, the work would be significant for enabling scalable 3D instance retrieval without scene-specific training or linear memory scaling. The training-free design leveraging existing 2D models and the explicit focus on city-scale feasibility are clear strengths that could impact multimedia analysis and 3D scene understanding applications.

major comments (2)
  1. [Abstract] Abstract: the central efficiency claims (70% memory reduction, 180x speedup, city-scale operation) rest on the instance-level query path preserving semantic-visual consistency, yet the abstract provides no information on evaluation datasets, baselines, error bars, or data exclusions, preventing verification of whether the concurrent maximum-weight association and multi-stage adaptive density clustering actually deliver the claimed accuracy parity.
  2. [Method] Method description (lifting and fusion): the assumption that concurrent max-weight mask lifting plus temporal fusion with multi-stage adaptive density clustering maintains cross-view consistency without any training or scene-specific tuning is load-bearing for the scalability claim, but no analysis of robustness to projection ambiguities, view-dependent appearance, or density variations in tens-of-millions-Gaussian scenes is referenced.
minor comments (1)
  1. [Abstract] The abstract would benefit from explicit mention of the datasets and quantitative metrics used to support the accuracy parity claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and propose targeted revisions to enhance the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central efficiency claims (70% memory reduction, 180x speedup, city-scale operation) rest on the instance-level query path preserving semantic-visual consistency, yet the abstract provides no information on evaluation datasets, baselines, error bars, or data exclusions, preventing verification of whether the concurrent maximum-weight association and multi-stage adaptive density clustering actually deliver the claimed accuracy parity.

    Authors: We agree that the abstract would benefit from additional context to support verifiability of the claims. We will revise the abstract to reference the main evaluation datasets (including city-scale scenes with tens of millions of Gaussians), the primary baselines, and indicate that accuracy results with error bars appear in the experiments. This change directly addresses the concern while preserving conciseness. revision: yes

  2. Referee: [Method] Method description (lifting and fusion): the assumption that concurrent max-weight mask lifting plus temporal fusion with multi-stage adaptive density clustering maintains cross-view consistency without any training or scene-specific tuning is load-bearing for the scalability claim, but no analysis of robustness to projection ambiguities, view-dependent appearance, or density variations in tens-of-millions-Gaussian scenes is referenced.

    Authors: The experiments section validates performance on large scenes via ablations of the fusion module. To strengthen the presentation of the load-bearing assumption, we will add a dedicated robustness analysis subsection (drawing on existing results and qualitative examples) that explicitly discusses handling of projection ambiguities, view-dependent effects, and density variations. This addition will reference the design elements without requiring new experiments. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation relies on pre-trained models and heuristics without self-referential reduction

full rationale

The paper presents QueryGaussian as a training-free method that decouples semantic understanding from geometry by leveraging existing pre-trained 2D vision models, a concurrent maximum-weight association for mask lifting, and a temporal fusion module with multi-stage adaptive density clustering. No equations, fitted parameters, predictions that reduce to inputs by construction, or load-bearing self-citations appear in the provided text. The efficiency and scalability claims derive directly from this architectural choice rather than any circular step. The derivation chain is self-contained against external pre-trained components.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities can be extracted. The method relies on pre-trained 2D vision models and geometric lifting assumptions whose details are not provided.

pith-pipeline@v0.9.1-grok · 5773 in / 1209 out tokens · 16297 ms · 2026-06-26T17:50:19.058572+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 5 canonical work pages · 3 internal anchors

  1. [1]

    Lifting by gaussians: A simple, fast and flexible method for 3d instance segmentation

    Rohan Chacko, Nicolai H ¨ani, Eldar Khaliullin, Lin Sun, and Douglas Lee. Lifting by gaussians: A simple, fast and flexible method for 3d instance segmentation. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 3497–3507. IEEE, 2025

  2. [2]

    Tensorf: Tensorial radiance fields

    Anpei Chen, Zexiang Xu, Andreas Geiger, Jingyi Yu, and Hao Su. Tensorf: Tensorial radiance fields. InEuropean Conference on Computer Vision (ECCV), 2022

  3. [3]

    Omnire: Omni urban scene reconstruction

    Ziyu Chen, Jiawei Yang, Jiahui Huang, Riccardo de Lutio, Jan- ick Martinez Esturo, Boris Ivanovic, Or Litany, Zan Gojcic, Sanja Fidler, Marco Pavone, Li Song, and Yue Wang. Omnire: Omni urban scene reconstruction. InThe Thirteenth International Conference on Learning Representations, 2025

  4. [4]

    Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

    DeepSeek-AI. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning, 2025

  5. [5]

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Martin Ester, Hans-Peter Kriegel, J ¨”org Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, page 226–231. AAAI Press, 1996

  6. [6]

    Mini-splatting: Representing scenes with a constrained number of gaussians

    Guangchi Fang and Bing Wang. Mini-splatting: Representing scenes with a constrained number of gaussians. InEuropean Conference on Computer Vision, 2024

  7. [7]

    Trips: Trilinear point splatting for real-time radiance field rendering

    Linus Franke, Darius R ¨”uckert, Laura Fink, and Marc Stamminger. Trips: Trilinear point splatting for real-time radiance field rendering. Computer Graphics Forum, 43(2), 2024

  8. [8]

    Plenoxels: Radiance fields without neural networks

    Sara Fridovich-Keil, Alex Yu, Matthew Tancik, Qinhong Chen, Ben- jamin Recht, and Angjoo Kanazawa. Plenoxels: Radiance fields without neural networks. InCVPR, 2022

  9. [9]

    Garbin, Marek Kowalski, Matthew Johnson, Jamie Shot- ton, and Julien Valentin

    Stephan J. Garbin, Marek Kowalski, Matthew Johnson, Jamie Shot- ton, and Julien Valentin. Fastnerf: High-fidelity neural rendering at 200fps. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14346–14355, October 2021

  10. [10]

    A hierar- chical compression technique for 3d gaussian splatting compression, 2025

    He Huang, Wenjie Huang, Qi Yang, Yiling Xu, and Zhu li. A hierar- chical compression technique for 3d gaussian splatting compression, 2025

  11. [11]

    Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes

    Yi-Hua Huang, Yang-Tian Sun, Ziyi Yang, Xiaoyang Lyu, Yan-Pei Cao, and Xiaojuan Qi. Sc-gs: Sparse-controlled gaussian splatting for editable dynamic scenes. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4220– 4230, June 2024

  12. [12]

    Pointgroup: Dual-set point grouping for 3d instance segmentation.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

    Li Jiang, Hengshuang Zhao, Shaoshuai Shi, Shu Liu, Chi-Wing Fu, and Jiaya Jia. Pointgroup: Dual-set point grouping for 3d instance segmentation.Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020

  13. [13]

    3d gaussian splatting for real-time radiance field rendering

    Bernhard Kerbl, Georgios Kopanas, Thomas Leimk ¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023

  14. [14]

    Lerf: Language embedded radiance fields

    Justin Kerr, Chung Min Kim, Ken Goldberg, Angjoo Kanazawa, and Matthew Tancik. Lerf: Language embedded radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 19729–19739, October 2023

  15. [15]

    Lerf: Language embedded radiance fields

    Justin* Kerr, Chung Min* Kim, Ken Goldberg, Angjoo Kanazawa, and Matthew Tancik. Lerf: Language embedded radiance fields. In International Conference on Computer Vision (ICCV), 2023

  16. [16]

    Segment Anything

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rol- land, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Doll´ar, and Ross Girshick. Segment anything. arXiv:2304.02643, 2023

  17. [17]

    Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond

    Yixuan Li, Lihan Jiang, Linning Xu, Yuanbo Xiangli, Zhenzhi Wang, Dahua Lin, and Bo Dai. Matrixcity: A large-scale city dataset for city-scale neural rendering and beyond. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 3205– 3215, 2023

  18. [18]

    Vastgaussian: Vast 3d gaussians for large scene reconstruction

    Jiaqi Lin, Zhihao Li, Xiao Tang, Jianzhuang Liu, Shiyong Liu, Jiayue Liu, Yangdi Lu, Xiaofei Wu, Songcen Xu, Youliang Yan, and Wenming Yang. Vastgaussian: Vast 3d gaussians for large scene reconstruction. InCVPR, 2024

  19. [19]

    Dsem-nerf: Multimodal feature fusion and global–local attention for enhanced 3d scene reconstruction.Information Fusion, 115:102752, 2025

    Dong Liu, Zhiyong Wang, and Peiyuan Chen. Dsem-nerf: Multimodal feature fusion and global–local attention for enhanced 3d scene reconstruction.Information Fusion, 115:102752, 2025

  20. [20]

    Weakly supervised 3d open-vocabulary segmentation

    Kunhao Liu, Fangneng Zhan, Jiahui Zhang, MUYU XU, Yingchen Yu, Abdulmotaleb El Saddik, Christian Theobalt, Eric Xing, and Shijian Lu. Weakly supervised 3d open-vocabulary segmentation. InThirty- seventh Conference on Neural Information Processing Systems, 2023

  21. [21]

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, et al. Grounding dino: Marrying dino with grounded pre-training for open-set object detection.arXiv preprint arXiv:2303.05499, 2023

  22. [22]

    Citygaussian: Real-time high-quality large-scale scene rendering with gaussians

    Yang Liu, Chuanchen Luo, Lue Fan, Naiyan Wang, Junran Peng, and Zhaoxiang Zhang. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. InEuropean Conference on Computer Vision, pages 265–282. Springer, 2025

  23. [23]

    hdbscan: Hierarchical density based clustering.Journal of Open Source Software, 2(11):205, 2017

    Leland McInnes, John Healy, and Steve Astels. hdbscan: Hierarchical density based clustering.Journal of Open Source Software, 2(11):205, 2017

  24. [24]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Bar- ron, Ravi Ramamoorthi, and Ren Ng. Nerf: representing scenes as neu- ral radiance fields for view synthesis.Commun. ACM, 65(1):99–106, December 2021

  25. [25]

    Instant neural graphics primitives with a multiresolution hash encod- ing.ACM Trans

    Thomas M ¨”uller, Alex Evans, Christoph Schied, and Alexander Keller. Instant neural graphics primitives with a multiresolution hash encod- ing.ACM Trans. Graph., 41(4):102:1–102:15, July 2022

  26. [26]

    Langsplat: 3d language gaussian splatting

    Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 20051–20060, June 2024

  27. [27]

    Advanc- ing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects

    Shi Qiu, Binzhu Xie, Qixuan Liu, and Pheng-Ann Heng. Advanc- ing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects . In2025 IEEE International Conference on Artificial Intelligence and eXtended and Virtual Reality (AIxVR), pages 203– 208, Los Alamitos, CA, USA, January 2025. IEEE Computer Society

  28. [28]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th Inter- national Conference on Machin...

  29. [29]

    SAM 2: Segment Anything in Images and Videos

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chai- tanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rol- land, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll ´ar, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:...

  30. [30]

    Sam 2: Segment anything in images and videos, 2024

    Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chai- tanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rol- land, Laura Gustafson, Eric Mintun, Junting Pan, Kalyan Vasudev Alwala, Nicolas Carion, Chao-Yuan Wu, Ross Girshick, Piotr Doll ´ar, and Christoph Feichtenhofer. Sam 2: Segment anything in images and videos, 2024

  31. [31]

    arXiv preprint arXiv:2403.17898 (2024)

    Kerui Ren, Lihan Jiang, Tao Lu, Mulin Yu, Linning Xu, Zhangkai Ni, and Bo Dai. Octree-gs: Towards consistent real-time rendering with lod-structured 3d gaussians.arXiv preprint arXiv:2403.17898, 2024

  32. [32]

    Grounded sam: Assembling open-world models for diverse visual tasks, 2024

    Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, and Lei Zhang. Grounded sam: Assembling open-world models for diverse visual tasks, 2024

  33. [33]

    Language embedded 3d gaussians for open-vocabulary scene under- standing

    Jin-Chuan Shi, Miao Wang, Hao-Bin Duan, and Shao-Hua Guan. Language embedded 3d gaussians for open-vocabulary scene under- standing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5333–5343, June 2024

  34. [34]

    Sa-gs: Scale-adaptive gaussian splatting for training-free anti-aliasing.arXiv preprint arXiv:2403.19615, 2024

    Xiaowei Song, Jv Zheng, Shiran Yuan, Huan-ang Gao, Jingwei Zhao, Xiang He, Weihao Gu, and Hao Zhao. Sa-gs: Scale-adaptive gaussian splatting for training-free anti-aliasing.arXiv preprint arXiv:2403.19615, 2024

  35. [35]

    Mega- nerf: Scalable construction of large-scale nerfs for virtual fly-throughs

    Haithem Turki, Deva Ramanan, and Mahadev Satyanarayanan. Mega- nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12922–12931, June 2022

  36. [36]

    4d gaussian splatting for real-time dynamic scene rendering

    Guanjun Wu, Taoran Yi, Jiemin Fang, Lingxi Xie, Xiaopeng Zhang, Wei Wei, Wenyu Liu, Qi Tian, and Xinggang Wang. 4d gaussian splatting for real-time dynamic scene rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20310–20320, June 2024

  37. [37]

    Opengaussian: Towards point-level 3d gaussian- based open vocabulary understanding

    Yanmin Wu, Jiarui Meng, Haijie LI, Chenming Wu, Yahao Shi, Xinhua Cheng, Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, and Jian Zhang. Opengaussian: Towards point-level 3d gaussian- based open vocabulary understanding. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

  38. [38]

    Street gaussians for modeling dynamic urban scenes

    Yunzhi Yan, Haotong Lin, Chenxu Zhou, Weijie Wang, Haiyang Sun, Kun Zhan, Xianpeng Lang, Xiaowei Zhou, and Sida Peng. Street gaussians for modeling dynamic urban scenes. InECCV, 2024

  39. [39]

    Multi-scale 3d gaussian splatting for anti-aliased rendering

    Zhiwen Yan, Weng Fei Low, Yu Chen, and Gim Hee Lee. Multi-scale 3d gaussian splatting for anti-aliased rendering. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20923–20931, 2024

  40. [40]

    Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction

    Ziyi Yang, Xinyu Gao, Wen Zhou, Shaohui Jiao, Yuqing Zhang, and Xiaogang Jin. Deformable 3d gaussians for high-fidelity monocular dynamic scene reconstruction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20331–20341, June 2024

  41. [41]

    Gaussian grouping: Segment and edit anything in 3d scenes

    Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaussian grouping: Segment and edit anything in 3d scenes. InECCV, 2024

  42. [42]

    Tyska, Bryan A

    Mengyang Zhao, Quan Liu, Aadarsh Jha, Ruining Deng, Tianyuan Yao, Anita Mahadevan-Jansen, Matthew J. Tyska, Bryan A. Millis, and Yuankai Huo. V oxelembed: 3d instance segmentation and tracking with voxel embedding based deep learning. InMachine Learning in Medical Imaging: 12th International Workshop, MLMI 2021, Held in Conjunction with MICCAI 2021, Stras...

  43. [43]

    Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes

    Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan, Yongtao Wang, Deqing Sun, and Ming-Hsuan Yang. Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21634–21643, 2024