arxiv: 2604.03309 · v1 · submitted 2026-03-31 · 💻 cs.CV · cs.AI

Recognition: 2 theorem links

· Lean Theorem

TreeGaussian: Tree-Guided Cascaded Contrastive Learning for Hierarchical Consistent 3D Gaussian Scene Segmentation and Understanding

Jingbin You , Zehao Li , Hao Jiang , Xinzhu Ma , Shuqin Gao , Honglong Zhao , Congcong Zheng , Tianlu Mao

show 3 more authors

Feng Dai Yucheng Zhang Zhaoqi Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 00:13 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords 3D Gaussian Splattinghierarchical segmentationcontrastive learningscene understandingobject-part hierarchiesview consistency3D segmentation

0 comments

The pith

TreeGaussian builds a multi-level object tree to guide cascaded contrastive learning for hierarchical consistent segmentation in 3D Gaussian scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TreeGaussian to overcome limitations in 3D Gaussian Splatting methods that fail to capture whole-part relationships and hierarchical semantics in complex scenes. Dense pairwise comparisons and inconsistent labels from 2D priors create redundancy and instability in feature learning. The framework constructs an object tree to structure supervision across levels and applies a two-stage cascaded contrastive strategy that refines representations progressively from global to local. A Consistent Segmentation Detection mechanism and graph-based denoising align outputs across views and suppress unstable points. This matters for applications needing reliable part-level scene understanding in real-time 3D representations.

Core claim

TreeGaussian constructs a multi-level object tree from 2D priors to explicitly model hierarchical semantic relationships, then applies a two-stage cascaded contrastive learning strategy that progressively refines features from global to local while using a Consistent Segmentation Detection mechanism and graph-based denoising to align segmentation modes across views and suppress unstable Gaussians, resulting in improved hierarchical consistency and segmentation quality.

What carries the argument

The multi-level object tree that structures contrastive supervision across object-part hierarchies together with the two-stage cascaded contrastive learning strategy that reduces redundancy and mitigates feature saturation.

If this is right

Structured learning across object-part hierarchies becomes feasible in real-time 3D Gaussian representations.
Redundancy in contrastive supervision is reduced through progressive global-to-local refinement.
Segmentation modes align across different views via the CSD mechanism.
Unstable Gaussian points are suppressed by the graph-based denoising module.
Performance improves on open-vocabulary 3D object selection and 3D point cloud understanding tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The tree construction step could be extended to incorporate temporal consistency if applied to dynamic scenes.
Part-level features learned this way may support downstream robotic tasks that require grasping or manipulation at the object-component level.
The cascaded contrastive pattern might transfer to other hierarchical 3D representations such as neural radiance fields with added spatial partitioning.

Load-bearing premise

Inconsistent hierarchical labels from 2D priors can be turned into a stable multi-level object tree that guides learning without propagating errors into the final 3D segmentations.

What would settle it

On scenes where 2D priors yield highly conflicting part labels, compare the cross-view consistency of the resulting 3D Gaussian segmentations with and without the tree-guided cascaded training; large drops in consistency would falsify the central claim.

Figures

Figures reproduced from arXiv: 2604.03309 by Congcong Zheng, Feng Dai, Hao Jiang, Honglong Zhao, Jingbin You, Shuqin Gao, Tianlu Mao, Xinzhu Ma, Yucheng Zhang, Zehao Li, Zhaoqi Wang.

**Figure 1.** Figure 1: Motivation illustration. (a) Flat contrastive learning isolates feature spaces for object wholes and parts, limiting their hierarchical interaction. (b) Fused contrastive learning merges feature spaces but suffers from oversaturation and instability due to dense pairwise comparisons. (c) Cascaded contrastive learning (our method) preserves semantic hierarchy while minimizing contrastive redundancy, enablin… view at source ↗

**Figure 2.** Figure 2: Overview of our method. (a) Constructing object-tree from multi-view images using SAM to capture structured relationships between object parts and wholes. (b) Two-stage cascaded contrastive learning strategy to progressively optimize the instance feature of each Gaussian point. (c) Graph-based denoising is applied to each languagemapped instance cluster for improving the multi-view rendering quality. – Ma… view at source ↗

**Figure 3.** Figure 3: Consistent Segmentation Detection (CSD) for local contrastive learning. The blue curve shows the raw Split Number and the red curve shows the smoothed reference across views. Views where the blue curve lies above the red reference are treated as over segmentation (apply only L 2 pull), while views where it lies below are treated as under segmentation (apply only L 2 push). When the blue curve is close to t… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of the rendered instance feature maps. Our method achieves better global feature consistency across objects (cup and spoon) at the whole scale and exhibits clearer feature separation at the part scale. Lerf_ovs dataset [33], which covers four scenes (figurines, teatime, ramen, and waldo_kitchen) with annotated pixel-level semantic labels. We computed the mIoU and mAcc@0.25 scores by … view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of rendered local objects. Our method produces cleaner and more accurate segmentation results compared to baselines, effectively reducing noise and preserving fine-grained structures. OpenGaussian (whole) TreeGaussian (whole) OpenGaussian (part) TreeGaussian (part) OpenGaussian (whole) TreeGaussian (whole) OpenGaussian (part) TreeGaussian (part) [PITH_FULL_IMAGE:figures/full_fig_p01… view at source ↗

**Figure 6.** Figure 6: Qualitative comparison of click-based 3D object selection. Compared to OpenGaussian, our method produces more accurate and hierarchically consistent results at both whole and part scales. archical relationships, flat contrastive strategies are unable to produce accurate and coherent instance segmentation. While OmniSeg3D-GS unifies whole and part feature spaces, the large number of redundant contrastive p… view at source ↗

**Figure 7.** Figure 7: Qualitative ablation results with Consistent Segmentation Detection (CSD) mechanism and Graph-based Gaussian Points Denoising. The results demonstrate that CSD reduces oversegmentation and enhances consistency, while the denoising module effectively suppresses distant clutter and improves clarity. model to adapt to under-segmentation and over-segmentation modes by selectively applying L 2 pull or L 2 push… view at source ↗

read the original abstract

3D Gaussian Splatting (3DGS) has emerged as a real-time, differentiable representation for neural scene understanding. However, existing 3DGS-based methods struggle to represent hierarchical 3D semantic structures and capture whole-part relationships in complex scenes. Moreover, dense pairwise comparisons and inconsistent hierarchical labels from 2D priors hinder feature learning, resulting in suboptimal segmentation. To address these limitations, we introduce TreeGaussian, a tree-guided cascaded contrastive learning framework that explicitly models hierarchical semantic relationships and reduces redundancy in contrastive supervision. By constructing a multi-level object tree, TreeGaussian enables structured learning across object-part hierarchies. In addition, we propose a two-stage cascaded contrastive learning strategy that progressively refines feature representations from global to local, mitigating saturation and stabilizing training. A Consistent Segmentation Detection (CSD) mechanism and a graph-based denoising module are further introduced to align segmentation modes across views while suppressing unstable Gaussian points, enhancing segmentation consistency and quality. Extensive experiments, including open-vocabulary 3D object selection, 3D point cloud understanding, and ablation studies, demonstrate the effectiveness and robustness of our approach.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TreeGaussian adds a tree-guided cascaded contrastive setup to 3D Gaussian Splatting for hierarchical segmentation, but the abstract supplies no numbers or error checks so the practical gains stay unproven.

read the letter

The paper's main move is to build an explicit multi-level object tree from 2D priors and then run contrastive learning in two cascaded stages, global then local, inside 3DGS. This directly targets the flatness and redundancy problems in earlier 3DGS segmentation work. The CSD mechanism plus graph denoising are added to keep segmentations consistent across views and drop unstable points. If those pieces work together, the approach could give cleaner part-whole structure without extra supervision overhead. That combination is not just a routine add-on to existing contrastive or Gaussian papers. The experiments listed—open-vocabulary selection, point-cloud tasks, and ablations—cover a sensible set of checks. The idea itself is concrete and builds on real prior components rather than inventing new primitives from scratch. The soft spot is the tree construction step. The abstract notes that 2D hierarchical labels are often inconsistent, yet it offers no sensitivity numbers, noise ablation, or propagation bounds. If the tree edges misalign across views, the cascaded losses could lock in those errors instead of smoothing them. Without those checks the central claim rests on an assumption that still needs testing. This is for people already working with 3DGS who need hierarchical output for robotics or AR. It is coherent enough on its own terms to go to a serious referee, though the review will have to focus hard on the experimental tables and robustness results.

Referee Report

2 major / 2 minor

Summary. The paper introduces TreeGaussian, a tree-guided cascaded contrastive learning framework for 3D Gaussian Splatting that constructs a multi-level object tree from 2D hierarchical labels, applies a two-stage global-to-local contrastive strategy, and incorporates a Consistent Segmentation Detection (CSD) mechanism plus graph-based denoising to improve hierarchical consistency and reduce redundancy in segmentation supervision.

Significance. If the central claims hold, the approach could advance structured 3D scene understanding by explicitly modeling object-part hierarchies in real-time Gaussian representations, with potential benefits for open-vocabulary selection and point-cloud tasks. The cascaded contrastive design and CSD module represent targeted innovations over standard contrastive baselines in 3DGS, but the absence of quantitative metrics, ablation tables, or error-propagation analysis in the provided description makes it difficult to gauge the magnitude of improvement or robustness.

major comments (2)

[Method (tree construction and cascaded contrastive strategy)] The central claim depends on reliable construction of a multi-level object tree from inconsistent 2D priors, yet the manuscript provides no quantitative sensitivity analysis, ablation on label noise levels, or bounds on error propagation across views. If tree edges misalign, the global-to-local contrastive losses risk reinforcing rather than correcting inconsistencies, directly undermining the hierarchical consistency benefit.
[Experiments and results] No numerical results, ablation tables, or error analysis are referenced to support the claims of enhanced segmentation consistency and quality. The abstract and description assert effectiveness from experiments on open-vocabulary selection and point-cloud understanding, but without reported metrics (e.g., mIoU, consistency scores) or baseline comparisons, the support for the central claims cannot be verified.

minor comments (2)

[Method] Clarify the precise definition and implementation details of the Consistent Segmentation Detection (CSD) mechanism and graph-based denoising module, including how they interact with the contrastive losses.
[Figures and tables] Ensure all figures and tables include clear captions, axis labels, and statistical significance indicators to aid interpretation of any ablation or comparison results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for the constructive feedback on TreeGaussian. We address each major comment below and will revise the manuscript to incorporate additional analysis and clearer experimental reporting as suggested.

read point-by-point responses

Referee: [Method (tree construction and cascaded contrastive strategy)] The central claim depends on reliable construction of a multi-level object tree from inconsistent 2D priors, yet the manuscript provides no quantitative sensitivity analysis, ablation on label noise levels, or bounds on error propagation across views. If tree edges misalign, the global-to-local contrastive losses risk reinforcing rather than correcting inconsistencies, directly undermining the hierarchical consistency benefit.

Authors: We appreciate this concern regarding robustness to inconsistent 2D priors. The multi-level object tree is built by aggregating hierarchical labels from multiple views via a graph structure that identifies consistent nodes, with the CSD mechanism explicitly detecting and enforcing segmentation consistency across views while the graph-based denoising removes unstable Gaussians. The cascaded global-to-local contrastive losses are intended to progressively correct rather than propagate errors. We acknowledge the absence of explicit sensitivity analysis in the current version and will add a new ablation subsection quantifying performance under varying label noise levels (simulated by random label flips) and providing empirical bounds on error propagation (measured via consistency scores before/after each stage). This revision will directly demonstrate that the framework mitigates misalignment. revision: yes
Referee: [Experiments and results] No numerical results, ablation tables, or error analysis are referenced to support the claims of enhanced segmentation consistency and quality. The abstract and description assert effectiveness from experiments on open-vocabulary selection and point-cloud understanding, but without reported metrics (e.g., mIoU, consistency scores) or baseline comparisons, the support for the central claims cannot be verified.

Authors: We apologize that the quantitative results were not sufficiently highlighted or cross-referenced in the version reviewed. The full manuscript contains ablation tables comparing the cascaded strategy and CSD module, along with numerical metrics including mIoU on 3D segmentation, view-consistency scores, and baseline comparisons for open-vocabulary object selection and point-cloud understanding tasks. In the revision we will add explicit in-text references to these tables/figures, include error bars and propagation analysis, and expand the results section to make all supporting numbers immediately verifiable. revision: yes

Circularity Check

0 steps flagged

No circularity detected; method is algorithmic extension of existing 3DGS and contrastive learning

full rationale

The paper presents TreeGaussian as a new tree-guided cascaded contrastive framework built on 3D Gaussian Splatting and standard contrastive learning. The abstract and description outline construction of a multi-level object tree from 2D priors, a two-stage cascaded contrastive strategy, CSD mechanism, and graph denoising without any equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claims to inputs by construction. No self-definitional loops, uniqueness theorems imported from authors, or ansatzes smuggled via citation appear in the provided text. The derivation chain consists of independent structural additions whose effectiveness is claimed to be shown via experiments, making the paper self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claim rests on the domain assumption that 2D image priors can supply usable hierarchical labels and that contrastive saturation can be avoided by staging the loss; no free parameters or invented physical entities are explicitly introduced in the abstract.

axioms (2)

domain assumption Hierarchical semantic structures in 3D scenes can be represented by a multi-level object tree derived from 2D priors
Invoked when the paper states that constructing the tree enables structured learning across object-part hierarchies.
domain assumption Cascaded global-to-local contrastive learning mitigates saturation and stabilizes training
Stated as the rationale for the two-stage strategy.

invented entities (2)

Consistent Segmentation Detection (CSD) mechanism no independent evidence
purpose: Align segmentation modes across views
New module introduced to enforce cross-view consistency; no independent evidence outside the method is provided.
Graph-based denoising module no independent evidence
purpose: Suppress unstable Gaussian points
New component for cleaning the representation; no external validation mentioned.

pith-pipeline@v0.9.0 · 5540 in / 1536 out tokens · 61651 ms · 2026-05-14T00:13:57.848329+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By constructing a multi-level object tree, TreeGaussian enables structured learning across object-part hierarchies... two-stage cascaded contrastive learning strategy that progressively refines feature representations from global to local

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

In: European conference on computer vision

Aliev, K.A., Sevastopolsky, A., Kolos, M., Ulyanov, D., Lempitsky, V.: Neu- ral point-based graphics. In: European conference on computer vision. pp. 696–712. Springer (2020)

work page 2020
[2]

In: Eu- ropean Conference on Computer Vision

Bhalgat, Y., Laina, I., Henriques, J.F., Zisserman, A., Vedaldi, A.: N2f2: Hierarchical scene understanding with nested neural feature fields. In: Eu- ropean Conference on Computer Vision. pp. 197–214. Springer (2024)

work page 2024
[4]

In: Proceedings of the IEEE/CVF international conference on computer vision

Caron, M., Touvron, H., Misra, I., Jégou, H., Mairal, J., Bojanowski, P., Joulin, A.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 9650–9660 (2021)

work page 2021
[5]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Cen, J., Fang, J., Yang, C., Xie, L., Zhang, X., Shen, W., Tian, Q.: Segment any 3d gaussians. In: Proceedings of the AAAI Conference on Artificial Intelligence. pp. 1971–1979 (2025)

work page 1971
[6]

Advances in Neural Information Processing Systems36, 25971–25990 (2023)

Cen, J., Zhou, Z., Fang, J., Shen, W., Xie, L., Jiang, D., Zhang, X., Tian, Q., et al.: Segment anything in 3d with nerfs. Advances in Neural Information Processing Systems36, 25971–25990 (2023)

work page 2023
[7]

In: European Conference on Computer Vision

Choi, S., Song, H., Kim, J., Kim, T., Do, H.: Click-gaussian: Interactive segmentation to any 3d gaussians. In: European Conference on Computer Vision. pp. 289–305. Springer (2024)

work page 2024
[8]

In: Proceed- ings of the IEEE conference on computer vision and pattern recognition

Dai, A., Chang, A.X., Savva, M., Halber, M., Funkhouser, T., Nießner, M.: Scannet: Richly-annotated 3d reconstructions of indoor scenes. In: Proceed- ings of the IEEE conference on computer vision and pattern recognition. pp. 5828–5839 (2017) 16 You et al

work page 2017
[9]

Foley,J.D.:Computergraphics:principlesandpractice,vol.12110.Addison- Wesley Professional (1996)

work page 1996
[11]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., Kanazawa, A.: Plenoxels: Radiance fields without neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 5501–5510 (2022)

work page 2022
[12]

The Uni- versity of North Carolina at Chapel Hill (2000)

Gottschalk, S.A.: Collision queries using oriented bounding boxes. The Uni- versity of North Carolina at Chapel Hill (2000)

work page 2000
[13]

In: Robotics: science and systems

Guadarrama, S., Rodner, E., Saenko, K., Zhang, N., Farrell, R., Donahue, J., Darrell, T.: Open-vocabulary object retrieval. In: Robotics: science and systems. vol. 2, p. 6 (2014)

work page 2014
[14]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Han, L., Zheng, T., Xu, L., Fang, L.: Occuseg: Occupancy-aware 3d instance segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 2940–2949 (2020)

work page 2020
[15]

IEEE Robotics and Automation Letters7(2), 2913–2920 (2022)

Humblot-Renaux, G., Marchegiani, L., Moeslund, T.B., Gade, R.: Navigation-oriented scene understanding for robotic autonomy: Learning to segment driveability in egocentric images. IEEE Robotics and Automation Letters7(2), 2913–2920 (2022)

work page 2022
[16]

ACM Trans

Kerbl, B., Kopanas, G., Leimkühler, T., Drettakis, G.: 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph.42(4), 139–1 (2023)

work page 2023
[17]

In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision

Kerr, J., Kim, C.M., Goldberg, K., Kanazawa, A., Tancik, M.: Lerf: Lan- guage embedded radiance fields. In: Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision. pp. 19729–19739 (2023)

work page 2023
[18]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Kim, C.M., Wu, M., Kerr, J., Goldberg, K., Tancik, M., Kanazawa, A.: Garfield: Group anything with radiance fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21530–21539 (2024)

work page 2024
[19]

In: Proceedings of the IEEE/CVF international conference on computer vision

Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., et al.: Segment anything. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 4015–4026 (2023)

work page 2023
[20]

Advances in Neural Information Processing Sys- tems35, 30233–30249 (2022)

Kusupati, A., Bhatt, G., Rege, A., Wallingford, M., Sinha, A., Ramanujan, V., Howard-Snyder, W., Chen, K., Kakade, S., Jain, P., et al.: Matryoshka representation learning. Advances in Neural Information Processing Sys- tems35, 30233–30249 (2022)

work page 2022
[21]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Lassner, C., Zollhofer, M.: Pulsar: Efficient sphere-based neural rendering. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 1440–1449 (2021)

work page 2021
[22]

Language-driven Semantic Segmentation

Li, B., Weinberger, K.Q., Belongie, S., Koltun, V., Ranftl, R.: Language- driven semantic segmentation. arXiv preprint arXiv:2201.03546 (2022) TreeGaussian 17

work page internal anchor Pith review arXiv 2022
[23]

In: Proceedings of the Computer Vision and Pat- tern Recognition Conference

Li, H., Wu, Y., Meng, J., Gao, Q., Zhang, Z., Wang, R., Zhang, J.: In- stancegaussian: Appearance-semantic joint gaussian representation for 3d instance-level perception. In: Proceedings of the Computer Vision and Pat- tern Recognition Conference. pp. 14078–14088 (2025)

work page 2025
[24]

In: In- ternational conference on machine learning

Li, J., Li, D., Savarese, S., Hoi, S.: Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. In: In- ternational conference on machine learning. pp. 19730–19742. PMLR (2023)

work page 2023
[25]

In: ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation (2024)

Li, Y., Pathak, D.: Object-aware gaussian splatting for robotic manipula- tion. In: ICRA 2024 Workshop on 3D Visual Representations for Robot Manipulation (2024)

work page 2024
[26]

Advances in neural information processing systems32(2019)

Liu, Z., Tang, H., Lin, Y., Han, S.: Point-voxel cnn for efficient 3d deep learning. Advances in neural information processing systems32(2019)

work page 2019
[27]

In: Proceedings of the Fifth Berkeley Symposium on Mathe- matical Statistics and Probability, Volume 1: Statistics

MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathe- matical Statistics and Probability, Volume 1: Statistics. vol. 5, pp. 281–298. University of California press (1967)

work page 1967
[28]

In: 2017 IEEE International Conference on Robotics and automation (ICRA)

McCormac, J., Handa, A., Davison, A., Leutenegger, S.: Semanticfusion: Dense 3d semantic mapping with convolutional neural networks. In: 2017 IEEE International Conference on Robotics and automation (ICRA). pp. 4628–4635. IEEE (2017)

work page 2017
[29]

Communications of the ACM65(1), 99–106 (2021)

Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM65(1), 99–106 (2021)

work page 2021
[30]

IEEE transactions on robotics33(5), 1255–1262 (2017)

Mur-Artal, R., Tardós, J.D.: Orb-slam2: An open-source slam system for monocular, stereo, and rgb-d cameras. IEEE transactions on robotics33(5), 1255–1262 (2017)

work page 2017
[31]

DINOv2: Learning Robust Visual Features without Supervision

Oquab, M., Darcet, T., Moutakanni, T., Vo, H., Szafraniec, M., Khali- dov, V., Fernandez, P., Haziza, D., Massa, F., El-Nouby, A., et al.: Di- nov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Peng, S., Genova, K., Jiang, C., Tagliasacchi, A., Pollefeys, M., Funkhouser, T., et al.: Openscene: 3d scene understanding with open vocabularies. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 815–824 (2023)

work page 2023
[33]

In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition

Qin, M., Li, W., Zhou, J., Wang, H., Pfister, H.: Langsplat: 3d language gaussian splatting. In: Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition. pp. 20051–20060 (2024)

work page 2024
[34]

In: International conference on machine learning

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sas- try, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)

work page 2021
[35]

The International Journal of Robotics Research27(2), 157–173 (2008)

Saxena, A., Driemeyer, J., Ng, A.Y.: Robotic grasping of novel objects us- ing vision. The International Journal of Robotics Research27(2), 157–173 (2008)

work page 2008
[36]

In: Proceedings of 18 You et al

Schult, J., Engelmann, F., Kontogianni, T., Leibe, B.: Dualconvmesh-net: Joint geodesic and euclidean convolutions on 3d meshes. In: Proceedings of 18 You et al. the IEEE/CVF conference on computer vision and pattern recognition. pp. 8612–8622 (2020)

work page 2020
[37]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Shi, J.C., Wang, M., Duan, H.B., Guan, S.H.: Language embedded 3d gaussians for open-vocabulary scene understanding. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5333–5343 (2024)

work page 2024
[38]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Valentin, J.P., Sengupta, S., Warrell, J., Shahrokni, A., Torr, P.H.: Mesh based semantic modelling for indoor and outdoor scenes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2067–2074 (2013)

work page 2067
[39]

In: International conference on machine learning

Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: International conference on machine learning. pp. 9929–9939. PMLR (2020)

work page 2020
[40]

In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C

Wu, Y., Meng, J., Li, H., Wu, C., Shi, Y., Cheng, X., Zhao, C., Feng, H., Ding, E., Wang, J., Zhang, J.: Opengaussian: Towards point-level 3d gaussian-based open vocabulary understanding. In: Globerson, A., Mackey, L., Belgrave, D., Fan, A., Paquet, U., Tomczak, J., Zhang, C. (eds.) Ad- vances in Neural Information Processing Systems. vol. 37, pp. 19114–1...

work page 2024
[41]

Advances in neural information processing systems32(2019)

Yang, B., Wang, J., Clark, R., Hu, Q., Wang, S., Markham, A., Trigoni, N.: Learning object bounding boxes for 3d instance segmentation on point clouds. Advances in neural information processing systems32(2019)

work page 2019
[42]

arXiv preprint arXiv:2405.00676 (2024)

Yang, R., Zhu, Z., Jiang, Z., Ye, B., Chen, X., Zhang, Y., Chen, Y., Zhao, J., Zhao, H.: Spectrally pruned gaussian fields with neural compensation. arXiv preprint arXiv:2405.00676 (2024)

work page arXiv 2024
[43]

In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recogni- tion

Yi, L., Zhao, W., Wang, H., Sung, M., Guibas, L.J.: Gspn: Generative shape proposal network for 3d instance segmentation in point cloud. In: Proceed- ings of the IEEE/CVF conference on computer vision and pattern recogni- tion. pp. 3947–3956 (2019)

work page 2019
[44]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Ying, H., Yin, Y., Zhang, J., Wang, F., Yu, T., Huang, R., Fang, L.: Om- niseg3d: Omniversal 3d segmentation via hierarchical contrastive learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 20612–20622 (2024)

work page 2024
[45]

IEEE Robotics and Automation Letters (2024)

Zheng, Y., Chen, X., Zheng, Y., Gu, S., Yang, R., Jin, B., Li, P., Zhong, C., Wang,Z.,Liu,L.,etal.:Gaussiangrasper:3dlanguagegaussiansplattingfor open-vocabulary robotic grasping. IEEE Robotics and Automation Letters (2024)

work page 2024
[46]

three cookies

Zhou, S., Chang, H., Jiang, S., Fan, Z., Zhu, Z., Xu, D., Chari, P., You, S., Wang, Z., Kadambi, A.: Feature 3dgs: Supercharging 3d gaussian splat- ting to enable distilled feature fields. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 21676–21685 (2024) TreeGaussian 1 Supplementary Material A Implementation Det...

work page 2024