arxiv: 2604.26262 · v2 · submitted 2026-04-29 · 💻 cs.CV

Recognition: unknown

Semantic Foam: Unifying Spatial and Semantic Scene Decomposition

Amr Sharafeldin , Shrisudhan Govindarajan , Thomas Walker , Aryan Mikaeili , Daniel Rebain , Kwang Moo Yi , Andrea Tagliasacchi

Authors on Pith no claims yet

Pith reviewed 2026-05-07 13:37 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic scene decompositionVoronoi meshscene reconstructionobject segmentationspatial regularizationnovel view synthesis3D representation

0 comments

The pith

Semantic Foam attaches semantic features to Voronoi cells for consistent object segmentation in reconstructed scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Scene reconstruction methods deliver detailed 3D visuals but struggle to add reliable semantic labels without creating artifacts or inconsistencies. Semantic Foam extends a Voronoi-based decomposition by assigning semantic features directly to each cell. This explicit cell-level structure allows straightforward spatial regularization that counters occlusion and view-to-view supervision mismatches. Experiments demonstrate stronger object-level segmentation than prior techniques. If the approach holds, reconstructed models become more suitable for applications that require both photorealistic rendering and usable object understanding.

Core claim

The paper claims that integrating Radiant Foam's natural spatial volumetric Voronoi mesh with an explicit semantic feature field parameterized at the cell level enables direct spatial regularization. This prevents artifacts caused by occlusion or inconsistent supervision across views, which are common in other point-based representations, and yields superior object-level segmentation performance.

What carries the argument

Cell-level semantic feature field attached to the Voronoi mesh cells.

If this is right

Superior object-level segmentation compared to methods such as Gaussian Grouping.
Reduced artifacts from occlusion and inconsistent multi-view supervision.
Direct spatial regularization becomes feasible because features live on the volumetric cells.
The base real-time rendering speed and quality remain available alongside the new semantic output.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The cell-wise structure could simplify post-hoc editing of semantic labels without retraining the geometry.
Combined spatial-semantic output may support downstream tasks such as 3D object manipulation or scene editing in interactive graphics.
The same cell parameterization might transfer to other volumetric decompositions beyond the original foam representation.

Load-bearing premise

Attaching semantic features to the Voronoi cells will preserve the original rendering quality while delivering consistent segmentation without new artifacts or loss of detail.

What would settle it

A multi-view dataset with known occlusions where either novel-view PSNR drops below the non-semantic baseline or cross-view segmentation labels show visible inconsistencies after training.

Figures

Figures reproduced from arXiv: 2604.26262 by Amr Sharafeldin, Andrea Tagliasacchi, Aryan Mikaeili, Daniel Rebain, Kwang Moo Yi, Shrisudhan Govindarajan, Thomas Walker.

**Figure 1.** Figure 1: Teaser – We propose Semantic Foam, a semantically decomposed 3D representation for scenes. Based on the Radiant Foam [12] model, our method extends its spatial Voronoi decomposition to also separate space into semantically distinct regions. This decomposition is regularized to extend into the empty space immediately surrounding objects (top), which enables clean extraction (bottom) and insertion edits th… view at source ↗

**Figure 2.** Figure 2: Overview – Our method builds on Radiant Foam, adding an extra supervision channel in the form of segmentation masks predicted by image segmentation models. Using these masks alongside the original images (left), Semantic Foam constructs a volumetric mesh-based radiance field along with per-point semantic identity features (center). Using these semantic features we can perform editing operations like object… view at source ↗

**Figure 3.** Figure 3: Semantic Foam Training – The training pipeline of Semantic Foam consists of two primary stages: (left) we begin by preparing the inputs using DEVA in everything mode to automatically generate masks for all training views; (middle) given these masked multi-view images, we jointly optimize all properties of the 3D Voronoi cells—including their identity encodings – via differentiable rendering, supervised by … view at source ↗

**Figure 4.** Figure 4: Qualitative results – We present qualitative comparisons of our object-extraction results against Gaussian Grouping [44] and SAGA [4]. As illustrated by the extracted pot and leaves, Gaussian-based approaches frequently over- or under-segment object regions, whereas our method produces precise, well-bounded object masks that more faithfully capture true object structure view at source ↗

**Figure 5.** Figure 5: Scene editing – We demonstrate our method’s ability to edit scenes by insertion (middle), and deletion (right) of objects, with the (left) view showing the unedited reference image. Here, the original scene is the Figurines sequence from LERF-Masked [20], while the inserted object is from the Kitchen scene in Mip-NeRF 360 [2]. This is achieved using the learned identity features, and allows for moving obje… view at source ↗

**Figure 6.** Figure 6: Object extraction comparison – We demonstrate the extraction of the table and pot from the Garden scene [2] through a corresponding point-cloud visualization. Unlike Gaussian Grouping – which is restricted by convex-hull–based extraction – our representation successfully handles a broad range of object geometries, including highly non-convex shapes, where Gaussian Grouping consistently fails. W/O LTV W LTV view at source ↗

**Figure 7.** Figure 7: Ablation – We qualitatively assess the influence of the Total Variation loss on our method by examining the object mask of the pot from the Garden scene [2]. Without this regularization, the model exhibits a substantial decline in segmentation quality, producing object masks that fail to capture the complete object. the TV loss yields noticeably cleaner and more structured identity encodings, which in turn… view at source ↗

**Figure 8.** Figure 8: Extra Qualitative Results. We present additional qualitative comparisons of our object - extraction results against Gaussian Grouping [44] and SAGA [4]. As demonstrated by the leaves and teatime extractions, Gaussian-based baselines often exhibit inconsistent segmentation boundaries; conversely, our approach generates sharp, accurately bounded masks that more faithfully preserve the integrity of the object… view at source ↗

**Figure 9.** Figure 9: Scene editing (insertion) – Comparison of object insertion between our semantic foam representation (middle) and Gaussian Grouping (right), with the (left) view showing the unedited reference image. Leveraging Radiant Foam’s implicit surface formulation, our method defines accurate non-convex 3D object masks without requiring convex-hull post-processing. As shown, our approach cleanly inserts the toy and l… view at source ↗

**Figure 10.** Figure 10: Scene editing (deletion) – Comparison of object deletion between our semantic foam representation (middle) and Gaussian Grouping (right), with the (left) view showing the unedited reference image (blue star denotes the object selected for deletion). Leveraging Radiant Foam’s implicit surface formulation, our method defines accurate non-convex 3D object masks without requiring convex-hull post-processing. … view at source ↗

read the original abstract

Modern scene reconstruction methods, such as 3D Gaussian Splatting, deliver photo-realistic novel view synthesis at real-time speeds, yet their adoption in interactive graphics applications has been limited. A major bottleneck is the difficulty of interacting with these representations compared to traditional, human-authored 3D assets. While previous research has attempted to impose semantic decomposition on these models, significant challenges remain regarding segmentation quality and consistency. To address this, we introduce Semantic Foam, extending the recently proposed Radiant Foam representations to semantic decomposition tasks. Our approach integrates the natural spatial volumetric decomposition of Radiant Foam's Voronoi mesh with an explicit semantic feature field parameterized at the cell level. This explicit structure enables direct spatial regularization, which prevents artifacts caused by occlusion or inconsistent supervision across views - common pitfalls for other point-based representations. Experimental results show that our method achieves superior object-level segmentation performance compared to state-of-the-art methods like Gaussian Grouping and SAGA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Semantic Foam layers per-cell semantic features onto Radiant Foam's Voronoi mesh to add spatial regularization for consistent segmentation.

read the letter

Semantic Foam layers per-cell semantic features onto Radiant Foam's Voronoi mesh to add spatial regularization for consistent segmentation. The move uses the existing volumetric decomposition from the prior work and attaches an explicit feature field at the cell level so that a regularization term can directly penalize view-to-view inconsistencies. That structure is a straightforward extension and gives a clean way to enforce spatial smoothness without inventing new primitives on top of the radiance fit. If the full results hold, it should reduce the bleeding and flickering that show up in point-based methods when supervision is incomplete across views. The paper does a decent job framing the practical bottleneck in interactive use of these representations. The explicit parameterization is the part that feels new and worth testing. The main soft spot is the unexamined fit between radiance-driven cells and semantic boundaries. Voronoi cells are shaped by the radiance optimization, so a single cell can easily straddle two objects or swallow fine details. Regularization on the features can enforce consistency but cannot fix misalignment after the fact; it may just blur labels or push the mesh in directions that hurt novel-view quality. The abstract claims better object-level segmentation than Gaussian Grouping and SAGA, yet supplies no numbers, ablations, or setup details, so the size of the gain is still unclear. This is for readers already working with neural scene representations who need editable outputs for AR or robotics. The thinking is direct and the method stays grounded in the prior representation rather than overclaiming a new framework. I would send it for peer review. The extension is concrete enough that referees can check the alignment assumption and the missing quantitative comparisons.

Referee Report

2 major / 0 minor

Summary. The paper claims to introduce Semantic Foam by extending Radiant Foam's Voronoi-based spatial decomposition with a semantic feature field at the cell level. This explicit parameterization allows spatial regularization to achieve consistent semantic segmentation without artifacts from occlusion or inconsistent multi-view supervision, and experimental results purportedly demonstrate superior performance compared to Gaussian Grouping and SAGA.

Significance. If substantiated, this could provide a valuable unification of spatial and semantic decomposition in real-time scene representations, facilitating better interactivity in graphics applications. The explicit structure is a strength that could avoid common pitfalls in point-based semantic methods.

major comments (2)

[Abstract] The abstract asserts superior segmentation performance over Gaussian Grouping and SAGA, but the manuscript provides no metrics, experimental setup, ablation studies, or quantitative results to support this claim.
[Proposed Approach] The central assumption that attaching semantic features to radiance-optimized Voronoi cells will preserve rendering quality and yield consistent segmentation is unexamined. The Voronoi tessellation may not align with semantic boundaries, potentially causing detail loss or inconsistent segments when cells overlap multiple objects, which directly impacts the claim that spatial regularization prevents artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] The abstract asserts superior segmentation performance over Gaussian Grouping and SAGA, but the manuscript provides no metrics, experimental setup, ablation studies, or quantitative results to support this claim.

Authors: We agree that the abstract summarizes a claim of superior performance that requires full substantiation in the manuscript. The current version references experimental results but does not present the supporting quantitative metrics, experimental setups, or ablation studies. In the revised manuscript, we will expand the Experiments section to include these elements, such as mIoU scores, segmentation consistency measures, detailed comparisons against Gaussian Grouping and SAGA, and ablations on the spatial regularization, to directly support the abstract claims. revision: yes
Referee: [Proposed Approach] The central assumption that attaching semantic features to radiance-optimized Voronoi cells will preserve rendering quality and yield consistent segmentation is unexamined. The Voronoi tessellation may not align with semantic boundaries, potentially causing detail loss or inconsistent segments when cells overlap multiple objects, which directly impacts the claim that spatial regularization prevents artifacts.

Authors: The manuscript explains that the explicit cell-level semantic features combined with spatial regularization avoid occlusion and view-inconsistency artifacts common in point-based methods. We acknowledge, however, that the potential for Voronoi cells to span semantic boundaries and the resulting effects on detail or consistency were not explicitly examined or analyzed. In the revision, we will add to the Proposed Approach section a dedicated analysis of cell-semantic alignment, including boundary visualizations and overlap metrics, plus targeted experiments showing that regularization still delivers consistent segmentation in multi-object cell cases. This will strengthen the justification for the approach. revision: partial

Circularity Check

0 steps flagged

No circularity: extension of external prior with independent regularization and experimental validation

full rationale

The derivation chain begins with the external Radiant Foam Voronoi mesh (cited as recently proposed prior work) and adds a new per-cell semantic feature field plus spatial regularization term. Neither the feature attachment nor the regularization is defined in terms of the target segmentation outputs; the mesh geometry remains fixed from the radiance stage while semantics are optimized separately. Performance claims rest on comparative experiments against Gaussian Grouping and SAGA rather than any fitted parameter being relabeled as a prediction or any uniqueness theorem imported from self-citation. No equation reduces the claimed consistency or artifact prevention to a tautology of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Based solely on the abstract; full paper may contain additional details on parameters or assumptions.

axioms (1)

domain assumption Radiant Foam representations provide a natural spatial volumetric decomposition via Voronoi mesh suitable for extension to semantics
Invoked as the foundation for the new semantic integration in the abstract.

invented entities (1)

Semantic feature field parameterized at the cell level no independent evidence
purpose: To enable explicit semantic decomposition and direct spatial regularization within the Voronoi structure
Newly introduced component to address limitations of point-based semantic methods.

pith-pipeline@v0.9.0 · 5480 in / 1291 out tokens · 59880 ms · 2026-05-07T13:37:11.835822+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 7 canonical work pages · 2 internal anchors

[1]

Zero-shot 3d shape correspon- dence

Ahmed Abdelreheem, Abdelrahman Eldesokey, Maks Ovs- janikov, and Peter Wonka. Zero-shot 3d shape correspon- dence. InSIGGRAPH Asia, 2023. 2

2023
[2]

Barron, Ben Mildenhall, Dor Verbin, Pratul P

Jonathan T. Barron, Ben Mildenhall, Dor Verbin, Pratul P. Srinivasan, and Peter Hedman. Mip-nerf 360: Unbounded anti-aliased neural radiance fields.CVPR, 2022. 5, 6, 7, 8, 11

2022
[3]

Emerg- ing properties in self-supervised vision transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herv´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers. InPro- ceedings of the International Conference on Computer Vision (ICCV), 2021. 2, 3

2021
[4]

Segment any 3d gaussians

Jiazhong Cen, Jiemin Fang, Chen Yang, Lingxi Xie, Xiaopeng Zhang, Wei Shen, and Qi Tian. Segment any 3d gaussians. arXiv preprint arXiv:2312.00860, 2023. 2, 3, 6, 7, 11, 13

work page arXiv 2023
[5]

Segment anything in 3d with nerfs

Jiazhong Cen, Zanwei Zhou, Jiemin Fang, Chen Yang, Wei Shen, Lingxi Xie, Xiaopeng Zhang, and Qi Tian. Segment anything in 3d with nerfs. InNeurIPS, 2023. 2, 3

2023
[6]

Bridging the domain gap: Self- supervised 3d scene understanding with foundation models

Zhimin Chen and Bing Li. Bridging the domain gap: Self- supervised 3d scene understanding with foundation models. arXiv preprint arXiv:2305.08776, 2023. 2

work page arXiv 2023
[7]

Tracking anything with de- coupled video segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, and Joon-Young Lee. Tracking anything with de- coupled video segmentation. InICCV, 2023. 2, 4, 5, 6

2023
[8]

4d spatio-temporal convnets: Minkowski convolutional neural networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. InProceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pages 3075–3084, 2019. 2

2019
[9]

Open- NeRF: Open Set 3D Neural Scene Segmentation with Pixel- Wise Features and Rendered Novel Views

Francis Engelmann, Fabian Manhardt, Michael Niemeyer, Keisuke Tateno, Marc Pollefeys, and Federico Tombari. Open- NeRF: Open Set 3D Neural Scene Segmentation with Pixel- Wise Features and Rendered Novel Views. InInternational Conference on Learning Representations, 2024. 3

2024
[10]

Scaling open-vocabulary image segmentation with image-level labels

Golnaz Ghiasi, Xiuye Gu, Yin Cui, and Tsung-Yi Lin. Scaling open-vocabulary image segmentation with image-level labels. InECCV, 2022. 3

2022
[11]

Narayanan

Rahul Goel, Dhawal Sirikonda, Saurabh Saini, and P.J. Narayanan. Interactive Segmentation of Radiance Fields. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023. 2

2023
[12]

Radiant foam: Real-time differen- tiable ray tracing.arXiv:2502.01157, 2025

Shrisudhan Govindarajan, Daniel Rebain, Kwang Moo Yi, and Andrea Tagliasacchi. Radiant foam: Real-time differen- tiable ray tracing.arXiv:2502.01157, 2025. 1, 2, 3, 11

work page arXiv 2025
[13]

3d semantic segmentation with submanifold sparse convolutional networks.CVPR, 2018

Benjamin Graham, Martin Engelcke, and Laurens van der Maaten. 3d semantic segmentation with submanifold sparse convolutional networks.CVPR, 2018. 2

2018
[14]

Semantic abstraction: Open-world 3D scene understanding from 2D vision-language models

Huy Ha and Shuran Song. Semantic abstraction: Open-world 3D scene understanding from 2D vision-language models. In Proceedings of the 2022 Conference on Robot Learning, 2022. 2

2022
[15]

Meshcnn: A network with an edge.ACM Transactions on Graphics (TOG), 38(4):90:1– 90:12, 2019

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. Meshcnn: A network with an edge.ACM Transactions on Graphics (TOG), 38(4):90:1– 90:12, 2019. 2

2019
[16]

Gaus- siancut: Interactive segmentation via graph cut for 3d gaus- sian splatting, 2024

Umangi Jain, Ashkan Mirzaei, and Igor Gilitschenski. Gaus- siancut: Interactive segmentation via graph cut for 3d gaus- sian splatting, 2024. 2, 3

2024
[17]

Learning 3d mesh segmentation and labeling.ACM Trans

Evangelos Kalogerakis, Aaron Hertzmann, and Karan Singh. Learning 3d mesh segmentation and labeling.ACM Trans. Graph., 29(4), 2010. 2

2010
[18]

Hierarchical mesh decomposition using fuzzy clustering and cuts.ACM Trans

Sagi Katz and Ayellet Tal. Hierarchical mesh decomposition using fuzzy clustering and cuts.ACM Trans. Graph., 22(3): 954–961, 2003. 2

2003
[19]

3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023

Bernhard Kerbl, Georgios Kopanas, Thomas Leimk¨uhler, and George Drettakis. 3d gaussian splatting for real-time radiance field rendering.ACM Transactions on Graphics, 42(4), 2023. 2, 3

2023
[20]

Lerf: Language embed- ded radiance fields

Justin* Kerr, Chung Min* Kim, Ken Goldberg, Angjoo Kanazawa, and Matthew Tancik. Lerf: Language embed- ded radiance fields. InInternational Conference on Computer Vision (ICCV), 2023. 6, 7, 13

2023
[21]

Lerf: Language embed- ded radiance fields

Justin* Kerr, Chung Min* Kim, Ken Goldberg, Angjoo Kanazawa, and Matthew Tancik. Lerf: Language embed- ded radiance fields. InInternational Conference on Computer Vision (ICCV), 2023. 2, 3, 7

2023
[22]

Garfield: Group anything with radiance fields

Chung Min* Kim, Mingxuan* Wu, Justin* Kerr, Matthew Tancik, Ken Goldberg, and Angjoo Kanazawa. Garfield: Group anything with radiance fields. InConference on Com- puter Vision and Pattern Recognition (CVPR), 2024. 3

2024
[23]

Kingma and Jimmy Ba

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. InICLR, 2015. 11

2015
[24]

Segment Anything

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll´ar, and Ross Girshick. Segment anything.arXiv:2304.02643, 2023. 2

work page internal anchor Pith review arXiv 2023
[25]

Decomposing nerf for editing via feature field distillation

Sosuke Kobayashi, Eiichi Matsumoto, and Vincent Sitzmann. Decomposing nerf for editing via feature field distillation. In Advances in Neural Information Processing Systems, 2022. 2

2022
[26]

iSeg: Interactive 3D Segmentation via Interac- tive Attention

Itai Lang, Fei Xu, Dale Decatur, Sudarshan Babu, and Rana Hanocka. iSeg: Interactive 3D Segmentation via Interac- tive Attention. InSIGGRAPH Asia 2024 Conference Papers. Association for Computing Machinery, 2024. 2

2024
[27]

Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar

Ben Mildenhall, Pratul P. Srinivasan, Rodrigo Ortiz-Cayon, Nima Khademi Kalantari, Ravi Ramamoorthi, Ren Ng, and Abhishek Kar. Local light field fusion: Practical view synthe- sis with prescriptive sampling guidelines.ACM Transactions on Graphics (TOG), 2019. 5, 6, 11, 13

2019
[28]

Srinivasan, Matthew Tancik, Jonathan T

Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthe- sis. InECCV, 2020. 2

2020
[29]

LaTeRF: Label and text driven object radi- ance fields

Ashkan Mirzaei, Yash Kant, Jonathan Kelly, and Igor Gilitschenski. LaTeRF: Label and text driven object radi- ance fields. InProceedings of the European Conference on Computer Vision (ECCV), 2022. 2

2022
[30]

Maxime Oquab, Timoth´ee Darcet, Theo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Russell Howes, Po-Yao Huang, Hu Xu, Vasu Sharma, Shang-Wen Li, Wojciech Galuba, Mike Rabbat, Mido Assran, Nicolas Ballas, Gabriel Synnaeve, Ishan Misra, Herve Jegou, Julien Mairal, Patrick Lab...
[31]

ASIA: Adaptive 3d seg- mentation using few image annotations.SIGGRAPH Asia Conference Papers, 2025

Sai Raj Kishore Perla, Aditya V ora, Sauradip Nag, Ali Mahdavi-Amiri, and Hao Zhang. ASIA: Adaptive 3d seg- mentation using few image annotations.SIGGRAPH Asia Conference Papers, 2025. 2

2025
[32]

Pointnet: Deep learning on point sets for 3d classification and segmentation.Proc

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation.Proc. Computer Vision and Pattern Recogni- tion (CVPR), IEEE, 2017. 2

2017
[33]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Charles R Qi, Li Yi, Hao Su, and Leonidas J Guibas. Point- net++: Deep hierarchical feature learning on point sets in a metric space.arXiv preprint arXiv:1706.02413, 2017. 2

work page Pith review arXiv 2017
[34]

Langsplat: 3d language gaussian splatting,

Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, and Hanspeter Pfister. Langsplat: 3d language gaussian splatting. arXiv preprint arXiv:2312.16084, 2023. 3

work page arXiv 2023
[35]

Language-driven physics-based scene synthesis and editing via feature splatting

Ri-Zhao Qiu, Ge Yang, Weijia Zeng, and Xiaolong Wang. Language-driven physics-based scene synthesis and editing via feature splatting. InEuropean Conference on Computer Vision (ECCV), 2024. 3

2024
[36]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the International Conference on Machine Learning (ICML). PMLR, 2021. 2

2021
[37]

SAM 2: Segment Anything in Images and Videos

Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Ronghang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment any- thing in images and videos.arXiv preprint arXiv:2408.00714,

work page internal anchor Pith review arXiv
[38]

Schwing†, and Oliver Wang†

Zhongzheng Ren, Aseem Agarwala†, Bryan Russell†, Alexan- der G. Schwing†, and Oliver Wang†. Neural volumetric object selection. InIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022. († alphabetic ordering). 2

2022
[39]

High-resolution image synthesis with latent diffusion models, 2021

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models, 2021. 2

2021
[40]

Structure-from-motion revisited

Johannes Lutz Sch ¨onberger and Jan-Michael Frahm. Structure-from-motion revisited. InConference on Computer Vision and Pattern Recognition (CVPR), 2016. 5

2016
[41]

Segmentation and Shape Extraction of 3D Boundary Meshes

Ariel Shamir. Segmentation and Shape Extraction of 3D Boundary Meshes. InEurographics 2006 - State of the Art Reports. The Eurographics Association, 2006. 2

2006
[42]

Easy3d: A simple yet effective method for 3d interactive segmentation.ICCV, 2025

Andrea Simonelli, Norman M¨uller, and Peter Kontschieder. Easy3d: A simple yet effective method for 3d interactive segmentation.ICCV, 2025. 2

2025
[43]

Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Fran c ¸ois Goulette, and Leonidas J

Hugues Thomas, Charles R. Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Fran c ¸ois Goulette, and Leonidas J. Guibas. Kpconv: Flexible and deformable convolution for point clouds.Proceedings of the IEEE International Confer- ence on Computer Vision, 2019. 2

2019
[44]

Gaus- sian grouping: Segment and edit anything in 3d scenes

Mingqiao Ye, Martin Danelljan, Fisher Yu, and Lei Ke. Gaus- sian grouping: Segment and edit anything in 3d scenes. In ECCV, 2024. 2, 3, 5, 6, 7, 8, 11, 13

2024
[45]

Labelgs: Label-aware 3d gaussian splatting for 3d scene segmentation

Yupeng Zhang, Dezhi Zheng, Ping Lu, Han Zhang, Lei Wang, Liping Xiang, Cheng Luo, Xiaowen Fu Kaijun Deng, Linlin Shen, and Jinbao Wang. Labelgs: Label-aware 3d gaussian splatting for 3d scene segmentation. 2025. 2, 3, 5, 6, 11

2025
[46]

isegman: Interactive segment-and- manipulate 3d gaussians, 2025

Yian Zhao, Wanshi Xu, Ruochong Zheng, Pengchong Qiao, Chang Liu, and Jie Chen. isegman: Interactive segment-and- manipulate 3d gaussians, 2025. 2, 3

2025
[47]

Shuaifeng Zhi, Tristan Laidlow, Stefan Leutenegger, and An- drew J. Davison. In-place scene labelling and understanding with implicit scene representation. InICCV, 2021. 2 MipNeRF 360 [2]LERF-Masked [44]LLFF [27]PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓PSNR↑SSIM↑LPIPS↓ Radiant Foam29.92 0.83 0.2122.73 0.79 0.3824.60 0.74 0.34Semantic Foam29.79 0.90 0.1722.72 0.79 ...

2021