Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting

· 2026 · cs.CV · arXiv 2604.09045

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Recent works on 3D scene understanding leverage 2D masks from visual foundation models (VFMs) to supervise radiance fields, enabling instance-level 3D segmentation. However, the supervision signals from foundation models are not fundamentally object-centric and often require additional mask pre/post-processing or specialized training and loss design to resolve mask identity conflicts across views. The learned identity of the 3D scene is scene-dependent, limiting generalizability across scenes. Therefore, we propose a dataset-level, object-centric supervision scheme to learn object representations in 3D Gaussian Splatting (3DGS). Building on a pre-trained slot attention-based Global Object Centric Learning (GOCL) module, we learn a scene-agnostic object codebook that provides consistent, identity-anchored representations across views and scenes. By coupling the codebook with the module's unsupervised object masks, we can directly supervise the identity features of 3D Gaussians without additional mask pre-/post-processing or explicit multi-view alignment. The learned scene-agnostic codebook enables object supervision and identification without per-scene fine-tuning or retraining. Our method thus introduces unsupervised object-centric learning (OCL) into 3DGS, yielding more structured representations and better generalization for downstream tasks such as robotic interaction, scene understanding, and cross-scene generalization.

representative citing papers

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

A feed-forward framework learns instance-structured 3D token groups from unposed multi-view images via differentiable rendering, enabling native object-level segmentation, editing, and retrieval without 3D supervision.

citing papers explorer

Showing 1 of 1 citing paper.

Scenes as Objects, Not Primitives: Instance-Structured 3D Tokenization from Unposed Views cs.CV · 2026-06-28 · unverdicted · none · ref 10 · internal anchor
A feed-forward framework learns instance-structured 3D token groups from unposed multi-view images via differentiable rendering, enabling native object-level segmentation, editing, and retrieval without 3D supervision.

Scene-Agnostic Object-Centric Representation Learning for 3D Gaussian Splatting

fields

years

verdicts

representative citing papers

citing papers explorer