pith. sign in

Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

While 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducing Chorus, a multi-teacher pretraining framework that learns a holistic feed-forward 3D Gaussian Splatting (3DGS) scene encoder by distilling complementary signals from 2D foundation models. Chorus employs a shared 3D encoder and teacher-specific projectors to learn from language-aligned, generalist, and object-aware teachers, encouraging a shared embedding space that captures signals from high-level semantics to fine-grained structure. We evaluate Chorus on a wide range of tasks: open-vocabulary semantic and instance segmentation, linear and decoder probing, data-efficient supervision, as well as LLM-based Q&A. Besides 3DGS, we also test Chorus on several benchmarks that only support point clouds by pretraining a variant using only Gaussian centers, colors, and estimated normals. Surprisingly, this encoder shows strong transfer and outperforms the point-cloud baseline while using 39.9 times fewer training scenes. Finally, we propose a render-and-distill adaptation that facilitates out-of-domain finetuning.

fields

cs.CV 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Streaming Gaussian Encoding for 4D Panoptic Occupancy Tracking

cs.CV · 2026-06-29 · unverdicted · novelty 7.0

Introduces a streaming Gaussian encoder maintaining persistent volumetric representations via ego-motion compensation and confidence-guided updates for improved 4D panoptic occupancy tracking from cameras.

citing papers explorer

Showing 1 of 1 citing paper.

  • Streaming Gaussian Encoding for 4D Panoptic Occupancy Tracking cs.CV · 2026-06-29 · unverdicted · none · ref 30 · internal anchor

    Introduces a streaming Gaussian encoder maintaining persistent volumetric representations via ego-motion compensation and confidence-guided updates for improved 4D panoptic occupancy tracking from cameras.