3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Uni3r: Unified 3d re- construction and semantic understanding via generalizable gaussian splatting from unposed multi-view images
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 7verdicts
UNVERDICTED 7roles
background 3polarities
background 3representative citing papers
COVScene is a pose-free framework that lifts semantic Gaussians into a volumetric occupancy field during training to jointly support novel view synthesis, open-vocabulary segmentation, and semantic occupancy prediction.
EPS3D is an end-to-end architecture for 3D panoptic segmentation from multi-view images that uses distillation and semantic-instance mutual enhancement to achieve higher benchmark performance and speed than prior methods.
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.
citing papers explorer
-
3AM: 3egment Anything with Geometric Consistency in Videos
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
-
Bridging 3D Gaussians and Semantic Occupancy for Comprehensive Open-Vocabulary Scene Understanding from Unposed Images
COVScene is a pose-free framework that lifts semantic Gaussians into a volumetric occupancy field during training to jointly support novel view synthesis, open-vocabulary segmentation, and semantic occupancy prediction.
-
EPS3D: End-to-End Feed-Forward 3D Panoptic Segmentation
EPS3D is an end-to-end architecture for 3D panoptic segmentation from multi-view images that uses distillation and semantic-instance mutual enhancement to achieve higher benchmark performance and speed than prior methods.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views via Compact Semantic Representation
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
-
FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.