3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
Uni3r: Unified 3d re- construction and semantic understanding via generalizable gaussian splatting from unposed multi-view images
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5verdicts
UNVERDICTED 5roles
background 3polarities
background 3representative citing papers
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.
citing papers explorer
-
3AM: 3egment Anything with Geometric Consistency in Videos
3AM integrates MUSt3R 3D features into SAM2 via a Feature Merger and FOV-aware sampling to deliver geometry-consistent video object segmentation from RGB alone, with large gains on wide-baseline datasets.
-
Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
The paper proposes a problem-driven taxonomy for feed-forward 3D scene modeling that groups methods by five core challenges: feature enhancement, geometry awareness, model efficiency, augmentation strategies, and temporal-aware modeling.
-
FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views via Compact Semantic Representation
FLEG reconstructs language-embedded 3D Gaussians from arbitrary input views using a dual-branch distillation framework and a sparse set of semantic Gaussians that requires only 5% of prior embeddings.
-
Learning 3D Representations for Spatial Intelligence from Unposed Multi-View Images
UniSplat learns consistent 3D geometry, appearance, and semantics from unposed images using dual masking, progressive Gaussian splatting, and recalibration to align predictions across tasks.
-
FF3R: Feedforward Feature 3D Reconstruction from Unconstrained views
FF3R unifies geometric and semantic 3D reconstruction in a single annotation-free feed-forward network trained solely via RGB and feature rendering supervision.