Fast-SAM3D: 3Dfy Anything in Images but Faster

Chuanguang Yang; Guoxin Fan; Haotong Qin; Libo Huang; Michele Magno; Mingqiang Wu; Weilun Feng; Xiaokun Liu; Yongjun Xu; Yulun Zhang

arxiv: 2602.05293 · v2 · pith:G4VUWEEKnew · submitted 2026-02-05 · 💻 cs.CV

Fast-SAM3D: 3Dfy Anything in Images but Faster

Weilun Feng , Mingqiang Wu , Zhiliang Chen , Chuanguang Yang , Haotong Qin , Yuqi Li , Xiaokun Liu , Guoxin Fan

show 5 more authors

Libo Huang Yulun Zhang Michele Magno Yongjun Xu Zhulin An

This is my paper

classification 💻 cs.CV

keywords fast-sam3dtextbftextitdemonstrategenerationinferencelayoutrefinement

0 comments

read the original abstract

SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the \textbf{first systematic investigation} into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level \textbf{heterogeneity}: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present \textbf{Fast-SAM3D}, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) \textit{Modality-Aware Step Caching} to decouple structural evolution from sensitive layout updates; (2) \textit{Joint Spatiotemporal Token Carving} to concentrate refinement on high-entropy regions; and (3) \textit{Spectral-Aware Token Aggregation} to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to \textbf{2.67$\times$} end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MV-SAM3D: Adaptive Multi-View Fusion for Layout-Aware 3D Generation
cs.CV 2026-03 unverdicted novelty 6.0

MV-SAM3D adds multi-view fusion via multi-diffusion with attention-entropy and visibility weighting plus physics-aware optimization to improve fidelity and physical plausibility in layout-aware 3D generation.