FusionBERT uses cross-attention to fuse multi-view images and a normal-aware encoder for 3D models, achieving higher image-3D retrieval accuracy than prior multimodal models in both single- and multi-view settings.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
FusionBERT: Multi-View Image-3D Retrieval via Cross-Attention Visual Fusion and Normal-Aware 3D Encoder
FusionBERT uses cross-attention to fuse multi-view images and a normal-aware encoder for 3D models, achieving higher image-3D retrieval accuracy than prior multimodal models in both single- and multi-view settings.