A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
In: Proceedings of the IEEE conference on computer vision and pattern recognition
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4representative citing papers
Proposes Spatial Group Convolution to accelerate 3D semantic scene completion networks via grouped sparse operations, reporting state-of-the-art accuracy and speed on SUNCG.
Presents an SSM-based hierarchical feature learning method for medical point clouds that reports superior performance on classification, completion, and segmentation using a new dataset MedPointS.
citing papers explorer
-
Multimodal LLMs under Pairwise Modalities
A two-stage framework enables multimodal LLMs to learn shared latent representations from pairwise modality data and achieve cross-modal generation when incorporating new modalities.
-
Efficient Semantic Scene Completion Network with Spatial Group Convolution
Proposes Spatial Group Convolution to accelerate 3D semantic scene completion networks via grouped sparse operations, reporting state-of-the-art accuracy and speed on SUNCG.
-
Hierarchical Feature Learning for Medical Point Clouds via State Space Model
Presents an SSM-based hierarchical feature learning method for medical point clouds that reports superior performance on classification, completion, and segmentation using a new dataset MedPointS.
- Can Vision Models Truly Forget? Mirage: Representation-Level Certification of Visual Unlearning