AuralSAM2 fuses audio-visual features via a pyramid-based AuralFuser module and audio-guided contrastive loss to improve promptable segmentation accuracy in SAM2 with minimal efficiency impact.
Masked autoencoders are scalable vision learners
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2025 2representative citing papers
Primus and PrimusV2 are Transformer-centric models that match or exceed nnU-Net and top CNNs on nine 3D medical segmentation datasets by enforcing attention usage.
citing papers explorer
-
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2 fuses audio-visual features via a pyramid-based AuralFuser module and audio-guided contrastive loss to improve promptable segmentation accuracy in SAM2 with minimal efficiency impact.
-
Primus: Enforcing Attention Usage for 3D Medical Image Segmentation
Primus and PrimusV2 are Transformer-centric models that match or exceed nnU-Net and top CNNs on nine 3D medical segmentation datasets by enforcing attention usage.