A multimodal training pipeline with phonological bounding-box priors and cross-modal contrastive alignment transfers speech supervision to single-modality rtMRI vocal tract segmentation and outperforms prior methods on two datasets.
Interspeech (2024)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Speech-Guided Multimodal Learning for Vocal Tract Segmentation in Real-Time MRI
A multimodal training pipeline with phonological bounding-box priors and cross-modal contrastive alignment transfers speech supervision to single-modality rtMRI vocal tract segmentation and outperforms prior methods on two datasets.