Deep latent space learn- ing for cross-modal mapping of audio and visual signals

Shah Nawaz, Muhammad Kamran Janjua, Ignazio Gallo, Arif Mahmood, Alessandro Calefati, “Deep latent space learning for cross-modal mapping of audio, visual signals,” in DICTA · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association

cs.SD · 2025-12-07 · unverdicted · novelty 3.0

XM-ALIGN improves face-voice association performance by jointly optimizing embeddings from separate encoders with MSE alignment loss and data augmentation on the MAV-Celeb dataset.

citing papers explorer

Showing 1 of 1 citing paper.

XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association cs.SD · 2025-12-07 · unverdicted · none · ref 7
XM-ALIGN improves face-voice association performance by jointly optimizing embeddings from separate encoders with MSE alignment loss and data augmentation on the MAV-Celeb dataset.

Deep latent space learn- ing for cross-modal mapping of audio and visual signals

fields

years

verdicts

representative citing papers

citing papers explorer