Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Karthik Nandakumar; Markus Schedl; Marta Moscati; Muhammad Haris Khan; Muhammad Haroon Yousaf; Muhammad Saad Saeed; Muhammad Salman Tahir; Muhammad Zaigham Zaheer; Rohan Kumar Das; Shah Nawaz

arxiv: 2404.09342 · v3 · pith:AYIGX3DTnew · submitted 2024-04-14 · 💻 cs.CV · cs.SD· eess.AS

Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Muhammad Saad Saeed , Shah Nawaz , Muhammad Salman Tahir , Rohan Kumar Das , Muhammad Zaigham Zaheer , Marta Moscati , Markus Schedl , Muhammad Haris Khan

show 2 more authors

Karthik Nandakumar Muhammad Haroon Yousaf

This is my paper

classification 💻 cs.CV cs.SDeess.AS

keywords multilingualchallengeassociationface-voiceenvironmentsfamesystemsaudio-visual

0 comments

read the original abstract

The advancements of technology have led to the use of multimodal systems in various real-world applications. Among them, the audio-visual systems are one of the widely used multimodal systems. In the recent years, associating face and voice of a person has gained attention due to presence of unique correlation between them. The Face-voice Association in Multilingual Environments (FAME) Challenge 2024 focuses on exploring face-voice association under a unique condition of multilingual scenario. This condition is inspired from the fact that half of the world's population is bilingual and most often people communicate under multilingual scenario. The challenge uses a dataset namely, Multilingual Audio-Visual (MAV-Celeb) for exploring face-voice association in multilingual environments. This report provides the details of the challenge, dataset, baselines and task details for the FAME Challenge.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

AMR: Adaptive Modality Routing for Multimodal Polyglot Speaker Identification
cs.LG 2026-06 unverdicted novelty 5.0

AMR dynamically routes audio (W2V-BERT 2.0) and face (IResNet-18) embeddings via adapters and a KL-supervised router, reaching 99.07% average accuracy on POLY-SIM 2026 protocols and beating the FOP baseline by 32.73%.