Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

Hassan Sajjad; Karthik Nandakumar; Markus Schedl; Muhammad Haris Khan; Muhammad Haroon Yousaf; Muhammad Saad Saeed; Muhammad Zaigham Zaheer; Shah Nawaz; Tom De Schepper

arxiv: 2408.07445 · v1 · pith:O4LRIQTEnew · submitted 2024-08-14 · 💻 cs.CV

Modality Invariant Multimodal Learning to Handle Missing Modalities: A Single-Branch Approach

Muhammad Saad Saeed , Shah Nawaz , Muhammad Zaigham Zaheer , Muhammad Haris Khan , Karthik Nandakumar , Muhammad Haroon Yousaf , Hassan Sajjad , Tom De Schepper

show 1 more author

Markus Schedl

This is my paper

classification 💻 cs.CV

keywords modalitiesmissingmultimodalperformanceexistinginvariantlearningmethod

0 comments

read the original abstract

Multimodal networks have demonstrated remarkable performance improvements over their unimodal counterparts. Existing multimodal networks are designed in a multi-branch fashion that, due to the reliance on fusion strategies, exhibit deteriorated performance if one or more modalities are missing. In this work, we propose a modality invariant multimodal learning method, which is less susceptible to the impact of missing modalities. It consists of a single-branch network sharing weights across multiple modalities to learn inter-modality representations to maximize performance as well as robustness to missing modalities. Extensive experiments are performed on four challenging datasets including textual-visual (UPMC Food-101, Hateful Memes, Ferramenta) and audio-visual modalities (VoxCeleb1). Our proposed method achieves superior performance when all modalities are present as well as in the case of missing modalities during training or testing compared to the existing state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SB-BEVFusion: Enhancing the Robustness against Sensor Malfunction and Corruptions
cs.CV 2026-05 unverdicted novelty 5.0

SB-BEVFusion introduces a framework-agnostic module that improves 3D object detection robustness when camera or LiDAR inputs are missing or corrupted, outperforming prior unified BEV approaches on the MultiCorrupt dataset.