Deep Multimodal Learning with Missing Modality: A Survey
Pith reviewed 2026-05-17 22:22 UTC · model grok-4.3
The pith
Multimodal deep learning models can maintain performance when some input types are missing by using dedicated robustness techniques.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Multimodal Learning with Missing Modality (MLMM) forms a distinct area from standard multimodal learning, and the survey supplies the first comprehensive review covering motivations, distinctions, current deep learning methods, applications, datasets, challenges, and future research directions.
What carries the argument
The taxonomy and detailed breakdown of methods that specifically address missing modalities to preserve model robustness when one or more data types are unavailable.
Load-bearing premise
The body of literature selected for the survey is sufficiently complete and representative of current work on deep multimodal learning with missing modalities.
What would settle it
A search that identifies several recent or important deep learning papers on missing-modality multimodal learning that were omitted from the survey's analysis would undermine its claim to comprehensiveness.
read the original abstract
During multimodal model training and testing, certain data modalities may be absent due to sensor limitations, cost constraints, privacy concerns, or data loss, negatively affecting performance. Multimodal learning techniques designed to handle missing modalities can mitigate this by ensuring model robustness even when some modalities are unavailable. This survey reviews recent progress in Multimodal Learning with Missing Modality (MLMM), focusing on deep learning methods. It provides the first comprehensive survey that covers the motivation and distinctions between MLMM and standard multimodal learning setups, followed by a detailed analysis of current methods, applications, and datasets, concluding with challenges and future directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a survey on deep multimodal learning with missing modalities (MLMM). It distinguishes MLMM from standard multimodal setups, reviews motivations arising from sensor limitations, cost, privacy, and data loss, analyzes deep learning methods for handling missing modalities, surveys applications and datasets, and outlines challenges plus future directions. The authors claim it as the first comprehensive survey focused specifically on this topic.
Significance. If the coverage proves complete and the taxonomy of methods accurate, the survey would provide a useful reference point for researchers working on robust multimodal models. It aggregates practical considerations around incomplete data that arise frequently in deployed vision and multimodal systems, potentially helping to consolidate scattered prior work and highlight open problems.
major comments (1)
- [Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.
minor comments (2)
- [Methods taxonomy] Ensure every cited work in the methods taxonomy table or section is accompanied by a brief one-sentence characterization of how it addresses missing modalities, to avoid readers needing to consult the original papers for basic distinctions.
- [Figures] Figure captions for any overview diagrams should explicitly state the criteria used to group methods (e.g., imputation-based vs. modality-robust vs. generative), as current phrasing leaves some boundary cases ambiguous.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of the paper's significance and for the constructive comment on improving the methodological transparency. We will address this point in the revision.
read point-by-point responses
-
Referee: [Introduction] Introduction (or dedicated survey methodology subsection): the assertion of being the 'first comprehensive survey' is load-bearing for the paper's motivation and for the synthesized challenges/future directions. The manuscript must explicitly document the search protocol (databases queried, exact keywords and Boolean strings, date range, inclusion/exclusion criteria, and handling of preprints versus peer-reviewed work) so that readers can evaluate completeness and potential systematic omissions.
Authors: We agree that explicitly documenting the survey methodology is crucial for validating the comprehensiveness of our review and supporting the synthesized insights. In the revised manuscript, we will add a dedicated subsection in the Introduction that describes the search protocol employed. This will include the databases and repositories queried, the keywords and search strategies utilized, the date range of the literature considered, the inclusion and exclusion criteria, and how preprints were handled relative to peer-reviewed publications. By providing this information, readers will be better positioned to assess the scope and any potential gaps in our survey. revision: yes
Circularity Check
No significant circularity in survey paper lacking derivations
full rationale
This is a literature survey paper that reviews and synthesizes existing external publications on MLMM without presenting original derivations, equations, fitted parameters, or predictive models. No load-bearing steps reduce by construction to the paper's own inputs, self-citations, or ansatzes. The claim of providing the 'first comprehensive survey' rests on literature selection completeness, which is an external validity and coverage issue rather than a circular reduction per the enumerated patterns. The paper is self-contained as an aggregation of independent prior results and receives a non-finding per guidelines for honest surveys without derivation chains.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 17 Pith papers
-
SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS
SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.
-
Retrieving to Recover: Towards Incomplete Audio-Visual Question Answering via Semantic-consistent Purification
R²ScP recovers missing audio-visual data in question answering by retrieving semantically consistent examples and purifying noise, outperforming generative imputation in incomplete scenarios.
-
Inference-Time Dynamic Modality Selection for Incomplete Multimodal Classification
DyMo dynamically selects reliable recovered modalities at inference by using task loss as a proxy for task-relevant information, outperforming prior discard-or-impute methods on image datasets.
-
Resilient Vision-Tabular Multimodal Learning under Modality Missingness
A vision-tabular multimodal transformer uses modality tokens, masked self-attention, and stochastic modality dropout to maintain performance under pervasive missing data on MIMIC-CXR and MIMIC-IV for 14-label diagnost...
-
LARGO: Low-Rank Hypernetwork for Handling Missing Modalities
LARGO uses a low-rank hypernetwork with CP decomposition to unify 2^N-1 missing-modality models into one, ranking first in 47 of 52 configurations on BraTS and ISLES with small Dice gains over baselines.
-
Robust Multimodal Recommendation via Graph Retrieval-Enhanced Modality Completion
GRE-MC retrieves relevant subgraphs and uses a graph transformer plus sparse codebook to complete missing modalities, outperforming prior methods on recommendation benchmarks.
-
Federated Cross-Modal Retrieval with Missing Modalities via Semantic Routing and Adapter Personalization
RCSR is a personalization-friendly federated framework that improves cross-modal retrieval accuracy and stability under missing modalities via semantic routing and adapters.
-
Multimodal Diffusion to Mutually Enhance Polarized Light and Low Resolution EBSD Data
A multimodal diffusion model trained on synthetic data enhances low-resolution EBSD and corrupted polarized light data, achieving near full-resolution performance with only 25% EBSD data.
-
Conditional Evidence Reconstruction and Decomposition for Interpretable Multimodal Diagnosis
CERD reconstructs missing modalities conditioned on observed inputs and decomposes diagnostic evidence via logit attribution, outperforming baselines on incomplete ADNI data while providing interpretable attributions.
-
Purify-then-Align: Towards Robust Human Sensing under Modality Missing with Knowledge Distillation from Noisy Multimodal Teacher
PTA framework purifies noisy multimodal data via meta-learning and distills cross-modal knowledge through diffusion to create robust single-modality models under missing modalities.
-
Evaluation Before Generation: A Paradigm for Robust Multimodal Sentiment Analysis with Missing Modalities
The ProMMA framework evaluates missing modalities at input using a dedicated evaluator, then applies modality-invariant prompt disentanglement, mutual-information dynamic weighting, and multi-level residual prompt con...
-
Emotion Collider: Dual Hyperbolic Mirror Manifolds for Sentiment Recovery via Anti Emotion Reflection
EC-Net combines Poincare-ball hyperbolic embeddings, hypergraph fusion, and decoupled radial-angular contrastive learning to improve accuracy on multimodal emotion benchmarks especially under partial or noisy modalities.
-
Fusion or Confusion? Multimodal Complexity Is Not All You Need
Complex multimodal architectures do not reliably outperform unimodal baselines or a simple multimodal baseline under standardized evaluation.
-
Calibrated Multimodal Representation Learning with Missing Modalities
CalMRL mitigates anchor shift in multimodal representation learning by calibrating incomplete alignments through representation-level imputation of missing modalities using priors and a bi-step optimization with close...
-
Multi-Perspective Evidence Synthesis and Reasoning for Unsupervised Multimodal Entity Linking
MSR-MEL synthesizes instance-centric, group-level, lexical, and statistical evidence with LLMs and asymmetric teacher-student GNNs to outperform prior unsupervised methods on multimodal entity linking benchmarks.
-
Head-wise Modality Specialization within MLLMs for Robust Fake News Detection under Missing Modality
Head-wise modality specialization via attention constraints and unimodal knowledge retention in MLLMs improves robustness to missing modalities in fake news detection while preserving full multimodal performance.
-
ModalImmune: Immunity Driven Unlearning via Self Destructive Training
ModalImmune enforces modality immunity in multimodal models by controlled collapse of input channels during training using adaptive regularizers and meta-optimization.
Reference graph
Works this paper leans on
-
[1]
Reza Azad, Nika Khosravi, Mohammad Dehghanmanshadi, Julien Cohen-Adad, and Dorit Merhof. Medical image segmentation on mri images with missing modalities: A review.arXiv preprint arXiv:2203.06217,
-
[2]
30 Published in Transactions on Machine Learning Research (02/2026) Oresti Banos, Mate Attila Toth, Miguel Damas, Hector Pomares, and Ignacio Rojas. Dealing with the effects of sensor displacement in wearable activity recognition.Sensors, 14(6):9995–10023,
work page 2026
-
[3]
DOI: https://doi.org/10.24432/C5C59F. Rohan Bavishi, Erich Elsen, Curtis Hawthorne, Maxwell Nye, Augustus Odena, Arushi Somani, and Sağnak Taşırlar. Introducing our multimodal models,
-
[4]
Benjamin Bischke, Patrick Helber, Florian Koenig, Damian Borth, and Andreas Dengel. Overcoming missing and incomplete modalities with generative adversarial networks for building footprint segmentation. In 2018 International Conference on Content-Based Multimedia Indexing (CBMI), pp. 1–6. IEEE,
work page 2018
-
[5]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Maximilian Burzer, Tobias King, Till Riedel, Michael Beigl, and Tobias Röddiger. Whar datasets: An open source library for wearable human activity recognition.arXiv preprint arXiv:2508.16604,
-
[7]
Evalu- ating imputation techniques for missing data in adni: a patient classification study
Sergio Campos, Luis Pizarro, Carlos Valle, Katherine R Gray, Daniel Rueckert, and Héctor Allende. Evalu- ating imputation techniques for missing data in adni: a patient classification study. InProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: 20th Iberoamerican Congress, CIARP 2015, Montevideo, Uruguay, November 9-12, 201...
work page 2015
-
[8]
Guoqing Chao, Shiliang Sun, and Jinbo Bi
URL https://arxiv.org/abs/2407.19156. Guoqing Chao, Shiliang Sun, and Jinbo Bi. A survey on multiview clustering.IEEE Transactions on Artificial Intelligence, 2(2):146–168,
-
[9]
doi: 10.1109/TAI.2021.3065894. Hava Chaptoukaev, Valeriya Strizhkova, Michele Panariello, Bianca Dalpaos, Aglind Reka, Valeria Manera, SusanneThümmler, EsmaIsmailova, MassimilianoTodisco, MariaAZuluaga, etal. Stressid: amultimodal dataset for stress identification.Advances in Neural Information Processing Systems, 36,
-
[10]
31 Published in Transactions on Machine Learning Research (02/2026) Agisilaos Chartsias, Thomas Joyce, Mario Valerio Giuffrida, and Sotirios A Tsaftaris. Multimodal mr syn- thesis via modality-invariant latent representation.IEEE transactions on medical imaging, 37(3):803–814,
work page 2026
-
[11]
Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion
Cheng Chen, Qi Dou, Yueming Jin, Hao Chen, Jing Qin, and Pheng-Ann Heng. Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2019: 22nd International Conference, Shenzhen, China, October 13–17, 2019, Proceedings, Part III 22, pp. 447–456. Springer,
work page 2019
-
[12]
Ling Chen, Yingsong Luo, Liangying Peng, Rong Hu, Yi Zhang, and Shenghuan Miao. A multi-graph convolutional network based wearable human activity recognition method using multi-sensors.Applied Intelligence, 53(23):28169–28185, 2023a. Qianqian Chen, Jiadong Zhang, Runqi Meng, Lei Zhou, Zhenhui Li, Qianjin Feng, and Dinggang Shen. Modality-specific informat...
-
[13]
32 Published in Transactions on Machine Learning Research (02/2026) Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Rescaling egocentric vision: Collec- tion, pipeline and challenges for epic-kitchens-100.International Journal of Computer...
work page 2026
-
[14]
Christian Debes, Andreas Merentitis, Roel Heremans, Jürgen Hahn, Nikolaos Frangiadakis, Tim van Kasteren, Wenzhi Liao, Rik Bellens, Aleksandra Pižurica, Sidharta Gautama, et al. Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7(6):2405–2418,
work page 2013
- [15]
-
[16]
URLhttps://www.kaggle.com/dsv/7745331. Robert Duin. Multiple Features. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5HC70. Aiman Farooq, Deepak Mishra, and Santanu Chaudhury. Survival prediction in lung cancer through multi- modal representation learning. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 3...
-
[17]
33 Published in Transactions on Machine Learning Research (02/2026) Kausic Gunasekar, Qiang Qiu, and Yezhou Yang. Low to high dimensional modality hallucination using aggregated fields of view.IEEE Robotics and Automation Letters, 5(2):1983–1990,
work page 2026
-
[18]
Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada
URLhttps://arxiv.org/abs/2407.05374. Qishen Ha, Kohei Watanabe, Takumi Karasawa, Yoshitaka Ushiku, and Tatsuya Harada. Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
-
[19]
Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality
Sebastian Hafner and Yifang Ban. Multi-modal deep learning for multi-temporal urban mapping with a partly missing optical modality. InIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pp. 6843–6846. IEEE,
work page 2023
-
[20]
Mohammad Hamghalam, Alejandro F Frangi, Baiying Lei, and Amber L Simpson. Modality completion via gaussian process prior variational autoencoders for multi-modal glioma segmentation. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part V...
work page 2021
-
[21]
Distilling the Knowledge in a Neural Network
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,
work page internal anchor Pith review Pith/arXiv arXiv
-
[22]
Knowledge distillation from multi-modal to mono-modal segmentation networks
Minhao Hu, Matthis Maillard, Ya Zhang, Tommaso Ciceri, Giammarco La Barbera, Isabelle Bloch, and Pietro Gori. Knowledge distillation from multi-modal to mono-modal segmentation networks. InMedical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23, pp. 772–78...
work page 2020
-
[23]
Bcdata: A large-scale dataset and benchmark for cell detection and counting
34 Published in Transactions on Machine Learning Research (02/2026) Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Ruizhe Geng, Hongliang He, Shan Du, Xia Liu, Yonghong Tian, Yongsheng Liang, et al. Bcdata: A large-scale dataset and benchmark for cell detection and counting. InInternational Conference on Medical Image Computing and Computer-Assisted Inter...
work page 2026
-
[24]
Epic-sounds: A large- scale dataset of actions that sound
Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, and Andrew Zisserman. Epic-sounds: A large- scale dataset of actions that sound. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE,
work page 2023
-
[25]
Towards robust multimodal prompting with miss- ing modalities
Jaehyuk Jang, Yooseung Wang, and Changick Kim. Towards robust multimodal prompting with miss- ing modalities. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8070–8074. IEEE,
work page 2024
-
[26]
35 Published in Transactions on Machine Learning Research (02/2026) kaggle. kaggle flir thermal,
work page 2026
-
[27]
Otterhd: A high-resolution multi-modality model
Bo Li, Peiyuan Zhang, Jingkang Yang, Yuanhan Zhang, Fanyi Pu, and Ziwei Liu. Otterhd: A high-resolution multi-modality model.arXiv preprint arXiv:2311.04219, 2023a. Haitao Li, Ziyu Li, Yiheng Mao, Zhengyao Ding, and Zhengxing Huang. Dc-seg: Disentangled contrastive learning for brain tumor segmentation with missing modalities.arXiv preprint arXiv:2505.119...
-
[28]
Deep learning based imaging data completion for improved brain disease diagnosis
Rongjian Li, Wenlu Zhang, Heung-Il Suk, Li Wang, Jiang Li, Dinggang Shen, and Shuiwang Ji. Deep learning based imaging data completion for improved brain disease diagnosis. InMedical Image Computing and Computer-Assisted Intervention–MICCAI 2014: 17th International Conference, Boston, MA, USA, September 14-18, 2014, Proceedings, Part III 17, pp. 305–312. ...
work page 2014
-
[29]
Sijie Li, Chen Chen, and Jungong Han. Simmlm: A simple framework for multi-modal learning with missing modality.arXiv preprint arXiv:2507.19264, 2025b. Siting Li, Chenzhuang Du, Yue Zhao, Yu Huang, and Hang Zhao. What makes for robust multi-modal models in the face of missing modalities?arXiv preprint arXiv:2310.06383, 2023c. Xue Li, Guo Zhang, Hao Cui, S...
-
[30]
Learning Representations from Imperfect Time Series Data via Tensor Rank Regularization
Paul Pu Liang, Zhun Liu, Yao-Hung Hubert Tsai, Qibin Zhao, Ruslan Salakhutdinov, and Louis-Philippe Morency. Learning representations from imperfect time series data via tensor rank regularization.arXiv preprint arXiv:1907.01011,
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[31]
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Zetian Wu, Yun Cheng, Jason Wu, Leslie Chen, Peter Wu, Michelle A Lee, YukeZhu, etal. Multibench: Multiscalebenchmarksformultimodalrepresentationlearning.Advances in neural information processing systems, 2021(DB1):1,
work page 2021
-
[32]
Zihan Liang, Ziwen Pan, and Ruoxuan Xiong. Causal representation learning from multimodal clinical records under non-random modality missingness. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pp. 28779–28796,
work page 2025
-
[33]
37 Published in Transactions on Machine Learning Research (02/2026) Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu, Wenzhong Tang, and Alex Kot. Sup- pressandrebalance: Towardsgeneralizedmulti-modalfaceanti-spoofing.arXiv preprint arXiv:2402.19298,
-
[34]
Visual instruction tuning.Advances in neural information processing systems, 36, 2024a
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning.Advances in neural information processing systems, 36, 2024a. Hong Liu, Dong Wei, Donghuan Lu, Jinghan Sun, Liansheng Wang, and Yefeng Zheng. M3ae: multimodal representation learning for brain tumor segmentation with missing modalities. InProceedings of the AAAI Conference ...
-
[35]
URLhttps://doi.org/10.1145/3411818
doi: 10.1145/3411818. URLhttps://doi.org/10.1145/3411818. Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, et al. Llava-plus: Learning to use tools for creating multimodal agents.arXiv preprint arXiv:2311.05437, 2023c. YanbeiLiu, Lianxi Fan, Changqing Zhang, Tao Zhou, ZhitaoXiao, Lei Geng, an...
-
[36]
Yi Liu, Cong Wang, and Xingliang Yuan. Fedmobile: Enabling knowledge contribution-aware multi-modal federated learning with incomplete modalities. InProceedings of the ACM on Web Conference 2025, pp. 2775–2786, 2025a. Yuhang Liu, Quan Zou, Ran Su, and Leyi Wei. scmomer: A modality-aware pretraining framework for single-cell multi-omics modeling under miss...
work page 2025
-
[37]
Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,
Zihong Luo, Haochen Xue, Mingyu Jin, Chengzhi Liu, Zile Huang, Chong Zhang, and Shuliang Zhao. Mc- dbn: A deep belief network-based model for modality completion.arXiv preprint arXiv:2402.09782,
-
[38]
Fei Ma, Shao-Lun Huang, and Lin Zhang. An efficient approach for audio-visual emotion recognition with missing labels and missing modalities. In2021 IEEE international conference on multimedia and Expo (ICME), pp. 1–6. IEEE, 2021a. Fei Ma, Xiangxiang Xu, Shao-Lun Huang, and Lin Zhang. Maximum likelihood estimation for multimodal learning with missing moda...
-
[39]
Daniele Malitesta, Emanuele Rossi, Claudio Pomo, Fragkiskos D Malliaros, and Tommaso Di Noia. Dealing with missing modalities in multimodal recommendation: a feature propagation-based approach.arXiv preprint arXiv:2403.19841,
-
[40]
Learning to recognize objects from unseen modalities
C Mario Christoudias, Raquel Urtasun, Mathieu Salzmann, and Trevor Darrell. Learning to recognize objects from unseen modalities. InComputer Vision–ECCV 2010, pp. 677–691. Springer,
work page 2010
-
[41]
Bjoern H Menze, Andras Jakab, Stefan Bauer, Jayashree Kalpathy-Cramer, Keyvan Farahani, Justin Kirby, Yuliya Burren, Nicole Porz, Johannes Slotboom, Roland Wiest, et al. The multimodal brain tumor image segmentation benchmark (brats).IEEE transactions on medical imaging, 34(10):1993–2024,
work page 1993
-
[42]
Antoine Miech, Ivan Laptev, and Josef Sivic. Learning a text-video embedding from incomplete and hetero- geneous data.arXiv preprint arXiv:1804.02516,
-
[43]
39 Published in Transactions on Machine Learning Research (02/2026) George B Moody and Roger G Mark. The impact of the mit-bih arrhythmia database.IEEE engineering in medicine and biology magazine, 20(3):45–50,
work page 2026
-
[44]
3d mri brain tumor segmentation using autoencoder regularization
Andriy Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries: 4th International Workshop, BrainLes 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, September 16, 2018, Revised Selected Papers, Part II 4, pp. 311–320. Springer,
work page 2018
-
[45]
Se Won Oh, Hyuntae Jeong, Seungeun Chung, Jeong Mook Lim, Kyoung Ju Noh, Sunkyung Lee, and Gyuwon Jung. Understanding human daily experience through continuous sensing: Etri lifelog dataset 2024.arXiv preprint arXiv:2508.03698,
-
[46]
Alessandro Palma, Till Richter, Hanyi Zhang, Manuel Lubetzki, Alexander Tong, Andrea Dittadi, and Fabian Theis. Multi-modal and multi-attribute generation of single cells with cfgen.arXiv preprint arXiv:2407.11734,
-
[47]
SrinivasParthasarathyandShivaSundaram
URLhttps://arxiv.org/abs/2407.16171. SrinivasParthasarathyandShivaSundaram. Trainingstrategiestohandlemissingmodalitiesforaudio-visual expression recognition. InCompanion Publication of the 2020 International Conference on Multimodal Interaction, pp. 400–404,
-
[48]
Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology
Yuanzhe Peng, Jieming Bian, and Jie Xu. Fedmm: Federated multi-modal learning with modality hetero- geneity in computational pathology. InICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1696–1700. IEEE,
work page 2024
-
[49]
2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a
Claudio Persello; Saurabh Prasad; Gemine Vivone; Vincent Lonjou ; Frédéric Bretar ; Raquel Rodriguez- Suquet ; Pauline Guntzburger ; Vincent Poulain ; Jacqueline Le Moigne; Benjamin Smith ; Sujay Kumar ; Thomas Huang ; Sophie Ricci ; Thanh Huy Nguyen ; Andrea Piacentini. 2024 ieee grss data fusion contest - flood rapid mapping.IEEE Dataport, 2023a. doi: 1...
-
[50]
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, and Rada Mihal- cea. Meld: A multimodal multi-party dataset for emotion recognition in conversations.arXiv preprint arXiv:1810.02508,
work page internal anchor Pith review Pith/arXiv arXiv
-
[51]
Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,
Ilija Radosavovic, Bike Zhang, Baifeng Shi, Jathushan Rajasegaran, Sarthak Kamat, Trevor Darrell, Koushil Sreenath, and Jitendra Malik. Humanoid locomotion as next token prediction.arXiv preprint arXiv:2402.19469,
-
[52]
Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,
Merey Ramazanova, Alejandro Pardo, Bernard Ghanem, and Motasem Alfarra. Combating missing modal- ities in egocentric videos at test time.arXiv preprint arXiv:2404.15161,
-
[53]
Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif
DOI: https://doi.org/10.24432/C5NW2H. Md Kaykobad Reza, Ashley Prater-Bennette, and M Salman Asif. Robust multimodal learning with missing modalities via parameter-efficient adaptation.arXiv preprint arXiv:2310.03986,
-
[54]
Daniel Roggen, Alberto Calatroni, Mirco Rossi, Thomas Holleczek, Kilian Förster, Gerhard Tröster, Paul Lukowicz, David Bannach, Gerald Pirkl, Alois Ferscha, Jakob Doppler, Clemens Holzmann, Marc Kurz, Gerald Holl, Ricardo Chavarriaga, Hesam Sagha, Hamidreza Bayati, Marco Creatura, and José del R. Millán. Collecting complex activity datasets in highly rich...
work page 2010
-
[55]
Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble
URL https://api.semanticscholar.org/CorpusID:953131. Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, and J Alison Noble. Examining modality incongruity in multimodal federated learning for medical vision and language-based disease detection.arXiv preprint arXiv:2402.05294,
-
[56]
41 Published in Transactions on Machine Learning Research (02/2026) GerwinSchalk, DennisJMcFarland, ThiloHinterberger, NielsBirbaumer, andJonathanRWolpaw. Bci2000: a general-purpose brain-computer interface (bci) system.IEEE Transactions on biomedical engineering, 51(6):1034–1043,
work page 2026
-
[57]
Introducing wesad, a multimodal dataset for wearable stress and affect detection
Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and Kristof Van Laerhoven. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI ’18, pp. 400–408, New York, NY, USA, 2018a. Association for Computing Machinery. ISBN 9781450356...
-
[58]
Brain tumor segmentation on mri with missing modalities
Yan Shen and Mingchen Gao. Brain tumor segmentation on mri with missing modalities. InInformation Processing in Medical Imaging: 26th International Conference, IPMI 2019, Hong Kong, China, June 2–7, 2019, Proceedings 26, pp. 417–428. Springer,
work page 2019
-
[59]
Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga
URLhttps: //arxiv.org/abs/2407.14796. Muhammad Shoaib, Stephan Bosch, Ozlem Durmaz Incel, Hans Scholten, and Paul JM Havinga. Fusion of smartphone motion sensors for physical activity recognition.Sensors, 14(6):10146–10176,
-
[60]
Aniruddh Sikdar, Jayant Teotia, and Suresh Sundaram. Contrastive learning-based spectral knowledge distillation for multi-modality and missing modality scenarios in semantic segmentation.arXiv preprint arXiv:2312.02240,
-
[61]
Indoor segmentation and support inference from rgbd images
Nathan Silberman, Derek Hoiem, Pushmeet Kohli, and Rob Fergus. Indoor segmentation and support inference from rgbd images. InECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part V 12, pp. 746–760. Springer,
work page 2012
-
[62]
42 Published in Transactions on Machine Learning Research (02/2026) Lukas Stappen, Alice Baird, Lukas Christ, Lea Schumann, Benjamin Sertolli, Eva-Maria Messner, Erik Cambria, Guoying Zhao, and Björn W Schuller. The muse 2021 multimodal sentiment analysis challenge: sentiment, emotion, physiological-emotion, and stress. InProceedings of the 2nd on Multimo...
work page 2026
-
[63]
Multispectral object detection for autonomous vehicles
Karasawa Takumi, Kohei Watanabe, Qishen Ha, Antonio Tejero-De-Pablos, Yoshitaka Ushiku, and Tatsuya Harada. Multispectral object detection for autonomous vehicles. InProceedings of the on Thematic Workshops of ACM Multimedia 2017, pp. 35–43,
work page 2017
-
[64]
NASA Science Editorial Team. Keeping Our Sense of Direction: Dealing With a Dead Sensor - NASA Science — science.nasa.gov.https://science.nasa.gov/missions/mars-2020-perseverance/ ingenuity-helicopter/keeping-our-sense-of-direction-dealing-with-a-dead-sensor/, JUN
work page 2020
-
[65]
Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz
DOI: https://doi.org/10.24432/C53W49. Katarzyna Tomczak, Patrycja Czerwińska, and Maciej Wiznerowicz. Review the cancer genome atlas (tcga): an immeasurable source of knowledge.Contemporary Oncology/Współczesna Onkologia, 2015(1):68–77,
-
[66]
Missing modalities imputation via cascaded residual autoencoder
43 Published in Transactions on Machine Learning Research (02/2026) Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. Missing modalities imputation via cascaded residual autoencoder. InProceedings of the IEEE conference on computer vision and pattern recognition, pp. 1405–1414,
work page 2026
-
[67]
Learning Factorized Multimodal Representations
Yao-Hung Hubert Tsai, Paul Pu Liang, Amir Zadeh, Louis-Philippe Morency, and Ruslan Salakhutdinov. Learning factorized multimodal representations.arXiv preprint arXiv:1806.06176,
work page internal anchor Pith review Pith/arXiv arXiv
-
[68]
Miguel Vasco, Hang Yin, Francisco S Melo, and Ana Paiva. How to sense the world: Leveraging hierarchy in multimodal perception for robust reinforcement learning agents.arXiv preprint arXiv:2110.03608,
-
[69]
Multi-modal learning with missing modality via shared-specific feature modelling
Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, and Gustavo Carneiro. Multi-modal learning with missing modality via shared-specific feature modelling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15878–15887, 2023a. Hu Wang, Congbo Ma, Jianpeng Zhang, Yuan Zhang, Jodie Avery, Louise Hull, and Gusta...
-
[70]
Prototype knowledge dis- tillation for medical segmentation with missing modality
44 Published in Transactions on Machine Learning Research (02/2026) Shuai Wang, Zipei Yan, Daoan Zhang, Haining Wei, Zhongsen Li, and Rui Li. Prototype knowledge dis- tillation for medical segmentation with missing modality. InICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5. IEEE, 2023c. Tianyi W...
-
[71]
Yiyu Wang, Haifang Jian, Jian Zhuang, Huimin Guo, and Yan Leng. Sslmm: Semi-supervised learning with missing modalities for multimodal sentiment analysis.Information Fusion, 120:103058, 2025b. Yuanyi Wang, Haifeng Sun, Jiabo Wang, Jingyu Wang, Wei Tang, Qi Qi, Shaoling Sun, and Jianxin Liao. Towardssemanticconsistency: Dirichletenergydrivenrobustmulti-mod...
-
[72]
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models.arXiv preprint arXiv:2303.04671, 2023a. Renjie Wu, Hu Wang, Feras Dayoub, and Hsiang-Ting Chen. Segment beyond view: handling partially missing modality for audio-visual semantic segmentation. InProce...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[73]
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang, Linjie Li, Jianfeng Wang, Kevin Lin, Ehsan Azarnasab, Faisal Ahmed, Zicheng Liu, Ce Liu, Michael Zeng, and Lijuan Wang. Mm-react: Prompting chatgpt for multimodal reasoning and action.arXiv preprint arXiv:2303.11381, 2023b. Wenfang Yao, Kejing Yin, William K Cheung, Jia Liu, and Jing Qin. Drfuse: Learning disentangled repre- sentation for ...
work page internal anchor Pith review Pith/arXiv arXiv
-
[74]
Naoto Yokoya, Pedram Ghamisi, Ronny Haensch, and Michael Schmitt. 2020 ieee grss data fusion contest: Global land cover mapping with weak supervision [technical committees].IEEE Geoscience and Remote Sensing Magazine, 8(1):154–157,
work page 2020
-
[75]
MOSI: Multimodal Corpus of Sentiment Intensity and Subjectivity Analysis in Online Opinion Videos
46 Published in Transactions on Machine Learning Research (02/2026) Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos.arXiv preprint arXiv:1606.06259,
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[76]
Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities
Jiandian Zeng, Jiantao Zhou, and Tianyi Liu. Mitigating inconsistencies in multimodal sentiment analysis under uncertain missing modalities. InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 2924–2934,
work page 2022
-
[77]
Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,
Jun Zhan, Junqi Dai, Jiasheng Ye, Yunhua Zhou, Dong Zhang, Zhigeng Liu, Xin Zhang, Ruibin Yuan, Ge Zhang, Linyang Li, et al. Anygpt: Unified multimodal llm with discrete sequence modeling.arXiv preprint arXiv:2402.12226,
-
[78]
Yuan Zhang, Hu Wang, David Butler, Minh-Son To, Jodie Avery, M Louise Hull, and Gustavo Carneiro. Distillingmissingmodalityknowledgefromultrasoundforendometriosisdiagnosiswithmagneticresonance images. In2023 IEEE 20th International Symposium on Biomedical Imaging, 2023b. Yue Zhang, Chengtao Peng, Qiuli Wang, Dan Song, Kaiyan Li, and S Kevin Zhou. Unified ...
-
[79]
Learning modality-agnostic representation for semantic segmen- tation from any modalities,
47 Published in Transactions on Machine Learning Research (02/2026) Xu Zheng, Yuanhuiyi Lyu, and Lin Wang. Learning modality-agnostic representation for semantic segmen- tation from any modalities,
work page 2026
-
[80]
Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li
URLhttps://arxiv.org/abs/2407.11351. Yu Zheng, Xiuwen Yi, Ming Li, Ruiyuan Li, Zhangqing Shan, Eric Chang, and Tianrui Li. Forecasting fine- grained air quality based on big data. InProceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp. 2267–2276,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.