MuteBench: Modality Unavailability Tolerance Evaluation for Incomplete Multimodal Fusion
Pith reviewed 2026-05-19 16:29 UTC · model grok-4.3
The pith
Architecture family predicts robustness to missing modalities better than model size in clinical fusion.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MuteBench systematically applies controlled modality missing and within-modality missing to six fusion architectures on nine clinical datasets. The central finding is that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but remain sensitive to within-modality missing on short sequences. Curriculum modality dropout provides protection only up to the highest dropout rate used during training. Channel count, sequence length, and modality alignment together determine which missing-data mode creates the larger threat. Diffusion-based imputation improves downstream classification under within-modim
What carries the argument
MuteBench benchmark that tests fusion architectures under controlled levels of modality missing and within-modality missing across multiple clinical datasets.
If this is right
- Channel-independent architectures provide reliable tolerance when an entire sensor channel disappears.
- Modality dropout during training only guarantees protection up to the maximum rate applied in that training.
- Short sequences make within-modality missing more damaging than full channel loss.
- Imputation helps most for models whose internal routing depends heavily on clean inputs.
Where Pith is reading between the lines
- Designers could combine channel-independent processing with explicit temporal-gap handling to address both failure modes at once.
- Robustness claims should be re-checked on datasets that vary sequence length and channel alignment independently.
- Deployment decisions may benefit from matching model type to expected failure statistics of the target clinical setting.
Load-bearing premise
The nine clinical datasets and six fusion architectures represent the typical range of real-world multimodal physiological signals and sensor-failure patterns.
What would settle it
A new dataset with different channel counts or sequence lengths where the robustness ranking by architecture family reverses or disappears.
Figures
read the original abstract
Multimodal physiological data powers clinical AI systems from intensive care units to wearable devices, but sensors routinely fail in practice. Two failure modes are common: modality missing, where an entire channel is absent, and within-modality missing, where a contiguous time segment is lost. No existing benchmark evaluates multiple fusion architectures under both failure modes at controlled severity levels across diverse clinical datasets. We present MuteBench, a benchmark covering 9 datasets from 7 clinical domains, 6 fusion architectures, and 2 missing-data modes over 125,000 samples. Through this benchmark, we find that architecture family is the strongest predictor of robustness, outweighing parameter count. Channel-independent models tolerate modality missing well but can be sensitive to within-modality missing, especially on short sequences. Curriculum modality dropout protects reliably only up to the maximum dropout rate used in training. We also find that channel count, sequence length, and modality alignment jointly determine which failure mode poses the greater threat. Finally, a PTB-XL case study suggests that diffusion-based imputation can improve downstream classification under within-modality missing, with the largest gains for models whose expert routing is most sensitive to corrupted inputs, though broader validation across datasets remains an open direction. MuteBench provides practitioners with concrete guidance for both selecting existing architectures and informing the design of future robust multimodal fusion methods.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces MuteBench, a benchmark for evaluating multimodal fusion architectures under modality missing and within-modality missing conditions. It covers 9 clinical datasets from 7 domains, 6 fusion architectures, and over 125,000 samples across two failure modes. The central empirical finding is that architecture family is the strongest predictor of robustness to these failures, outweighing parameter count; additional results address curriculum modality dropout limits and the potential benefits of diffusion imputation for sensitive routing models.
Significance. If the robustness rankings hold after controlling for design choices, the benchmark supplies concrete, practitioner-oriented guidance for selecting fusion methods in clinical settings where sensor dropouts are routine. The scale of the evaluation and the explicit comparison of failure modes across domains represent a useful contribution to reproducible multimodal robustness research.
major comments (2)
- [Abstract] Abstract and results: the claim that architecture family is the strongest predictor of robustness (outweighing parameter count) is not isolated from confounding differences in missing-data handling. The abstract itself notes that curriculum modality dropout protects only up to the training rate and that diffusion imputation helps models with sensitive routing; without an ablation that equalizes these mechanisms across families, variance attributed to 'family' may instead reflect built-in masking, expert routing, or imputation strategies.
- [Abstract] Abstract: the assertion that the nine datasets and six architectures 'sufficiently represent' real-world multimodal physiological signals and sensor-failure patterns is stated without supporting evidence or sensitivity analysis. This assumption is load-bearing for the generalization of the robustness rankings.
minor comments (2)
- [Abstract] The abstract reports '125,000 samples' but does not break down the distribution across datasets, architectures, or missing-data severity levels.
- Clarify the precise statistical procedure used to rank predictors (architecture family vs. parameter count) and report effect sizes or confidence intervals for the ranking.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript introducing MuteBench. We appreciate the referee's focus on potential confounders in our robustness analysis and the generalizability of the benchmark. We address each major comment below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract and results: the claim that architecture family is the strongest predictor of robustness (outweighing parameter count) is not isolated from confounding differences in missing-data handling. The abstract itself notes that curriculum modality dropout protects only up to the training rate and that diffusion imputation helps models with sensitive routing; without an ablation that equalizes these mechanisms across families, variance attributed to 'family' may instead reflect built-in masking, expert routing, or imputation strategies.
Authors: We acknowledge that differences in missing-data handling mechanisms (such as built-in masking, expert routing, or imputation) are inherent to the architecture families evaluated and could contribute to the observed robustness patterns. These mechanisms form part of what distinguishes the families in practical deployments, and our experiments compared representative implementations as they are commonly used. Parameter counts were varied within families where possible to support the family-level finding. To address the concern directly, we will revise the discussion section to explicitly note this potential confounding and highlight the need for future controlled ablations that equalize handling strategies across families. revision: partial
-
Referee: [Abstract] Abstract: the assertion that the nine datasets and six architectures 'sufficiently represent' real-world multimodal physiological signals and sensor-failure patterns is stated without supporting evidence or sensitivity analysis. This assumption is load-bearing for the generalization of the robustness rankings.
Authors: The nine datasets were chosen to cover seven distinct clinical domains with differences in channel counts, sequence lengths, sampling rates, and modality alignments, aiming to reflect common physiological signal characteristics and sensor failure scenarios. The total of over 125,000 samples provides scale for the comparisons. We do not claim the selection is exhaustive or perfectly representative of all possible real-world cases. In the revision, we will add a dedicated subsection in the datasets description justifying the selection criteria with a summary table of key characteristics and include a brief sensitivity check by reporting robustness rankings on dataset subsets to assess stability. revision: yes
Circularity Check
No circularity: empirical benchmark derives claims from external dataset comparisons
full rationale
The paper introduces MuteBench as an empirical benchmark evaluating 6 fusion architectures across 9 clinical datasets under controlled missing-data conditions. The central claim that architecture family is the strongest predictor of robustness is obtained directly from experimental results on these external datasets rather than from any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations. No derivation chain reduces to its own inputs by construction; the findings remain falsifiable via replication on the benchmark. This is the expected outcome for a purely empirical evaluation paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The selected clinical datasets and fusion architectures are representative of broader multimodal physiological data scenarios.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1038/s41598-026-39035-z
Benchmarking imputation strategies for missing time-series data in critical care using real- world-inspired scenarios.Scientific Reports, 2026. doi: 10.1038/s41598-026-39035-z
-
[2]
Nat Med28, 1773–1784 (2022) https://doi.org/10.1038/s41591-022-01981-2
Julián N Acosta, Guido J Falcone, Pranav Rajpurkar, and Eric J Topol. Multimodal biomedical AI.Nature Medicine, 28(9):1773–1784, 2022. doi: 10.1038/s41591-022-01981-2. URL https://www.nature.com/articles/s41591-022-01981-2
-
[3]
Seongsu Bae, Daeun Kyung, Jaehee Ryu, Eunbyeol Cho, Gyubok Lee, Sunjun Kweon, Jungwoo Oh, Lei Ji, Eric Chang, Tackeun Kim, and Edward Choi. Ehrxqa: A multi-modal ques- tion answering dataset for electronic health records with chest x-ray images. In A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, editors,Advances in Neural Information ...
work page 2023
-
[4]
Yoshua Bengio, Jérôme Louradour, Ronan Collobert, and Jason Weston. Curriculum learning. InProceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pages 41–48, New York, NY , USA, 2009. Association for Computing Machinery. ISBN 9781605585161. doi: 10.1145/1553374.1553380
-
[5]
Zhengping Che, Sanjay Purushotham, Kyunghyun Cho, David Sontag, and Yan Liu. Recurrent neural networks for multivariate time series with missing values.Scientific Reports, 8(1):6085,
-
[6]
doi: 10.1038/s41598-018-24271-9. URL https://www.nature.com/articles/ s41598-018-24271-9
-
[7]
Emma Chen, Aman Kansal, Julie Chen, Boyang Tom Jin, Julia Rachel Reisler, David A Kim, and Pranav Rajpurkar. Multimodal clinical benchmark for emergency care (mc-bec): A comprehensive benchmark for evaluating foundation models in emergency medicine. 2023. URLhttps://arxiv.org/abs/2311.04937
-
[8]
Jonas Chromik, S A I Klopfenstein, Bjarne Pfitzner, Zeynab C Sinno, Bert Arnrich, Felix Balzer, and Akira-Sebastian Poncette. Computational approaches to alleviate alarm fatigue in intensive care medicine: a systematic literature review.Frontiers in Digital Health, 4:843747, 2022. doi: 10.3389/fdgth.2022.843747. URLhttps://doi.org/10.3389/fdgth.2022.843747
-
[9]
CLIMB: Data foundations for large scale multimodal clinical foundation models
Wei Dai, Peilin Chen, Malinda Lu, Daniel A Li, Haowen Wei, Hejie Cui, and Paul Pu Liang. CLIMB: Data foundations for large scale multimodal clinical foundation models. InForty- second International Conference on Machine Learning, 2025. URL https://openreview. net/forum?id=TcvjOSePic
work page 2025
-
[10]
Wearable sensors enable personalized predictions of clinical laboratory measurements
Jessilyn Dunn, Lukasz Kidzinski, Ryan Runge, Daniel Witt, Jennifer L Hicks, Sophia Miryam Schüssler-Fiorenza Rose, Xiao Li, Amir Bahmani, Scott L Delp, Trevor Hastie, and Michael P Snyder. Wearable sensors enable personalized predictions of clinical laboratory measurements. Nature Medicine, 27(6):1105–1112, 2021. doi: 10.1038/s41591-021-01339-0. URL https...
-
[11]
Dennis Fast, Lisa C Adams, Felix Busch, Conor Fallon, Marc Huppertz, Robert Siepmann, Philipp Prucker, Nadine Bayerl, Daniel Truhn, Marcus Makowski, et al. Autonomous medical evaluation for guideline adherence of large language models.NPJ Digital Medicine, 7(1):358, 2024. 10
work page 2024
-
[12]
Samuel G. Finlayson, Adarsh Subbaswamy, Karandeep Singh, John Bowers, Annabel Kupke, Jonathan Zittrain, Isaac S. Kohane, and Suchi Saria. The clinician and dataset shift in artificial intelligence.New England Journal of Medicine, 385(3):283–286, 2021. doi: 10.1056/NEJMc2104626
-
[13]
Ary L. Goldberger, Luis A. N. Amaral, Leon Glass, Jeffrey M. Hausdorff, Plamen Ch. Ivanov, Roger G. Mark, Joseph E. Mietus, George B. Moody, Chung-Kang Peng, and H. Eugene Stanley. Physiobank, physiotoolkit, and physionet.Circulation, 101(23):e215–e220, 2000. doi: 10.1161/01.CIR.101.23.e215
-
[14]
Xing Han, Huy Nguyen, Carl Harris, Nhat Ho, and Suchi Saria. Fusemoe: Mixture-of- experts transformers for fleximodal fusion.arXiv preprint arXiv:2402.03226, 2024. URL https://arxiv.org/abs/arXiv:2402.03226
-
[15]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, pages 6840–6851. Curran Associates, Inc., 2020
work page 2020
-
[16]
Yutao Hu, Tianbin Li, Quanfeng Lu, Wenqi Shao, Junjun He, Yu Qiao, and Ping Luo. Om- nimedvqa: A new large-scale comprehensive evaluation benchmark for medical lvlm. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22170–22183, 2024. doi: 10.1109/CVPR52733.2024.02093
-
[17]
Yu Huang, Junyang Lin, Chang Zhou, Hongxia Yang, and Longbo Huang. Modality compe- tition: What makes joint training of multi-modal network fail in deep learning? (Provably). InProceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 9226–9259. PMLR, 17–23 Jul 2022. URL https://pro...
work page 2022
-
[18]
Juntao Jiang, Jiangning Zhang, Yali Bi, Jinsheng Bai, Weixuan Liu, Weiwei Jin, Zhucun Xue, Yong Liu, Xiaobin Hu, and Shuicheng Yan. M3CoTBench: Benchmark chain-of-thought of MLLMs in medical image understanding.arXiv preprint arXiv:2601.08758, 2026
-
[19]
Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? a large-scale open domain question answering dataset from medical exams.Applied Sciences, 11(14), 2021. ISSN 2076-3417. doi: 10.3390/app11146421. URLhttps://www.mdpi.com/2076-3417/11/14/6421
-
[20]
Mimic-cxr-jpg: Chest radiographs with structured labels.PhysioNet, 2019
Alistair Johnson, Matthew Lungren, Yifan Peng, Zhiyong Lu, Roger Mark, Seth Berkowitz, and Steven Horng. Mimic-cxr-jpg: Chest radiographs with structured labels.PhysioNet, 2019. doi: 10.13026/8360-t248
-
[21]
MIMIC-IV.PhysioNet, October 2024
Alistair Johnson, Lucas Bulgarelli, Tom Pollard, Brian Gow, Benjamin Moody, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV.PhysioNet, October 2024. doi: 10.13026/ kpb9-mt58. URLhttps://doi.org/10.13026/kpb9-mt58. Version 3.1
-
[22]
Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg,
B. Kemp, A.H. Zwinderman, B. Tuk, H.A.C. Kamphuisen, and J.J.L. Oberye. Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the eeg.IEEE Transactions on Biomedical Engineering, 47(9):1185–1194, 2000. doi: 10.1109/10.867928. URLhttps://physionet.org/content/sleep-edfx/1.0.0/
-
[23]
Multimodal prompting with missing modalities for visual recognition
Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, and Chen-Yu Lee. Multimodal prompting with missing modalities for visual recognition. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023
work page 2023
-
[24]
Hao Li, Bowen Deng, Chang Xu, Zhiyuan Feng, Viktor Schlegel, Yu-Hao Huang, Yizheng Sun, Jingyuan Sun, Kailai Yang, Yiyao Yu, et al. MIRA: Medical time series foundation model for real-world health data.arXiv preprint arXiv:2506.07584, 2025. URL https: //arxiv.org/abs/2506.07584
-
[25]
Xiaomin Li, Mingye Gao, Yuexing Hao, Taoran Li, Guangya Wan, Zihan Wang, and Yijun Wang. MedGUIDE: Benchmarking clinical decision-making in large language models.arXiv preprint arXiv:2505.11613, 2025. 11
-
[26]
Paul Pu Liang, Yiwei Lyu, Xiang Fan, Arav Agarwal, Yun Cheng, Louis-Philippe Morency, and Ruslan Salakhutdinov. MULTIZOO & MULTIBENCH: A standardized toolkit for multimodal deep learning.Journal of Machine Learning Research, 24:1–7, 2023
work page 2023
-
[27]
SMIL: Multimodal learning with severely missing modality
Mengmeng Ma, Jian Ren, Long Zhao, Sergey Tulyakov, Cathy Wu, and Xi Peng. SMIL: Multimodal learning with severely missing modality. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 2302–2310, 2021. doi: 10.1609/aaai.v35i3.16330. URLhttps://ojs.aaai.org/index.php/AAAI/article/view/16330
-
[28]
Up-fall detection dataset: A multimodal approach
Lourdes Martínez-Villaseñor, Hiram Ponce, Jorge Brieva, Ernesto Moya-Albor, José Núñez- Martínez, and Carlos Peñafort-Asturiano. Up-fall detection dataset: A multimodal approach. Sensors, 19(9), 2019. ISSN 1424-8220. doi: 10.3390/s19091988. URL https://www.mdpi. com/1424-8220/19/9/1988
-
[29]
The CirCor DigiScope Phonocardiogram Dataset.PhysioNet, May 2022
Jorge Oliveira, Francesco Renna, Paulo Costa, Marcelo Nogueira, Ana Cristina Oliveira, Andoni Elola, Carlos Ferreira, Alipio Jorge, Ali Bahrami Rad, Matthew Reyna, Reza Sameni, Gari Clifford, and Miguel Coimbra. The CirCor DigiScope Phonocardiogram Dataset.PhysioNet, May 2022. doi: 10.13026/tshs-mw03. URL https://doi.org/10.13026/tshs-mw03. Version 1.0.3
-
[30]
Maestro: Adaptive sparse attention and robust learning for multimodal dynamic time series
Akash Pandey Payal Mohapatra, Yueyuan Sui, Stephen Xia, and Qi Zhu. Maestro: Adaptive sparse attention and robust learning for multimodal dynamic time series. InNeurIPS, 2025
work page 2025
-
[31]
Attila Reiss, Ina Indlekofer, and Philip Schmidt. PPG-DaLiA. 2019. URL https://archive. ics.uci.edu/dataset/495/ppg+dalia. DOI: https://doi.org/10.24432/C53890
-
[32]
Introducing wesad, a multimodal dataset for wearable stress and affect detection,
Philip Schmidt, Attila Reiss, Robert Duerichen, Claus Marberger, and Kristof Van Laerhoven. Introducing wesad, a multimodal dataset for wearable stress and affect detection. InProceedings of the 20th ACM International Conference on Multimodal Interaction, ICMI ’18, page 400–408, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 978145035...
-
[33]
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer.arXiv preprint arXiv:1701.06538, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
Benjamin Shickel, Tyler J Loftus, Lasith Adhikari, Tezcan Ozrazgat-Baslanti, Azra Bihorac, and Parisa Rashidi. DeepSOFA: a continuous acuity score for critically ill patients using clinically interpretable deep learning.Scientific Reports, 9(1):1879, 2019. doi: 10.1038/ s41598-019-38491-0. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC6372608/
work page 2019
-
[35]
Multi-time attention networks for irregularly sampled time series
Satya Narayan Shukla and Benjamin M Marlin. Multi-time attention networks for irregularly sampled time series. InInternational Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=4c0J6lwQ4_
work page 2021
-
[36]
Ikaro Silva, George Moody, Daniel J Scott, Leo A Celi, and Roger G Mark. Predicting in- hospital mortality of icu patients: The physionet/computing in cardiology challenge 2012. In2012 Computing in Cardiology, pages 245–248, 2012. URL https://physionet.org/ content/challenge-2012/1.0.0/
work page 2012
-
[37]
Large language models encode clinical knowledge.Nature, 2023
Karan Singhal, Shekoofeh Azizi, Tao Tu, Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Perry Payne, Stephen Pfohl, Martin Seneviratne, Paul Gamble, Christopher Kelly, Abubakr Abdelrazig Hassan Babiker, Nathanael Schaerli, Aakanksha Chowdhery, Philip Mansfield, Dina Demner-Fushman, Blaise Aguera-Arcas, Dale Webst...
work page 2023
-
[38]
Luis R Soenksen, Yu Ma, Cynthia Zeng, Leonard Boussioux, Kimberly Villalobos Carballo, Liangyuan Na, Holly M Wiberg, Michael L Li, Ignacio Fuentes, and Dimitris Bertsimas. Integrated multimodal artificial intelligence framework for healthcare applications.NPJ Digital Medicine, 5(1):149, 2022. doi: 10.1038/s41746-022-00689-4. URL https://www.nature. com/ar...
-
[39]
Available: https://doi.org/10.1109/JBHI.2020.3022989
Nils Strodthoff, Patrick Wagner, Tobias Schaeffter, and Wojciech Samek. Deep learning for ecg analysis: Benchmarks and insights from ptb-xl.IEEE Journal of Biomedical and Health Informatics, 25(5):1519–1528, 2021. doi: 10.1109/JBHI.2020.3022989
-
[40]
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. Csdi: Conditional score-based diffusion models for probabilistic time series imputation.Advances in neural information processing systems, 34:24804–24816, 2021
work page 2021
-
[41]
Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Charles Lau, Ryutaro Tanno, Ira Ktena, Anil Palepu, Basil Mustafa, Aakanksha Chowdhery, Yun Liu, Simon Kornblith, David Fleet, Philip Mansfield, Sushant Prakash, Renee Wong, Sunny Virmani, Christopher Semturs, S. Sara Mahdavi, Bradley Green, Ewa Dominow...
-
[42]
Guillaume Vila, Clément Godin, Sylvie Charbonnier, and Aurélie Campagne. Real-time quality index to control data loss in real-life cardiac monitoring applications.Sensors, 21(16):5357,
-
[43]
URLhttps://doi.org/10.3390/s21165357
doi: 10.3390/s21165357. URLhttps://doi.org/10.3390/s21165357
-
[44]
Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Wojciech Samek, and Tobias Schaeffter. PTB-XL, a large publicly available electrocardiography dataset.PhysioNet, April 2020. doi: 10.13026/x4td-x982. URLhttps://doi.org/10.13026/x4td-x982. Version 1.0.1
-
[45]
Multi- modal learning with missing modality via shared-specific feature modelling
Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, and Gustavo Carneiro. Multi- modal learning with missing modality via shared-specific feature modelling. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15878–15887, 2023
work page 2023
-
[46]
In: Proceedings of the IEEE/CVF Conference on Computer 25 Vision and Pattern Recognition, pp
Weiyao Wang, Du Tran, and Matt Feiszli. What makes training multi-modal classification networks hard? In2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12692–12702, 2020. doi: 10.1109/CVPR42600.2020.01271
-
[47]
Yuanlong Wang, Changchang Yin, and Ping Zhang. Multimodal risk prediction with physiolog- ical signals, medical images and clinical notes.Heliyon, 10(5):e26772, 2024. ISSN 2405-8440. doi: https://doi.org/10.1016/j.heliyon.2024.e26772
-
[48]
Narayan, Errol Colak, Adewole Adamson, Laura Heacock, Geoffrey H
Kathryn Wantlin, Chenwei Wu, Shih-Cheng Huang, Oishi Banerjee, Farah Dadabhoy, Veeral Vipin Mehta, Ryan Wonhee Han, Fang Cao, Raja R. Narayan, Errol Colak, Adewole Adamson, Laura Heacock, Geoffrey H. Tison, Alex Tamkin, and Pranav Rajpurkar. Benchmd: A benchmark for modality-agnostic learning on medical images and sensors, 2023
work page 2023
-
[49]
Elisa Warner, Joonsang Lee, William Hsu, Tanveer Syeda-Mahmood, Charles E Kahn Jr, Olivier Gevaert, and Arvind Rao. Multimodal machine learning in image-based and clini- cal biomedicine: survey and prospects.International Journal of Computer Vision, 132(9): 3753–3769, 2024. doi: 10.1007/s11263-024-02032-8. URL https://link.springer.com/ article/10.1007/s1...
-
[50]
Wenfang Yao, Kejing Yin, William K Cheung, Jia Liu, and Jing Qin. DrFuse: Learning disentangled representation for clinical multi-modal fusion with missing modality and modal inconsistency. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 16416–16424, 2024. doi: 10.1609/aaai.v38i15.29578. URL https://ojs.aaai.org/ index.ph...
-
[51]
Flex-moe: Modeling arbitrary modality combination via the flexible mixture-of-experts, 2024
Sukwon Yun, Inyoung Choi, Jie Peng, Yangfan Wu, Jingxuan Bao, Qiyiwen Zhang, Jiayi Xin, Qi Long, and Tianlong Chen. Flex-moe: Modeling arbitrary modality combination via the flexible mixture-of-experts, 2024. URLhttps://arxiv.org/abs/2410.08245
-
[52]
M3Care: Learning with missing modalities in multimodal healthcare data
Chaohe Zhang, Xu Chu, Liantao Ma, Yinghao Zhu, Yasha Wang, Jiangtao Wang, and Junfeng Zhao. M3Care: Learning with missing modalities in multimodal healthcare data. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, pages 2418–2428, 2022. doi: 10.1145/3534678.3539388. 13
-
[53]
Graph-guided net- work for irregularly sampled multivariate time series
Xiang Zhang, Marko Zeman, Theodoros Tsiligkaridis, and Marinka Zitnik. Graph-guided net- work for irregularly sampled multivariate time series. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=Kwm8I7dU-l5
work page 2022
-
[54]
Jianwei Zheng, Hangyuan Guo, and Huimin Chu. A large scale 12-lead electrocardiogram database for arrhythmia study.PhysioNet, August 2022. doi: 10.13026/wgex-er52. URL https://doi.org/10.13026/wgex-er52. Version 1.0.0. A Broader Impacts MuteBench provides practitioners with concrete, dataset-aware guidance for selecting multimodal fusion architectures tha...
-
[55]
Clinical time series ( C= 30 , T= 48 ):Vital signs and laboratory values aggregated into 1-hour bins over the first 48 hours of ICU admission. The 30 channels include heart rate, systolic/diastolic/mean arterial blood pressure, respiratory rate, body temperature, SpO2, and key biochemical markers such as glucose, creatinine, potassium, sodium, and bicarbonate. 21
-
[56]
Chest X-ray features (1024-D static vector):Visual embeddings pre-extracted from the most recent chest radiograph sourced from MIMIC-CXR-JPG [19], following the multimodal configuration of Han et al. [13], using a pretrained thoracic image encoder, capturing structural lung and cardiac pathology including effusions, cardiomegaly, and consolidations
-
[57]
ECG features (256-D static vector):Temporal embeddings pre-extracted from the 12-lead ECG recording closest to ICU admission time, encoding arrhythmia and ischaemia patterns in a compact representation
-
[58]
Clinical text features (768-D static vector):Semantic embeddings pre-extracted from clinical notes (nursing notes, discharge summaries) using a pretrained BERT-based clinical language model, encoding free-text observations not captured by structured variables. Modalities 2–4 are static vectors with no time axis and cannot be aligned with the hourly time s...
work page 2012
-
[59]
We useblock_n= 0.05andblock_n_max= 0.10, so each block covers 5–10% ofT
Compute the block length range: ℓmin =⌈block_n·T⌉ , ℓmax =⌈block_n_max·T⌉ . We useblock_n= 0.05andblock_n_max= 0.10, so each block covers 5–10% ofT
-
[60]
Estimate the number of blocks required to cover fractionblock_mof the sequence: k= block_m·T (ℓmin +ℓ max)/2 . 24
-
[61]
For each block, uniformly sample a start position and check for overlap with already-placed blocks. If an overlap is found, resample up to 64 times; if no valid position is found, stop placing further blocks for this channel
-
[62]
Setmask[c,start:end]←0for each placed block. Each channel uses an independent sub-generator: before iterating over channels, the shared rng draws one 64-bit seed per channel upfront, and each channel’s block placement proceeds from its own np.random.default_rng. This ensures that different channels miss different time windows while the entire per-sample p...
work page 2012
-
[63]
Institutional review board (IRB) approvals or equivalent for research with human subjects Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.