MedMIX: Modality-Internal Expert Fusion for Multimodal Medical Diagnosis

Anqi Li; Seungik Cho; Wei Qiu

arxiv: 2605.16639 · v1 · pith:6N5F62JRnew · submitted 2026-05-15 · 💻 cs.LG

MedMIX: Modality-Internal Expert Fusion for Multimodal Medical Diagnosis

Seungik Cho , Anqi Li , Wei Qiu This is my paper

Pith reviewed 2026-05-20 19:34 UTC · model grok-4.3

classification 💻 cs.LG

keywords multimodal medical diagnosismissing modalitiesexpert fusionfoundation modelsclinical predictionrobustnessintra-modality aggregation

0 comments

The pith

MedMIX combines intra-modality expert fusion with learned inter-modality fusion to enable robust medical predictions under missing modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces MedMIX to address challenges in multimodal clinical prediction including multiple models per modality, missing data types, and varying contributions from each. The framework aggregates experts within a modality, fuses across available ones in a sample-specific way, and uses large models for training only. If successful, it would allow medical systems to deliver reliable results even with incomplete patient records across different hospitals and conditions.

Core claim

The central discovery is that by aggregating complementary embeddings from multiple small expert models within each modality, performing learned fusion over the available modalities, and collaborating with large teacher models exclusively during training, MedMIX achieves consistently strong performance on the OpenI, MIMIC-IV-MM, and MMIST-ccRCC benchmarks while showing robustness to missing-modality perturbations and cross-cohort shifts on MIMIC-III.

What carries the argument

Intra-modality expert fusion that aggregates multiple small model embeddings per data type, combined with learned inter-modality fusion that adapts to available inputs and training-only large-small collaboration.

If this is right

The framework remains effective when modalities are missing during both training and testing.
It generalizes across different medical datasets and patient cohorts without major performance loss.
Large models contribute to better representations without incurring extra cost when the system is deployed.
Sample-specific fusion allows varying modality importance depending on the individual case.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This design suggests a path for other domains with incomplete multimodal inputs, such as environmental monitoring with faulty sensors.
One could explore whether the learned fusion provides insights into modality importance for particular diagnoses.
Extending the expert pool with domain-specific models might further improve results on rare conditions.

Load-bearing premise

That the learned inter-modality fusion and intra-modality expert aggregation can compensate for missing modalities without introducing systematic biases or depending on data distributions that match the three chosen benchmarks.

What would settle it

If MedMIX underperforms compared to simple concatenation or unimodal approaches on a new benchmark where missing modalities follow a different pattern, such as always missing one specific type together, that would challenge the claim of general robustness.

Figures

Figures reproduced from arXiv: 2605.16639 by Anqi Li, Seungik Cho, Wei Qiu.

**Figure 2.** Figure 2: MedMIX remains relatively stable under train-time one-modality drop and shows larger, [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: MedMIX is robust to train-time multi-random drop and degrades gracefully as test-time [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Structural ablation results across OpenI, MIMIC-IV-MM, and MMIST-ccRCC. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Macro-averaged efficiency comparison across OpenI, MIMIC-IV-MM, and MMIST-ccRCC. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

Multimodal clinical prediction faces three challenges: multiple foundation models (FMs) with complementary strengths per modality, pervasive missing modalities at training and test time, and sample-specific variation in modality contributions. We introduce MedMIX, a multimodal framework that combines intra-modality expert fusion, learned inter-modality fusion, and training-only large--small model collaboration for robust medical prediction under incomplete modalities. Within each modality, MedMIX aggregates complementary embeddings from multiple small expert models; across modalities, it performs learned fusion over available modalities; and during training, it leverages large teacher models to improve deployed representations without additional inference cost. Across three heterogeneous benchmarks (OpenI, MIMIC-IV-MM, and MMIST-ccRCC), MedMIX achieves consistently strong performance while remaining robust under controlled missing-modality perturbations, and further demonstrates sustained robustness under cross-cohort shift on MIMIC-III. These results highlight MedMIX as a practical framework that unifies within-modality expert collaboration, sample-specific cross-modality fusion, and efficient large--small model collaboration while remaining robust to incomplete modalities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MedMIX fuses intra-modality experts with learned cross-modality combination to handle missing data, but the robustness story hinges on whether its perturbations reflect real clinical patterns.

read the letter

The core idea is straightforward: inside each modality run several small expert models and aggregate their embeddings, then learn a sample-specific way to combine the available modalities at test time, while using large teachers only in training to sharpen the small models. That combination is presented as new for the medical multimodal case, and the abstract claims it delivers strong performance plus robustness on OpenI, MIMIC-IV-MM, and MMIST-ccRCC, with an extra cross-cohort check on MIMIC-III. The practical angle is real; missing modalities are routine in clinical data, so a method that avoids simple imputation and keeps inference cheap could matter for deployment of foundation models. The paper does a decent job naming the three pieces it brings together and tying them to the missing-modality problem without overclaiming theoretical novelty. The main soft spot is exactly the one in the stress-test note. The abstract only says “controlled missing-modality perturbations,” with no indication whether those are MCAR, MAR, or MNAR, and no mention of whether missingness correlates with severity or outcome the way it often does in EHRs. Without that, it is hard to know if the learned fusion actually compensates or just works under artificial conditions. There are also no numbers, error bars, or ablation tables visible here, so it is impossible to judge how much the expert aggregation adds over straightforward baselines. If the full paper supplies those details and shows the fusion is not just fitting to the benchmark distributions, the claim strengthens. This is the kind of work that belongs in a reading group focused on applied multimodal medical ML; readers who care about incomplete data in foundation-model pipelines will find the framework description useful even if they end up tweaking the missingness model. It is worth sending to peer review so the methods, ablations, and missingness protocol can be checked properly.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces MedMIX, a multimodal framework for clinical prediction that performs intra-modality expert fusion by aggregating complementary embeddings from multiple small expert models per modality, learned inter-modality fusion over available modalities at inference time, and training-only collaboration with large teacher models to improve representations without added inference cost. It evaluates the approach on three heterogeneous benchmarks (OpenI, MIMIC-IV-MM, MMIST-ccRCC), claiming consistently strong performance, robustness under controlled missing-modality perturbations, and sustained robustness under cross-cohort shift on MIMIC-III.

Significance. If the empirical claims are supported by detailed results, MedMIX would address practically important challenges in multimodal medical AI: pervasive missing modalities at train and test time, sample-specific variation in modality utility, and the cost of large foundation models. The combination of within-modality expert aggregation, sample-adaptive cross-modality fusion, and efficient large-small distillation is a coherent design that could reduce reliance on complete multimodal inputs while maintaining performance. The reported cross-cohort robustness on MIMIC-III is a positive indicator of generalization if the experimental controls are appropriate.

major comments (2)

[Abstract and §4] Abstract and §4 (Experiments): the central robustness claim rests on 'controlled missing-modality perturbations' whose distribution is not specified (MCAR, MAR, or MNAR) and for which no ablation against standard imputation baselines or severity-correlated missingness is reported. If the perturbations are uniform or random rather than reflecting real clinical patterns (e.g., missingness correlated with patient severity or outcome), the intra-modality expert aggregation plus learned inter-modality fusion may compensate only under artificial conditions, undermining the claim that the mechanism works 'without introducing systematic biases'.
[§3] §3 (Method): the description of how learned inter-modality fusion weights are obtained when modalities are absent at test time is insufficient to determine whether the mechanism is truly sample-specific or reduces to a fixed imputation strategy. Without an explicit equation or pseudocode showing the fusion operation under partial modality availability, it is impossible to verify that the approach avoids circular dependence on the training distribution.

minor comments (2)

[Abstract] The abstract uses the term 'consistently strong performance' without reference to specific metrics, baselines, or statistical tests; adding a one-sentence summary of the key quantitative gains would improve clarity.
[§3] Notation for the intra-modality expert aggregation and inter-modality fusion operators should be introduced with explicit equations rather than prose descriptions to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity around missing-modality mechanisms and the inter-modality fusion process. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experiments): the central robustness claim rests on 'controlled missing-modality perturbations' whose distribution is not specified (MCAR, MAR, or MNAR) and for which no ablation against standard imputation baselines or severity-correlated missingness is reported. If the perturbations are uniform or random rather than reflecting real clinical patterns (e.g., missingness correlated with patient severity or outcome), the intra-modality expert aggregation plus learned inter-modality fusion may compensate only under artificial conditions, undermining the claim that the mechanism works 'without introducing systematic biases'.

Authors: We agree that the missingness distribution must be stated explicitly. Our experiments applied independent uniform random modality drops at rates of 20-60% (MCAR) to create controlled test conditions. We will revise the abstract and §4 to specify this mechanism and add ablations comparing MedMIX against standard imputation baselines (mean imputation, zero imputation, and modality-specific forward filling). While we lack the clinical metadata to simulate severity-correlated MNAR missingness on these benchmarks, the cross-cohort evaluation on MIMIC-III already demonstrates robustness under real distributional shift; we will add a limitations paragraph discussing the gap between MCAR and clinical MNAR patterns. revision: partial
Referee: [§3] §3 (Method): the description of how learned inter-modality fusion weights are obtained when modalities are absent at test time is insufficient to determine whether the mechanism is truly sample-specific or reduces to a fixed imputation strategy. Without an explicit equation or pseudocode showing the fusion operation under partial modality availability, it is impossible to verify that the approach avoids circular dependence on the training distribution.

Authors: We apologize for the lack of detail. The learned inter-modality fusion is sample-specific: a small gating network produces normalized weights exclusively over the embeddings of modalities present at inference time, with absent modalities simply omitted from the softmax (no imputation occurs). We will insert an explicit equation in §3 of the form w = softmax(G(E_avail)) followed by the weighted sum, together with pseudocode that shows the masking logic for partial availability. This formulation depends only on observed embeddings and therefore introduces no circular dependence on the training distribution of missing patterns. revision: yes

Circularity Check

0 steps flagged

No circularity detected; framework is empirically validated without self-referential derivations

full rationale

The paper presents MedMIX as an architectural framework for multimodal medical prediction that aggregates intra-modality experts, performs learned inter-modality fusion, and uses training-only teacher collaboration. No equations, uniqueness theorems, ansatzes, or derivation chains appear in the provided abstract or description. Performance claims rest on empirical results across OpenI, MIMIC-IV-MM, MMIST-ccRCC, and cross-cohort MIMIC-III evaluations under controlled perturbations, not on any fitted parameter renamed as a prediction or on self-citation that bears the central load. The description is self-contained against external benchmarks and does not reduce any claim to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no free parameters, axioms, or invented entities are explicitly stated or derivable from the provided text.

pith-pipeline@v0.9.0 · 5719 in / 1089 out tokens · 38504 ms · 2026-05-20T19:34:42.266350+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Within each modality, MedMIX aggregates complementary embeddings from multiple small expert models; across modalities, it performs learned fusion over available modalities
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a learned MoE router then computes soft aggregation weights over available experts, adapting to heterogeneous experts while masking missing ones
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

controlled missing-modality perturbations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 6 internal anchors

[1]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Nazanin Zhao, et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs.arXiv preprint arXiv:2303.00915, 2023a. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir K...

work page internal anchor Pith review Pith/arXiv arXiv
[3]

MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval

Qiao Jin, Won Kim, Qingyu Chen, Donald C Comeau, Wo-Ting Yim, W John Wilbur, and Zhiyong Lu. MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 13906–13921,

work page 2023
[4]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361,

work page internal anchor Pith review Pith/arXiv arXiv 2001
[6]

A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024a

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024a. Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zh...

work page arXiv
[7]

Labrak, A

Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. BioMistral: a collection of open-source pretrained large language models for medical domains.arXiv preprint arXiv:2402.10373,

work page arXiv
[8]

Me LLaMA: foundation large language models for medical applications.arXiv preprint arXiv:2402.12749,

Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, et al. Me LLaMA: foundation large language models for medical applications.arXiv preprint arXiv:2402.12749,

work page arXiv
[9]

LLM2Vec: large language models are secretly powerful text encoders.arXiv preprint arXiv:2404.05961,

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. LLM2Vec: large language models are secretly powerful text encoders.arXiv preprint arXiv:2404.05961,

work page arXiv
[10]

MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b

11 Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, and Xiangyu Yue. MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b. Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, and Xiangyu Yue. OneLLM: one framework to align all moda...

work page arXiv
[11]

Distilling the Knowledge in a Neural Network

Association for Computational Linguistics. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv
[12]

Small models are valuable plug-ins for large language models

Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley. Small models are valuable plug-ins for large language models. InFindings of the Association for Computational Linguistics: ACL 2024, pages 283–294, Bangkok, Thailand, 2024b. Association for Computational Linguistics. Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan ...

work page 2024
[13]

REMIND: Rethinking medical high-modality learning under missingness—a long-tailed distribution perspective.arXiv preprint arXiv:2603.00046,

Chenwei Wu, Zitao Shuai, and Liyue Shen. REMIND: Rethinking medical high-modality learning under missingness—a long-tailed distribution perspective.arXiv preprint arXiv:2603.00046,

work page arXiv
[14]

Distilling large language models for biomedical knowledge extraction: a case study on named entity recognition.arXiv preprint arXiv:2307.01217,

Yu Liu, Preeti Agrawal, Elan Papanichalaou, Mengdi Gao, Paul Pu Liang, and Louis-Philippe Morency. Distilling large language models for biomedical knowledge extraction: a case study on named entity recognition.arXiv preprint arXiv:2307.01217,

work page arXiv
[15]

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Alistair E W Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data, 10(1):1, 2023a. Alistair E W Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, ...

work page internal anchor Pith review Pith/arXiv arXiv 1901
[16]

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV-Note: Deidentified free-text clinical notes.PhysioNet, 2023b. doi: 10.13026/1n74-ne17. Version 2.2. Tiago Mota, M Rita Verdelho, Diogo J Araújo, Alceu Bissoto, Carlos Santiago, and Catarina Barata. MMIST-ccRCC: A real world medical dataset for the development of multi-...

work page doi:10.13026/1n74-ne17
[17]

MAIRA-2: grounded radiology report generation.arXiv preprint arXiv:2406.04449,

Stephanie L Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Francesca Dalla Serra, Mercy Innæs, Aditya Nori, Hoifung Poon, Valentina Salvatelli, Amit Sharma, et al. MAIRA-2: grounded radiology report generation.arXiv preprint arXiv:2406.04449,

work page arXiv
[18]

variable_name: value; variable_name: value

Zhengrui Xu, Jiabo Zhang, Siyuan Liang, Xinhao Wang, Guang Luo, Yang Song, Anjia Han, Yuh- Show Sung, Xiao Han, Jing Yao, et al. HistGen: histopathology report generation via local-global feature encoding and cross-modal context interaction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11500–11510, 2024c. Appe...

work page arXiv 2024

[1] [1]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs

Sheng Zhang, Yanbo Xu, Naoto Usuyama, Hanwen Xu, Jaspreet Bagga, Robert Tinn, Sam Preston, Rajesh Rao, Mu Wei, Nazanin Zhao, et al. BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs.arXiv preprint arXiv:2303.00915, 2023a. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir K...

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval

Qiao Jin, Won Kim, Qingyu Chen, Donald C Comeau, Wo-Ting Yim, W John Wilbur, and Zhiyong Lu. MedCPT: contrastive pre-trained transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval. InFindings of the Association for Computational Linguistics: EMNLP 2023, pages 13906–13921,

work page 2023

[4] [4]

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, DDL Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training compute-optimal large language models.arXiv preprint arXiv:2203.15556,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361,

work page internal anchor Pith review Pith/arXiv arXiv 2001

[6] [6]

A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024a

Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, Rajesh Rao, Tristan Naumann, Cliff Wong, Zelalem Gero, Javier González, Yu Gu, et al. A whole-slide foundation model for digital pathology from real-world data.Nature, pages 1–8, 2024a. Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zh...

work page arXiv

[7] [7]

Labrak, A

Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. BioMistral: a collection of open-source pretrained large language models for medical domains.arXiv preprint arXiv:2402.10373,

work page arXiv

[8] [8]

Me LLaMA: foundation large language models for medical applications.arXiv preprint arXiv:2402.12749,

Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, et al. Me LLaMA: foundation large language models for medical applications.arXiv preprint arXiv:2402.12749,

work page arXiv

[9] [9]

LLM2Vec: large language models are secretly powerful text encoders.arXiv preprint arXiv:2404.05961,

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, and Siva Reddy. LLM2Vec: large language models are secretly powerful text encoders.arXiv preprint arXiv:2404.05961,

work page arXiv

[10] [10]

MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b

11 Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, and Xiangyu Yue. MetaTransformer: a unified framework for multimodal learning.arXiv preprint arXiv:2307.10802, 2023b. Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, and Xiangyu Yue. OneLLM: one framework to align all moda...

work page arXiv

[11] [11]

Distilling the Knowledge in a Neural Network

Association for Computational Linguistics. Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv

[12] [12]

Small models are valuable plug-ins for large language models

Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, and Julian McAuley. Small models are valuable plug-ins for large language models. InFindings of the Association for Computational Linguistics: ACL 2024, pages 283–294, Bangkok, Thailand, 2024b. Association for Computational Linguistics. Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan ...

work page 2024

[13] [13]

REMIND: Rethinking medical high-modality learning under missingness—a long-tailed distribution perspective.arXiv preprint arXiv:2603.00046,

Chenwei Wu, Zitao Shuai, and Liyue Shen. REMIND: Rethinking medical high-modality learning under missingness—a long-tailed distribution perspective.arXiv preprint arXiv:2603.00046,

work page arXiv

[14] [14]

Distilling large language models for biomedical knowledge extraction: a case study on named entity recognition.arXiv preprint arXiv:2307.01217,

Yu Liu, Preeti Agrawal, Elan Papanichalaou, Mengdi Gao, Paul Pu Liang, and Louis-Philippe Morency. Distilling large language models for biomedical knowledge extraction: a case study on named entity recognition.arXiv preprint arXiv:2307.01217,

work page arXiv

[15] [15]

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

Alistair E W Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J Pollard, Sicheng Hao, Benjamin Moody, Brian Gow, et al. MIMIC-IV, a freely accessible electronic health record dataset.Scientific Data, 10(1):1, 2023a. Alistair E W Johnson, Tom J Pollard, Nathaniel R Greenbaum, Matthew P Lungren, Chih-ying Deng, Yifan Peng, ...

work page internal anchor Pith review Pith/arXiv arXiv 1901

[16] [16]

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark

Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. MIMIC-IV-Note: Deidentified free-text clinical notes.PhysioNet, 2023b. doi: 10.13026/1n74-ne17. Version 2.2. Tiago Mota, M Rita Verdelho, Diogo J Araújo, Alceu Bissoto, Carlos Santiago, and Catarina Barata. MMIST-ccRCC: A real world medical dataset for the development of multi-...

work page doi:10.13026/1n74-ne17

[17] [17]

MAIRA-2: grounded radiology report generation.arXiv preprint arXiv:2406.04449,

Stephanie L Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C Castro, Francesca Dalla Serra, Mercy Innæs, Aditya Nori, Hoifung Poon, Valentina Salvatelli, Amit Sharma, et al. MAIRA-2: grounded radiology report generation.arXiv preprint arXiv:2406.04449,

work page arXiv

[18] [18]

variable_name: value; variable_name: value

Zhengrui Xu, Jiabo Zhang, Siyuan Liang, Xinhao Wang, Guang Luo, Yang Song, Anjia Han, Yuh- Show Sung, Xiao Han, Jing Yao, et al. HistGen: histopathology report generation via local-global feature encoding and cross-modal context interaction. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11500–11510, 2024c. Appe...

work page arXiv 2024