Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models
Pith reviewed 2026-05-22 07:59 UTC · model grok-4.3
The pith
Director-Experts (DEX) produces emergent modular representations that resolve gradient conflicts across heterogeneous medical imaging modalities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This work reframes the failure of monolithic self-supervised optimization on multi-modality medical data as an imbalance between specialization and coordination in emergent modularity, and proposes Director-Experts (DEX), a modular network that explicitly regulates these dynamics in stacked modules. Each DEX module comprises a pool of experts, dynamically adapted by an image-wise activation strategy that autonomously specializes in modality-dominant statistics, together with a director, updated via group exponential moving average, which distills multi-expert knowledge into a shared space for semantic integration across modalities, thus driving the emergence of modular representations.
What carries the argument
Director-Experts (DEX) module that uses a pool of experts with image-wise activation for modality specialization and a director with group exponential moving average for cross-modality knowledge distillation and coordination.
If this is right
- Improved optimization behavior during pre-training on data with pronounced non-IID statistics across modalities.
- Higher transferability to a wide range of downstream medical vision tasks.
- Representations that avoid collapse toward modality-dominant shortcuts.
- A step toward general-purpose multi-modality medical AI systems.
Where Pith is reading between the lines
- The same explicit separation of specialization and coordination could be tested on non-medical multi-modal data such as satellite imagery combined with ground sensors.
- Selective expert activation may allow lower inference cost by routing only relevant experts for a given input modality.
- The director mechanism might be combined with existing contrastive or masked-autoencoder objectives to further stabilize training on even larger modality sets.
Load-bearing premise
The image-wise activation strategy combined with group exponential moving average will autonomously produce useful modality specialization and semantic integration without introducing new gradient conflicts or requiring extensive hyper-parameter tuning.
What would settle it
Train DEX on the Medical Vision Universe benchmark and compare performance against monolithic baselines on the 26 downstream tasks; absence of consistent gains or failure of expert activations to align with distinct modalities would falsify the emergence of beneficial modular representations.
Figures
read the original abstract
Multi-modality medical vision (MV) foundation models (FM) are fundamentally challenged by pronounced Non-IID feature statistics across heterogeneous imaging modalities. Monolithic self-supervised optimization on such data induces conflicting gradients, driving representations to collapse toward modality-dominant shortcuts. This work reframes this failure as an imbalance between specialization and coordination in emergent modularity, and proposes Director-Experts (DEX), a modular network that explicitly regulates these dynamics in stacked modules. Each DEX module comprises a pool of experts, dynamically adapted by our image-wise activation strategy, autonomously specializing in modality-dominant statistics, together with a director, updated via our group exponential moving average, which distills multi-expert knowledge into a shared space for semantic integration across modalities, thus driving the emergence of modular representations. We curate a new benchmark, Medical Vision Universe, over 4 million images across 10 modalities, which provides a FM-level pre-training with the broadest coverage of distinct imaging modalities to our DEX. Extensive evaluations on 26 downstream tasks demonstrate improved optimization behavior and transferability, indicating DEX as a principled step toward general-purpose multi-modality medical AI. Our code and dataset will be opened at https://github.com/YutingHe-list/DEX.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that monolithic self-supervised training on Non-IID multi-modality medical images induces gradient conflicts and modality shortcuts; it reframes this as an imbalance in emergent modularity and introduces Director-Experts (DEX) modules. Each DEX module uses an image-wise expert activation strategy to promote modality-dominant specialization and a director updated by group exponential moving average to distill integrated representations. The authors curate the Medical Vision Universe benchmark (4M+ images, 10 modalities) and report improved optimization behavior and transfer performance on 26 downstream tasks.
Significance. If the mechanisms are shown to produce the claimed specialization and integration without hidden conflicts or extensive tuning, the work would offer a concrete architectural route to more robust multi-modality medical foundation models. The new benchmark itself constitutes a useful community resource for broad-modality pre-training.
major comments (2)
- [Abstract and §3] Abstract and §3 (DEX module description): the central claim that image-wise activation plus group EMA autonomously yields modality-dominant specialization and semantic integration without new gradient conflicts is load-bearing, yet the manuscript provides no activation histograms, per-modality expert utilization curves, or gradient-norm comparisons on the Non-IID medical data to substantiate this.
- [§4] §4 (experiments): the reported gains on 26 tasks are presented without ablations isolating the contribution of image-wise activation versus group EMA, without error bars, and without comparison to standard MoE baselines under identical pre-training budgets, making it impossible to assess whether the improvements exceed what generic MoE scaling would deliver.
minor comments (2)
- [§3] Notation for the group EMA update rule and the exact form of the image-wise gating function should be written explicitly with equations rather than prose descriptions.
- The paper states that code and dataset will be released; confirming the exact release timeline and license in the camera-ready version would strengthen reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important opportunities to strengthen the empirical support for DEX. We will revise the manuscript to incorporate the requested analyses and ablations while preserving the core contributions of the Medical Vision Universe benchmark and the 26-task evaluation.
read point-by-point responses
-
Referee: [Abstract and §3] Abstract and §3 (DEX module description): the central claim that image-wise activation plus group EMA autonomously yields modality-dominant specialization and semantic integration without new gradient conflicts is load-bearing, yet the manuscript provides no activation histograms, per-modality expert utilization curves, or gradient-norm comparisons on the Non-IID medical data to substantiate this.
Authors: We agree that explicit visualizations are needed to substantiate the claimed specialization and integration dynamics. In the revision we will add activation histograms across modalities, per-modality expert utilization curves over training, and gradient-norm comparisons between DEX and monolithic baselines on the Non-IID medical data. These additions will directly illustrate that image-wise activation promotes modality-dominant expert specialization while the group-EMA director maintains semantic integration without introducing additional gradient conflicts. revision: yes
-
Referee: [§4] §4 (experiments): the reported gains on 26 tasks are presented without ablations isolating the contribution of image-wise activation versus group EMA, without error bars, and without comparison to standard MoE baselines under identical pre-training budgets, making it impossible to assess whether the improvements exceed what generic MoE scaling would deliver.
Authors: We acknowledge that component-wise ablations and controlled baselines are required to isolate the benefit of our design choices. The revised manuscript will include (i) ablations that separately disable image-wise activation and group EMA, (ii) error bars computed over at least three independent pre-training runs, and (iii) direct comparisons against standard MoE architectures trained under identical data, compute budget, and optimization settings. These results will clarify whether DEX delivers gains beyond generic MoE scaling on the 26 downstream tasks. revision: yes
Circularity Check
No circularity: DEX is an explicit architectural proposal, not a derivation reducing to its inputs.
full rationale
The paper reframes Non-IID challenges in multi-modality medical vision as an imbalance in specialization/coordination and directly proposes the DEX module (image-wise expert activation plus group EMA director) as a design intervention to drive emergent modularity. This is presented as a new network structure with downstream evaluations on 26 tasks and a new benchmark, without any equations, fitted parameters, or self-citations that reduce the claimed emergence or transferability back to the inputs by construction. No self-definitional, fitted-input-called-prediction, or ansatz-smuggled patterns appear in the provided text; the central claim remains an independent modeling choice rather than a tautological restatement.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Director-Experts (DEX) module
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Each DEX module comprises a pool of experts, dynamically adapted by our image-wise activation strategy, autonomously specializing in modality-dominant statistics, together with a director, updated via our group exponential moving average, which distills multi-expert knowledge into a shared space
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
reframes this failure as an imbalance between specialization and coordination in emergent modularity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Zeeshan Ahmed, Shahbaz Qamar Panhwar, Attiya Baqai, Fahim Aziz Umrani, Munawar Ahmed, and Arbaaz Khan. 2022. Deep learning based automated detection of intraretinal cystoid fluid.International Journal of Imaging Systems and Technology32, 3 (2022), 902–917
work page 2022
-
[2]
Tugba Akinci D’Antonoli, Lucas K Berger, Ashraya K Indrakanti, Nathan Vish- wanathan, Jakob Weiss, Matthias Jung, Zeynep Berkarda, Alexander Rau, Marco Reisert, Thomas Küstner, et al. 2025. Totalsegmentator mri: Robust sequence- independent segmentation of multiple anatomic structures in mri.Radiology 314, 2 (2025), e241613
work page 2025
-
[3]
Walid Al-Dhabyani, Mohammed Gomaa, Hussien Khaled, and Aly Fahmy. 2020. Dataset of breast ultrasound images.Data in brief28 (2020), 104863
work page 2020
-
[4]
Antonio Almudévar, José Miguel Hernández-Lobato, Sameer Khurana, Ricard Marxer, and Alfonso Ortega. [n. d.]. Aligning Multimodal Representations through an Information Bottleneck. InForty-second International Conference on Machine Learning
-
[5]
Mohamed Amgad, Habiba Elfandy, Hagar Hussein, Lamees A Atteya, Mai AT Elsebaie, Lamia S Abo Elnasr, Rokia A Sakr, Hazem SE Salem, Ahmed F Ismail, Anas M Saad, et al . 2019. Structured crowdsourcing enables convolutional segmentation of histology images.Bioinformatics35, 18 (2019), 3461–3467
work page 2019
-
[6]
Khaled Bayoudh, Raja Knani, Fayçal Hamdaoui, and Abdellatif Mtibaa. 2022. A survey on deep multimodal learning for computer vision: advances, trends, applications, and datasets.The Visual Computer38, 8 (2022), 2939–2970
work page 2022
-
[7]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains.Machine learning79, 1 (2010), 151–175
work page 2010
-
[8]
Olivier Bernard, Alain Lalande, Clement Zotti, Frederick Cervenansky, Xin Yang, Pheng-Ann Heng, Irem Cetin, Karim Lekadir, Oscar Camara, Miguel Angel Gonzalez Ballester, et al. 2018. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: is the problem solved? IEEE transactions on medical imaging37, 11 (2018), 2514–2525
work page 2018
-
[9]
Gaurav Bhole, S Suba, and Nita Parekh. 2025. Mammo-Bench: A Large-scale Benchmark Dataset of Mammography Images.medRxiv(2025), 2025–01
work page 2025
-
[10]
Nicholas Bien, Pranav Rajpurkar, Robyn L Ball, Jeremy Irvin, Allison Park, Erik Jones, Michael Bereket, Bhavik N Patel, Kristen W Yeom, Katie Shpanskaya, et al
-
[11]
Deep-learning-assisted diagnosis for knee magnetic resonance imaging: development and retrospective validation of MRNet.PLoS medicine15, 11 (2018), e1002699
work page 2018
-
[12]
Johanna Bischof, Georgina Fletcher, Paul Verkade, Claudia Kuntner, Julia Fernandez-Rodriguez, Linda Chaabane, Leor Ariel Rose, Andreas Walter, Michiel Vandenbosch, Marc AMJ van Zandvoort, et al. 2024. Multimodal bioimaging across disciplines and scales: challenges, opportunities and breaking down barriers.npj Imaging2, 1 (2024), 5
work page 2024
-
[13]
Rishi Bommasani. 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258(2021)
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[14]
Hanna Borgli, Vajira Thambawita, Pia H Smedsrud, Steven Hicks, Debesh Jha, Sigrun L Eskeland, Kristin Ranheim Randel, Konstantin Pogorelov, Mathias Lux, Duc Tien Dang Nguyen, et al. 2020. HyperKvasir, a comprehensive multi-class image and video dataset for gastrointestinal endoscopy.Scientific data7, 1 (2020), 283
work page 2020
-
[15]
Longbing Cao. 2022. Beyond iid: Non-iid thinking, informatics, and learning. IEEE Intelligent Systems37, 4 (2022), 5–17
work page 2022
-
[16]
Fernando Cervantes-Sanchez, Ivan Cruz-Aceves, Arturo Hernandez-Aguirre, Martha Alicia Hernandez-Gonzalez, and Sergio Eduardo Solorio-Meza. 2019. Automatic segmentation of coronary arteries in X-ray angiograms using mul- tiscale analysis and artificial neural networks.Applied Sciences9, 24 (2019), 5507
work page 2019
-
[17]
Abhra Chaudhuri, Anjan Dutta, Tu Bui, and Serban Georgescu. [n. d.]. A Closer Look at Multimodal Representation Collapse. InForty-second International Con- ference on Machine Learning
-
[18]
Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, and Kun Zhang. [n. d.]. PLOT: Prompt Learning with Optimal Transport for Vision- Language Models. InThe Eleventh International Conference on Learning Repre- sentations
-
[19]
Richard J Chen, Tong Ding, Ming Y Lu, Drew FK Williamson, Guillaume Jaume, Andrew H Song, Bowen Chen, Andrew Zhang, Daniel Shao, Muhammad Shaban, et al. 2024. Towards a general-purpose foundation model for computational pathology.Nature medicine30, 3 (2024), 850–862
work page 2024
-
[20]
Xinlei Chen, Saining Xie, and Kaiming He. 2021. An empirical study of training self-supervised vision transformers. InProceedings of the IEEE/CVF international conference on computer vision. 9640–9649
work page 2021
-
[21]
Yanyuan Chen, Dexuan Xu, Yu Huang, Songkun Zhan, Hanpin Wang, Dongxue Chen, Xueping Wang, Meikang Qiu, and Hang Li. 2025. MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output. InProceedings of the Computer Vision and Pattern Recognition Conference. 24732–24741
work page 2025
- [22]
-
[23]
Benoît Colson, Patrice Marcotte, and Gilles Savard. 2007. An overview of bilevel optimization.Annals of operations research153, 1 (2007), 235–256
work page 2007
-
[24]
Róbert Csordás, Sjoerd van Steenkiste, and Jürgen Schmidhuber. [n. d.]. Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks. InInternational Conference on Learning Representations
-
[25]
Viacheslav V Danilov, Kirill Yu Klyshnikov, Olga M Gerget, Anton G Kutikhin, Vladimir I Ganyukov, Alejandro F Frangi, and Evgeny A Ovcharenko. 2021. Real-time coronary artery stenosis detection based on modern neural networks. Scientific reports11, 1 (2021), 7582
work page 2021
-
[26]
Adrito Das, Danyal Z Khan, Dimitrios Psychogyios, Yitong Zhang, John G Hanrahan, Francisco Vasconcelos, You Pang, Zhen Chen, Jinlin Wu, Xiaoyang Zou, et al . 2024. Pitvis-2023 challenge: Workflow recognition in videos of endoscopic pituitary surgery.arXiv preprint arXiv:2409.01184(2024)
-
[27]
Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, et al . 2024. The 2024 brain tumor segmentation (brats) challenge: Glioma segmentation on post-treatment mri.arXiv preprint arXiv:2405.18368 (2024)
-
[28]
Etienne Decencière, Xiwei Zhang, Guy Cazuguel, Bruno Lay, Béatrice Cochener, Caroline Trone, Philippe Gain, John-Richard Ordóñez-Varela, Pascale Massin, Ali Erginay, et al. 2014. Feedback on a publicly distributed image database: the Messidor database.Image Analysis & Stereology(2014), 231–234
work page 2014
-
[29]
Yi Ding, IEEE Member, Qiqi Yang, Yiqian Wang, Dajiang Chen, Zhiguang Qin, and Jian Zhang. 2022. MallesNet: A multi-object assistance based network for brachial plexus segmentation in ultrasound images.Medical Image Analysis80 (2022), 102511
work page 2022
-
[30]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al . 2020. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929 (2020)
work page internal anchor Pith review Pith/arXiv arXiv 2020
- [31]
-
[32]
Yiyang Fang, Wenke Huang, Guancheng Wan, Kehua Su, and Mang Ye. 2025. EMOE: Modality-Specific Enhanced Dynamic Emotion Experts. InProceedings of the Computer Vision and Pattern Recognition Conference. 14314–14324
work page 2025
-
[33]
Andrey Fedorov, William JR Longabaugh, David Pot, David A Clunie, Steve Pieper, Hugo JWL Aerts, André Homeyer, Rob Lewis, Afshin Akbarzadeh, Dennis Bontempi, et al. 2021. NCI imaging data commons.Cancer research81, 16 (2021), 4188–4193
work page 2021
-
[34]
William Fedus, Barret Zoph, and Noam Shazeer. 2022. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research23, 120 (2022), 1–39
work page 2022
-
[35]
Chun-Mei Feng, Yunlu Yan, Geng Chen, Yong Xu, Ying Hu, Ling Shao, and Huazhu Fu. 2022. Multimodal transformer for accelerated MR imaging.IEEE Transactions on Medical Imaging42, 10 (2022), 2804–2816
work page 2022
-
[36]
Sergios Gatidis, Marcel Früh, Matthias P Fabritius, Sijing Gu, Konstantin Niko- laou, Christian La Fougère, Jin Ye, Junjun He, Yige Peng, Lei Bi, et al . 2024. Results from the autoPET challenge on fully automated lesion segmentation in oncologic PET/CT imaging.Nature Machine Intelligence6, 11 (2024), 1396–1405
work page 2024
-
[37]
Sergios Gatidis, Tobias Hepp, Marcel Früh, Christian La Fougère, Kon- stantin Nikolaou, Christina Pfannenberg, Bernhard Schölkopf, Thomas Küstner, Clemens Cyran, and Daniel Rubin. 2022. A whole-body FDG-PET/CT dataset with manually annotated tumor lesions.Scientific Data9, 1 (2022), 601
work page 2022
-
[38]
Robert Geirhos, Jörn-Henrik Jacobsen, Claudio Michaelis, Richard Zemel, Wieland Brendel, Matthias Bethge, and Felix A Wichmann. 2020. Shortcut learning in deep neural networks.Nature Machine Intelligence2, 11 (2020), 665–673
work page 2020
-
[39]
Hao Guan and Mingxia Liu. 2021. Domain adaptation for medical image analysis: a survey.IEEE Transactions on Biomedical Engineering69, 3 (2021), 1173–1185
work page 2021
-
[40]
Varun Gulshan, Lily Peng, Marc Coram, Martin C Stumpe, Derek Wu, Arunacha- lam Narayanaswamy, Subhashini Venugopalan, Kasumi Widner, Tom Madams, Conference’17, July 2017, Washington, DC, USA Yuting He, Chenyu You, and Shuo Li Jorge Cuadros, et al . 2016. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retin...
work page 2017
-
[41]
Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myro- nenko, Bennett Landman, Holger R Roth, and Daguang Xu. 2022. Unetr: Trans- formers for 3d medical image segmentation. InProceedings of the IEEE/CVF winter conference on applications of computer vision. 574–584
work page 2022
-
[42]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Gir- shick. 2022. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 16000–16009
work page 2022
-
[43]
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. 2020. Momen- tum contrast for unsupervised visual representation learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9729–9738
work page 2020
-
[44]
Yuting He, Fuxiang Huang, Xinrui Jiang, Yuxiang Nie, Minghao Wang, Jiguang Wang, and Hao Chen. 2024. Foundation model for advancing healthcare: Chal- lenges, opportunities and future directions.IEEE Reviews in Biomedical Engi- neering(2024)
work page 2024
-
[45]
Yuting He and Shuo Li. 2025. Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 19827–19837
work page 2025
-
[46]
Yuting He, Guanyu Yang, Jian Yang, Rongjun Ge, Youyong Kong, Xiaomei Zhu, Shaobo Zhang, Pengfei Shao, Huazhong Shu, Jean-Louis Dillenseger, et al
-
[47]
Meta grayscale adaptive network for 3D integrated renal structures segmentation.Medical image analysis71 (2021), 102055
work page 2021
-
[48]
Halyard Health. 2016. Ultrasound Nerve Segmentation. Kaggle. Available at https://www.kaggle.com/c/ultrasound-nerve-segmentation/data
work page 2016
- [49]
-
[50]
Aravind Eye Hospital. 2019. APTOS 2019 Blindness Detection. https://www. kaggle.com/competitions/aptos2019-blindness-detection. Accessed: 2025-02-15
work page 2019
-
[51]
Zhi Huang, Federico Bianchi, Mert Yuksekgonul, Thomas J Montine, and James Zou. 2023. A visual–language foundation model for pathology image analysis using medical twitter.Nature medicine29, 9 (2023), 2307–2316
work page 2023
-
[52]
Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. 2019. Chexpert: A large chest radiograph dataset with uncertainty la- bels and expert comparison. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 590–597
work page 2019
-
[53]
Andrew Janowczyk and Anant Madabhushi. 2016. Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases. Journal of pathology informatics7, 1 (2016), 29
work page 2016
-
[54]
Adrián Javaloy, Maryam Meghdadi, and Isabel Valera. 2022. Mitigating modal- ity collapse in multimodal VAEs via impartial optimization. InInternational Conference on Machine Learning. PMLR, 9938–9964
work page 2022
-
[55]
Jing Jiao, Jin Zhou, Xiaokang Li, Menghua Xia, Yi Huang, Lihong Huang, Na Wang, Xiaofan Zhang, Shichong Zhou, Yuanyuan Wang, et al. 2024. Usfm: A universal ultrasound foundation model generalized to tasks and organs towards label efficient image analysis.Medical image analysis96 (2024), 103202
work page 2024
-
[56]
Kai Jin, Xingru Huang, Jingxing Zhou, Yunxiang Li, Yan Yan, Yibao Sun, Qianni Zhang, Yaqi Wang, and Juan Ye. 2022. Fives: A fundus image dataset for artificial intelligence based vessel segmentation.Scientific data9, 1 (2022), 475
work page 2022
-
[57]
Michael I. Jordan and Robert A. Jacobs. 1994. Hierarchical Mixtures of Experts and the EM Algorithm.Neural Computation6, 2 (1994), 181–214. doi:10.1162/ neco.1994.6.2.181
work page 1994
-
[58]
David N Kennedy, Christian Haselgrove, Steven M Hodge, Pallavi S Rane, Nikos Makris, and Jean A Frazier. 2012. CANDIShare: a resource for pediatric neu- roimaging data.Neuroinformatics10, 3 (2012), 319–322
work page 2012
-
[59]
Daniel S Kermany, Michael Goldbaum, Wenjia Cai, Carolina CS Valentim, Huiy- ing Liang, Sally L Baxter, Alex McKeown, Ge Yang, Xiaokang Wu, Fangbing Yan, et al. 2018. Identifying medical diagnoses and treatable diseases by image-based deep learning.cell172, 5 (2018), 1122–1131
work page 2018
-
[60]
Daisuke Komura, Takumi Onoyama, Koki Shinbo, Hiroto Odaka, Minako Hayakawa, Mieko Ochi, Ranny Rahaningrum Herdiantoputri, Haruya Endo, Hiroto Katoh, Tohru Ikeda, et al. 2023. Restaining-based annotation for can- cer histology segmentation to overcome annotation-related limitations among pathologists.Patterns4, 2 (2023)
work page 2023
-
[61]
Mikhail Kulyabin, Aleksei Zhdanov, Anastasia Nikiforova, Andrey Stepichev, Anna Kuznetsova, Mikhail Ronkin, Vasilii Borisov, Alexander Bogachev, Sergey Korotkich, Paul A Constable, et al. 2024. Octdl: Optical coherence tomography dataset for image-based deep learning methods.Scientific data11, 1 (2024), 365
work page 2024
-
[62]
Nicholas R Kurtansky, Brian M D’Alessandro, Maura C Gillis, Brigid Betz- Stablein, Sara E Cerminara, Rafael Garcia, Marcela Alves Girundi, Elisabeth Vic- toria Goessinger, Philippe Gottfrois, Pascale Guitera, et al. 2024. The SLICE-3D dataset: 400,000 skin lesion image crops extracted from 3D TBP for skin cancer detection.Scientific Data11, 1 (2024), 884
work page 2024
-
[63]
Makerere AI Lab. 2023. Lacuna Malaria Datasets. doi:10.7910/DVN/VEADSE
-
[64]
Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang, Linjie Li, Lijuan Wang, Jianfeng Gao, et al. 2024. Multimodal foundation models: From specialists to general-purpose assistants.Foundations and Trends®in Computer Graphics and Vision16, 1-2 (2024), 1–214
work page 2024
-
[65]
Huafeng Li, Dayong Su, Qing Cai, and Yafei Zhang. 2025. Bsafusion: A bidi- rectional stepwise feature alignment network for unaligned medical image fusion. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 4725–4733
work page 2025
-
[66]
Mingchao Li, Kun Huang, Qiuzhuo Xu, Jiadong Yang, Yuhan Zhang, Zexuan Ji, Keren Xie, Songtao Yuan, Qinghuai Liu, and Qiang Chen. 2024. OCTA-500: a retinal dataset for optical coherence tomography angiography study.Medical image analysis93 (2024), 103092
work page 2024
-
[67]
Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong, Wenhan Luo, Lin Ma, and Min Zhang. 2025. Uni-moe: Scaling unified multimodal llms with mixture of experts.IEEE Transactions on Pattern Analysis and Machine Intelligence(2025)
work page 2025
-
[68]
Wentao Liu, Tong Tian, Lemeng Wang, Weijin Xu, Lei Li, Haoyuan Li, Wenyi Zhao, Siyu Tian, Xipeng Pan, Yiming Deng, et al. 2024. DIAS: A dataset and benchmark for intracranial artery segmentation in DSA sequences.Medical Image Analysis97 (2024), 103247
work page 2024
-
[69]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision. 10012–10022
work page 2021
-
[70]
Vebjorn Ljosa, Katherine L Sokolnicki, and Anne E Carpenter. 2012. Annotated high-throughput microscopy image sets for validation.Nature methods9, 7 (2012), 637
work page 2012
-
[71]
Ilya Loshchilov and Frank Hutter. [n. d.]. Decoupled Weight Decay Regulariza- tion. InInternational Conference on Learning Representations
-
[72]
Meng Lou, Hanning Ying, Xiaoqing Liu, Hong-Yu Zhou, Yuqin Zhang, and Yizhou Yu. 2025. SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion Classification Using 3D Multi-Phase Imaging.Neural Networks (2025), 107228
work page 2025
-
[73]
Ming Y Lu, Bowen Chen, Drew FK Williamson, Richard J Chen, Ivy Liang, Tong Ding, Guillaume Jaume, Igor Odintsov, Long Phi Le, Georg Gerber, et al
-
[74]
A visual-language foundation model for computational pathology.Nature medicine30, 3 (2024), 863–874
work page 2024
-
[75]
DongAo Ma, Jiaxuan Pang, Michael B Gotway, and Jianming Liang. 2025. A fully open AI foundation model applied to chest radiography.Nature(2025), 1–11
work page 2025
-
[76]
Jun Ma, Yuting He, Feifei Li, Lin Han, Chenyu You, and Bo Wang. 2024. Segment anything in medical images.Nature Communications15, 1 (2024), 654
work page 2024
-
[77]
Yuhui Ma, Huaying Hao, Jianyang Xie, Huazhu Fu, Jiong Zhang, Jianlong Yang, Zhen Wang, Jiang Liu, Yalin Zheng, and Yitian Zhao. 2020. ROSE: a retinal OCT- angiography vessel segmentation dataset and new model.IEEE transactions on medical imaging40, 3 (2020), 928–939
work page 2020
-
[78]
Yuxin Ma, Yang Hua, Hanming Deng, Tao Song, Hao Wang, Zhengui Xue, Heng Cao, Ruhui Ma, and Haibing Guan. 2021. Self-supervised vessel segmentation via adversarial learning. Inproceedings of the IEEE/CVF international conference on computer vision. 7536–7545
work page 2021
-
[79]
2025.Skin Lesion Segmentation and Classification Dataset
MakhResearch. 2025.Skin Lesion Segmentation and Classification Dataset. https://huggingface.co/datasets/makhresearch/skin-lesion-segmentation- classification
work page 2025
-
[80]
Anqi Mao, Mehryar Mohri, and Yutao Zhong. 2023. Cross-entropy loss functions: Theoretical analysis and applications. InInternational conference on Machine learning. pmlr, 23803–23828
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.