Recognition: unknown
Uni-Encoder Meets Multi-Encoders: Representation Before Fusion for Brain Tumor Segmentation with Missing Modalities
Pith reviewed 2026-05-08 12:51 UTC · model grok-4.3
The pith
A two-stage architecture pretrains one ViT for representations robust to missing MRI modalities then fuses them with CNN features for brain tumor segmentation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that pretraining a single ViT Uni-Encoder via masked image modeling creates a modality-robust unified representation, which when fused in a second stage with features from modality-specific CNN Multi-Encoders produces more precise segmentations than prior methods under missing-modality conditions on BraTS 2023 and 2024.
What carries the argument
The two-stage heterogeneous architecture in which a pretrained single ViT Uni-Encoder supplies a global representation robust to missing modalities before fusion with fine-grained, multi-scale features from modality-specific CNN Multi-Encoders.
If this is right
- Segmentation accuracy improves when one or more MRI modalities are absent compared with previous single- or multi-encoder approaches.
- The method can exploit any available subset of modalities without retraining or imputation.
- Decoupling the pretrained global representation from local CNN features reduces the trade-off between context modeling and detail preservation.
- The same two-stage pattern could be applied to other multi-modal medical segmentation tasks that suffer from incomplete inputs.
Where Pith is reading between the lines
- The approach might transfer to non-MRI modalities such as CT or PET where data dropout is common in clinical workflows.
- Further gains could come from testing the pretrained Uni-Encoder on larger unlabeled medical image collections before fine-tuning.
- Real-world deployment would require validation on datasets with natural missingness patterns rather than simulated dropout.
Load-bearing premise
That pretraining a single ViT with masked image modeling will produce a representation robust enough to missing modalities that fusing it with CNN-extracted fine-grained features yields precise segmentations without new failure modes.
What would settle it
A controlled test on BraTS data showing that UniME does not exceed the strongest baseline segmentation scores for any specific pattern of missing modalities would falsify the performance claim.
Figures
read the original abstract
Multimodal MRI offers complementary information for brain tumor segmentation, but clinical scans often lack one or more modalities, which degrades segmentation performance. In this paper, we propose UniME (Uni-Encoder Meets Multi-Encoders), a two-stage heterogeneous method for brain tumor segmentation with missing modalities that reconciles the trade-offs among fine-grained structure capture, cross-modal complementarity modeling, and exploitation of available modalities. The idea is to decouple representation learning from segmentation via a two-stage heterogeneous architecture. Stage 1 pretrains a single ViT Uni-Encoder with masked image modeling to establish a unified representation robust to missing modalities. Stage 2 adds modality-specific CNN Multi-Encoders to extract high-resolution, multi-scale, fine-grained features. We fuse these features with the global representation to produce precise segmentations. Experiments on BraTS 2023 and BraTS 2024 show that UniME outperforms previous methods under incomplete multi-modal scenarios. The code is available at https://github.com/Hooorace-S/UniME
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes UniME, a two-stage heterogeneous architecture for brain tumor segmentation under missing MRI modalities. Stage 1 pretrains a single ViT Uni-Encoder via masked image modeling to learn a unified global representation. Stage 2 introduces modality-specific CNN Multi-Encoders to extract high-resolution fine-grained features, which are fused with the ViT representation to produce segmentations. Experiments on BraTS 2023 and BraTS 2024 claim superior performance relative to prior methods in incomplete multi-modal scenarios, with code released at https://github.com/Hooorace-S/UniME.
Significance. If the performance gains are shown to stem from the claimed unified representation rather than the CNN branch alone, the two-stage decoupling of representation learning from fine-grained feature extraction could provide a practical template for handling missing data in clinical multimodal imaging. The public code release supports reproducibility.
major comments (2)
- [Abstract and Section 3.1] Abstract and Section 3.1: The statement that masked image modeling pretraining produces a representation 'robust to missing modalities' is not supported by a description of whether entire modalities (as opposed to random patches within available images) are dropped during pretraining. Standard MIM does not simulate full-modality absence, so the robustness claim requires explicit justification or an ablation that isolates the ViT under complete modality dropout at inference time.
- [Section 4] Section 4: The reported outperformance on BraTS 2023/2024 lacks error bars, component-wise ablations (e.g., ViT-only vs. full UniME under modality dropout), and statistical tests. Without these, it is impossible to attribute gains to the two-stage design rather than the CNN encoders alone, weakening the central empirical claim.
minor comments (1)
- [Abstract] The abstract would benefit from specifying the exact missing-modality combinations evaluated and the number of modalities involved.
Simulated Author's Rebuttal
We thank the referee for the insightful comments and the recommendation for major revision. We address each major comment below and outline the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [Abstract and Section 3.1] Abstract and Section 3.1: The statement that masked image modeling pretraining produces a representation 'robust to missing modalities' is not supported by a description of whether entire modalities (as opposed to random patches within available images) are dropped during pretraining. Standard MIM does not simulate full-modality absence, so the robustness claim requires explicit justification or an ablation that isolates the ViT under complete modality dropout at inference time.
Authors: We acknowledge that the pretraining procedure in Stage 1 relies on standard masked image modeling applied to the input images, without explicitly dropping entire modalities during pretraining. The claim of robustness stems from the unified ViT encoder learning global representations from diverse multimodal data. To strengthen this, we will revise the abstract and Section 3.1 to clarify the pretraining details and include a new ablation study that evaluates the ViT encoder in isolation under scenarios with complete modality dropout at inference time. This will provide the requested justification. revision: yes
-
Referee: [Section 4] Section 4: The reported outperformance on BraTS 2023/2024 lacks error bars, component-wise ablations (e.g., ViT-only vs. full UniME under modality dropout), and statistical tests. Without these, it is impossible to attribute gains to the two-stage design rather than the CNN encoders alone, weakening the central empirical claim.
Authors: We agree with the referee that additional experimental details are necessary to robustly support our claims. In the revised manuscript, we will add error bars to all reported results, perform and report component-wise ablations including ViT-only and full UniME configurations under various modality missing patterns, and include statistical tests to compare against prior methods. These additions will help demonstrate that the performance improvements arise from the proposed two-stage heterogeneous architecture. revision: yes
Circularity Check
No significant circularity
full rationale
The paper describes a two-stage heterogeneous architecture (ViT Uni-Encoder pretrained via masked image modeling, then fused with modality-specific CNN Multi-Encoders) and reports empirical outperformance on external BraTS 2023/2024 benchmarks. No equations, derivations, or mathematical claims appear in the provided text. Performance assertions rest on dataset experiments rather than internal definitions or fitted parameters renamed as predictions. No load-bearing self-citations, uniqueness theorems, or ansatzes are invoked. The derivation chain is therefore self-contained against external validation.
Axiom & Free-Parameter Ledger
free parameters (1)
- training hyperparameters and fusion weights
axioms (1)
- domain assumption Masked image modeling on a single ViT produces representations robust to arbitrary modality absence
Reference graph
Works this paper leans on
-
[1]
The brain tumor seg- mentation (brats) challenge 2023: Glioma segmentation in sub-saharan africa patient population (brats-africa)
Maruf Adewole, Jeffrey D Rudie, Anu Gbdamosi, Oluyemisi Toyobo, Confidence Raymond, Dong Zhang, Olubukola Omidiji, Rachel Akinola, Mohammad Abba Suwaid, Adaobi Emegoakor, et al. The brain tumor seg- mentation (brats) challenge 2023: Glioma segmentation in sub-saharan africa patient population (brats-africa). ArXiv, pages arXiv–2305, 2023. 5
2023
-
[2]
Self-supervised learning from im- ages with a joint-embedding predictive architecture
Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bo- janowski, Pascal Vincent, Michael Rabbat, Yann LeCun, and Nicolas Ballas. Self-supervised learning from im- ages with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15619–15629,
-
[3]
Smu-net: Style matching u-net for brain tumor segmentation with missing modalities
Reza Azad, Nika Khosravi, and Dorit Merhof. Smu-net: Style matching u-net for brain tumor segmentation with missing modalities. InInternational Conference on Med- ical Imaging with Deep Learning, pages 48–62. PMLR,
-
[4]
Spyridon Bakas, Mauricio Reyes, Andras Jakab, Ste- fan Bauer, Markus Rempfler, Alessandro Crimi, Rus- sell Takeshi Shinohara, Christoph Berger, Sung Min Ha, Martin Rozycki, et al. Identifying the best machine learn- ing algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge.arXiv preprint ar...
work page Pith review arXiv 2018
-
[5]
2-d ssm: A general spatial layer for visual transformers
Ethan Baron, Itamar Zimerman, and Lior Wolf. 2-d ssm: A general spatial layer for visual transformers.arXiv preprint arXiv:2306.06635, 2023. 3
-
[6]
Swin-unet: Unet-like pure transformer for medical image segmenta- tion
Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xi- aopeng Zhang, Qi Tian, and Manning Wang. Swin-unet: Unet-like pure transformer for medical image segmenta- tion. InEuropean conference on computer vision, pages 205–218. Springer, 2022. 2
2022
-
[7]
Emerging properties in self-supervised vision transformers
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerging properties in self-supervised vision transformers. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 9650–9660,
-
[8]
Transunet: Rethinking the u- net architecture design for medical image segmentation through the lens of transformers.Medical Image Analy- sis, page 103280, 2024
Jieneng Chen, Jieru Mei, Xianhang Li, Yongyi Lu, Qi- hang Yu, Qingyue Wei, Xiangde Luo, Yutong Xie, Ehsan Adeli, Yan Wang, et al. Transunet: Rethinking the u- net architecture design for medical image segmentation through the lens of transformers.Medical Image Analy- sis, page 103280, 2024. 2
2024
-
[9]
Confidence- weighted mutual supervision on dual networks for un- supervised cross-modality image segmentation.Science China Information Sciences, 66(11):210104, 2023
Yajie Chen, Xin Yang, and Xiang Bai. Confidence- weighted mutual supervision on dual networks for un- supervised cross-modality image segmentation.Science China Information Sciences, 66(11):210104, 2023. 1
2023
-
[10]
FlashAttention-2: Faster attention with better parallelism and work partitioning
Tri Dao. FlashAttention-2: Faster attention with better parallelism and work partitioning. InInternational Con- ference on Learning Representations (ICLR), 2024. 3
2024
-
[11]
Fu, Stefano Ermon, Atri Rudra, and Christopher R ´e
Tri Dao, Daniel Y . Fu, Stefano Ermon, Atri Rudra, and Christopher R ´e. FlashAttention: Fast and memory- efficient exact attention with IO-awareness. InAdvances in Neural Information Processing Systems (NeurIPS),
-
[12]
Vision Transformers Need Registers
Timoth ´ee Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers. arXiv preprint arXiv:2309.16588, 2023. 3
work page internal anchor Pith review arXiv 2023
-
[13]
Language modeling with gated convolutional networks
Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. InInternational conference on machine learn- ing, pages 933–941. PMLR, 2017. 3
2017
-
[14]
Maria Correia de Verdier, Rachit Saluja, Louis Gagnon, Dominic LaBella, Ujjwall Baid, Nourel Hoda Tahon, Martha Foltyn-Dumitru, Jikai Zhang, Maram Alafif, Saif Baig, et al. The 2024 brain tumor segmentation (brats) challenge: Glioma segmentation on post-treatment mri. arXiv preprint arXiv:2405.18368, 2024. 5
-
[15]
Rfnet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation
Yuhang Ding, Xin Yu, and Yi Yang. Rfnet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation. InProceedings of the IEEE/CVF interna- tional conference on computer vision, pages 3975–3984,
-
[16]
Hetero-modal varia- tional encoder-decoder for joint modality completion and segmentation
Reuben Dorent, Samuel Joutard, Marc Modat, S ´ebastien Ourselin, and Tom Vercauteren. Hetero-modal varia- tional encoder-decoder for joint modality completion and segmentation. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 74–82. Springer, 2019. 6
2019
-
[17]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 3
work page internal anchor Pith review arXiv 2010
-
[18]
The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models.arXiv e-prints, pages arXiv–2407, 2024. 4
2024
-
[19]
Eva-02: A visual represen- tation for neon genesis.Image and Vision Computing, 149:105171, 2024
Yuxin Fang, Quan Sun, Xinggang Wang, Tiejun Huang, Xinlong Wang, and Yue Cao. Eva-02: A visual represen- tation for neon genesis.Image and Vision Computing, 149:105171, 2024. 2, 3, 4
2024
-
[20]
Act3d: Infinite resolution action detection transformer for robotic manipulation
Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, and Katerina Fragkiadaki. Act3d: 3d feature field transform- ers for multi-task robotic manipulation.arXiv preprint arXiv:2306.17817, 2023. 3, 4
-
[21]
Body mri ar- tifacts in clinical practice: a physicist’s and radiologist’s perspective.Journal of Magnetic Resonance Imaging, 38 (2):269–287, 2013
Martin J Graves and Donald G Mitchell. Body mri ar- tifacts in clinical practice: a physicist’s and radiologist’s perspective.Journal of Magnetic Resonance Imaging, 38 (2):269–287, 2013. 1
2013
-
[22]
Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images
Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger R Roth, and Daguang Xu. Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images. InInternational MICCAI brain- lesion workshop, pages 272–284. Springer, 2021. 2, 3
2021
-
[23]
Hemis: Hetero-modal im- age segmentation
Mohammad Havaei, Nicolas Guizard, Nicolas Chapa- dos, and Yoshua Bengio. Hemis: Hetero-modal im- age segmentation. InInternational conference on med- ical image computing and computer-assisted interven- tion, pages 469–477. Springer, 2016. 1, 2, 6
2016
-
[24]
Masked autoencoders are scalable vision learners
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Pi- otr Doll´ar, and Ross Girshick. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022. 2, 4
2022
-
[25]
Your vit is secretly an image segmentation model
Tommie Kerssies, Niccolo Cavagnero, Alexander Her- mans, Narges Norouzi, Giuseppe Averta, Bastian Leibe, Gijs Dubbelman, and Daan de Geus. Your vit is secretly an image segmentation model. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 25303–25313, 2025. 2
2025
-
[26]
Adaptive latent diffu- sion model for 3d medical image to image translation: Multi-modal magnetic resonance imaging study
Jonghun Kim and Hyunjin Park. Adaptive latent diffu- sion model for 3d medical image to image translation: Multi-modal magnetic resonance imaging study. InPro- ceedings of the IEEE/CVF Winter conference on appli- cations of computer Vision, pages 7604–7613, 2024. 1, 2
2024
-
[27]
Segment anything
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. Segment anything. InProceedings of the IEEE/CVF international conference on computer vision, pages 4015–4026, 2023. 2
2023
-
[28]
Artifacts in magnetic resonance imaging.Polish journal of radiology, 80:93, 2015
Katarzyna Krupa and Monika Bekiesi ´nska-Figatowska. Artifacts in magnetic resonance imaging.Polish journal of radiology, 80:93, 2015. 1
2015
-
[29]
As- sessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks.Na- ture Machine Intelligence, 2(1):34–42, 2020
Dongwook Lee, Won-Jin Moon, and Jong Chul Ye. As- sessing the importance of magnetic resonance contrasts using collaborative generative adversarial networks.Na- ture Machine Intelligence, 2(1):34–42, 2020. 1, 2
2020
-
[30]
Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, et al. Deepseek-v3 tech- nical report.arXiv preprint arXiv:2412.19437, 2024. 4
work page internal anchor Pith review arXiv 2024
-
[31]
M3ae: Multimodal rep- resentation learning for brain tumor segmentation with missing modalities
Hong Liu, Dong Wei, Donghuan Lu, Jinghan Sun, Lian- sheng Wang, and Yefeng Zheng. M3ae: Multimodal rep- resentation learning for brain tumor segmentation with missing modalities. InProceedings of the AAAI confer- ence on artificial intelligence, pages 1657–1665, 2023. 2, 3, 5, 6
2023
-
[32]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fer- nandez, Daniel Haziza, Francisco Massa, Alaaeldin El- Nouby, et al. Dinov2: Learning robust visual features without supervision.arXiv preprint arXiv:2304.07193,
work page internal anchor Pith review arXiv
-
[33]
Im-fuse: A mamba-based fusion block for brain tu- mor segmentation with incomplete modalities
Vittorio Pipoli, Alessia Saporita, Kevin Marchesini, Costantino Grana, Elisa Ficarra, and Federico Bolelli. Im-fuse: A mamba-based fusion block for brain tu- mor segmentation with incomplete modalities. InIn- ternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 225–235. Springer, 2025. 1, 2, 3, 5, 6
2025
-
[34]
Scratch each other’s back: In- complete multi-modal brain tumor segmentation via cat- egory aware group self-support learning
Yansheng Qiu, Delin Chen, Hongdou Yao, Yongchao Xu, and Zheng Wang. Scratch each other’s back: In- complete multi-modal brain tumor segmentation via cat- egory aware group self-support learning. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 21317–21326, 2023. 1, 2
2023
-
[35]
SAM 2: Segment Anything in Images and Videos
Nikhila Ravi, Valentin Gabeur, Yuan-Ting Hu, Rong- hang Hu, Chaitanya Ryali, Tengyu Ma, Haitham Khedr, Roman R ¨adle, Chloe Rolland, Laura Gustafson, et al. Sam 2: Segment anything in images and videos.arXiv preprint arXiv:2408.00714, 2024. 2
work page internal anchor Pith review arXiv 2024
-
[36]
U-net: Convolutional networks for biomedical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medi- cal image computing and computer-assisted interven- tion, pages 234–241. Springer, 2015. 2
2015
-
[37]
Medical image segmentation: A review of modern architectures
Natalia Salpea, Paraskevi Tzouveli, and Dimitrios Kol- lias. Medical image segmentation: A review of modern architectures. InEuropean Conference on Computer Vi- sion, pages 691–708. Springer, 2022. 3
2022
-
[38]
GLU Variants Improve Transformer
Noam Shazeer. Glu variants improve transformer.arXiv preprint arXiv:2002.05202, 2020. 3, 4
work page internal anchor Pith review arXiv 2002
-
[39]
Mftrans: Modality-masked fusion transformer for incomplete multi-modality brain tumor segmentation.IEEE journal of biomedical and health informatics, 28(1):379–390, 2023
Junjie Shi, Li Yu, Qimin Cheng, Xin Yang, Kwang-Ting Cheng, and Zengqiang Yan. Mftrans: Modality-masked fusion transformer for incomplete multi-modality brain tumor segmentation.IEEE journal of biomedical and health informatics, 28(1):379–390, 2023. 1, 2, 6
2023
-
[40]
Oriane Sim ´eoni, Huy V V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha¨el Ramamonjisoa, et al. Dinov3.arXiv preprint arXiv:2508.10104, 2025. 2
work page internal anchor Pith review arXiv 2025
-
[41]
Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced trans- former with rotary position embedding.Neurocomput- ing, 568:127063, 2024. 3, 4
2024
-
[42]
Missing modalities imputation via cascaded residual au- toencoder
Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. Missing modalities imputation via cascaded residual au- toencoder. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 1405– 1414, 2017. 2
2017
-
[43]
Joint sequence learning and cross- modality convolution for 3d biomedical segmentation
Kuan-Lun Tseng, Yen-Liang Lin, Winston Hsu, and Chung-Yang Huang. Joint sequence learning and cross- modality convolution for 3d biomedical segmentation. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 6393–6400, 2017. 1
2017
-
[44]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017. 3
2017
-
[45]
Primus: Enforcing Attention Usage for 3D Medical Image Segmentation
Tassilo Wald, Saikat Roy, Fabian Isensee, Constantin Ul- rich, Sebastian Ziegler, Dasha Trofimova, Raphael Stock, Michael Baumgartner, Gregor K¨ohler, and Klaus Maier- Hein. Primus: Enforcing attention usage for 3d medical image segmentation.arXiv preprint arXiv:2503.01835,
work page internal anchor Pith review Pith/arXiv arXiv
-
[46]
Multi-modal learn- ing with missing modality via shared-specific feature modelling
Hu Wang, Yuanhong Chen, Congbo Ma, Jodie Avery, Louise Hull, and Gustavo Carneiro. Multi-modal learn- ing with missing modality via shared-specific feature modelling. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 15878–15887, 2023. 1, 2
2023
-
[47]
Eca-net: Effi- cient channel attention for deep convolutional neural net- works
Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, and Qinghua Hu. Eca-net: Effi- cient channel attention for deep convolutional neural net- works. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11534– 11542, 2020. 5
2020
-
[48]
Lt-net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation
Shuxin Wang, Shilei Cao, Dong Wei, Renzhen Wang, Kai Ma, Liansheng Wang, Deyu Meng, and Yefeng Zheng. Lt-net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pages 9162–9171, 2020. 2
2020
-
[49]
Acn: Adversarial co-training network for brain tumor segmentation with missing modalities
Yixin Wang, Yang Zhang, Yang Liu, Zihao Lin, Jiang Tian, Cheng Zhong, Zhongchao Shi, Jianping Fan, and Zhiqiang He. Acn: Adversarial co-training network for brain tumor segmentation with missing modalities. InInternational conference on medical image comput- ing and computer-assisted intervention, pages 410–420. Springer, 2021. 1, 2
2021
-
[50]
Masked feature pre- diction for self-supervised visual pre-training
Chen Wei, Haoqi Fan, Saining Xie, Chao-Yuan Wu, Alan Yuille, and Christoph Feichtenhofer. Masked feature pre- diction for self-supervised visual pre-training. InPro- ceedings of the IEEE/CVF conference on computer vi- sion and pattern recognition, pages 14668–14678, 2022. 2
2022
-
[51]
Mmanet: Margin-aware distillation and modality-aware regular- ization for incomplete multimodal learning
Shicai Wei, Chunbo Luo, and Yang Luo. Mmanet: Margin-aware distillation and modality-aware regular- ization for incomplete multimodal learning. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20039–20049, 2023. 1, 2
2023
-
[52]
Transformers in medical image segmen- tation: A review.Biomedical Signal Processing and Con- trol, 84:104791, 2023
Hanguang Xiao, Li Li, Qiyuan Liu, Xiuhong Zhu, and Qihang Zhang. Transformers in medical image segmen- tation: A review.Biomedical Signal Processing and Con- trol, 84:104791, 2023. 3
2023
-
[53]
Simmim: A simple framework for masked image modeling
Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jian- min Bao, Zhuliang Yao, Qi Dai, and Han Hu. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9653–9663, 2022. 2
2022
-
[54]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chen- gen Huang, Chenxu Lv, et al. Qwen3 technical report. arXiv preprint arXiv:2505.09388, 2025. 4
work page internal anchor Pith review arXiv 2025
-
[55]
Learning uni- fied hyper-network for multi-modal mr image synthesis and tumor segmentation with missing modalities.IEEE Transactions on Medical Imaging, 42(12):3678–3689,
Heran Yang, Jian Sun, and Zongben Xu. Learning uni- fied hyper-network for multi-modal mr image synthesis and tumor segmentation with missing modalities.IEEE Transactions on Medical Imaging, 42(12):3678–3689,
-
[56]
Car- diac image segmentation by random walks with dynamic shape constraint.IET Computer Vision, 10(1):79–86,
Xulei Yang, Yi Su, Rubing Duan, Haijin Fan, Si Yong Yeo, Calvin Lim, Liang Zhong, and Ru San Tan. Car- diac image segmentation by random walks with dynamic shape constraint.IET Computer Vision, 10(1):79–86,
-
[57]
Level set segmentation with robust im- age gradient energy and statistical shape prior
Si Yong Yeo, Xianghua Xie, Igor Sazonov, and Peru- mal Nithiarasu. Level set segmentation with robust im- age gradient energy and statistical shape prior. In2011 18th IEEE International Conference on Image Process- ing, pages 3397–3400. IEEE, 2011. 2
2011
-
[58]
Ziqi Yu, Xiaoyang Han, Shengjie Zhang, Jianfeng Feng, Tingying Peng, and Xiao-Yong Zhang. Mousegan++: unsupervised disentanglement and contrastive represen- tation for multiple mri modalities synthesis and struc- tural segmentation of mouse brain.IEEE Transactions on Medical Imaging, 42(4):1197–1209, 2022. 1, 2
2022
-
[59]
Ex- ploring task structure for brain tumor segmentation from multi-modality mr images.IEEE Transactions on Image Processing, 29:9032–9043, 2020
Dingwen Zhang, Guohai Huang, Qiang Zhang, Jungong Han, Junwei Han, Yizhou Wang, and Yizhou Yu. Ex- ploring task structure for brain tumor segmentation from multi-modality mr images.IEEE Transactions on Image Processing, 29:9032–9043, 2020. 1
2020
-
[60]
Residual connections harm generative representation learning.arXiv preprint arXiv:2404.10947, 2024
Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Wil- lett, and Michael Maire. Residual connections harm generative representation learning.arXiv preprint arXiv:2404.10947, 2024. 2
-
[61]
M 2segmamba: Mamba-based incom- plete multimodal learning for brain tumor segmentation with few samples.IEEE Journal of Biomedical and Health Informatics, 2025
Xinyue Zhang, Ali Bahri, Christian Desrosiers, Hui Liu, and Fangxun Bao. M 2segmamba: Mamba-based incom- plete multimodal learning for brain tumor segmentation with few samples.IEEE Journal of Biomedical and Health Informatics, 2025. 2, 3, 5, 6
2025
-
[62]
mmformer: Multimodal medical trans- former for incomplete multimodal learning of brain tu- mor segmentation
Yao Zhang, Nanjun He, Jiawei Yang, Yuexiang Li, Dong Wei, Yawen Huang, Yang Zhang, Zhiqiang He, and Yefeng Zheng. mmformer: Multimodal medical trans- former for incomplete multimodal learning of brain tu- mor segmentation. InInternational conference on med- ical image computing and computer-assisted interven- tion, pages 107–117. Springer, 2022. 1, 2, 6
2022
-
[63]
Incomplete multi- modal brain tumor segmentation via learnable sorting state space model
Zheyu Zhang, Yayuan Lu, Feipeng Ma, Yueyi Zhang, Huanjing Yue, and Xiaoyan Sun. Incomplete multi- modal brain tumor segmentation via learnable sorting state space model. InProceedings of the Computer Vi- sion and Pattern Recognition Conference, pages 25982– 25992, 2025. 1, 2, 3, 5, 6
2025
-
[64]
Data augmentation using learned transformations for one-shot medical image seg- mentation
Amy Zhao, Guha Balakrishnan, Fredo Durand, John V Guttag, and Adrian V Dalca. Data augmentation using learned transformations for one-shot medical image seg- mentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8543– 8553, 2019. 2
2019
-
[65]
Hi-net: hybrid-fusion network for multi- modal mr image synthesis.IEEE transactions on med- ical imaging, 39(9):2772–2781, 2020
Tao Zhou, Huazhu Fu, Geng Chen, Jianbing Shen, and Ling Shao. Hi-net: hybrid-fusion network for multi- modal mr image synthesis.IEEE transactions on med- ical imaging, 39(9):2772–2781, 2020. 1, 2
2020
-
[66]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang, Xinlong Wang, Wenyu Liu, and Xinggang Wang. Vision mamba: Efficient visual representation learning with bidirectional state space model.arXiv preprint arXiv:2401.09417,
work page internal anchor Pith review arXiv
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.