Mtextsuperscript{4}Fuse: Lightweight State-Space MoE with a Cross-Scale Gating Bridge for Brain Tumor Segmentation
Pith reviewed 2026-05-08 18:35 UTC · model grok-4.3
The pith
M4Fuse delivers higher brain tumor segmentation accuracy with 63 percent fewer parameters by using state-space mixing and sample-level experts even at half the usual input resolution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
M4Fuse prioritizes discriminative brain tumor cues over exhaustive appearance reconstruction by balancing encoder and decoder capacity, propagating long-range context with linear complexity via a grouped state-space mixer, denoising and aligning skip features with a cross-scale dual-stage gating bridge, and absorbing cross-site acquisition shifts with a sample-level mixture-of-experts, achieving superior parameter-to-accuracy efficiency on BraTS2019 and BraTS2021 even at the reduced input resolution of 64x128x128.
What carries the argument
The synergistic combination of grouped state-space mixer, cross-scale dual-stage gating bridge, and sample-level mixture-of-experts that together replace depth expansion while preserving long-range context and shift robustness.
If this is right
- Accurate segmentation remains possible at input volumes half the size used by prior lightweight models.
- Parameter counts drop by more than 60 percent relative to other high-performing lightweight networks on the same benchmarks.
- Average segmentation performance improves by 0.09 percent despite the reduced model size.
- Component ablations confirm that each of the three core modules contributes measurably to the observed efficiency.
Where Pith is reading between the lines
- The architecture could be adapted to other 3D medical segmentation tasks that suffer from scanner variability, such as liver or prostate imaging.
- Lower memory footprint opens the possibility of running full 3D inference on edge devices in operating rooms or portable scanners.
- The linear-complexity mixer may scale to higher-resolution volumes without the quadratic cost growth typical of attention-based alternatives.
Load-bearing premise
That the specific grouping of state-space mixing, cross-scale gating, and sample-level expert routing will continue to produce efficiency and accuracy gains on data from unseen scanners or acquisition protocols.
What would settle it
A head-to-head comparison on a new multi-center brain tumor dataset acquired with different scanners where M4Fuse requires more parameters than the next-best lightweight model to reach equal Dice scores.
Figures
read the original abstract
Encoder-decoder imbalance and the reliance on large input volumes make many 3D brain tumor segmentation models both compute-heavy and brittle. We present M\textsuperscript{4}Fuse, a lightweight network that prioritizes discriminative brain tumor cues over exhaustive appearance reconstruction. Our method balances encoder and decoder capacity and replaces depth expansion with a synergistic design: it propagates long-range context with linear complexity via a grouped state space mixer, denoises and aligns skip features using a cross-scale dual-stage gating bridge, and absorbs cross-site acquisition shifts with a sample-level mixture-of-experts. On the BraTS2019 and BraTS2021 benchmarks, M\textsuperscript{4}Fuse outperforms other lightweight excellent methods in both parameter count and performance. Even at a challenging input resolution of \(64\times128\times128\) (half that of existing excellent models), M\textsuperscript{4}Fuse reduces parameters by 62.63\% and improves average performance by 0.09\%. Ablations of key components validate the method's exceptional parameter-to-accuracy efficiency and robustness across diverse data centers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes M⁴Fuse, a lightweight encoder-decoder architecture for 3D brain tumor segmentation. It combines a grouped state-space mixer for long-range context with linear complexity, a cross-scale dual-stage gating bridge for denoising and aligning skip connections, and a sample-level mixture-of-experts to manage cross-site variations. The central claims are that it outperforms other lightweight methods on BraTS2019 and BraTS2021 in both accuracy and parameter count, and that even at a reduced input resolution of 64×128×128 it achieves a 62.63% parameter reduction while improving average performance by 0.09%.
Significance. Should the efficiency and accuracy claims be substantiated with rigorous statistical evidence and reproducible experiments, this approach could meaningfully advance the development of computationally efficient models for volumetric medical image segmentation, particularly in settings with limited computational resources or variable data acquisition protocols. The integration of state-space models with MoE and gating mechanisms offers a promising direction for balancing model capacity in 3D tasks.
major comments (2)
- [Abstract] The reported 0.09% improvement in average performance at the challenging 64×128×128 resolution lacks any mention of error bars, standard deviations from multiple runs, or statistical significance testing. Since BraTS segmentation metrics typically exhibit run-to-run or cross-validation variances of 0.5–2%, this small gain cannot be confidently distinguished from noise without additional analysis, directly impacting the validity of the outperformance claim.
- [Abstract] The comparison at halved resolution does not specify whether the competing lightweight models were evaluated under identical input conditions or if their architectures were adapted accordingly. Without such matched baselines or a dedicated table detailing per-model performance at 64×128×128, the 62.63% parameter reduction and performance delta may not be directly comparable.
minor comments (2)
- The abstract uses the phrase 'lightweight excellent methods,' which is imprecise; rephrasing to 'other lightweight state-of-the-art methods' would improve clarity.
- While ablations are mentioned as validating the components, the abstract does not summarize the key quantitative findings from these ablations, which would strengthen the presentation of the synergistic design.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and will revise the manuscript to strengthen the statistical rigor and experimental clarity of our claims.
read point-by-point responses
-
Referee: [Abstract] The reported 0.09% improvement in average performance at the challenging 64×128×128 resolution lacks any mention of error bars, standard deviations from multiple runs, or statistical significance testing. Since BraTS segmentation metrics typically exhibit run-to-run or cross-validation variances of 0.5–2%, this small gain cannot be confidently distinguished from noise without additional analysis, directly impacting the validity of the outperformance claim.
Authors: We agree that the reported 0.09% average improvement requires supporting statistical evidence to substantiate the outperformance claim. In the revised manuscript we will add mean and standard deviation values computed over multiple independent training runs (with different random seeds), along with the results of paired statistical significance tests (e.g., Wilcoxon signed-rank or paired t-test) against the strongest baseline. These additions will appear both in the abstract and in an expanded results table, allowing readers to evaluate whether the observed delta exceeds typical BraTS variance. revision: yes
-
Referee: [Abstract] The comparison at halved resolution does not specify whether the competing lightweight models were evaluated under identical input conditions or if their architectures were adapted accordingly. Without such matched baselines or a dedicated table detailing per-model performance at 64×128×128, the 62.63% parameter reduction and performance delta may not be directly comparable.
Authors: The referee correctly identifies an ambiguity in the abstract. All competing lightweight models were evaluated at the identical 64×128×128 input resolution without architectural modifications, ensuring a matched comparison. We will revise the abstract to state this explicitly and insert a new supplementary table that reports Dice scores, parameter counts, and FLOPs for every baseline at this resolution. This will make both the 62.63% parameter reduction and the 0.09% performance delta directly interpretable. revision: yes
Circularity Check
No circularity; empirical claims rest on external BraTS benchmarks without self-referential derivations
full rationale
The paper introduces an architectural design (grouped state-space mixer, cross-scale gating bridge, sample-level MoE) for 3D segmentation and validates it via direct comparison to external BraTS2019/BraTS2021 leaderboards and parameter counts at fixed resolutions. No equations, uniqueness theorems, or first-principles derivations appear that reduce claimed performance deltas to quantities defined inside the paper by construction. Ablation results and benchmark scores are presented as independent measurements rather than tautological predictions. Any internal self-citations (if present in the full text) are not load-bearing for the headline numeric claims, which remain falsifiable against public datasets.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Yasmina Al Khalil, Sina Amirrajab, Cristian Lorenz, J ¨urgen Weese, Josien Pluim, and Marcel Breeuwer. Reducing seg- mentation failures in cardiac mri via late feature fusion and gan-based augmentation.Computers in Biology and Medicine, 161:106973, 2023. 2
work page 2023
-
[2]
Dianlong An, Panpan Liu, Yan Feng, Pengju Ding, Weifeng Zhou, and Bin Yu. Dynamic weighted knowledge distilla- tion for brain tumor segmentation.Pattern Recognition, 155: 110731, 2024. 1
work page 2024
-
[3]
Jeffrey G Andrews, Sarabjot Singh, Qiaoyang Ye, Xingqin Lin, and Harpreet S Dhillon. An overview of load balancing in hetnets: Old myths and open problems.IEEE Wireless Communications, 21(2):18–25, 2014. 2
work page 2014
-
[4]
Guogang Cao, Zhaojun Yang, Wanying Liang, Sai Zhang, Tao Zhong, Hongdong Mao, Dong Wang, and Ming Zong. Lcmf-net: A lightweight collaborative multimodal fusion network for brain tumor segmentation.Neural Networks, page 108257, 2025. 3
work page 2025
-
[5]
Moe-lightning: High-throughput moe inference on memory-constrained gpus
Shiyi Cao, Shu Liu, Tyler Griggs, Peter Schafhalter, Xiaox- uan Liu, Ying Sheng, Joseph E Gonzalez, Matei Zaharia, and Ion Stoica. Moe-lightning: High-throughput moe inference on memory-constrained gpus. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, pages 715–730, 2025. 2
work page 2025
-
[6]
Lienkamp, Thomas Brox, and Olaf Ronneberger
¨Ozg¨un C ¸ ic ¸ek, Ahmed Abdulkadir, Soeren S. Lienkamp, Thomas Brox, and Olaf Ronneberger. 3d u-net: Learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Interven- tion – MICCAI 2016, pages 424–432, Cham, 2016. Springer International Publishing. 2, 6
work page 2016
-
[7]
Rizhi Ding, Hui Lu, and Manhua Liu. Denseformer-moe: A dense transformer foundation model with mixture of experts for multi-task brain image analysis.IEEE Transactions on Medical Imaging, 2025. 2
work page 2025
-
[8]
Yongkang Ding, Xiaoyin Wang, Hao Yuan, Meina Qu, and Xiangzhou Jian. Decoupling feature-driven and mul- timodal fusion attention for clothing-changing person re- identification.Artificial Intelligence Review, 58(8):241,
-
[9]
Maximum score rout- ing for mixture-of-experts
Bowen Dong, Yilong Fan, Yutao Sun, Zhenyu Li, Tengyu Pan, Zhou Xun, and Jianyong Wang. Maximum score rout- ing for mixture-of-experts. InFindings of the Association for Computational Linguistics: ACL 2025, pages 12619–12632,
work page 2025
-
[10]
Mixture-of-experts for semantic segmentation of re- moting sensing image
Shaofeng He, Qiu Cheng, Yu Huai, Zhongke Zhu, and Jie Ding. Mixture-of-experts for semantic segmentation of re- moting sensing image. InInternational Conference on Image Processing and Artificial Intelligence (ICIPAl 2024), pages 478–483. SPIE, 2024. 2
work page 2024
-
[11]
Yufan He, Vishwesh Nath, Dong Yang, Yucheng Tang, An- driy Myronenko, and Daguang Xu. Swinunetr-v2: Stronger swin transformers with stagewise convolutions for 3d medi- cal image segmentation. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2023, pages 416– 426, Cham, 2023. Springer Nature Switzerland. 2, 6
work page 2023
-
[12]
Fabian Isensee, Paul F Jaeger, Simon AA Kohl, Jens Pe- tersen, and Klaus H Maier-Hein. nnu-net: a self-configuring method for deep learning-based biomedical image segmen- tation.Nature methods, 18(2):203–211, 2021. 3, 6
work page 2021
-
[13]
Huafeng Li, Zengyi Yang, Yafei Zhang, Wei Jia, Zheng- tao Yu, and Yu Liu. Mulfs-cap: Multimodal fusion- supervised cross-modality alignment perception for unreg- istered infrared-visible image fusion.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025. 2
work page 2025
-
[14]
THOR-MoE: Hierarchical task-guided and context-responsive routing for neural machine translation
Yunlong Liang, Fandong Meng, and Jie Zhou. THOR-MoE: Hierarchical task-guided and context-responsive routing for neural machine translation. InProceedings of the 63rd An- nual Meeting of the Association for Computational Linguis- tics (Volume 1: Long Papers), pages 21433–21445, Vienna, Austria, 2025. Association for Computational Linguistics. 2
work page 2025
-
[15]
Lightm-unet: Mamba assists in lightweight unet for medical image segmentation,
Weibin Liao, Yinghao Zhu, Xinyuan Wang, Chengwei Pan, Yasha Wang, and Liantao Ma. Lightm-unet: Mamba assists in lightweight unet for medical image segmentation.arXiv preprint arXiv:2403.05246, 2024. 2, 6
-
[16]
Huabing Liu, Zhengze Ni, Dong Nie, Dinggang Shen, Jinda Wang, and Zhenyu Tang. Multimodal brain tumor segmen- tation boosted by monomodal normal brain images.IEEE Transactions on Image Processing, 33:1199–1210, 2024. 1
work page 2024
-
[17]
Xiao Liu, Peng Gao, Tao Yu, Fei Wang, and Ru-Yue Yuan. Cswin-unet: Transformer unet with cross-shaped windows for medical image segmentation.Information Fusion, 113: 102634, 2025. 1
work page 2025
-
[18]
Paddleseg: A high-efficient development toolkit for image segmentation, 2021
Yi Liu, Lutao Chu, Guowei Chen, Zewu Wu, Zeyu Chen, Baohua Lai, and Yuying Hao. Paddleseg: A high-efficient development toolkit for image segmentation, 2021. 2, 7
work page 2021
-
[19]
3d mri brain tumor segmentation using autoencoder regularization
Andriy Myronenko. 3d mri brain tumor segmentation using autoencoder regularization. InBrainlesion: Glioma, Mul- tiple Sclerosis, Stroke and Traumatic Brain Injuries, Cham,
-
[20]
Springer International Publishing. 6
-
[21]
Dichao Pan, Jianguo Shen, Zaid Al-Huda, and Mo- hammed AA Al-Qaness. Vcanet: Vision transformer with fusion channel and spatial attention module for 3d brain tu- mor segmentation.Computers in Biology and Medicine, 186: 109662, 2025. 1
work page 2025
-
[22]
Soujanya Poria, Erik Cambria, Rajiv Bajpai, and Amir Hus- sain. A review of affective computing: From unimodal anal- ysis to multimodal fusion.Information fusion, 37:98–125,
-
[23]
Abhiram Potlapalli and Seetharam Khetavath. Exploring the use of deep learning models for image compression in em- bedded systems: Encoder and decoder architectures.Journal of Intelligent Systems & Internet of Things, 15(1), 2025. 1
work page 2025
-
[24]
Novsheena Rasool and Javaid Iqbal Bhat. A critical review on segmentation of glioma brain tumor and prediction of 9 overall survival.Archives of Computational Methods in En- gineering, 32(3):1525–1569, 2025. 1
work page 2025
-
[25]
U- net: Convolutional networks for biomedical image segmen- tation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InInternational Conference on Medical image com- puting and computer-assisted intervention, pages 234–241. Springer, 2015. 1, 2
work page 2015
-
[26]
Johansen, Dag Johansen, Michael A
Abhishek Srivastava, Debesh Jha, Sukalpa Chanda, Uma- pada Pal, H ˚avard D. Johansen, Dag Johansen, Michael A. Riegler, Sharib Ali, and P ˚al Halvorsen. Msrf-net: A multi- scale residual fusion network for biomedical image segmen- tation.IEEE Journal of Biomedical and Health Informatics, 26(5):2252–2263, 2022. 2
work page 2022
-
[27]
Dayu Tan, Zhiyuan Yao, Xin Peng, Haiping Ma, Yike Dai, Yansen Su, and Weimin Zhong. Multi-level medical image segmentation network based on multi-scale and context in- formation fusion strategy.IEEE Transactions on Emerging Topics in Computational Intelligence, 8(1):474–487, 2023. 2
work page 2023
-
[28]
Haonan Wang, Peng Cao, Jinzhu Yang, and Osmar Zaiane. Narrowing the semantic gaps in u-net with learnable skip connections: The case of medical image segmentation.Neu- ral Networks, 178:106546, 2024. 1
work page 2024
-
[29]
Huiyan Wang, Ruihao Peng, Ming Ying, Fashuai Li, Jiuyi Zhang, Xiaolan Li, Yan Tian, and Guofeng Zhang. Mff-sdd: A bidirectional guidance and multiscale multimodal fusion model for small defect detection in industrial films.IEEE Transactions on Industrial Informatics, 2025. 2
work page 2025
-
[30]
Transbts: Multimodal brain tumor segmen- tation using transformer
Wenxuan Wang, Chen Chen, Meng Ding, Hong Yu, Sen Zha, and Jiangyun Li. Transbts: Multimodal brain tumor segmen- tation using transformer. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2021, pages 109– 119, Cham, 2021. Springer International Publishing. 2, 6
work page 2021
-
[31]
Yikai Wang, Wenbing Huang, Fuchun Sun, Tingyang Xu, Yu Rong, and Junzhou Huang. Deep multimodal fusion by chan- nel exchanging.Advances in neural information processing systems, 33:4835–4845, 2020. 2
work page 2020
-
[32]
Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation
Zhaohu Xing, Tian Ye, Yijun Yang, Guang Liu, and Lei Zhu. Segmamba: Long-range sequential modeling mamba for 3d medical image segmentation. InMedical Image Computing and Computer Assisted Intervention – MICCAI 2024, pages 578–588, Cham, 2024. Springer Nature Switzerland. 2, 6
work page 2024
-
[33]
Guoan Xu, Juncheng Li, Guangwei Gao, Huimin Lu, Jian Yang, and Dong Yue. Lightweight real-time semantic seg- mentation network with efficient transformer and cnn.IEEE Transactions on Intelligent Transportation Systems, 24(12): 15897–15906, 2023. 2
work page 2023
-
[34]
Zihui Xue and Radu Marculescu. Dynamic multimodal fu- sion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 2575–2584,
-
[35]
XMoE: Sparse models with fine-grained and adaptive expert selection
Yuanhang Yang, Shiyi Qi, Wenchao Gu, Chaozheng Wang, Cuiyun Gao, and Zenglin Xu. XMoE: Sparse models with fine-grained and adaptive expert selection. InFindings of the Association for Computational Linguistics: ACL 2024, pages 11664–11674, Bangkok, Thailand, 2024. Association for Computational Linguistics. 2
work page 2024
-
[36]
All- in-one medical image restoration via task-adaptive routing
Zhiwen Yang, Haowei Chen, Ziniu Qian, Yang Yi, Hui Zhang, Dan Zhao, Bingzheng Wei, and Yan Xu. All- in-one medical image restoration via task-adaptive routing. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention, pages 67–77. Springer,
-
[37]
Su- perlightnet: Lightweight parameter aggregation network for multimodal brain tumor segmentation
Feng Yu, Jiacheng Cao, Li Liu, and Minghua Jiang. Su- perlightnet: Lightweight parameter aggregation network for multimodal brain tumor segmentation. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 5197–5206, 2025. 2, 6
work page 2025
-
[38]
Resilient datacenter load balancing in the wild
Hong Zhang, Junxue Zhang, Wei Bai, Kai Chen, and Mosharaf Chowdhury. Resilient datacenter load balancing in the wild. InProceedings of the Conference of the ACM Special Interest Group on Data Communication, pages 253– 266, 2017. 2
work page 2017
-
[39]
Xian Zhang, Zhibin Quan, Qiang Li, Dejun Zhu, and Wankou Yang. Sed: Searching enhanced decoder with switchable skip connection for semantic segmentation.Pat- tern Recognition, 149:110196, 2024. 1
work page 2024
-
[40]
Zheng Zhang, Yaqi Xia, Hulin Wang, Donglin Yang, Chuang Hu, Xiaobo Zhou, and Dazhao Cheng. Mpmoe: Memory efficient moe for pre-trained models with adaptive pipeline parallelism.IEEE Transactions on Parallel and Distributed Systems, 35(6):998–1011, 2024. 2
work page 2024
-
[41]
Meihua Zhou, Jun Feng, Tianlong Zheng, Min Cheng, and Li Yang. Contrast-aware hybrid attention network for medi- cal image segmentation.Information Sciences, page 123000,
-
[42]
Dcl-se: Dynamic curricu- lum learning for spatiotemporal encoding of brain imaging
Meihua Zhou, Xinyu Tong, Jiarui Zhao, Min Cheng, Li Yang, Lei Tian, and Nan Wan. Dcl-se: Dynamic curricu- lum learning for spatiotemporal encoding of brain imaging. arXiv preprint arXiv:2511.15151, 2025. 2
-
[43]
Meihua Zhou, Tianlong Zheng, Zhihua Wu, Nan Wan, and Min Cheng. Damnet: Dynamic mobile architectures for alzheimer’s disease.Computers in Biology and Medicine, 185:109517, 2025. 2
work page 2025
-
[44]
High-resolution encoder–decoder networks for low-contrast medical image segmentation
Sihang Zhou, Dong Nie, Ehsan Adeli, Jianping Yin, Jun Lian, and Dinggang Shen. High-resolution encoder–decoder networks for low-contrast medical image segmentation. IEEE Transactions on Image Processing, 29:461–475, 2019. 1
work page 2019
-
[45]
Tongxue Zhou. M2gcnet: Multi-modal graph convolution network for precise brain tumor segmentation across multi- ple mri sequences.IEEE Transactions on Image Processing,
-
[46]
Rongxin Zhu, Azzedine Boukerche, and Qiuling Yang. An efficient secure and adaptive routing protocol based on gmm- hmm-lstm for internet of underwater things.IEEE Internet of Things Journal, 11(9):16491–16504, 2024. 2
work page 2024
-
[47]
Zhiqin Zhu, Ziyu Wang, Guanqiu Qi, Neal Mazur, Pan Yang, and Yu Liu. Brain tumor segmentation in mri with multi-modality spatial information enhancement and bound- ary shape correction.Pattern Recognition, 153:110553,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.