pith. sign in

arxiv: 2607.00385 · v1 · pith:KQIKWFFCnew · submitted 2026-07-01 · 📡 eess.IV · cs.AI· cs.CV

MalariAI: A Label-Resilient Decoupled Framework for Universal Cell Segmentation and Explainable Stage Classification in Dense Malaria Blood Smears

Pith reviewed 2026-07-02 04:48 UTC · model grok-4.3

classification 📡 eess.IV cs.AIcs.CV
keywords malaria diagnosisblood smear microscopycell segmentationstage classificationwatershed algorithmEfficientNetexplainable AIGrad-CAM
0
0 comments X

The pith

A decoupled framework isolates cells in malaria smears without annotations and classifies stages at 98.36 percent accuracy with per-cell explanations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing end-to-end detectors for malaria diagnosis fail in dense smears because they treat unannotated cells as background, suppress valid detections via non-maximum suppression, and lack per-cell evidence for audit. MalariAI addresses these issues with a two-stage decoupled pipeline. Stage 1 applies an annotation-agnostic distance-transform guided watershed algorithm to isolate every cell in a full 1600x1200 image, recovering 75.95 percent of ground-truth cells by centroid localisation on the NIH BBBC041 test set without any ground-truth input. Stage 2 fine-tunes EfficientNet-B0 with focal loss on 64x64 crops to classify stages, reaching 98.36 percent overall accuracy and higher per-class accuracy on rare schizont and gametocyte stages than a Faster R-CNN baseline. Grad-CAM++ heatmaps then supply instance-level spatial evidence for each cell. This separation supports reliable outputs in resource-limited settings where expert microscopists are scarce.

Core claim

MalariAI is a two-stage decoupled framework for universal cell segmentation and explainable stage classification in dense malaria blood smears. Stage 1 uses an annotation-agnostic distance-transform guided watershed algorithm to isolate every cell in a 1600x1200 image, recovering 75.95 percent of ground-truth cells by centroid localisation across the 120-image NIH BBBC041 test set without any ground-truth input. Stage 2 fine-tunes EfficientNet-B0 with focal loss (gamma 2.0, per-class inverse-frequency weights) on 64x64 crops, achieving 98.36 percent overall classification accuracy with 87.5 percent and 75.0 percent per-class accuracy on the rare schizont and gametocyte stages, compared to on

What carries the argument

The annotation-agnostic distance-transform guided watershed algorithm for cell isolation, paired with per-cell Grad-CAM++ explainability on the EfficientNet-B0 classifier.

If this is right

  • Recovers 75.95 percent of ground-truth cells by centroid localisation without any ground-truth input during segmentation.
  • Classifies infection stages at 98.36 percent overall accuracy on the NIH BBBC041 test set.
  • Achieves 87.5 percent and 75.0 percent per-class accuracy on schizont and gametocyte stages, exceeding Faster R-CNN average precision on those classes.
  • Supplies instance-level spatial evidence via Grad-CAM++ heatmaps for clinical audit without sacrificing classification performance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The decoupled design may reduce the need for fully annotated training data in other dense cell microscopy tasks.
  • Instance-level explanations could support clinical adoption by letting microscopists verify individual predictions.
  • The focal loss weighting appears to help with class imbalance, which could be tested on additional imbalanced medical imaging datasets.
  • Performance on the specific NIH BBBC041 split would need confirmation on smears from varied sources and staining protocols.

Load-bearing premise

The distance-transform guided watershed algorithm can reliably isolate every individual cell in dense smear regions without any annotation input or post-processing tuned to the test set.

What would settle it

Applying the watershed algorithm to a fresh collection of dense blood smear images from a different preparation or microscope and measuring whether cell recovery by centroid localisation falls below 70 percent.

Figures

Figures reproduced from arXiv: 2607.00385 by Kaysarul Anas Apurba, Md Hasibul Hasan, Mohammed Ali, Tanzilur Rahman.

Figure 1
Figure 1. Figure 1: Class distribution of NIH BBBC041 training annotations. Left: log-scale view [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative ground-truth crops of each parasitic stage in NIH BBBC041 at [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of annotated bounding boxes per image in the BBBC041 training [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: MP-IDB species distribution. Left: annotation counts per [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: MP-IDB lifecycle stage distribution. Left: number of images containing each [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Bounding-box area distributions for BBBC041 (left) and MP-IDB (right). Left: [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Geometric illustration of the scale gap between BBBC041 (left) and MP-IDB [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Representative MP-IDB images with ground-truth bounding boxes overlaid. Red [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: MalariAI two-stage decoupled architecture. Stage 1 performs universal cell segmentation via distance-transform guided watershed on the raw 1600×1200 blood smear image with no annotation input. Stage 2 classifies each 64×64 crop with EfficientNet-B0 and Focal Loss, followed by Grad-CAM++ for per-cell spatial explainability. 3.2.2. Stage 1 – Universal Cell Segmentation Stage 1 applies a five-step computer vi… view at source ↗
Figure 10
Figure 10. Figure 10: Stage 1 annotation-agnostic watershed segmentation (Algorithm 1). Parameters: [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Faster R-CNN (Baseline A) training and validation loss over 80 epochs. Val [PITH_FULL_IMAGE:figures/full_fig_p035_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Ground truth (top) versus Faster R-CNN Baseline A predictions (bottom) on [PITH_FULL_IMAGE:figures/full_fig_p035_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Pipeline B Stage 2 (EfficientNet-B0) training and validation loss and accuracy [PITH_FULL_IMAGE:figures/full_fig_p043_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Stage 1 watershed output on a representative test image (64985a1e, 42 valid [PITH_FULL_IMAGE:figures/full_fig_p053_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Grad-CAM++ crop gallery for the representative test image. Each panel shows a detected parasitic cell at 160×160 pixels with the heatmap overlaid (jet colormap, α = 0.5). Red-yellow regions indicate pixels that most strongly contributed to the model’s classification decision. Labels show class name and softmax confidence. Image courtesy NIH BBBC041. 52 [PITH_FULL_IMAGE:figures/full_fig_p053_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Full-image Grad-CAM++ overlay. The heatmap is spliced back onto the original 1600×1200 smear only at parasite-detected bounding box locations using a soft spatial mask (Gaussian-blurred boundary). The rest of the image is shown unchanged. Parasite bounding boxes are drawn in class color. Image courtesy NIH BBBC041. 53 [PITH_FULL_IMAGE:figures/full_fig_p054_16.png] view at source ↗
read the original abstract

Automated malaria diagnosis from blood smear microscopy is a critical challenge in global health AI; in resource-limited settings, the scarcity of expert microscopists remains the primary bottleneck to timely and accurate diagnosis. Three compounding failure modes prevent reliable clinical deployment of existing deep learning systems. First, end-to-end detectors treat unannotated cells as background during training, producing recall figures that are strongly influenced by annotation completeness rather than reflecting true cell recovery. Second, Non-Maximum Suppression tends to suppress valid detections in dense smear regions where infection counts matter most. Third, existing whole-slide detection pipelines lack per-cell spatial evidence for clinical audit, despite image-level explainability methods such as Grad-CAM having been applied to malaria image classification tasks. We present MalariAI, a two-stage decoupled framework that addresses all three failure modes in a unified pipeline. Stage 1 applies an annotation-agnostic distance-transform guided watershed algorithm to isolate every cell in a full 1600x1200 blood smear image, recovering 75.95% of ground-truth cells by centroid localisation across the 120-image NIH BBBC041 test set without any ground-truth input. Stage 2 fine-tunes EfficientNet-B0 with Focal Loss (gamma = 2.0, per-class inverse-frequency weights) on 64x64 crops, achieving 98.36% overall classification accuracy with 87.5% and 75.0% per-class accuracy on the rare schizont and gametocyte stages, compared to only 24.57% and 25.95% AP for a Faster R-CNN baseline on the same classes. Grad-CAM++ heatmaps generated per detected cell provide instance-level spatial evidence for clinical audit, enabling microscopists to verify model predictions at the individual parasite level without sacrificing classification performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces MalariAI, a decoupled two-stage framework for malaria parasite detection and classification in blood smears. Stage 1 employs an annotation-agnostic distance-transform guided watershed algorithm to segment individual cells from full 1600x1200 images, reporting 75.95% centroid recovery on the NIH BBBC041 test set without using ground-truth annotations. Stage 2 fine-tunes an EfficientNet-B0 model using focal loss on 64x64 cell crops for five-stage classification, achieving 98.36% overall accuracy and superior performance on rare classes compared to Faster R-CNN, with Grad-CAM++ providing per-cell explainability.

Significance. If the central claims regarding parameter-independent cell isolation and split-independent classification hold, this work would be significant for global health AI by mitigating annotation bias, improving detection in dense regions, and providing instance-level explainability for clinical validation in malaria diagnosis.

major comments (2)
  1. [Abstract] The watershed-based cell isolation claim (75.95% recovery without ground-truth input) is load-bearing for the 'label-resilient' and 'annotation-agnostic' framing, but the abstract supplies no details on the specific fixed parameters of the distance-transform and watershed steps (e.g., Gaussian sigma, local maxima thresholds), contrary to the requirement that they be dataset-independent and untuned on the test set.
  2. [Abstract] The classification results (98.36% accuracy, 75.0% on gametocytes) are evaluated on the same 120-image test set as the segmentation stage; the manuscript does not state whether the training crops for EfficientNet-B0 fine-tuning are drawn from a completely disjoint set of images, which is necessary to establish that the reported metrics reflect generalization rather than leakage.
minor comments (1)
  1. [Abstract] The abstract reports headline metrics without error bars or statistical tests; this presentation issue should be addressed with confidence intervals in the results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on improving the transparency of our claims. We address the two major comments point-by-point below.

read point-by-point responses
  1. Referee: [Abstract] The watershed-based cell isolation claim (75.95% recovery without ground-truth input) is load-bearing for the 'label-resilient' and 'annotation-agnostic' framing, but the abstract supplies no details on the specific fixed parameters of the distance-transform and watershed steps (e.g., Gaussian sigma, local maxima thresholds), contrary to the requirement that they be dataset-independent and untuned on the test set.

    Authors: We agree the abstract should be self-contained on this point. The full manuscript describes the fixed parameters in the Methods section; these were selected a priori based on typical cell morphology and held constant without any tuning on the NIH BBBC041 test set (or any evaluation data). We will revise the abstract to explicitly list the parameters and restate that they are dataset-independent. revision: yes

  2. Referee: [Abstract] The classification results (98.36% accuracy, 75.0% on gametocytes) are evaluated on the same 120-image test set as the segmentation stage; the manuscript does not state whether the training crops for EfficientNet-B0 fine-tuning are drawn from a completely disjoint set of images, which is necessary to establish that the reported metrics reflect generalization rather than leakage.

    Authors: The training crops are drawn exclusively from images disjoint from the 120-image test set. We will add an explicit statement clarifying the image-level split in the revised manuscript (and abstract where space permits) to confirm the absence of leakage. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical measurements on named public dataset

full rationale

The paper reports direct empirical results from applying a fixed distance-transform watershed pipeline (Stage 1) and fine-tuning EfficientNet-B0 with Focal Loss (Stage 2) to the NIH BBBC041 test set. No equations, derivations, or self-citations are presented that reduce any claimed output to the inputs by construction. The 75.95% recovery and 98.36% accuracy figures are framed as measurements on an external benchmark without parameter fitting to the evaluation split or renaming of known results. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, new physical constants, or postulated entities appear in the abstract; the work rests on standard image-processing algorithms and a pre-trained CNN whose training details are not supplied.

free parameters (1)
  • gamma = 2.0
    Focal-loss focusing parameter set to 2.0; value is stated but its selection process is not described.

pith-pipeline@v0.9.1-grok · 5886 in / 1350 out tokens · 37570 ms · 2026-07-02T04:48:35.735838+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 42 canonical work pages · 7 internal anchors

  1. [1]

    rep., WHO Press, Geneva (2023)

    World Health Organization, World malaria report 2023, Tech. rep., WHO Press, Geneva (2023). URL https://www.who.int/teams/global-malaria-programme/ reports/world-malaria-report-2023

  2. [2]

    P. J. Delves, S. J. Martin, D. R. Burton, I. M. Roitt, Roitt’s Essential Immunology, 13th Edition, Wiley-Blackwell, 2017

  3. [3]

    Poostchi, K

    M. Poostchi, K. Silamut, R. J. Maude, S. Jaeger, G. Thoma, Image anal- ysis and machine learning for detecting malaria, Translational Research 194 (2018) 36–55.doi:10.1016/j.trsl.2017.12.004

  4. [4]

    Rajaraman, S

    S. Rajaraman, S. K. Antani, M. Poostchi, K. Silamut, M. A. Hossain, R. J. Maude, S. Jaeger, G. R. Thoma, Pre-trained convolutional neural networksasfeatureextractorstowardimprovedmalariaparasitedetection in thin blood smear images, PeerJ 6 (2018) e4568.doi:10.7717/peerj. 4568

  5. [5]

    Singh, C

    R. Singh, C. Prabha, S. Abdulla, Optimized CNN framework for malaria detection using Otsu thresholding-based image segmentation, Scientific Reports 15 (2025) 40117.doi:10.1038/s41598-025-23961-5

  6. [6]

    S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems (NeurIPS), 2015, pp. 91–99. URLhttps://arxiv.org/abs/1506.01497 63

  7. [7]

    Vythilingam, P

    D.Sukumarran, K.Hasikin, A.S.M.Khairuddin, R.Ngui, W.Y.WanSu- laiman, I. Vythilingam, P. C. S. Divis, An optimised YOLOv4 deep learn- ing model for efficient malarial cell detection in thin blood smear images, Parasites & Vectors 17 (2024) 188.doi:10.1186/s13071-024-06215-7

  8. [8]

    Parveen, B

    R. Parveen, B. Qui, W. Song, N. Al-Kahtani, M. M. Jamjoom, S. M. Mostafa, N. Sultan, J. Fatima, Trustworthy deep learning for malaria diagnosis using explainable artificial intelligence, Scientific Reports 15 (2025) 45037.doi:10.1038/s41598-025-28387-7

  9. [9]

    M. R. Islam, M. Nahiduzzaman, M. O. F. Goni, A. Sayeed, M. S. Anower, M. Ahsan, J. Haider, Explainable transformer-based deep learning model for the detection of malaria parasites from blood cell images, Sensors 22 (12) (2022) 4358.doi:10.3390/s22124358

  10. [10]

    O. O. Awe, P. N. Mwangi, S. K. Goudoungou, R. V. Esho, O. S. Oyejide, Explainable AI for enhanced accuracy in malaria diagnosis using ensemble machine learning models, BMC Medical Informatics and Decision Making 25 (2025) 152.doi:10.1186/s12911-025-02874-3

  11. [11]

    R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in: IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626. URLhttps://arxiv.org/abs/1610.02391

  12. [12]

    Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks

    A. Chattopadhyay, A. Sarkar, P. Howlader, V. N. Balasubramanian, Grad-CAM++: Generalized gradient-based visual explanations for deep 64 convolutional networks, in: IEEE Winter Conference on Applications of Computer Vision (WACV), 2018, pp. 839–847. URLhttps://arxiv.org/abs/1710.11063

  13. [13]

    Mujahid, F

    M. Mujahid, F. Rustam, R. Shafique, E. Caro Montero, E. Silva Alvarado, I. de la Torre Diez, I. Ashraf, Efficient deep learning-based approach for malaria detection using red blood cell smears, Scientific Reports 14 (2024) 13249.doi:10.1038/s41598-024-63831-0

  14. [14]

    O. P. Mmileng, A. Whata, M. Olusanya, S. Mhlongo, Application of Con- vNeXt with transfer learning and data augmentation for malaria parasite detection in resource-limited settings using microscopic images, PLOS One 20 (6) (2025) e0313734.doi:10.1371/journal.pone.0313734

  15. [15]

    O. O. Oladimeji, A. O. Ibitoye, A novel attention-enhanced hybrid deep learning approach for malaria diagnosis in microscopic cell images, Informatics and Health 3 (2026) 41–47.doi:10.1016/j.infoh.2025. 11.004

  16. [16]

    Gaouar, S

    A. Gaouar, S. H. Cherif, A. Rahmoun, M. El Habib Daho, Explain- able AI for early malaria detection using stacked-LSTM and atten- tion mechanisms, Informatics in Medicine Unlocked 57 (2025) 101667. doi:10.1016/j.imu.2025.101667

  17. [17]

    A. T. Issah, I. Seidu, C. Mukamakuza, Detection versus instance segmen- tation for multi-species malaria diagnosis: A head-to-head comparison and multi-dataset validation of YOLOv12 architectures with small object optimization, in: Proceedings of Machine Learning Research, Vol. 315, 65 2026, pp. 4683–4702. URLhttps://proceedings.mlr.press/v315/issah26a.html

  18. [18]

    Sukumarran, K

    D. Sukumarran, K. Hasikin, A. S. M. Khairuddin, R. Ngui, W. Y. W. Sulaiman, I. Vythilingam, P. C. S. Divis, Machine and deep learning methods in identifying malaria through microscopic blood smear: A systematic review, Engineering Applications of Artificial Intelligence 133 (2024) 108529.doi:10.1016/j.engappai.2024.108529

  19. [19]

    Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1) (1979) 62–66

    N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man, and Cybernetics 9 (1) (1979) 62–66. doi:10.1109/TSMC.1979.4310076

  20. [20]

    Delgado-Ortet, A

    M. Delgado-Ortet, A. Molina, S. Alférez, J. Rodellar, A. Merino, A deep learning approach for segmentation of red blood cell images and malaria detection, Entropy 22 (6) (2020) 657.doi:10.3390/e22060657

  21. [21]

    Beucher, F

    S. Beucher, F. Meyer, The morphological approach to segmentation: the watershed transformation, in: E. Dougherty (Ed.), Mathematical Morphology in Image Processing, Marcel Dekker, New York, 1992, pp. 433–481

  22. [22]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.doi:10.1109/CVPR.2016.90

  23. [23]

    T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in: IEEE/CVF Confer- 66 ence on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125.doi:10.1109/CVPR.2017.106

  24. [24]

    J. Hung, A. Carpenter, Applying faster R-CNN for object detection on malaria images, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017. doi:10.1109/CVPRW.2017.112

  25. [25]

    D. R. Loh, W. X. Yong, J. Yapeter, K. Subburaj, R. Chandramohanadas, A deep learning approach to the screening of malaria infection: Auto- matedandrapidcellcounting, objectdetectionandinstancesegmentation using Mask R-CNN, Computerized Medical Imaging and Graphics 88 (2021) 101845.doi:10.1016/j.compmedimag.2020.101845

  26. [26]

    K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961– 2969. URLhttps://arxiv.org/abs/1703.06870

  27. [27]

    F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, L. Zhang, DN-DETR: Accel- erate DETR training by introducing query denoising, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13619–13629. URLhttps://arxiv.org/abs/2203.01305

  28. [28]

    Guemas, B

    E. Guemas, B. Routier, T. Ghelfenstein-Ferreira, C. Cordier, S. Hartuis, B. Marion, S. Bertout, E. Varlet-Marie, D. Costa, G. Pasquier, Auto- matic patient-level recognition of fourPlasmodiumspecies on thin blood 67 smear by a real-time detection transformer (RT-DETR) object detection algorithm: a proof-of-concept and evaluation, Microbiology Spectrum 12 ...

  29. [29]

    X. Bai, B. Ma, C. Li, Y. Xia, Tackling the incomplete annotation issue in universallesiondetectiontaskbyexploratorytraining, IEEETransactions on Medical Imaging (2023).doi:10.1109/TMI.2023.3321488

  30. [30]

    Marks, U

    M. Marks, U. Israel, R. Dilip, Q. Li, C. Yu, E. Laubscher, A. Iqbal, E. Pradhan, A. Ates, M. Abt, C. Brown, E. Pao, S. Li, A. Pearson- Goulart, P. Perona, G. Gkioxari, R. Barnowski, Y. Yue, D. Van Valen, CellSAM: a foundation model for cell segmentation, Nature Methods 22 (2025) 2585–2593.doi:10.1038/s41592-025-02879-w

  31. [31]

    Segment Anything

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, R. Girshick, Segment anything, in: IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 4015–4026. URLhttps://arxiv.org/abs/2304.02643

  32. [32]

    R. Sun, Y. Yang, K. Guo, C. Jiang, D. Xu, Z. Liu, T. Pan, L. Han, X. Jiang, W. Wei, Y. Cheng, Disco: Densely-overlapping cell instance seg- mentation via adjacency-aware collaborative coloring, in: International Conference on Learning Representations (ICLR), 2026

  33. [33]

    Ronneberger, P

    O. Ronneberger, P. Fischer, T. Brox, U-Net: Convolutional networks for biomedical image segmentation, in: Medical Image Computing and Computer-Assisted Intervention (MICCAI), Vol. 9351 of Lec- 68 ture Notes in Computer Science, 2015, pp. 234–241.doi:10.1007/ 978-3-319-24574-4_28

  34. [34]

    Badrinarayanan, A

    V. Badrinarayanan, A. Kendall, R. Cipolla, SegNet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (12) (2017) 2481–2495. doi:10.1109/TPAMI.2016.2644615

  35. [35]

    Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, UNet++: A nested U-Net architecture for medical image segmentation, in: Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (MICCAI Workshop), 2018, pp. 3–11. URLhttps://arxiv.org/abs/1807.10165

  36. [36]

    Horst, T

    F. Horst, T. Rempe, J. Li, C. Brockmann, T. Lawitzki, S. Amirpour, G.Baldini, C.Ulrich, S.Denner, J.Kleesiek, CellViT:Visiontransformers for precise cell segmentation and classification, Medical Image Analysis 94 (2024) 103143.doi:10.1016/j.media.2024.103143

  37. [37]

    Pandiaraj, P

    A. Pandiaraj, P. R. Kshirsagar, R. Thiagarajan, T. K. Tak, B. Sivaneasan, A robust malaria cell detection framework using adaptive and atrous convolution-based recurrent MobileNetV2 with Trans-MobileUNet++- based abnormality segmentation, Journal of Imaging Informatics in Medicine 38 (2025) 2381–2411.doi:10.1007/s10278-024-01311-7

  38. [38]

    Petsiuk, R

    V. Petsiuk, R. Jain, V. Manjunatha, V. I. Morariu, A. Mehra, V. Ordonez, K. Saenko, Black-box explanation of object detectors via saliency maps, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition 69 (CVPR), 2021, pp. 11443–11452. URLhttps://arxiv.org/abs/2006.03204

  39. [39]

    Zheng, E

    Y. Zheng, E. Abila, E. Chrenková, I. Buljan, J. Winkler, A. F. Rendeiro, LazySlide: accessible and interoperable whole-slide image analysis, Na- ture Methods 23 (2026) 728–731.doi:10.1038/s41592-026-03044-7

  40. [40]

    H. Guan, M. Liu, Domain adaptation for medical image analysis: A survey, IEEE Transactions on Biomedical Engineering 69 (3) (2022) 1173–1185.doi:10.1109/TBME.2021.3117407

  41. [41]

    Nakasi, J

    R. Nakasi, J. N. Nabende, J. F. Tusubira, A. L. Bamundaga, A. Andama, A dataset of blood slide images for AI-based diagnosis of malaria, Data in Brief 58 (2025) 111190.doi:10.1016/j.dib.2024.111190

  42. [42]

    Ljosa, K

    V. Ljosa, K. L. Sokolnicki, A. E. Carpenter, Annotated high-throughput microscopy image sets for validation, Nature Methods 9 (7) (2012) 637, dataset: https://bbbc.broadinstitute.org/BBBC041. doi:10.1038/ nmeth.2083

  43. [43]

    T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object detection, in: IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988. URLhttps://arxiv.org/abs/1708.02002

  44. [44]

    Loddo, C

    A. Loddo, C. Di Ruberto, M. Kocher, G. Prod’Hom, MP-IDB: The malaria parasite image database for image processing and analysis, in: Processing and Analysis of Biomedical Information (SaMBa 2018), Vol. 70 11379 of Lecture Notes in Computer Science, 2019, pp. 57–65.doi: 10.1007/978-3-030-13835-6_7

  45. [45]

    M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolu- tional neural networks, in: Proceedings of the International Conference on Machine Learning (ICML), 2019, pp. 6105–6114. URLhttps://arxiv.org/abs/1905.11946

  46. [46]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009, pp. 248–255. doi:10.1109/CVPR.2009.5206848

  47. [47]

    Yeung, E

    M. Yeung, E. Sala, C.-B. Schönlieb, L. Rundo, Unified focal loss: Gener- alising Dice and cross entropy-based losses to handle class imbalanced medical image segmentation, Computerized Medical Imaging and Graph- ics 95 (2022) 102026.doi:10.1016/j.compmedimag.2021.102026

  48. [48]

    B. H. M. van der Velden, H. J. Kuijf, K. G. A. Gilhuijs, M. A. Viergever, Explainable artificial intelligence (XAI) in deep learning-based medical image analysis, Medical Image Analysis 79 (2022) 102470.doi:10.1016/ j.media.2022.102470

  49. [49]

    van der Walt, J

    S. van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, J. D. Warner, N. Yager, E. Gouillart, T. Yu, the scikit-image contributors, scikit-image: image processing in Python, PeerJ 2 (2014) e453.doi: 10.7717/peerj.453. 71

  50. [50]

    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. L. Zitnick, Microsoft COCO: Common objects in context, in: European Conference on Computer Vision (ECCV), Vol. 8693 of Lecture Notes in Computer Science, 2014, pp. 740–755.doi:10.1007/ 978-3-319-10602-1_48. Appendix A. Inference Pipeline Development: Iterative Improve- ment Th...