A Hierarchical Ensemble Inference Pipeline for Robust White Blood Cell Classification Under Domain Shifts
Pith reviewed 2026-05-08 08:38 UTC · model grok-4.3
The pith
A three-stage hierarchical kNN ensemble with a memory feature bank on a LoRA-tuned DinoBloom backbone provides robust white blood cell classification despite domain shifts from staining and scanners.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We propose a memory-augmented, hierarchical ensemble pipeline for WBC classification under domain shifts, leveraging a feature bank and a DinoBloom backbone fine-tuned with LoRA. Our three-stage inference hierarchy combines k-nearest neighbors (kNN) retrieval at each level, reducing over-reliance on any single decision. Evaluated on the WBCBench dataset, our method ranks within the top ten by macro F1-score in the final testing phase.
What carries the argument
The three-stage inference hierarchy that performs kNN retrieval from a stored feature bank at each level on top of a LoRA-adapted DinoBloom model.
Load-bearing premise
The memory-augmented hierarchical ensemble with its fixed feature bank and staged kNN retrieval will be enough to counteract the effects of staining, scanner, and lab differences on classification accuracy.
What would settle it
Observing a significant drop in macro F1 score on a held-out test set collected from an unseen laboratory using a novel staining protocol and scanner type, relative to the reported challenge performance.
read the original abstract
Automated white blood cell (WBC) classification is essential for scalable leukaemia screening. However, real-world deployment is challenged by domain shifts caused by staining protocols, scanner characteristics, and inter-laboratory variability, which often degrade model performance. The White Blood Cell Classification Challenge (WBCBench) at ISBI 2026 aims to advance robust WBC recognition, with a focus on accurately identifying blast cells and other clinically critical rare subtypes. We propose a memory-augmented, hierarchical ensemble pipeline for WBC classification under domain shifts, leveraging a feature bank and a DinoBloom backbone fine-tuned with LoRA. Our three-stage inference hierarchy combines k-nearest neighbors (kNN) retrieval at each level, reducing over-reliance on any single decision. Evaluated on the WBCBench dataset, our method ranks within the top ten by macro F1-score in the final testing phase.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a memory-augmented hierarchical ensemble pipeline for white blood cell (WBC) classification under domain shifts from staining protocols, scanners, and inter-laboratory variability. It employs a DinoBloom backbone fine-tuned with LoRA, a feature bank for memory augmentation, and a three-stage inference hierarchy that applies k-nearest neighbors (kNN) retrieval at each level to reduce reliance on any single decision. The method is evaluated on the WBCBench dataset from the ISBI 2026 White Blood Cell Classification Challenge, where it achieves a top-10 ranking by macro F1-score in the final testing phase. The central claim is that this architecture provides robust performance on clinically critical subtypes such as blast cells without requiring site-specific adaptation.
Significance. If the hierarchical kNN ensemble and feature bank demonstrably mitigate domain shifts as claimed, the work would have moderate practical significance for scalable, reliable automated leukemia screening in heterogeneous clinical environments. The combination of a strong vision backbone with memory-augmented retrieval could serve as a template for other medical imaging tasks facing distribution shifts. However, the absence of methodological specifics, baselines, and quantitative validation prevents assessment of whether the approach advances beyond standard ensemble or adaptation methods in computer vision for hematology.
major comments (2)
- [Abstract] Abstract: The description of the three-stage inference hierarchy is insufficient to support the claim that it reduces over-reliance on any single decision. No information is provided on the classification objective of each stage, the fusion rule for combining kNN outputs, or the construction, population, and retrieval mechanism of the feature bank.
- [Evaluation] Evaluation section: The top-10 macro F1 ranking is reported without baselines, ablation studies on the hierarchy components or feature bank, error bars, statistical tests, or quantitative measures of domain-shift severity. This prevents determination of whether the result reflects genuine robustness gains or is specific to the challenge data distribution.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to incorporate additional details and analyses where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The description of the three-stage inference hierarchy is insufficient to support the claim that it reduces over-reliance on any single decision. No information is provided on the classification objective of each stage, the fusion rule for combining kNN outputs, or the construction, population, and retrieval mechanism of the feature bank.
Authors: We agree that the abstract provides only a high-level overview. In the revised manuscript, we have expanded the Methods section with a detailed description of the three-stage hierarchy. Stage 1 performs coarse lineage classification via kNN on global DinoBloom features, Stage 2 refines within-lineage subtypes, and Stage 3 applies memory-augmented kNN retrieval focused on rare clinically critical classes such as blasts. Outputs from the stages are combined using a similarity-weighted fusion rule. The feature bank is populated with embeddings extracted from the full training set and retrieved via cosine similarity at inference time. These additions clarify how the staged retrieval reduces dependence on any single decision. revision: yes
-
Referee: [Evaluation] Evaluation section: The top-10 macro F1 ranking is reported without baselines, ablation studies on the hierarchy components or feature bank, error bars, statistical tests, or quantitative measures of domain-shift severity. This prevents determination of whether the result reflects genuine robustness gains or is specific to the challenge data distribution.
Authors: The referee correctly notes that the original evaluation section was limited. We have revised it to include ablation studies that isolate the contribution of the hierarchical stages, the feature bank, and LoRA fine-tuning by comparing performance with and without each element. We now report error bars from repeated runs with different random seeds and include paired statistical tests against a baseline fine-tuned DinoBloom classifier. We have also added quantitative domain-shift analysis using Maximum Mean Discrepancy between source and target feature distributions. While the top-10 ranking in the WBCBench challenge provides external validation of robustness, these internal controls help attribute gains to the proposed components rather than the specific test distribution. revision: yes
Circularity Check
No circularity: empirical pipeline evaluated on external benchmark
full rationale
The paper describes a proposed memory-augmented hierarchical ensemble pipeline (DinoBloom+LoRA backbone, three-stage kNN with feature bank) and reports its macro F1 ranking on the external WBCBench challenge dataset. No equations, parameter fittings presented as predictions, self-citations, or ansatzes are present in the text. The central claim is an empirical performance result on an independent benchmark rather than any derivation that reduces to its own inputs by construction. This is a standard non-circular empirical contribution.
Axiom & Free-Parameter Ledger
free parameters (2)
- LoRA rank and scaling
- k for kNN retrieval
axioms (2)
- domain assumption Features extracted by the LoRA-adapted DinoBloom backbone remain discriminative for WBC subtypes under domain shift
- domain assumption Hierarchical multi-stage kNN retrieval reduces over-reliance and improves robustness compared to single-model inference
Reference graph
Works this paper leans on
-
[1]
The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022
INTRODUCTION Leukaemia is a life-threatening hematological malignancy, accounting for approximately 4% of cancer-related deaths and ranking among the leading causes of cancer mortality in both males and females [1]. The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022...
2022
-
[2]
We propose a coarse-to-fine hierarchical ensemble infer- ence strategy for robust WBC classification under domain shifts
-
[3]
We fine-tune a DinoBloom backbone with LoRA to ob- tain transferable embeddings that support retrieval-based inference under domain shifts
-
[4]
We further employ multi-split majority voting to stabi- lize predictions and improve performance on long-tailed classes
-
[5]
METHODOLOGY Fig. 1 provides an overview of our three-stage pipeline de- signed to handle class imbalance and domain shift in white blood cell classification. arXiv:2604.23271v1 [cs.CV] 25 Apr 2026 Fig. 1. Overview of the proposed three-stage pipeline. 2.1. Dataset Preparation WBCBench Challenge [9] provides labeled WBC images for a 13-class task. To mitig...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[6]
Implementation Our method is implemented in PyTorch
EXPERIMENTS AND RESULTS 3.1. Implementation Our method is implemented in PyTorch. We fine-tune the DinoBloom weights using LoRA on a single NVIDIA A100 for 100 epochs with a batch size of 16. We optimize the model with AdamW using a learning rate of1×10−5, a weight decay of1×10 −2, and an EMA momentum of 0.999. Inference is performed on an NVIDIA RTX 6000...
-
[7]
This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust
CONCLUSION In this work, we propose a hierarchical kNN-based inference framework for fine-grained white blood cell classification on WBCBench. This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust. Our approach fine-tunes a pretrained backbone to learn transferab...
-
[8]
Cancer statistics, 2022,
Rebecca L Siegel, Kimberly D Miller, Hannah E Fuchs, and Ahmedin Jemal, “Cancer statistics, 2022,”CA: a cancer journal for clinicians, vol. 72, no. 1, pp. 7–33, 2022
2022
-
[9]
Recognition of peripheral blood cell images using convolutional neu- ral networks,
Andrea Acevedo, Santiago Alf ´erez, Anna Merino, Laura Puigv ´ı, and Jos ´e Rodellar, “Recognition of peripheral blood cell images using convolutional neu- ral networks,”Computer methods and programs in biomedicine, vol. 180, pp. 105020, 2019
2019
-
[10]
Rabia Asghar, Sanjay Kumar, Paul Hynds, and Abeera Mahfooz, “Automatic classification of blood cell im- ages using convolutional neural network,”Preprint at https://doi. org/10.48550/arXiv, vol. 2308, 2023
work page internal anchor Pith review doi:10.48550/arxiv 2023
-
[11]
Histogram of cell types: deep learning for automated bone marrow cytology,
Rohollah Moosavi Tayebi, Youqing Mu, Taher Dehkharghanian, Catherine Ross, Monalisa Sur, Ronan Foley, Hamid R Tizhoosh, and Clinton JV Camp- bell, “Histogram of cell types: deep learning for automated bone marrow cytology,”arXiv preprint arXiv:2107.02293, 2021
-
[12]
Feature extraction using cnn for peripheral blood cells recognition,
Mohammed Ammar, Mostafa El Habib Daho, Khaled Harrar, and Amel Laidi, “Feature extraction using cnn for peripheral blood cells recognition,”EAI Endorsed Transactions on Scalable Information Systems, vol. 9, no. 34, pp. e12, 2022
2022
-
[13]
Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,
Ying Xing, Xuekai Liu, Juhua Dai, Xiaoxing Ge, Qingchen Wang, Ziyu Hu, Zhicheng Wu, Xuehui Zeng, Dan Xu, and Chenxue Qu, “Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, pp. 50, 2023
2023
-
[14]
An explainable vi- sion transformer model based white blood cells classi- fication and localization,
Oguzhan Katar and Ozal Yildirim, “An explainable vi- sion transformer model based white blood cells classi- fication and localization,”Diagnostics, vol. 13, no. 14, pp. 2459, 2023
2023
-
[15]
Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,
Rufus Rubin, SM Anzar, Alavikunhu Panthakkan, and Wathiq Mansoor, “Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,” in2023 6th International Conference on Signal Processing and Information Security (ICSPIS). IEEE, 2023, pp. 212–217
2023
-
[16]
WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,
Xin Tian, Xudong Ma, Tianqi Yang, Alin Achim, Bartek Papiez, Phandee Watanaboonyongcharoen, and Nan- theera Anantrasirichai, “WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,” in2026 IEEE International Symposium on Biomedical Imaging (ISBI). 2026, IEEE
2026
-
[17]
Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,
Valentin Koch, Sophia J Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia A Schnabel, Tingy- ing Peng, and Carsten Marr, “Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,” inInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention. Springer, 2024, pp. 520–530
2024
-
[18]
A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,
Andrea Acevedo, Anna Merino, Santiago Alf ´erez, ´Angel Molina, Laura Bold ´u, and Jos ´e Rodellar, “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,”Data in brief, vol. 30, pp. 105474, 2020
2020
-
[19]
Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,
Zahra Mousavi Kouzehkanan, Sepehr Saghari, Es- lam Tavakoli, Peyman Rostami, Mohammadjavad Abaszadeh, Farzaneh Mirzadeh, Esmaeil Shahabi Satl- sar, Maryam Gheidishahran, Fatemeh Gorgi, Saeed Mo- hammadi, et al., “Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,”BioRxiv, pp. 2021–05, 2021
2021
-
[20]
Lora: Low-rank adaptation of large language models.,
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al., “Lora: Low-rank adaptation of large language models.,”Iclr, vol. 1, no. 2, pp. 3, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.