A Hierarchical Ensemble Inference Pipeline for Robust White Blood Cell Classification Under Domain Shifts

arxiv: 2604.23271 · v1 · submitted 2026-04-25 · 💻 cs.CV

A Hierarchical Ensemble Inference Pipeline for Robust White Blood Cell Classification Under Domain Shifts

Ruyi Dai , Tingkwong Ng , Hao Chen This is my paper

Pith reviewed 2026-05-08 08:38 UTC · model grok-4.3

classification 💻 cs.CV

keywords white blood cell classificationdomain shiftshierarchical ensemblek-nearest neighborsfeature bankLoRA fine-tuningleukemia screeningmedical image analysis

0 comments p. Extension

The pith

A three-stage hierarchical kNN ensemble with a memory feature bank on a LoRA-tuned DinoBloom backbone provides robust white blood cell classification despite domain shifts from staining and scanners.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a memory-augmented hierarchical ensemble pipeline designed to classify white blood cells accurately even when images come from varied laboratories with different staining methods and scanners. It builds this system around a backbone model that is lightly adapted, then adds a feature bank for retrieval. At inference time the pipeline runs three successive stages of k-nearest neighbor lookup to combine decisions and avoid depending too much on any one prediction. This setup is tested on the WBCBench challenge dataset for leukemia screening, where it achieves a competitive ranking. If the approach works as intended, it could support more reliable automated screening across different clinical settings without needing to retrain for each new site.

Core claim

We propose a memory-augmented, hierarchical ensemble pipeline for WBC classification under domain shifts, leveraging a feature bank and a DinoBloom backbone fine-tuned with LoRA. Our three-stage inference hierarchy combines k-nearest neighbors (kNN) retrieval at each level, reducing over-reliance on any single decision. Evaluated on the WBCBench dataset, our method ranks within the top ten by macro F1-score in the final testing phase.

What carries the argument

The three-stage inference hierarchy that performs kNN retrieval from a stored feature bank at each level on top of a LoRA-adapted DinoBloom model.

Load-bearing premise

The memory-augmented hierarchical ensemble with its fixed feature bank and staged kNN retrieval will be enough to counteract the effects of staining, scanner, and lab differences on classification accuracy.

What would settle it

Observing a significant drop in macro F1 score on a held-out test set collected from an unseen laboratory using a novel staining protocol and scanner type, relative to the reported challenge performance.

read the original abstract

Automated white blood cell (WBC) classification is essential for scalable leukaemia screening. However, real-world deployment is challenged by domain shifts caused by staining protocols, scanner characteristics, and inter-laboratory variability, which often degrade model performance. The White Blood Cell Classification Challenge (WBCBench) at ISBI 2026 aims to advance robust WBC recognition, with a focus on accurately identifying blast cells and other clinically critical rare subtypes. We propose a memory-augmented, hierarchical ensemble pipeline for WBC classification under domain shifts, leveraging a feature bank and a DinoBloom backbone fine-tuned with LoRA. Our three-stage inference hierarchy combines k-nearest neighbors (kNN) retrieval at each level, reducing over-reliance on any single decision. Evaluated on the WBCBench dataset, our method ranks within the top ten by macro F1-score in the final testing phase.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes a memory-augmented hierarchical ensemble pipeline for white blood cell (WBC) classification under domain shifts from staining protocols, scanners, and inter-laboratory variability. It employs a DinoBloom backbone fine-tuned with LoRA, a feature bank for memory augmentation, and a three-stage inference hierarchy that applies k-nearest neighbors (kNN) retrieval at each level to reduce reliance on any single decision. The method is evaluated on the WBCBench dataset from the ISBI 2026 White Blood Cell Classification Challenge, where it achieves a top-10 ranking by macro F1-score in the final testing phase. The central claim is that this architecture provides robust performance on clinically critical subtypes such as blast cells without requiring site-specific adaptation.

Significance. If the hierarchical kNN ensemble and feature bank demonstrably mitigate domain shifts as claimed, the work would have moderate practical significance for scalable, reliable automated leukemia screening in heterogeneous clinical environments. The combination of a strong vision backbone with memory-augmented retrieval could serve as a template for other medical imaging tasks facing distribution shifts. However, the absence of methodological specifics, baselines, and quantitative validation prevents assessment of whether the approach advances beyond standard ensemble or adaptation methods in computer vision for hematology.

major comments (2)

[Abstract] Abstract: The description of the three-stage inference hierarchy is insufficient to support the claim that it reduces over-reliance on any single decision. No information is provided on the classification objective of each stage, the fusion rule for combining kNN outputs, or the construction, population, and retrieval mechanism of the feature bank.
[Evaluation] Evaluation section: The top-10 macro F1 ranking is reported without baselines, ablation studies on the hierarchy components or feature bank, error bars, statistical tests, or quantitative measures of domain-shift severity. This prevents determination of whether the result reflects genuine robustness gains or is specific to the challenge data distribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and have revised the manuscript to incorporate additional details and analyses where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The description of the three-stage inference hierarchy is insufficient to support the claim that it reduces over-reliance on any single decision. No information is provided on the classification objective of each stage, the fusion rule for combining kNN outputs, or the construction, population, and retrieval mechanism of the feature bank.

Authors: We agree that the abstract provides only a high-level overview. In the revised manuscript, we have expanded the Methods section with a detailed description of the three-stage hierarchy. Stage 1 performs coarse lineage classification via kNN on global DinoBloom features, Stage 2 refines within-lineage subtypes, and Stage 3 applies memory-augmented kNN retrieval focused on rare clinically critical classes such as blasts. Outputs from the stages are combined using a similarity-weighted fusion rule. The feature bank is populated with embeddings extracted from the full training set and retrieved via cosine similarity at inference time. These additions clarify how the staged retrieval reduces dependence on any single decision. revision: yes
Referee: [Evaluation] Evaluation section: The top-10 macro F1 ranking is reported without baselines, ablation studies on the hierarchy components or feature bank, error bars, statistical tests, or quantitative measures of domain-shift severity. This prevents determination of whether the result reflects genuine robustness gains or is specific to the challenge data distribution.

Authors: The referee correctly notes that the original evaluation section was limited. We have revised it to include ablation studies that isolate the contribution of the hierarchical stages, the feature bank, and LoRA fine-tuning by comparing performance with and without each element. We now report error bars from repeated runs with different random seeds and include paired statistical tests against a baseline fine-tuned DinoBloom classifier. We have also added quantitative domain-shift analysis using Maximum Mean Discrepancy between source and target feature distributions. While the top-10 ranking in the WBCBench challenge provides external validation of robustness, these internal controls help attribute gains to the proposed components rather than the specific test distribution. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical pipeline evaluated on external benchmark

full rationale

The paper describes a proposed memory-augmented hierarchical ensemble pipeline (DinoBloom+LoRA backbone, three-stage kNN with feature bank) and reports its macro F1 ranking on the external WBCBench challenge dataset. No equations, parameter fittings presented as predictions, self-citations, or ansatzes are present in the text. The central claim is an empirical performance result on an independent benchmark rather than any derivation that reduces to its own inputs by construction. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The method depends on standard transfer learning and ensemble assumptions without new entities or many explicitly fitted parameters beyond typical hyperparameters.

free parameters (2)

LoRA rank and scaling
Hyperparameters controlling the adaptation of the DinoBloom backbone, chosen during fine-tuning.
k for kNN retrieval
Number of neighbors used at each hierarchy level, selected on validation data.

axioms (2)

domain assumption Features extracted by the LoRA-adapted DinoBloom backbone remain discriminative for WBC subtypes under domain shift
Invoked as the basis for the feature bank and retrieval stages.
domain assumption Hierarchical multi-stage kNN retrieval reduces over-reliance and improves robustness compared to single-model inference
Central premise of the three-stage inference hierarchy.

pith-pipeline@v0.9.0 · 5447 in / 1611 out tokens · 69995 ms · 2026-05-08T08:38:53.238479+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages · 2 internal anchors

[1]

The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022

INTRODUCTION Leukaemia is a life-threatening hematological malignancy, accounting for approximately 4% of cancer-related deaths and ranking among the leading causes of cancer mortality in both males and females [1]. The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022...

2022
[2]

We propose a coarse-to-fine hierarchical ensemble infer- ence strategy for robust WBC classification under domain shifts
[3]

We fine-tune a DinoBloom backbone with LoRA to ob- tain transferable embeddings that support retrieval-based inference under domain shifts
[4]

We further employ multi-split majority voting to stabi- lize predictions and improve performance on long-tailed classes
[5]

A Hierarchical Ensemble Inference Pipeline for Robust White Blood Cell Classification Under Domain Shifts

METHODOLOGY Fig. 1 provides an overview of our three-stage pipeline de- signed to handle class imbalance and domain shift in white blood cell classification. arXiv:2604.23271v1 [cs.CV] 25 Apr 2026 Fig. 1. Overview of the proposed three-stage pipeline. 2.1. Dataset Preparation WBCBench Challenge [9] provides labeled WBC images for a 13-class task. To mitig...

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Implementation Our method is implemented in PyTorch

EXPERIMENTS AND RESULTS 3.1. Implementation Our method is implemented in PyTorch. We fine-tune the DinoBloom weights using LoRA on a single NVIDIA A100 for 100 epochs with a batch size of 16. We optimize the model with AdamW using a learning rate of1×10−5, a weight decay of1×10 −2, and an EMA momentum of 0.999. Inference is performed on an NVIDIA RTX 6000...
[7]

This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust

CONCLUSION In this work, we propose a hierarchical kNN-based inference framework for fine-grained white blood cell classification on WBCBench. This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust. Our approach fine-tunes a pretrained backbone to learn transferab...
[8]

Cancer statistics, 2022,

Rebecca L Siegel, Kimberly D Miller, Hannah E Fuchs, and Ahmedin Jemal, “Cancer statistics, 2022,”CA: a cancer journal for clinicians, vol. 72, no. 1, pp. 7–33, 2022

2022
[9]

Recognition of peripheral blood cell images using convolutional neu- ral networks,

Andrea Acevedo, Santiago Alf ´erez, Anna Merino, Laura Puigv ´ı, and Jos ´e Rodellar, “Recognition of peripheral blood cell images using convolutional neu- ral networks,”Computer methods and programs in biomedicine, vol. 180, pp. 105020, 2019

2019
[10]

A Universal Prompting Strategy for Extracting Process Model Infor- mation from Natural Language Text using Large Language Mod- els

Rabia Asghar, Sanjay Kumar, Paul Hynds, and Abeera Mahfooz, “Automatic classification of blood cell im- ages using convolutional neural network,”Preprint at https://doi. org/10.48550/arXiv, vol. 2308, 2023

work page internal anchor Pith review doi:10.48550/arxiv 2023
[11]

Histogram of cell types: deep learning for automated bone marrow cytology,

Rohollah Moosavi Tayebi, Youqing Mu, Taher Dehkharghanian, Catherine Ross, Monalisa Sur, Ronan Foley, Hamid R Tizhoosh, and Clinton JV Camp- bell, “Histogram of cell types: deep learning for automated bone marrow cytology,”arXiv preprint arXiv:2107.02293, 2021

work page arXiv 2021
[12]

Feature extraction using cnn for peripheral blood cells recognition,

Mohammed Ammar, Mostafa El Habib Daho, Khaled Harrar, and Amel Laidi, “Feature extraction using cnn for peripheral blood cells recognition,”EAI Endorsed Transactions on Scalable Information Systems, vol. 9, no. 34, pp. e12, 2022

2022
[13]

Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,

Ying Xing, Xuekai Liu, Juhua Dai, Xiaoxing Ge, Qingchen Wang, Ziyu Hu, Zhicheng Wu, Xuehui Zeng, Dan Xu, and Chenxue Qu, “Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, pp. 50, 2023

2023
[14]

An explainable vi- sion transformer model based white blood cells classi- fication and localization,

Oguzhan Katar and Ozal Yildirim, “An explainable vi- sion transformer model based white blood cells classi- fication and localization,”Diagnostics, vol. 13, no. 14, pp. 2459, 2023

2023
[15]

Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,

Rufus Rubin, SM Anzar, Alavikunhu Panthakkan, and Wathiq Mansoor, “Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,” in2023 6th International Conference on Signal Processing and Information Security (ICSPIS). IEEE, 2023, pp. 212–217

2023
[16]

WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,

Xin Tian, Xudong Ma, Tianqi Yang, Alin Achim, Bartek Papiez, Phandee Watanaboonyongcharoen, and Nan- theera Anantrasirichai, “WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,” in2026 IEEE International Symposium on Biomedical Imaging (ISBI). 2026, IEEE

2026
[17]

Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,

Valentin Koch, Sophia J Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia A Schnabel, Tingy- ing Peng, and Carsten Marr, “Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,” inInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention. Springer, 2024, pp. 520–530

2024
[18]

A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,

Andrea Acevedo, Anna Merino, Santiago Alf ´erez, ´Angel Molina, Laura Bold ´u, and Jos ´e Rodellar, “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,”Data in brief, vol. 30, pp. 105474, 2020

2020
[19]

Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,

Zahra Mousavi Kouzehkanan, Sepehr Saghari, Es- lam Tavakoli, Peyman Rostami, Mohammadjavad Abaszadeh, Farzaneh Mirzadeh, Esmaeil Shahabi Satl- sar, Maryam Gheidishahran, Fatemeh Gorgi, Saeed Mo- hammadi, et al., “Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,”BioRxiv, pp. 2021–05, 2021

2021
[20]

Lora: Low-rank adaptation of large language models.,

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al., “Lora: Low-rank adaptation of large language models.,”Iclr, vol. 1, no. 2, pp. 3, 2022

2022

[1] [1]

The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022

INTRODUCTION Leukaemia is a life-threatening hematological malignancy, accounting for approximately 4% of cancer-related deaths and ranking among the leading causes of cancer mortality in both males and females [1]. The World Health Organi- zation statistics reported 2.9 new cases and 1.2 deaths per 100,000 people in individuals aged 0 to 24 years in 2022...

2022

[2] [2]

We propose a coarse-to-fine hierarchical ensemble infer- ence strategy for robust WBC classification under domain shifts

[3] [3]

We fine-tune a DinoBloom backbone with LoRA to ob- tain transferable embeddings that support retrieval-based inference under domain shifts

[4] [4]

We further employ multi-split majority voting to stabi- lize predictions and improve performance on long-tailed classes

[5] [5]

A Hierarchical Ensemble Inference Pipeline for Robust White Blood Cell Classification Under Domain Shifts

METHODOLOGY Fig. 1 provides an overview of our three-stage pipeline de- signed to handle class imbalance and domain shift in white blood cell classification. arXiv:2604.23271v1 [cs.CV] 25 Apr 2026 Fig. 1. Overview of the proposed three-stage pipeline. 2.1. Dataset Preparation WBCBench Challenge [9] provides labeled WBC images for a 13-class task. To mitig...

work page internal anchor Pith review Pith/arXiv arXiv 2026

[6] [6]

Implementation Our method is implemented in PyTorch

EXPERIMENTS AND RESULTS 3.1. Implementation Our method is implemented in PyTorch. We fine-tune the DinoBloom weights using LoRA on a single NVIDIA A100 for 100 epochs with a batch size of 16. We optimize the model with AdamW using a learning rate of1×10−5, a weight decay of1×10 −2, and an EMA momentum of 0.999. Inference is performed on an NVIDIA RTX 6000...

[7] [7]

This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust

CONCLUSION In this work, we propose a hierarchical kNN-based inference framework for fine-grained white blood cell classification on WBCBench. This task is challenging due to subtle inter-class differences and limited labeled data, which can make standard end-to-end classifiers less robust. Our approach fine-tunes a pretrained backbone to learn transferab...

[8] [8]

Cancer statistics, 2022,

Rebecca L Siegel, Kimberly D Miller, Hannah E Fuchs, and Ahmedin Jemal, “Cancer statistics, 2022,”CA: a cancer journal for clinicians, vol. 72, no. 1, pp. 7–33, 2022

2022

[9] [9]

Recognition of peripheral blood cell images using convolutional neu- ral networks,

Andrea Acevedo, Santiago Alf ´erez, Anna Merino, Laura Puigv ´ı, and Jos ´e Rodellar, “Recognition of peripheral blood cell images using convolutional neu- ral networks,”Computer methods and programs in biomedicine, vol. 180, pp. 105020, 2019

2019

[10] [10]

A Universal Prompting Strategy for Extracting Process Model Infor- mation from Natural Language Text using Large Language Mod- els

Rabia Asghar, Sanjay Kumar, Paul Hynds, and Abeera Mahfooz, “Automatic classification of blood cell im- ages using convolutional neural network,”Preprint at https://doi. org/10.48550/arXiv, vol. 2308, 2023

work page internal anchor Pith review doi:10.48550/arxiv 2023

[11] [11]

Histogram of cell types: deep learning for automated bone marrow cytology,

Rohollah Moosavi Tayebi, Youqing Mu, Taher Dehkharghanian, Catherine Ross, Monalisa Sur, Ronan Foley, Hamid R Tizhoosh, and Clinton JV Camp- bell, “Histogram of cell types: deep learning for automated bone marrow cytology,”arXiv preprint arXiv:2107.02293, 2021

work page arXiv 2021

[12] [12]

Feature extraction using cnn for peripheral blood cells recognition,

Mohammed Ammar, Mostafa El Habib Daho, Khaled Harrar, and Amel Laidi, “Feature extraction using cnn for peripheral blood cells recognition,”EAI Endorsed Transactions on Scalable Information Systems, vol. 9, no. 34, pp. e12, 2022

2022

[13] [13]

Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,

Ying Xing, Xuekai Liu, Juhua Dai, Xiaoxing Ge, Qingchen Wang, Ziyu Hu, Zhicheng Wu, Xuehui Zeng, Dan Xu, and Chenxue Qu, “Artificial intelligence of digital morphology analyzers improves the efficiency of manual leukocyte differentiation of peripheral blood,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, pp. 50, 2023

2023

[14] [14]

An explainable vi- sion transformer model based white blood cells classi- fication and localization,

Oguzhan Katar and Ozal Yildirim, “An explainable vi- sion transformer model based white blood cells classi- fication and localization,”Diagnostics, vol. 13, no. 14, pp. 2459, 2023

2023

[15] [15]

Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,

Rufus Rubin, SM Anzar, Alavikunhu Panthakkan, and Wathiq Mansoor, “Transforming healthcare: Raabin white blood cell classification with deep vision trans- former,” in2023 6th International Conference on Signal Processing and Information Security (ICSPIS). IEEE, 2023, pp. 212–217

2023

[16] [16]

WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,

Xin Tian, Xudong Ma, Tianqi Yang, Alin Achim, Bartek Papiez, Phandee Watanaboonyongcharoen, and Nan- theera Anantrasirichai, “WBCBench 2026: A challenge for robust white blood cell classification under class im- balance,” in2026 IEEE International Symposium on Biomedical Imaging (ISBI). 2026, IEEE

2026

[17] [17]

Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,

Valentin Koch, Sophia J Wagner, Salome Kazeminia, Ece Sancar, Matthias Hehr, Julia A Schnabel, Tingy- ing Peng, and Carsten Marr, “Dinobloom: a founda- tion model for generalizable cell embeddings in hema- tology,” inInternational Conference on Medical Im- age Computing and Computer-Assisted Intervention. Springer, 2024, pp. 520–530

2024

[18] [18]

A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,

Andrea Acevedo, Anna Merino, Santiago Alf ´erez, ´Angel Molina, Laura Bold ´u, and Jos ´e Rodellar, “A dataset of microscopic peripheral blood cell images for development of automatic recognition systems,”Data in brief, vol. 30, pp. 105474, 2020

2020

[19] [19]

Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,

Zahra Mousavi Kouzehkanan, Sepehr Saghari, Es- lam Tavakoli, Peyman Rostami, Mohammadjavad Abaszadeh, Farzaneh Mirzadeh, Esmaeil Shahabi Satl- sar, Maryam Gheidishahran, Fatemeh Gorgi, Saeed Mo- hammadi, et al., “Raabin-wbc: a large free access dataset of white blood cells from normal peripheral blood,”BioRxiv, pp. 2021–05, 2021

2021

[20] [20]

Lora: Low-rank adaptation of large language models.,

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al., “Lora: Low-rank adaptation of large language models.,”Iclr, vol. 1, no. 2, pp. 3, 2022

2022