Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear

Fajin Tao; Gen Yang; Hongru Chen; Lin An; Qunxian Lu; Tianyi Wang; Weihua Meng; Xiaodong Mo; Yuqi Ma

arxiv: 2606.10735 · v1 · pith:6ETSGMYMnew · submitted 2026-06-09 · 💻 cs.CV · physics.med-ph

Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear

Yuqi Ma , Tianyi Wang , Weihua Meng , Hongru Chen , Fajin Tao , Qunxian Lu , Lin An , Xiaodong Mo

show 1 more author

Gen Yang

This is my paper

Pith reviewed 2026-06-27 13:33 UTC · model grok-4.3

classification 💻 cs.CV physics.med-ph

keywords acute myeloid leukemiabone marrow smeardeep learningcell classificationpatient-level diagnosisYOLOEfficientNetcomposite blast-like cells

0 comments

The pith

A deep learning pipeline aggregates cell classifications into CBLC ratios to support patient-level AML diagnosis from bone marrow smears.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a cell-to-patient pipeline that detects individual cells in bone marrow smear images and classifies them to estimate the ratio of an expert-defined composite blast-like cell category. This ratio is then used to assist in diagnosing acute myeloid leukemia at the patient level rather than relying on manual single-cell review. The approach trains a YOLO detector followed by an EfficientNet classifier using a two-stage strategy that corrects for class imbalance and incorporates morphology supervision. Validation on an external cohort from three additional centers shows the method maintains performance when moving beyond the training data. The work focuses on turning many cellular observations into a single diagnostic ratio that pathologists could use for AML assessment.

Core claim

By defining a composite category of blast-like cells (CBLC) from eight specific morphological types and training a YOLO segmentation plus EfficientNet classification pipeline on that target, cell predictions can be aggregated into patient-level CBLC ratios that support AML diagnosis. The pipeline produces stable results internally and generalizes externally, reaching ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on the three held-out centers.

What carries the argument

YOLO-based cell detection matched to expert contours followed by EfficientNet-B0 classification of the expert-defined CBLC composite category, with patient-level aggregation of the resulting cell ratios.

If this is right

The same pipeline can be applied to new centers without retraining while retaining F1 performance above 0.86.
Patient diagnosis can be derived from the ratio of one composite cell category instead of exhaustive manual counting of all cell types.
Two-stage training with contour matching and morphology supervision produces consistent single-cell crops across variable smear preparations.
Ensemble weighting of the classifier outputs improves the final patient-level F1 scores on external data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The CBLC ratio could be tracked over serial smears to monitor disease progression or treatment response without new model training.
Similar composite-category targeting might reduce annotation burden when adapting the pipeline to related blood disorders.
Embedding the ratio output into existing laboratory software could allow pathologists to flag cases for review rather than replace their judgment.

Load-bearing premise

The expert grouping of eight specific cell types into a single CBLC composite accurately serves as a morphological proxy for AML diagnosis.

What would settle it

A blinded comparison in which pathologists diagnose AML from the same smears without using the model's CBLC ratios and the two sets of diagnoses disagree on a substantial fraction of patients.

Figures

Figures reproduced from arXiv: 2606.10735 by Fajin Tao, Gen Yang, Hongru Chen, Lin An, Qunxian Lu, Tianyi Wang, Weihua Meng, Xiaodong Mo, Yuqi Ma.

**Figure 2.** Figure 2: Global distribution of the 16 annotated cell categories and CBLC definition. The upper panel shows center- [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Overall cell-to-patient pipeline for AML-assisted diagnosis using CBLC. The workflow links de-identified bone marrow smear images, fixed cell segmentation, IoU-based contour matching, single-cell crop generation, patient-level five-fold splitting, two-stage EfficientNet-B0 training, cell-level prediction, and patient-level aggregation [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Bone marrow smear review remains important for acute myeloid leukemia (AML) assessment, but manual single-cell interpretation is labor-intensive and patient-level diagnosis requires aggregation of many cellular observations. We present a cell-to-patient deep learning pipeline for AML-assisted diagnosis from bone marrow smear images. The study included 258 patients from six anonymized centers, including a main cohort of 169 patients from Centers 1-3 and an external validation cohort of 89 patients from Centers 4-6. A 16-category cell annotation vocabulary was used to describe the global cellular composition, including granulocytic, monocytic, erythroid, lymphoid, eosinophilic, and other cells. Rather than identifying strict AML blasts or leukemic blasts, the model targets an expert-defined composite category termed Composite Blast-like Cells (CBLC), comprising N, N1, M, M1, R, R1, J, and J1 according to the project-wide morphological standard. A fixed YOLO-based segmentation module detected cells, predicted contours were matched to expert polygon annotations by contour IoU, and standardized single-cell crops were generated. An EfficientNet-B0 classifier was trained through a two-stage GT-to-YOLO and YOLO-to-YOLO strategy with class-imbalance correction, center-border regularization, and morphology-assisted supervision. Cell-level predictions were aggregated into patient-level CBLC ratios for AML-oriented diagnostic support. The pipeline achieved stable internal validation and maintained external generalization, with ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on Centers 4, 5, and 6, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows decent external cell-level F1 on bone marrow smears but reports no patient-level AML diagnostic results despite the title and abstract claims.

read the letter

Colleague,

The headline result is external cell classification performance on three centers with ensemble weighted F1 scores of 0.9076, 0.8696, and 0.9124. That part is competent engineering. The title and abstract, however, frame the work as a cell-to-patient pipeline that supports AML diagnosis via CBLC ratios, and no numbers on patient-level separation, accuracy, or correlation with expert blast counts are supplied.

They collected smears from 258 patients across six centers, used YOLO for detection and EfficientNet-B0 for 16-class classification, and trained with a two-stage schedule plus imbalance and center regularization. Defining a composite CBLC category from expert input instead of chasing rare true blasts is a reasonable practical choice. The multi-center external validation is the clearest strength and better than most single-site studies in this area.

The main gap is the missing patient-level link. The abstract states that cell predictions are aggregated into CBLC ratios for diagnostic support, yet the only metrics stay at the cell level. Without any table or figure showing how those ratios perform against AML labels or manual counts, the central claim cannot be evaluated. The assumption that the grouped cells form a reliable proxy is stated but not tested at the patient level in the reported results.

This is for readers who build or evaluate computational tools for hematology labs and want a worked example of detection-plus-classification on real multi-center smear data. It will not interest people seeking new algorithms or formally verified methods.

I would send it for peer review with a clear request for the patient-level diagnostic metrics. Without those, the work does not yet match its own framing.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a deep learning pipeline for bone marrow smear analysis to support AML diagnosis. It employs YOLO-based detection followed by EfficientNet-B0 classification of cells into 16 morphological categories, with aggregation of an expert-defined Composite Blast-like Cells (CBLC) group (N, N1, M, M1, R, R1, J, J1) into patient-level ratios. The study uses 258 patients across six centers (169 in internal cohort from Centers 1-3; 89 in external from Centers 4-6), reporting ensemble weighted F1 scores of 0.9076, 0.8696, and 0.9124 on the external centers.

Significance. If the patient-level CBLC ratio aggregation were shown to reliably separate AML from non-AML cases and correlate with expert blast counts, the work would address a clinically relevant task in hematopathology by reducing manual review burden. The multi-center external validation setup and two-stage training strategy with imbalance correction are positive elements that strengthen potential generalizability.

major comments (2)

[Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.
[Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.

minor comments (1)

[Methods] The definition and morphological criteria for the CBLC composite category could be clarified with an explicit table or figure showing example cells from each subclass to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback on our manuscript describing a cell-to-patient pipeline for bone marrow smear analysis in AML. We address each major comment below and commit to revisions that will strengthen the presentation of patient-level aspects and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.

Authors: We acknowledge that the manuscript reports detailed cell-level weighted F1 scores as the primary quantitative result and describes the aggregation into patient-level CBLC ratios for diagnostic support without providing explicit AML vs. non-AML classification metrics, AUC, sensitivity/specificity, or direct correlations with expert blast counts. The cell-level performance is presented as the enabling step for the pipeline. In revision we will add a dedicated results subsection and table reporting patient-level CBLC ratio statistics, Pearson/Spearman correlations with expert blast percentages, and binary diagnostic performance (e.g., sensitivity/specificity at standard blast thresholds) on the external cohorts to directly support the title and abstract claims. revision: yes
Referee: [Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.

Authors: We will revise the abstract, methods, and supplementary material to specify the per-center breakdown of the 89 external patients across Centers 4-6, the exact patient-level train/validation/test splits used in the two-stage training, the staining normalization procedures applied, and baseline comparisons (including agreement with manual blast counting and at least one alternative classifier). These additions will allow readers to assess generalization more rigorously. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical cell classification with external validation

full rationale

The manuscript presents an empirical ML pipeline: a YOLO segmenter and EfficientNet-B0 classifier are trained on annotated cells from centers 1-3 and evaluated via weighted F1 on held-out centers 4-6. No equations, parameter-fitting steps, or derivations are described that would reduce any output to its inputs by construction. The CBLC category is introduced as an expert-defined grouping whose patient-level ratio is asserted to support diagnosis, but this assertion is not derived from any fitted quantity or self-citation chain within the paper. External-center testing supplies independent evidence, satisfying the self-contained criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The claim rests on the domain assumption that the CBLC grouping is clinically meaningful and that cell composition ratios generalize across centers; no free parameters or invented entities are introduced.

axioms (1)

domain assumption The 16-category cell annotation vocabulary and the CBLC composite definition accurately capture morphological features relevant to AML assessment.
Invoked when the model is trained to predict CBLC rather than strict blasts and when patient-level ratios are used for diagnosis support.

pith-pipeline@v0.9.1-grok · 5858 in / 1310 out tokens · 23683 ms · 2026-06-27T13:33:30.977118+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 10 canonical work pages

[1]

Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN

Döhner H, Wei AH, Appelbaum FR, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345-1377. doi:10.1182/blood.2022016867

work page doi:10.1182/blood.2022016867 2022
[2]

The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms

Khoury JD, Solary E, Abla O, et al. The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms. Leukemia. 2022;36:1703-1719. doi:10.1038/s41375-022-01613-1

work page doi:10.1038/s41375-022-01613-1 2022
[3]

International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data

Arber DA, Orazi A, Hasserjian RP, et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200-1228. doi:10.1182/blood.2022015850

work page doi:10.1182/blood.2022015850 2022
[4]

2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party

Heuser M, Freeman SD, Ossenkoppele GJ, et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood. 2021;138(26):2753-2767. doi:10.1182/blood.2021013626

work page doi:10.1182/blood.2021013626 2021
[5]

Medical Image Analysis42, 60–88 (Dec 2017).https: //doi.org/10.1016/j.media.2017.07.005

Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88. doi:10.1016/j.media.2017.07.005

work page doi:10.1016/j.media.2017.07.005 2017
[6]

Novoa, Justin Ko, Susan M

Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. doi:10.1038/nature21056

work page doi:10.1038/nature21056 2017
[7]

In: International Conference on Medical image com- puting and computer-assisted intervention

Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. doi:10.1007/978-3-319-24574-4_28

work page doi:10.1007/978-3-319-24574-4_28 2015
[8]

Cellpose:ageneralistalgorithmforcellularsegmentation

Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18:100-106. doi:10.1038/s41592-020-01018-x

work page doi:10.1038/s41592-020-01018-x 2021
[9]

Segment Anything

Kirillov A, Mintun E, Ravi N, et al. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:4015-4026

2023
[10]

You Only Look Once: unified, real-time object detection

Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788

2016
[11]

Ultralytics YOLO

Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO. 2023. Available from: https://github.com/ultralytics/ultralytics

2023
[12]

EfficientNet: rethinking model scaling for convolutional neural networks

Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. 2019;97:6105-6114

2019
[13]

Focal loss for dense object detection

Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision. 2017:2980-2988

2017
[14]

Decoupled weight decay regularization

Loshchilov I, Hutter F. Decoupled weight decay regularization. International Conference on Learning Representations. 2019

2019
[15]

PyTorch: an imperative style, high-performance deep learning library

Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019;32

2019
[16]

PyTorch Lightning

Falcon WA, The PyTorch Lightning team. PyTorch Lightning. 2019. Available from: https://github.com/Lightning-AI/pytorch-lightning

2019
[17]

Proposals for the classification of the acute leukaemias

Bennett JM, Catovsky D, Daniel MT, et al. Proposals for the classification of the acute leukaemias. French-American-British Cooperative Group. Br J Haematol. 1976;33(4):451-458. doi:10.1111/j.1365-2141.1976.tb03563.x

work page doi:10.1111/j.1365-2141.1976.tb03563.x 1976
[18]

Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet

Sanz MA, Fenaux P, Tallman MS, et al. Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet. Blood. 2019;133(15):1630-1643. doi:10.1182/blood-2019-01-894980

work page doi:10.1182/blood-2019-01-894980 2019

[1] [1]

Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN

Döhner H, Wei AH, Appelbaum FR, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345-1377. doi:10.1182/blood.2022016867

work page doi:10.1182/blood.2022016867 2022

[2] [2]

The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms

Khoury JD, Solary E, Abla O, et al. The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms. Leukemia. 2022;36:1703-1719. doi:10.1038/s41375-022-01613-1

work page doi:10.1038/s41375-022-01613-1 2022

[3] [3]

International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data

Arber DA, Orazi A, Hasserjian RP, et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200-1228. doi:10.1182/blood.2022015850

work page doi:10.1182/blood.2022015850 2022

[4] [4]

2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party

Heuser M, Freeman SD, Ossenkoppele GJ, et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood. 2021;138(26):2753-2767. doi:10.1182/blood.2021013626

work page doi:10.1182/blood.2021013626 2021

[5] [5]

Medical Image Analysis42, 60–88 (Dec 2017).https: //doi.org/10.1016/j.media.2017.07.005

Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88. doi:10.1016/j.media.2017.07.005

work page doi:10.1016/j.media.2017.07.005 2017

[6] [6]

Novoa, Justin Ko, Susan M

Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. doi:10.1038/nature21056

work page doi:10.1038/nature21056 2017

[7] [7]

In: International Conference on Medical image com- puting and computer-assisted intervention

Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. doi:10.1007/978-3-319-24574-4_28

work page doi:10.1007/978-3-319-24574-4_28 2015

[8] [8]

Cellpose:ageneralistalgorithmforcellularsegmentation

Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18:100-106. doi:10.1038/s41592-020-01018-x

work page doi:10.1038/s41592-020-01018-x 2021

[9] [9]

Segment Anything

Kirillov A, Mintun E, Ravi N, et al. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:4015-4026

2023

[10] [10]

You Only Look Once: unified, real-time object detection

Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788

2016

[11] [11]

Ultralytics YOLO

Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO. 2023. Available from: https://github.com/ultralytics/ultralytics

2023

[12] [12]

EfficientNet: rethinking model scaling for convolutional neural networks

Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. 2019;97:6105-6114

2019

[13] [13]

Focal loss for dense object detection

Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision. 2017:2980-2988

2017

[14] [14]

Decoupled weight decay regularization

Loshchilov I, Hutter F. Decoupled weight decay regularization. International Conference on Learning Representations. 2019

2019

[15] [15]

PyTorch: an imperative style, high-performance deep learning library

Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019;32

2019

[16] [16]

PyTorch Lightning

Falcon WA, The PyTorch Lightning team. PyTorch Lightning. 2019. Available from: https://github.com/Lightning-AI/pytorch-lightning

2019

[17] [17]

Proposals for the classification of the acute leukaemias

Bennett JM, Catovsky D, Daniel MT, et al. Proposals for the classification of the acute leukaemias. French-American-British Cooperative Group. Br J Haematol. 1976;33(4):451-458. doi:10.1111/j.1365-2141.1976.tb03563.x

work page doi:10.1111/j.1365-2141.1976.tb03563.x 1976

[18] [18]

Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet

Sanz MA, Fenaux P, Tallman MS, et al. Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet. Blood. 2019;133(15):1630-1643. doi:10.1182/blood-2019-01-894980

work page doi:10.1182/blood-2019-01-894980 2019