Patient-Level Diagnosis of Acute Myeloid Leukemia via Deep Learning Analysis of Bone Marrow Smear
Pith reviewed 2026-06-27 13:33 UTC · model grok-4.3
The pith
A deep learning pipeline aggregates cell classifications into CBLC ratios to support patient-level AML diagnosis from bone marrow smears.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By defining a composite category of blast-like cells (CBLC) from eight specific morphological types and training a YOLO segmentation plus EfficientNet classification pipeline on that target, cell predictions can be aggregated into patient-level CBLC ratios that support AML diagnosis. The pipeline produces stable results internally and generalizes externally, reaching ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on the three held-out centers.
What carries the argument
YOLO-based cell detection matched to expert contours followed by EfficientNet-B0 classification of the expert-defined CBLC composite category, with patient-level aggregation of the resulting cell ratios.
If this is right
- The same pipeline can be applied to new centers without retraining while retaining F1 performance above 0.86.
- Patient diagnosis can be derived from the ratio of one composite cell category instead of exhaustive manual counting of all cell types.
- Two-stage training with contour matching and morphology supervision produces consistent single-cell crops across variable smear preparations.
- Ensemble weighting of the classifier outputs improves the final patient-level F1 scores on external data.
Where Pith is reading between the lines
- The CBLC ratio could be tracked over serial smears to monitor disease progression or treatment response without new model training.
- Similar composite-category targeting might reduce annotation burden when adapting the pipeline to related blood disorders.
- Embedding the ratio output into existing laboratory software could allow pathologists to flag cases for review rather than replace their judgment.
Load-bearing premise
The expert grouping of eight specific cell types into a single CBLC composite accurately serves as a morphological proxy for AML diagnosis.
What would settle it
A blinded comparison in which pathologists diagnose AML from the same smears without using the model's CBLC ratios and the two sets of diagnoses disagree on a substantial fraction of patients.
Figures
read the original abstract
Bone marrow smear review remains important for acute myeloid leukemia (AML) assessment, but manual single-cell interpretation is labor-intensive and patient-level diagnosis requires aggregation of many cellular observations. We present a cell-to-patient deep learning pipeline for AML-assisted diagnosis from bone marrow smear images. The study included 258 patients from six anonymized centers, including a main cohort of 169 patients from Centers 1-3 and an external validation cohort of 89 patients from Centers 4-6. A 16-category cell annotation vocabulary was used to describe the global cellular composition, including granulocytic, monocytic, erythroid, lymphoid, eosinophilic, and other cells. Rather than identifying strict AML blasts or leukemic blasts, the model targets an expert-defined composite category termed Composite Blast-like Cells (CBLC), comprising N, N1, M, M1, R, R1, J, and J1 according to the project-wide morphological standard. A fixed YOLO-based segmentation module detected cells, predicted contours were matched to expert polygon annotations by contour IoU, and standardized single-cell crops were generated. An EfficientNet-B0 classifier was trained through a two-stage GT-to-YOLO and YOLO-to-YOLO strategy with class-imbalance correction, center-border regularization, and morphology-assisted supervision. Cell-level predictions were aggregated into patient-level CBLC ratios for AML-oriented diagnostic support. The pipeline achieved stable internal validation and maintained external generalization, with ensemble weighted F1-scores of 0.9076, 0.8696, and 0.9124 on Centers 4, 5, and 6, respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a deep learning pipeline for bone marrow smear analysis to support AML diagnosis. It employs YOLO-based detection followed by EfficientNet-B0 classification of cells into 16 morphological categories, with aggregation of an expert-defined Composite Blast-like Cells (CBLC) group (N, N1, M, M1, R, R1, J, J1) into patient-level ratios. The study uses 258 patients across six centers (169 in internal cohort from Centers 1-3; 89 in external from Centers 4-6), reporting ensemble weighted F1 scores of 0.9076, 0.8696, and 0.9124 on the external centers.
Significance. If the patient-level CBLC ratio aggregation were shown to reliably separate AML from non-AML cases and correlate with expert blast counts, the work would address a clinically relevant task in hematopathology by reducing manual review burden. The multi-center external validation setup and two-stage training strategy with imbalance correction are positive elements that strengthen potential generalizability.
major comments (2)
- [Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.
- [Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.
minor comments (1)
- [Methods] The definition and morphological criteria for the CBLC composite category could be clarified with an explicit table or figure showing example cells from each subclass to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback on our manuscript describing a cell-to-patient pipeline for bone marrow smear analysis in AML. We address each major comment below and commit to revisions that will strengthen the presentation of patient-level aspects and methodological details.
read point-by-point responses
-
Referee: [Abstract] Abstract and Results: The central claim of a 'cell-to-patient' pipeline supporting AML diagnosis via aggregated CBLC ratios is not supported by any reported patient-level metrics. No table, figure, or text provides AML vs. non-AML classification accuracy, AUC, sensitivity/specificity at diagnostic thresholds, or correlation between automated CBLC percentages and expert blast counts. Only cell-level weighted F1 scores are supplied, which is load-bearing for the title, abstract, and stated purpose.
Authors: We acknowledge that the manuscript reports detailed cell-level weighted F1 scores as the primary quantitative result and describes the aggregation into patient-level CBLC ratios for diagnostic support without providing explicit AML vs. non-AML classification metrics, AUC, sensitivity/specificity, or direct correlations with expert blast counts. The cell-level performance is presented as the enabling step for the pipeline. In revision we will add a dedicated results subsection and table reporting patient-level CBLC ratio statistics, Pearson/Spearman correlations with expert blast percentages, and binary diagnostic performance (e.g., sensitivity/specificity at standard blast thresholds) on the external cohorts to directly support the title and abstract claims. revision: yes
-
Referee: [Abstract] Abstract: No information is given on per-center patient counts within the external cohort, exact train/validation/test splits, staining normalization procedures, or any baseline comparisons (e.g., against standard blast counting or other classifiers). These omissions prevent evaluation of whether the reported F1 scores reflect robust generalization.
Authors: We will revise the abstract, methods, and supplementary material to specify the per-center breakdown of the 89 external patients across Centers 4-6, the exact patient-level train/validation/test splits used in the two-stage training, the staining normalization procedures applied, and baseline comparisons (including agreement with manual blast counting and at least one alternative classifier). These additions will allow readers to assess generalization more rigorously. revision: yes
Circularity Check
No circularity; empirical cell classification with external validation
full rationale
The manuscript presents an empirical ML pipeline: a YOLO segmenter and EfficientNet-B0 classifier are trained on annotated cells from centers 1-3 and evaluated via weighted F1 on held-out centers 4-6. No equations, parameter-fitting steps, or derivations are described that would reduce any output to its inputs by construction. The CBLC category is introduced as an expert-defined grouping whose patient-level ratio is asserted to support diagnosis, but this assertion is not derived from any fitted quantity or self-citation chain within the paper. External-center testing supplies independent evidence, satisfying the self-contained criterion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The 16-category cell annotation vocabulary and the CBLC composite definition accurately capture morphological features relevant to AML assessment.
Reference graph
Works this paper leans on
-
[1]
Döhner H, Wei AH, Appelbaum FR, et al. Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN. Blood. 2022;140(12):1345-1377. doi:10.1182/blood.2022016867
-
[2]
Khoury JD, Solary E, Abla O, et al. The 5th edition of the World Health Organization Classification of Haematolymphoid Tumours: Myeloid and Histiocytic/Dendritic Neoplasms. Leukemia. 2022;36:1703-1719. doi:10.1038/s41375-022-01613-1
-
[3]
Arber DA, Orazi A, Hasserjian RP, et al. International Consensus Classification of Myeloid Neoplasms and Acute Leukemias: integrating morphologic, clinical, and genomic data. Blood. 2022;140(11):1200-1228. doi:10.1182/blood.2022015850
-
[4]
Heuser M, Freeman SD, Ossenkoppele GJ, et al. 2021 Update on MRD in acute myeloid leukemia: a consensus document from the European LeukemiaNet MRD Working Party. Blood. 2021;138(26):2753-2767. doi:10.1182/blood.2021013626
-
[5]
Medical Image Analysis42, 60–88 (Dec 2017).https: //doi.org/10.1016/j.media.2017.07.005
Litjens G, Kooi T, Bejnordi BE, et al. A survey on deep learning in medical image analysis. Med Image Anal. 2017;42:60-88. doi:10.1016/j.media.2017.07.005
-
[6]
Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115-118. doi:10.1038/nature21056
-
[7]
In: International Conference on Medical image com- puting and computer-assisted intervention
Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation. In: Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. doi:10.1007/978-3-319-24574-4_28
-
[8]
Cellpose:ageneralistalgorithmforcellularsegmentation
Stringer C, Wang T, Michaelos M, Pachitariu M. Cellpose: a generalist algorithm for cellular segmentation. Nat Methods. 2021;18:100-106. doi:10.1038/s41592-020-01018-x
-
[9]
Segment Anything
Kirillov A, Mintun E, Ravi N, et al. Segment Anything. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023:4015-4026
2023
-
[10]
You Only Look Once: unified, real-time object detection
Redmon J, Divvala S, Girshick R, Farhadi A. You Only Look Once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:779-788
2016
-
[11]
Ultralytics YOLO
Jocher G, Chaurasia A, Qiu J. Ultralytics YOLO. 2023. Available from: https://github.com/ultralytics/ultralytics
2023
-
[12]
EfficientNet: rethinking model scaling for convolutional neural networks
Tan M, Le QV. EfficientNet: rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. 2019;97:6105-6114
2019
-
[13]
Focal loss for dense object detection
Lin TY, Goyal P, Girshick R, He K, Dollár P. Focal loss for dense object detection. Proceedings of the IEEE International Conference on Computer Vision. 2017:2980-2988
2017
-
[14]
Decoupled weight decay regularization
Loshchilov I, Hutter F. Decoupled weight decay regularization. International Conference on Learning Representations. 2019
2019
-
[15]
PyTorch: an imperative style, high-performance deep learning library
Paszke A, Gross S, Massa F, et al. PyTorch: an imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems. 2019;32
2019
-
[16]
PyTorch Lightning
Falcon WA, The PyTorch Lightning team. PyTorch Lightning. 2019. Available from: https://github.com/Lightning-AI/pytorch-lightning
2019
-
[17]
Proposals for the classification of the acute leukaemias
Bennett JM, Catovsky D, Daniel MT, et al. Proposals for the classification of the acute leukaemias. French-American-British Cooperative Group. Br J Haematol. 1976;33(4):451-458. doi:10.1111/j.1365-2141.1976.tb03563.x
-
[18]
Sanz MA, Fenaux P, Tallman MS, et al. Management of acute promyelocytic leukemia: updated recommendations from an expert panel of the European LeukemiaNet. Blood. 2019;133(15):1630-1643. doi:10.1182/blood-2019-01-894980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.