AnemiaVision: Non-Invasive Anemia Detection via Smartphone Imagery Using EfficientNet-B3 with TrivialAugmentWide, Mixup Augmentation, and Persistent Patient History Management
Pith reviewed 2026-05-08 12:28 UTC · model grok-4.3
The pith
A smartphone-based AI system using EfficientNet-B3 detects anemia from photos of the eyelid and nails with 96.2 percent accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that fine-tuning EfficientNet-B3 on smartphone images of the palpebral conjunctiva and fingernail beds, combined with TrivialAugmentWide, Mixup augmentation, RandomErasing, and accuracy-driven early stopping, produces a classifier with 96.2% validation accuracy, 0.98 AUC-ROC, and 0.96 sensitivity for the anemic class. This outperforms a simple CPU baseline by a large margin. The system further includes a deployed Flask application with persistent PostgreSQL patient records and migration support.
What carries the argument
The EfficientNet-B3 backbone with a redesigned three-layer classifier head using BatchNorm, GELU activations, and high-rate dropout, trained with TrivialAugmentWide and Mixup.
If this is right
- The system is suitable as a first-line screening tool for community health workers in rural settings.
- Persistent patient history management with PostgreSQL ensures no data loss across redeploys.
- Mixup augmentation contributes an additional 2.8% to validation accuracy.
- Accuracy-first early stopping adds 1.6% to final performance.
- The open web application allows public access for non-invasive anemia screening.
Where Pith is reading between the lines
- Similar phone-based imaging could be adapted to screen for other conditions visible on skin or eyes.
- Real-world deployment would require ongoing testing across varied lighting, skin tones, and phone models to confirm generalization.
- Integration with mobile health platforms could enable data collection for improving the model over time.
- The source code availability supports community modifications for local needs.
Load-bearing premise
The dataset of smartphone images from the study accurately represents the variability found in real-world low-resource settings, including differences in skin tones, lighting conditions, camera qualities, and patient demographics.
What would settle it
Testing the trained model on a fresh collection of smartphone images from previously unseen regions or devices that results in validation accuracy falling below 85% or AUC-ROC below 0.90 would disprove the generalization claim.
Figures
read the original abstract
Anemia affects over one billion people globally and remains severely under-diagnosed in low-resource regions where laboratory blood tests are inaccessible. This paper presents AnemiaVision, an end-to-end web-based system for non-invasive anemia screening from smartphone photographs of the palpebral conjunctiva and fingernail beds. The proposed pipeline fine-tunes a pre-trained EfficientNet-B3 backbone with a redesigned three-layer classifier head incorporating BatchNorm, GELU activations, and high-rate Dropout (0.45/0.35). Training employs four orthogonal accuracy-boosting techniques: TrivialAugmentWide for policy-free image augmentation, RandomErasing for spatial regularisation, Mixup (alpha=0.2) for inter-class smoothing, and cosine-annealing scheduling with linear warmup. Early stopping is governed by peak validation accuracy rather than validation loss to prevent premature termination on high-variance epochs. The deployed Flask application integrates persistent patient-history management backed by PostgreSQL on Render, with an automated database-migration entrypoint ensuring zero data loss across redeploys. Ablation experiments demonstrate that accuracy-first early stopping contributes +1.6% and Mixup contributes +2.8% to final validation accuracy. Overall, the proposed system achieves a validation accuracy of 96.2% and AUC-ROC of 0.98, compared with 44.9% validation accuracy and AUC-ROC of 0.58 from the three-epoch CPU-only baseline. Sensitivity for the anemic class reaches 0.96, making the system suitable as a first-line screening tool for community health workers in rural settings. The system is publicly accessible and source code is openly available.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents AnemiaVision, an end-to-end web-based system for non-invasive anemia detection from smartphone images of the palpebral conjunctiva and fingernail beds. It fine-tunes a pre-trained EfficientNet-B3 backbone with a custom three-layer classifier head (BatchNorm, GELU, high-rate Dropout), applies TrivialAugmentWide, RandomErasing, Mixup (alpha=0.2), and cosine annealing with linear warmup, and uses accuracy-based early stopping. The system reports 96.2% validation accuracy, 0.98 AUC-ROC, and 0.96 sensitivity for the anemic class, with ablations crediting +1.6% to accuracy-first early stopping and +2.8% to Mixup; it outperforms a three-epoch CPU baseline (44.9% accuracy, 0.58 AUC) and includes a Flask/PostgreSQL deployment with persistent patient history.
Significance. If the reported metrics prove robust, the work could have substantial practical significance for anemia screening in low-resource settings, offering an accessible tool for community health workers. The open-source code, deployed web application, and explicit ablation results are clear strengths that aid reproducibility and incremental improvement.
major comments (3)
- [Abstract and Experiments] Abstract and Experiments section: No dataset size, patient demographics, collection protocol, train/validation/test splits, or external validation set are reported. These details are load-bearing for interpreting the 96.2% validation accuracy, 0.98 AUC-ROC, and claim of suitability for rural deployment across variable skin tones, lighting, and camera qualities.
- [Results] Results section: The three-epoch CPU-only baseline (44.9% accuracy) is too weak to support the performance claims; a competitive control would include standard transfer-learning fine-tuning of EfficientNet-B3 or comparable models without the proposed augmentations.
- [Ablation experiments] Ablation experiments: The reported gains from accuracy-first early stopping (+1.6%) and Mixup (+2.8%) cannot be assessed without confirmation that the validation set uses patient-level partitioning to prevent leakage, especially given the persistent patient-history component of the deployed system.
minor comments (2)
- [Abstract] Abstract: The list of 'four orthogonal accuracy-boosting techniques' includes TrivialAugmentWide, RandomErasing, Mixup, and cosine annealing, but RandomErasing is not explicitly tied to the ablation results; clarify its contribution.
- [Methods] Methods: The three-layer classifier head is described with BatchNorm, GELU, and Dropout rates of 0.45/0.35, but the number of units per layer and exact connectivity are not specified.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating where revisions will be made to improve the manuscript.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments section: No dataset size, patient demographics, collection protocol, train/validation/test splits, or external validation set are reported. These details are load-bearing for interpreting the 96.2% validation accuracy, 0.98 AUC-ROC, and claim of suitability for rural deployment across variable skin tones, lighting, and camera qualities.
Authors: We agree that these experimental details are critical for evaluating the reported metrics and the claims regarding deployment in variable conditions. The current manuscript does not include them. In the revised version, we will add a dedicated subsection in the Experiments section that reports the total number of images and patients, available demographic information, the image acquisition protocol (including device types and lighting conditions), the exact train/validation/test split ratios and methodology, and any external validation performed. We will also expand the discussion to address generalizability across skin tones, lighting, and camera qualities. revision: yes
-
Referee: [Results] Results section: The three-epoch CPU-only baseline (44.9% accuracy) is too weak to support the performance claims; a competitive control would include standard transfer-learning fine-tuning of EfficientNet-B3 or comparable models without the proposed augmentations.
Authors: The three-epoch CPU baseline was included primarily to demonstrate the practical necessity of GPU resources and longer training rather than as a competitive benchmark. We acknowledge that it does not constitute a strong control. In the revised manuscript, we will add a new baseline experiment consisting of standard transfer-learning fine-tuning of EfficientNet-B3 (with default augmentations such as random horizontal flips, rotations, and color jitter, plus standard validation-loss early stopping) without TrivialAugmentWide, Mixup, RandomErasing, or accuracy-first early stopping. This will provide a clearer comparison of the proposed techniques. revision: yes
-
Referee: [Ablation experiments] Ablation experiments: The reported gains from accuracy-first early stopping (+1.6%) and Mixup (+2.8%) cannot be assessed without confirmation that the validation set uses patient-level partitioning to prevent leakage, especially given the persistent patient-history component of the deployed system.
Authors: Patient-level partitioning is indeed essential to prevent leakage, particularly given the patient-history management in the deployed system. The current manuscript does not explicitly describe the partitioning strategy used in the ablations. We will revise the Ablation experiments section to state that all dataset splits were performed at the patient level (ensuring no images from the same patient appear in multiple sets). If this was not the case in the original experiments, we will re-run the ablations with strict patient-level partitioning and report the updated accuracy gains. revision: partial
Circularity Check
No circularity: empirical performance metrics are measured on held-out data
full rationale
The paper describes a standard supervised image classification pipeline using EfficientNet-B3 fine-tuning, data augmentations (TrivialAugmentWide, Mixup, RandomErasing), and accuracy-based early stopping. All reported figures (96.2% validation accuracy, 0.98 AUC-ROC, 0.96 sensitivity) are direct empirical measurements on a validation set after training. No equations, first-principles derivations, or predictions are claimed; ablation results simply quantify incremental gains from each technique on the same data. No self-citations, self-definitional loops, or fitted parameters renamed as predictions appear. The derivation chain is self-contained experimental reporting with no reduction of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
free parameters (2)
- Dropout rate in classifier head =
0.45/0.35
- Mixup alpha =
0.2
axioms (2)
- domain assumption Visual features in smartphone images of the palpebral conjunctiva and fingernail beds are sufficient to detect anemia
- domain assumption The validation accuracy metric reliably indicates generalization to new patients
Reference graph
Works this paper leans on
- [1]
-
[2]
Prevalence of anaemia in India: ICMR-INDIAB study,
Indian Council of Medical Research, “Prevalence of anaemia in India: ICMR-INDIAB study,” Indian J. Med. Res. , vol. 155, no. 1, pp. 5–9, 2022
work page 2022
-
[3]
National Family Health Survey (NFHS-5) 2019–21,
International Institute for Population Sciences, “National Family Health Survey (NFHS-5) 2019–21,” Ministry of Health and Family Welfare, Govt. of India, 2022
work page 2019
-
[4]
G. Dimauro, D. Caivano, and F. Girardi, “A new method and a non- invasive device to estimate anemia based on digital images of the conjunctiva,” IEEE Access, vol. 6, pp. 46968–46975, 2018
work page 2018
-
[5]
Detection of anemia using conjunctiva images: a smartphone application approach,
P. Appiahene, E. J. Arthur, S. Korankye, S. Afrifa, J. W. Asare, and E. T. Donkoh, “Detection of anemia using conjunctiva images: a smartphone application approach,”Med. Novel Technol. Devices, vol. 18, p. 100237, 2023
work page 2023
-
[6]
J. W. Asare, P. Appiahene, E. T. Donkoh, and G. Dimauro, “Iron defi- ciency anemia detection using machine learning models: A comparative study of fingernails, palm and conjunctiva of the eye images,” Eng. Reports, vol. 5, p. e12667, 2023
work page 2023
-
[7]
Anemia detection using convolutional neural network based on palpe- bral conjunctiva images,
E. Purwanti, H. Amelia, M. A. Bustomi, M. A. Yatijan, and R. N. Putri, “Anemia detection using convolutional neural network based on palpe- bral conjunctiva images,” in Proc. 14th Int. Conf. Inf. Commun. Technol. Syst. (ICTS), Surabaya, Indonesia, Oct. 2023, pp. 117–122
work page 2023
-
[8]
L. K. Singh et al., “Non-invasive anemia detection from conjunctiva and sclera images using vision transformer with attention map explainabil- ity,” Sci. Rep., vol. 15, p. 1, Dec. 2025
work page 2025
-
[9]
H.-Y . Zhang et al. , “Deep learning-based model for non-invasive hemoglobin estimation via body parts images: a retrospective analysis and a prospective emergency department study,” npj Digit. Med., vol. 8, 2024
work page 2024
-
[10]
Anemia detection using ensemble learn- ing techniques and statistical models,
P. T. Dalvi and N. Vernekar, “Anemia detection using ensemble learn- ing techniques and statistical models,” in Proc. IEEE RTEICT , 2016, pp. 1747–1751
work page 2016
-
[11]
EfficientNet: Rethinking model scaling for con- volutional neural networks,
M. Tan and Q. V . Le, “EfficientNet: Rethinking model scaling for con- volutional neural networks,” in Proc. Int. Conf. Mach. Learn. (ICML) , 2019, pp. 6105–6114
work page 2019
-
[12]
TrivialAugment: Tuning-free yet state-of-the- art data augmentation,
S. Müller and F. Hutter, “TrivialAugment: Tuning-free yet state-of-the- art data augmentation,” in Proc. IEEE/CVF ICCV , 2021, pp. 774–782
work page 2021
-
[13]
Random erasing data augmentation,
Z. Zhong, L. Zheng, G. Kang, S. Li, and Y . Yang, “Random erasing data augmentation,” in Proc. AAAI Conf. Artif. Intell. , 2020, pp. 13001– 13008
work page 2020
-
[14]
Mixup: Beyond empirical risk minimization,
H. Zhang, M. Cisse, Y . N. Dauphin, and D. Lopez-Paz, “Mixup: Beyond empirical risk minimization,” in Proc. Int. Conf. Learn. Represent. (ICLR), 2018
work page 2018
-
[15]
Gaussian Error Linear Units (GELUs)
D. Hendrycks and K. Gimpel, “Gaussian error linear units (GELUs),” arXiv preprint arXiv:1606.08415 , 2016
work page internal anchor Pith review arXiv 2016
-
[16]
Grad-CAM: Visual explanations from deep networks via gradient-based localization,
R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE/CVF ICCV , 2017, pp. 618– 626
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.