pith. sign in

arxiv: 2605.18900 · v1 · pith:FMIM65OBnew · submitted 2026-05-17 · 🧬 q-bio.OT · cs.LG

A Logistic Regression Model to Predict Malaria Severity in Children

Pith reviewed 2026-05-20 13:11 UTC · model grok-4.3

classification 🧬 q-bio.OT cs.LG
keywords logistic regressionmalariaseveritychildrenGhanapredictive modelenvironmental factorssample representation
0
0 comments X

The pith

A logistic regression model predicts malaria severity in children using environmental and biological factors with 83.3 percent accuracy

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a logistic regression model to predict the severity of malaria in children based on factors such as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets. It achieves an accuracy of 83.3 percent using data from 417 respondents in the Bosomtwe District of Ghana. The findings indicate that children there are highly prone to malaria infection but that severity levels are very low. The authors conclude that good sample representation of different class labels is as important as sample size when developing machine learning models.

Core claim

A logistic regression model was developed to predict malaria severity from factors including sickle cell disease, stagnant water, garbage dump, wet lawns, and use of treated mosquito nets. Applied to 417 respondents in the Bosomtwe District, the model attains 83.3 percent accuracy. The study deduces that although children in the District are highly prone to malaria infection, the severity is very low.

What carries the argument

A logistic regression model that classifies malaria cases as severe or non-severe based on the presence of sickle cell disease and local environmental conditions such as stagnant water and wet lawns.

Load-bearing premise

The 417 respondents in the Bosomtwe District provide a representative sample of both severe and non-severe malaria cases that allows the model to generalize.

What would settle it

Gathering a new dataset of malaria cases and factors from the same district and verifying whether the logistic regression model maintains approximately 83 percent accuracy on the unseen data.

Figures

Figures reproduced from arXiv: 2605.18900 by Asare Yaw Obeng, Mary Opokua Ansong, Samuel King Opoku.

Figure 1
Figure 1. Figure 1: Transmission Cycle of Malaria Parasite (Source [3]) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
read the original abstract

One of the main causes of death around the globe is malaria. Researchers have sought to develop predictive models for malaria outbreaks based on meteorological data, climate data and the breeding cycle of Plasmodium, the causative agent of malaria. This study predicts the severity of malaria based on environmental and biological factors. A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate. The study was carried out in the Bosomtwe District of Ghana with 417 respondents. It was deduced that although children in the District are highly prone to malaria infection, the severity is very low. The study recommends that not just having a good sample size alone is important during machine learning model development, but also having a good sample representation of the various class labels is equally important.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes the development of a logistic regression model to predict malaria severity in children based on biological factors like sickle cell disease and environmental factors such as stagnant water, garbage dumps, wet lawns, and use of treated mosquito nets. Conducted in the Bosomtwe District of Ghana with a sample of 417 respondents, the model is reported to achieve 83.3% accuracy. The authors conclude that malaria infection is common but severity is low in the district and stress the importance of representative sampling for class labels in model development.

Significance. Should the accuracy claim be validated with proper out-of-sample testing, this work could contribute to identifying modifiable risk factors for severe malaria in children in similar settings, supporting public health efforts in malaria-endemic areas. The emphasis on sample representation is a useful reminder for applied ML in epidemiology.

major comments (2)
  1. Abstract: The central performance claim of 83.3% accuracy lacks any description of the train-test split, cross-validation, or class imbalance handling. Given that logistic regression coefficients are fitted directly to the 417 records, this accuracy likely measures in-sample fit rather than generalization, which is load-bearing for the predictive utility asserted in the title and abstract.
  2. Abstract: The deduction that 'the severity is very low' is stated without reference to specific model outputs, odds ratios, or statistical tests from the logistic regression, making it unclear how this conclusion follows from the analysis.
minor comments (2)
  1. Abstract: The recommendation regarding sample representation in machine learning is presented as a deduction from the study but would benefit from more explicit linkage to the observed class distribution in the 417 respondents.
  2. Consider adding standard references for logistic regression assumptions and validation practices in biomedical prediction models.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed review and constructive feedback on our manuscript. Below, we respond to each of the major comments raised.

read point-by-point responses
  1. Referee: Abstract: The central performance claim of 83.3% accuracy lacks any description of the train-test split, cross-validation, or class imbalance handling. Given that logistic regression coefficients are fitted directly to the 417 records, this accuracy likely measures in-sample fit rather than generalization, which is load-bearing for the predictive utility asserted in the title and abstract.

    Authors: We agree with the referee that the abstract should provide more details on how the accuracy was computed. In our analysis, the logistic regression model was fitted to the entire sample of 417 records, and the reported accuracy of 83.3% is the in-sample classification accuracy. We did not use a separate test set or cross-validation for the primary reported metric. This is a valid concern for assessing the model's predictive performance. In the revised manuscript, we will clarify this in the abstract and methods, and we will add results from a 5-fold cross-validation to better demonstrate generalization. We will also address class imbalance if present in the data. revision: yes

  2. Referee: Abstract: The deduction that 'the severity is very low' is stated without reference to specific model outputs, odds ratios, or statistical tests from the logistic regression, making it unclear how this conclusion follows from the analysis.

    Authors: The conclusion that severity is very low is based on the empirical observation in our dataset that the majority of malaria cases among the children were mild, as determined by clinical assessment. The logistic regression model was used to identify factors associated with severity (e.g., presence of stagnant water increasing odds of severe malaria), but the overall statement reflects the low proportion of severe cases in the sample. To make this clearer, we will revise the abstract to reference the descriptive statistics or specific odds ratios from the model that indicate low risk for severity in this population. revision: yes

Circularity Check

1 steps flagged

Reported 83.3% accuracy reduces to in-sample training fit by construction

specific steps
  1. fitted input called prediction [Abstract]
    "A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate."

    The 417 respondents constitute the sole dataset on which the logistic regression coefficients are estimated. The reported accuracy is then presented as the model's predictive performance, but without any partitioning or hold-out procedure described, this accuracy is necessarily the training-set fit and therefore equivalent to the input data by construction rather than an independent test of generalization.

full rationale

The paper's central claim rests on a logistic regression model that 'predicts' malaria severity with 83.3% accuracy using the listed environmental and biological factors. The abstract states the model was developed on the 417 respondents and directly reports this accuracy figure, with no description of any train/test split, cross-validation, or external validation cohort. This makes the accuracy metric equivalent to the in-sample goodness-of-fit on the exact data used to estimate the coefficients, satisfying the fitted-input-called-prediction pattern. No self-citations, self-definitional steps, or imported uniqueness theorems appear in the text; the remainder of the derivation (factor selection and sample description) is independent of the performance claim.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The model rests on the standard logistic regression assumptions plus the unverified claim that the collected sample mirrors the underlying population distribution of severity.

free parameters (1)
  • logistic regression coefficients
    The intercept and slope parameters for each predictor (sickle cell, stagnant water, etc.) are estimated from the 417 records.
axioms (2)
  • domain assumption The logit of the probability of severe malaria is a linear function of the listed predictors.
    Standard logistic regression modeling assumption invoked to justify the chosen classifier.
  • ad hoc to paper The 417 respondents constitute an unbiased sample of malaria cases in the district.
    Required for the accuracy figure to be interpreted as a population-level performance estimate.

pith-pipeline@v0.9.0 · 5695 in / 1464 out tokens · 31425 ms · 2026-05-20T13:11:55.194112+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    A logistic regression model was developed in this study to predict the severity of malaria based on such factors as sickle cell disease, stagnant water, garbage dump, wet lawns, and the use of treated mosquito nets, with an 83.3% accuracy rate.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Development of miRNA-Based Approaches to Explore the Interruption of Mosquito-Borne Disease Transmission

    Xu TL, Sun YW, Feng XY, Zhou XN, Zheng B. Development of miRNA-Based Approaches to Explore the Interruption of Mosquito-Borne Disease Transmission. Frontiers in Cellular and Infection Microbiology. 2021; 11:665444 9 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in...

  2. [2]

    Malaria [Internet] 2021 [August 16; cited 2023 August 13] A vailable from: https://eu.biogents.com/malaria/

    Biogents.com. Malaria [Internet] 2021 [August 16; cited 2023 August 13] A vailable from: https://eu.biogents.com/malaria/

  3. [3]

    Malaria Vaccine: Prospects and Challenges

    Hassan AO, Oso OV, Obeagu EI, Adeyemo AT. Malaria Vaccine: Prospects and Challenges. Madonna University Journal of Medicine and Health Sciences. 2022; 2(2): 22-40

  4. [4]

    Molecular mechanisms of Plasmodium development in male and female Anopheles mosquitoes

    Haraguchi A, Takano M, Hakozaki J, Nakayama K, Nakamura S, Yoshikawa Y, Ikadai H. Molecular mechanisms of Plasmodium development in male and female Anopheles mosquitoes. bioRxiv. 2022; 2022-01

  5. [5]

    Transfusion-transmitted malaria and mitigation strategies in nonendemic regions

    Niederhauser C, Galel SA. Transfusion-transmitted malaria and mitigation strategies in nonendemic regions. Transfusion medicine and hemotherapy. 2022; 49(4): 205-217

  6. [6]

    A model for predicting malaria outbreak using machine learning technique

    Stephen A, Akomolafe PO, Ogundoyin KI. A model for predicting malaria outbreak using machine learning technique. Annals. Computer Science Series. 2020; 9(1):9-15

  7. [7]

    A Deep Learning Approach for Segmentation of Red Blood Cell Images and Malaria Detection

    Delgado-Ortet M, Molina A, Alférez S, Rodellar J, Merino A. A Deep Learning Approach for Segmentation of Red Blood Cell Images and Malaria Detection. Entropy. 2020; 22(6):657 10 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an observati...

  8. [8]

    Spatial and spatio-temporal methods for mapping malaria risk: a systematic review

    Odhiambo JN, Kalinda C, Macharia PM, Snow R W, Sartorius B. Spatial and spatio-temporal methods for mapping malaria risk: a systematic review. BMJ Global Health. 2020; 5(10):e002919

  9. [9]

    Determining suitable machine learning classifier technique for prediction of malaria incidents attributed to climate of Odisha

    Mohapatra P, Tripathi NK, Pal I, Shrestha S. Determining suitable machine learning classifier technique for prediction of malaria incidents attributed to climate of Odisha. International Journal of Environmental Health Research. 2021; 32(8):1716-1732

  10. [10]

    Machine learning based malaria prediction using clinical findings

    Yadav SS, Kadam VJ, Jadhav SM, Jagtap S, Pathak PR. Machine learning based malaria prediction using clinical findings. International Conference on Emerging Smart Computing and Informatics. pp. 216-222, March 2021

  11. [11]

    Using Biological Variables and Social Determinants to Predict Malaria and Anemia among Children in Senegal

    Sow B, Suguri H, Mukhtar H, Ahmad HF. Using Biological Variables and Social Determinants to Predict Malaria and Anemia among Children in Senegal. IEICE Technical Report; IEICE Tech. Report. 2017; 117(336):3-20

  12. [12]

    Africa’s Malaria Epidemic Predictor: Application of Machine Learning on Malaria Incidence and Climate Data

    Masinde M. Africa’s Malaria Epidemic Predictor: Application of Machine Learning on Malaria Incidence and Climate Data. ACM International Conference Proceeding Series. pp. 29-37, 2020. 11 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an ...

  13. [13]

    Artificial intelligence approaches using natural language processing to advance EHR-based clinical research

    Juhn YH. Artificial intelligence approaches using natural language processing to advance EHR-based clinical research. Journal of Allergy and Clinical Immunology. 2020; 145(2):463-469

  14. [14]

    Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation

    Bucher BT, Shi J, Ferraro JP, Skarda DE, Samore MH, Hurdle JF, Finlayson SR. Portable Automated Surveillance of Surgical Site Infections Using Natural Language Processing: Development and Validation. Annals of Surgery, 2020; 272(4):629

  15. [15]

    Compliance with the who strategy of test, treat and track for malaria control at Bosomtwi district in Ghana

    Oteng G, Kenu E, Bandoh D, Nortey P, Afari E. Compliance with the who strategy of test, treat and track for malaria control at Bosomtwi district in Ghana. Ghana Medical Journal. 2020; 54(2):40-44

  16. [16]

    Modeling CO 2 emissions in South Africa: empirical evidence from ARDL based bounds and wavelet coherence techniques

    Adebayo TS, Odugbesan JA. Modeling CO 2 emissions in South Africa: empirical evidence from ARDL based bounds and wavelet coherence techniques. Environmental Science and Pollution Research. 202; 28(8):9377-9389

  17. [17]

    Scikit-learn: Machine Learning in Python

    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Duchesnay É. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2021; 12:2825-2830. 12 [1] Baba E, Hamade P, Kivumbi H, Marasciulo M, Maxwell K, Moroso D, Milligan P. Effectiveness of seasonal malaria chemoprevention at scale in west and central Africa: an ob...

  18. [18]

    Gene expression data classification with robust sparse logistic regression using fused regularisation

    Lavanya K, Rambabu P, Suresh GV, Bhandari R. Gene expression data classification with robust sparse logistic regression using fused regularisation. International Journal of Ad Hoc and Ubiquitous Computing, 2023; 42(4):281-291

  19. [19]

    Feature Space Sketching for Logistic Regression

    Dexter G, Khanna R, Raheel J, Drineas P. Feature Space Sketching for Logistic Regression. arXiv preprint, 2023; arXiv:2303.14284. 13