pith. sign in

arxiv: 1907.11325 · v1 · pith:EMIKERXUnew · submitted 2019-07-25 · 📊 stat.AP

Decision Tree Learning for Uncertain Clinical Measurements

Pith reviewed 2026-05-24 15:31 UTC · model grok-4.3

classification 📊 stat.AP
keywords decision treesuncertain dataclinical measurementsprobabilistic thresholdssoft trainingnoise robustnessregularizationmedical diagnosis
0
0 comments X

The pith

Modeling uncertainty as noise only during decision tree training produces smaller trees that retain accuracy as noise increases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper separates the use of probabilistic thresholds into three distinct phases of decision tree construction and use: locating split values, assigning training examples to branches, and issuing predictions on new cases. It tests each phase independently on data where measurement error is represented by independent noise distributions. The training phases produce a regularizing effect that shrinks the resulting trees while accuracy holds steady or improves slightly with rising noise; the prediction phase alone yields no such gain. This separation clarifies that the benefit comes from how the tree is grown rather than how it is later applied.

Core claim

Soft training approaches that realize noise distributions when searching for split thresholds and when splitting training instances achieve a regularizing effect, leading to significant reductions in decision tree size while maintaining accuracy for increased noise; soft evaluation during prediction shows no benefit in handling noise.

What carries the argument

A probabilistic decision tree that independently realizes noise distributions in three phases: (1) searching for split thresholds, (2) splitting the training instances, and (3) generating predictions for unseen data.

If this is right

  • Decision trees trained with soft thresholds can be smaller yet equally accurate when input measurements contain noise.
  • The regularization benefit arises specifically from the training phases rather than from probabilistic prediction.
  • Larger noise levels do not degrade accuracy when the soft training steps are used.
  • Interpretability is preserved because the final tree structure remains a standard decision tree.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same three-phase separation could be applied to other tree-based methods such as random forests to test whether the regularization generalizes.
  • If clinical measurements exhibit systematic bias rather than zero-mean noise, the observed size reduction may not hold.
  • The approach could be extended by learning the noise distribution parameters jointly with the tree rather than assuming them known.

Load-bearing premise

That modeling measurement uncertainty as independently realized noise distributions across the three phases is enough to capture the relevant uncertainty structure in clinical data.

What would settle it

A dataset in which measurement errors are correlated across features or across patients, tested under the same three-phase protocol, where the size-reduction effect disappears or reverses.

Figures

Figures reproduced from arXiv: 1907.11325 by Anders Jonsson, Bart Bijnens, Cec\'ilia Nunes, H\'el\`ene Langet, Mathieu De Craene, Oscar Camara.

Figure 1
Figure 1. Figure 1: Motivating example to show the effect of employing th [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Probability of misclassifying (x(t) , 1) as function of the standard deviation σ of the normal uncertainty model. In 2a, the uncertainty model is considered only for the training instances x(1) and x(4), simulating soft training propagation (STP), while x(t) is certain. In 2b x(t) has normally-distributed noise, as in soft evaluation (SE). factor and x¯ the training subset mean of X. The same n is used for… view at source ↗
Figure 3
Figure 3. Figure 3: (a) Ejection fraction (EF) data of the Data Science Bo [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Information gain computation the ejection fraction [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Results of the experiments displayed as boxplots of t [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

Clinical decision requires reasoning in the presence of imperfect data. DTs are a well-known decision support tool, owing to their interpretability, fundamental in safety-critical contexts such as medical diagnosis. However, learning DTs from uncertain data leads to poor generalization, and generating predictions for uncertain data hinders prediction accuracy. Several methods have suggested the potential of probabilistic decisions at the internal nodes in making DTs robust to uncertainty. Some approaches only employ probabilistic thresholds during evaluation. Others also consider the uncertainty in the learning phase, at the expense of increased computational complexity or reduced interpretability. The existing methods have not clarified the merit of a probabilistic approach in the distinct phases of DT learning, nor when the uncertainty is present in the training or the test data. We present a probabilistic DT approach that models measurement uncertainty as a noise distribution, independently realized: (1) when searching for the split thresholds, (2) when splitting the training instances, and (3) when generating predictions for unseen data. The soft training approaches (1, 2) achieved a regularizing effect, leading to significant reductions in DT size, while maintaining accuracy, for increased noise. Soft evaluation (3) showed no benefit in handling noise.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces a probabilistic decision tree framework that models measurement uncertainty via independent noise distributions realized separately in three phases: (1) split threshold search, (2) training instance splitting, and (3) prediction on unseen data. It reports that soft training in phases (1) and (2) produces a regularizing effect with significantly smaller trees at maintained accuracy under increased noise, while soft evaluation in phase (3) yields no benefit.

Significance. If the empirical results hold under realistic conditions, the phase-specific analysis offers a clear way to isolate where probabilistic handling of uncertainty improves DT practicality in clinical settings, particularly by reducing model size (and thus improving interpretability) without accuracy loss. The explicit separation of training versus evaluation phases is a methodological strength that could guide future work on robust DTs.

major comments (2)
  1. [Abstract] Abstract (noise model): The central claim that soft training yields smaller trees at maintained accuracy rests on treating uncertainty as independently realized noise distributions across the three phases. Clinical measurements commonly exhibit correlated errors (e.g., shared instrument drift or patient physiology across features), which independent per-phase sampling does not reproduce. This independence assumption is load-bearing for the practical conclusion; without experiments using multivariate or correlated noise, the reported regularization benefit may not transfer to clinical data.
  2. [Abstract] Abstract (empirical support): The abstract asserts 'significant reductions in DT size' and 'maintaining accuracy' but supplies no information on datasets, noise distribution families, baseline comparators, number of replicates, or statistical tests. These details are required to evaluate whether the regularization effect is robust or an artifact of the chosen experimental conditions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below. We agree that the abstract requires additional empirical details and will revise it to include them. For the noise model, we will add discussion of the independence assumption as a modeling choice while noting its implications.

read point-by-point responses
  1. Referee: [Abstract] Abstract (noise model): The central claim that soft training yields smaller trees at maintained accuracy rests on treating uncertainty as independently realized noise distributions across the three phases. Clinical measurements commonly exhibit correlated errors (e.g., shared instrument drift or patient physiology across features), which independent per-phase sampling does not reproduce. This independence assumption is load-bearing for the practical conclusion; without experiments using multivariate or correlated noise, the reported regularization benefit may not transfer to clinical data.

    Authors: Our framework deliberately models uncertainty via independent noise distributions realized separately in each phase precisely to isolate the effects of soft decisions during threshold search, instance splitting, and prediction. This separation is central to the phase-specific analysis. While we recognize that correlated errors occur in clinical measurements, the regularization benefit of soft training is shown under the independent model. We will revise the manuscript to explicitly state this modeling assumption and discuss its potential limitations for direct applicability to correlated clinical data. revision: partial

  2. Referee: [Abstract] Abstract (empirical support): The abstract asserts 'significant reductions in DT size' and 'maintaining accuracy' but supplies no information on datasets, noise distribution families, baseline comparators, number of replicates, or statistical tests. These details are required to evaluate whether the regularization effect is robust or an artifact of the chosen experimental conditions.

    Authors: We agree that the abstract would be strengthened by including these details. In the revised manuscript we will expand the abstract to specify the datasets, noise distribution families, baseline comparators, number of replicates, and statistical tests used to support the reported reductions in tree size and maintained accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on experimental comparisons, not derivations or self-referential reductions.

full rationale

The paper describes probabilistic decision tree methods that model measurement uncertainty as independent noise distributions applied in three phases (threshold search, instance splitting, prediction). It reports empirical results showing regularization effects from soft training phases. No equations, derivations, or first-principles claims are present that reduce outputs to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims are statistical outcomes from experiments, which are externally falsifiable and do not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review provides no equations, so free parameters, axioms, and invented entities cannot be enumerated in detail; the core modeling choice of independent noise realizations is treated as a domain assumption.

axioms (1)
  • domain assumption Measurement uncertainty in clinical data can be adequately represented as independent noise distributions realized separately during split search, instance assignment, and prediction
    This modeling choice underpins the three-phase approach described in the abstract.

pith-pipeline@v0.9.0 · 5755 in / 1147 out tokens · 18700 ms · 2026-05-24T15:31:22.982218+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    The coming of age o f artificial intelligence in medicine,

    V . L. Patel, E. H. Shortliffe, M. Stefanelli, P . Szolovits, M . R. Berthold, R. Bellazzi, and A. Abu-Hanna, “The coming of age o f artificial intelligence in medicine,” Artificial Intelligence in Medicine , vol. 46, no. 1, pp. 5–17, 2009

  2. [2]

    Health Informatics via Machine Learning for the Clinical M an- agement of Patients,

    D. A. Clifton, K. E. Niehaus, P . Charlton, and G. W. Colopy , “Health Informatics via Machine Learning for the Clinical M an- agement of Patients,” Y earb Med Inform, vol. 10, no. 1, pp. 38–43, 2015

  3. [3]

    Exploratory medical k nowl- edge discovery: Experiences and issues,

    J. Roddick, P . Fule, and W. Graco, “Exploratory medical k nowl- edge discovery: Experiences and issues,” ACM SIGKDD Explo- rations Newsletter, pp. 2–7, 2003

  4. [4]

    Intelligent data analysis for medical diagnosis: Using ma chine learning and temporal abstraction,

    N. Lavraˇ c, I. Kononenko, E. Keravnou, M. Kukar, and B. Zu pan, “Intelligent data analysis for medical diagnosis: Using ma chine learning and temporal abstraction,” AI Communications , vol. 11, no. 3, pp. 191–218, 1998

  5. [5]

    Data quality: A sta tistical perspective,

    A. F. Karr, A. P . Sanil, and D. L. Banks, “Data quality: A sta tistical perspective,” Statistical Methodology , vol. 3, no. 2, pp. 137–173, 2006

  6. [6]

    Evaluat ion of measurement data - guide to the expression of uncertainty in measurement,

    W. G. . Joint Committee for Guides in Metrology , “Evaluat ion of measurement data - guide to the expression of uncertainty in measurement,” in T ech. Rep. JCGM 100: 2008 (BIPM, IEC, IFCC, ILAC, ISO, IUP AC, IUP AP and OIML, 2008

  7. [7]

    Uniqueness of medical da ta mining,

    K. J. Cios and G. William Moore, “Uniqueness of medical da ta mining,” Artificial Intelligence in Medicine , vol. 26, no. 1-2, pp. 1–24, 2002

  8. [8]

    Intra- and interobserver variability in th e mea- surements of abdominal aortic and common iliac artery diame ter with computed tomography . The Tromsø study,

    K. Singh, B. K. Jacobsen, S. Solberg, K. H. Bønaa, S. Kumar, R. B ajic, and E. Arnesen, “Intra- and interobserver variability in th e mea- surements of abdominal aortic and common iliac artery diame ter with computed tomography . The Tromsø study,” European Journal Vascular and Endovascular Surgery, vol. 25, no. 5, pp. 399–407, 2003

  9. [9]

    Measuring left ventricular ejecti on fraction-techniques and potential pitfalls,

    T. Foley , S. Mankad, N. Anavekar, C. Bonnichsen, M. Morris , T. Miller, and P . Araoz, “Measuring left ventricular ejecti on fraction-techniques and potential pitfalls,” European Cardiology , vol. 8, no. 2, pp. 108–114, 2012

  10. [10]

    Comparison of imaging techniques to assess appendage anatomy and measurements for left atrial a p- pendage closure device selection

    J. R. Lopez-Minguez, R. Gonzalez-Fernandez, C. Fernan dez-V egas, V . Millan-Nunez, M. E. Fuentes-Canamero, J. M. Nogales-Asensio, J. Doncel-V ecino, M. Y uste Dominguez, L. Garcia Serrano, and D. Sanchez Quintana, “Comparison of imaging techniques to assess appendage anatomy and measurements for left atrial a p- pendage closure device selection.” The Jou...

  11. [11]

    The quantita tive science of evaluating imaging evidence,

    T. S. Genders, B. S. Ferket, and M. M. Hunink, “The quantita tive science of evaluating imaging evidence,” JACC: Cardiovascular Imaging, vol. 10, no. 3, pp. 264–275, 2017

  12. [12]

    Assessment of left ventricular e jection fraction in patients eligible for ICD therapy: Discrepancy between cardiac magnetic resonance imaging and 2D echocardiograph y,

    S. de Haan, K. de Boer, J. Commandeur, A. M. Beek, A. C. van Rossum, and C. P . Allaart, “Assessment of left ventricular e jection fraction in patients eligible for ICD therapy: Discrepancy between cardiac magnetic resonance imaging and 2D echocardiograph y,” Netherlands Heart Journal , vol. 22, no. 10, pp. 449–455, 2014

  13. [13]

    Closing the chasm between research and pra ctice: evidence of and for change,

    L. W. Green, “Closing the chasm between research and pra ctice: evidence of and for change,” Health Promotion Journal of Australia , vol. 25, no. 1, pp. 25–29, 2014. (PREPRINT) IEEE TRANSACTIONS ON KNOWLEDGE AND DA T A ENGINEERING, SUBMITTED FOR REVIEW, AUGUST 2019 12

  14. [14]

    Interactive dichotomizer, id3,

    J. Quinlan et al. , “Interactive dichotomizer, id3,” Eds. Morgan Kauffmann, Springer-Verlag, 1979

  15. [15]

    Quinlan, C4.5: Programs for Machine Learning

    R. Quinlan, C4.5: Programs for Machine Learning . San Mateo, CA: Morgan Kaufmann Publishers, 1993

  16. [16]

    Breiman, J

    L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classifica- tion and Regression T rees. Belmont, CA: Wadsworth International Group, 1984

  17. [17]

    An exploratory technique for investigatin g large quantities of categorical data,

    G. V . Kass, “An exploratory technique for investigatin g large quantities of categorical data,” Applied statistics, pp. 119–127, 1980

  18. [18]

    Can machine-learning improve cardiovascular risk prediction using routine clinical data?

    S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi , “Can machine-learning improve cardiovascular risk prediction using routine clinical data?” PLOS ONE, vol. 12, no. 4, 2017

  19. [19]

    L 119, 4.5.:1–88

    “Regulation (EU) 2016/679 of the European Parliament a nd of the Council of 27 April 2016 on the protection of natural pers ons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC,” 2 016 O.J. L 119, 4.5.:1–88

  20. [20]

    Decision trees as probabilistic classi fiers,

    J. R. Quinlan, “Decision trees as probabilistic classi fiers,” in Pro- ceedings of the 4th International Workshop on Machine Learn ing. Morgan Kauffman, 1987, pp. 31–37

  21. [21]

    Softening splits in decision trees using simulated annealing,

    J. Dvor´ ak and P . Savick ´y, “Softening splits in decision trees using simulated annealing,” in Adaptive and Natural Computing Algorithms, 8th International Conference, ICANNGA 2007, W arsaw, Poland, April 11-14, 2007, Proceedings, Part I , 2007, pp. 721–729

  22. [22]

    Decision trees for uncertain data,

    S. Tsang, B. Kao, K. Y . Yip, W.-S. Ho, and S. D. Lee, “Decision trees for uncertain data,” IEEE transactions on knowledge and data engineering, vol. 23, no. 1, pp. 64–78, 2011

  23. [23]

    Soft decision tr ees,

    O. Irsoy , O. T. Yıldız, and E. Alpaydın, “Soft decision tr ees,” in Pattern Recognition (ICPR), 2012 21st International Confe rence on . IEEE, 2012, pp. 1819–1822

  24. [24]

    Induction of fuzzy decision trees,

    Y . Y uan, “Induction of fuzzy decision trees,” Fuzzy Sets and Sys- tems, vol. 69, no. 2, pp. 125–139, 1995

  25. [25]

    On the optimization of fuzzy decision trees,

    X. Wang, B. Chen, G. Qian, and F. Y e, “On the optimization of fuzzy decision trees,” Fuzzy Sets and Systems , vol. 112, no. 1, pp. 117–125, may 2000

  26. [26]

    On Distribu ted Fuzzy Decision Trees for Big Data,

    A. Segatori, F. Marcelloni, and W. Pedrycz, “On Distribu ted Fuzzy Decision Trees for Big Data,” IEEE T ransactions on Fuzzy Systems , pp. 1–1, 2017

  27. [27]

    Probabilistic decision trees,

    J. R. Quinlan, “Probabilistic decision trees,” Machine learning: an artificial intelligence approach , vol. 3, pp. 140–152, 1990

  28. [28]

    Hierarchical mixtures of experts and the em algorithm,

    M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” Neural computation , vol. 6, no. 2, pp. 181– 214, 1994

  29. [29]

    Constructing optimal binary decision trees is NP-complete,

    L. Hyafil and R. L. Rivest, “Constructing optimal binary decision trees is NP-complete,” Information Processing Letters , vol. 5, no. 1, pp. 15–17, 1976

  30. [30]

    Induction of decision trees,

    J. R. Quinlan, “Induction of decision trees,” Machine Learning , vol. 1, no. 1, pp. 81–106, 1986

  31. [31]

    Top-down induction of decisio n trees classifiers - A survey ,

    L. Rokach and O. Maimon, “Top-down induction of decisio n trees classifiers - A survey ,” IEEE T ransactions on Systems, Man and Cybernetics Part C: Applications and Reviews , vol. 35, no. 4, pp. 476– 487, 2005

  32. [32]

    Ross Quinlan’s personal homepage

    Quinlan, Ross. Ross Quinlan’s personal homepage. Acce ssed: 2018-06-03. [Online]. Available: www.rulequest.com/Personal/

  33. [33]

    Bayesian model averaging: a tutorial,

    J. A. Hoeting, D. Madigan, A. E. Raftery , and C. T. V olins ky , “Bayesian model averaging: a tutorial,” Statistical science, pp. 382– 401, 1999

  34. [34]

    Two-dimensional speckle tracking echocardiography: standardization effo rts based on synthetic ultrasound data,

    J. D’Hooge, D. Barbosa, H. Gao, P . Claus, D. Prater, J. Ha milton, P . Lysyansky , Y . Abe, Y . Ito, H. Houle et al. , “Two-dimensional speckle tracking echocardiography: standardization effo rts based on synthetic ultrasound data,” Eur Heart J Cardiovasc Imaging , vol. 17, no. 6, pp. 693–701, 2016

  35. [35]

    An experimen tal and theoretical comparison of model selection methods,

    M. Kearns, Y . Mansour, A. Y . Ng, and D. Ron, “An experimen tal and theoretical comparison of model selection methods,” Machine Learning, vol. 50, pp. 7–50, 1997

  36. [36]

    Learning decision rules in no isy do- mains,

    T. Niblett and I. Bratko, “Learning decision rules in no isy do- mains,” in Proceedings of Expert Systems ’86, The 6Th Annual T ech- nical Conference on Research and development in expert syst ems III . Cambridge University Press, 1986, pp. 25–34

  37. [37]

    UCI Machine Learning Repository,

    M. Lichman, “UCI Machine Learning Repository,” 2013. [ Online]. Available: http://archive.ics.uci.edu/ml

  38. [38]

    KEEL data-mining software tool: Data set repository , integration of algorithms and experimental analysis framework,

    J. Alcal´ a-Fdez, A. Fern´ andez, J. Luengo, J. Derrac, S. Garc´ ıa, L. S´ anchez, and F. Herrera, “KEEL data-mining software tool: Data set repository , integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing , vol. 17, no. 2-3, pp. 255–287, 2011

  39. [39]

    Design of experiments for the nips 2003 variable selection benchmark,

    I. Guyon, “Design of experiments for the nips 2003 variable selection benchmark,” 2003. [Online]. Available : clopinet.com/isabelle/Projects/NIPS2003

  40. [40]

    Scikit-learn: Machine learning in Python ,

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. T hirion, O. Grisel, M. Blondel, P . Prettenhofer, R. Weiss, V . Dubourg , J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perr ot, and E. Duchesnay , “Scikit-learn: Machine learning in Python ,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011

  41. [41]

    The effects of training set size on decision tree complexity ,

    D. Jensen and T. Oates, “The effects of training set size on decision tree complexity ,” in Proceedings of the 14th International Conference on Machine Learning , 1999, pp. 254–262

  42. [42]

    Data Scienc e Bowl Cardiac Challenge Data,

    National Heart, Lung, and Blood Institute, “Data Scienc e Bowl Cardiac Challenge Data,” 2015. [Online]. Available: www.kaggle.com/c/second-annual-data-science-bowl

  43. [43]

    Ponikowski et al

    P . Ponikowski et al. , “2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: The Task Force for the diagnosis and treatment of acute and chronic heart failure o f the European Society of Cardiology (ESC) Developed with the speci al contribution of the Heart Failure Association (HFA) of the E SC,” European heart journal ...

  44. [44]

    Statistical Comparison of Classifiers over M ultiple Data Sets,

    J. Demsar, “Statistical Comparison of Classifiers over M ultiple Data Sets,” Journal of Machine Learning Research , vol. 7, no. 7, pp. 1–30, 2006

  45. [45]

    Individual Comparisons by Ranking Metho ds,

    F. Wilcoxon, “Individual Comparisons by Ranking Metho ds,” Biometrics Bulletin , vol. 1, no. 6, pp. 80–83, 1945

  46. [46]

    The use of confidence or fiduci al limits illustrated in the case of the binomial,

    C. Clopper and E. Pearson, “The use of confidence or fiduci al limits illustrated in the case of the binomial,” Biometrika, vol. 26, no. 4, p. 404, 1934. (PREPRINT) IEEE TRANSACTIONS ON KNOWLEDGE AND DA T A ENGINEERING, SUBMITTED FOR REVIEW, AUGUST 2019 13 APPENDIX A PARAMETER TUNING Figures A.1 and A.2 display the average value of the param- eters that con...