Decision Tree Learning for Uncertain Clinical Measurements
Pith reviewed 2026-05-24 15:31 UTC · model grok-4.3
The pith
Modeling uncertainty as noise only during decision tree training produces smaller trees that retain accuracy as noise increases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Soft training approaches that realize noise distributions when searching for split thresholds and when splitting training instances achieve a regularizing effect, leading to significant reductions in decision tree size while maintaining accuracy for increased noise; soft evaluation during prediction shows no benefit in handling noise.
What carries the argument
A probabilistic decision tree that independently realizes noise distributions in three phases: (1) searching for split thresholds, (2) splitting the training instances, and (3) generating predictions for unseen data.
If this is right
- Decision trees trained with soft thresholds can be smaller yet equally accurate when input measurements contain noise.
- The regularization benefit arises specifically from the training phases rather than from probabilistic prediction.
- Larger noise levels do not degrade accuracy when the soft training steps are used.
- Interpretability is preserved because the final tree structure remains a standard decision tree.
Where Pith is reading between the lines
- The same three-phase separation could be applied to other tree-based methods such as random forests to test whether the regularization generalizes.
- If clinical measurements exhibit systematic bias rather than zero-mean noise, the observed size reduction may not hold.
- The approach could be extended by learning the noise distribution parameters jointly with the tree rather than assuming them known.
Load-bearing premise
That modeling measurement uncertainty as independently realized noise distributions across the three phases is enough to capture the relevant uncertainty structure in clinical data.
What would settle it
A dataset in which measurement errors are correlated across features or across patients, tested under the same three-phase protocol, where the size-reduction effect disappears or reverses.
Figures
read the original abstract
Clinical decision requires reasoning in the presence of imperfect data. DTs are a well-known decision support tool, owing to their interpretability, fundamental in safety-critical contexts such as medical diagnosis. However, learning DTs from uncertain data leads to poor generalization, and generating predictions for uncertain data hinders prediction accuracy. Several methods have suggested the potential of probabilistic decisions at the internal nodes in making DTs robust to uncertainty. Some approaches only employ probabilistic thresholds during evaluation. Others also consider the uncertainty in the learning phase, at the expense of increased computational complexity or reduced interpretability. The existing methods have not clarified the merit of a probabilistic approach in the distinct phases of DT learning, nor when the uncertainty is present in the training or the test data. We present a probabilistic DT approach that models measurement uncertainty as a noise distribution, independently realized: (1) when searching for the split thresholds, (2) when splitting the training instances, and (3) when generating predictions for unseen data. The soft training approaches (1, 2) achieved a regularizing effect, leading to significant reductions in DT size, while maintaining accuracy, for increased noise. Soft evaluation (3) showed no benefit in handling noise.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces a probabilistic decision tree framework that models measurement uncertainty via independent noise distributions realized separately in three phases: (1) split threshold search, (2) training instance splitting, and (3) prediction on unseen data. It reports that soft training in phases (1) and (2) produces a regularizing effect with significantly smaller trees at maintained accuracy under increased noise, while soft evaluation in phase (3) yields no benefit.
Significance. If the empirical results hold under realistic conditions, the phase-specific analysis offers a clear way to isolate where probabilistic handling of uncertainty improves DT practicality in clinical settings, particularly by reducing model size (and thus improving interpretability) without accuracy loss. The explicit separation of training versus evaluation phases is a methodological strength that could guide future work on robust DTs.
major comments (2)
- [Abstract] Abstract (noise model): The central claim that soft training yields smaller trees at maintained accuracy rests on treating uncertainty as independently realized noise distributions across the three phases. Clinical measurements commonly exhibit correlated errors (e.g., shared instrument drift or patient physiology across features), which independent per-phase sampling does not reproduce. This independence assumption is load-bearing for the practical conclusion; without experiments using multivariate or correlated noise, the reported regularization benefit may not transfer to clinical data.
- [Abstract] Abstract (empirical support): The abstract asserts 'significant reductions in DT size' and 'maintaining accuracy' but supplies no information on datasets, noise distribution families, baseline comparators, number of replicates, or statistical tests. These details are required to evaluate whether the regularization effect is robust or an artifact of the chosen experimental conditions.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below. We agree that the abstract requires additional empirical details and will revise it to include them. For the noise model, we will add discussion of the independence assumption as a modeling choice while noting its implications.
read point-by-point responses
-
Referee: [Abstract] Abstract (noise model): The central claim that soft training yields smaller trees at maintained accuracy rests on treating uncertainty as independently realized noise distributions across the three phases. Clinical measurements commonly exhibit correlated errors (e.g., shared instrument drift or patient physiology across features), which independent per-phase sampling does not reproduce. This independence assumption is load-bearing for the practical conclusion; without experiments using multivariate or correlated noise, the reported regularization benefit may not transfer to clinical data.
Authors: Our framework deliberately models uncertainty via independent noise distributions realized separately in each phase precisely to isolate the effects of soft decisions during threshold search, instance splitting, and prediction. This separation is central to the phase-specific analysis. While we recognize that correlated errors occur in clinical measurements, the regularization benefit of soft training is shown under the independent model. We will revise the manuscript to explicitly state this modeling assumption and discuss its potential limitations for direct applicability to correlated clinical data. revision: partial
-
Referee: [Abstract] Abstract (empirical support): The abstract asserts 'significant reductions in DT size' and 'maintaining accuracy' but supplies no information on datasets, noise distribution families, baseline comparators, number of replicates, or statistical tests. These details are required to evaluate whether the regularization effect is robust or an artifact of the chosen experimental conditions.
Authors: We agree that the abstract would be strengthened by including these details. In the revised manuscript we will expand the abstract to specify the datasets, noise distribution families, baseline comparators, number of replicates, and statistical tests used to support the reported reductions in tree size and maintained accuracy. revision: yes
Circularity Check
No circularity: empirical claims rest on experimental comparisons, not derivations or self-referential reductions.
full rationale
The paper describes probabilistic decision tree methods that model measurement uncertainty as independent noise distributions applied in three phases (threshold search, instance splitting, prediction). It reports empirical results showing regularization effects from soft training phases. No equations, derivations, or first-principles claims are present that reduce outputs to inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims are statistical outcomes from experiments, which are externally falsifiable and do not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Measurement uncertainty in clinical data can be adequately represented as independent noise distributions realized separately during split search, instance assignment, and prediction
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
models measurement uncertainty as a noise distribution, independently realized: (1) when searching for the split thresholds, (2) when splitting the training instances, and (3) when generating predictions
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The soft training approaches (1, 2) achieved a regularizing effect, leading to significant reductions in DT size, while maintaining accuracy, for increased noise.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
The coming of age o f artificial intelligence in medicine,
V . L. Patel, E. H. Shortliffe, M. Stefanelli, P . Szolovits, M . R. Berthold, R. Bellazzi, and A. Abu-Hanna, “The coming of age o f artificial intelligence in medicine,” Artificial Intelligence in Medicine , vol. 46, no. 1, pp. 5–17, 2009
work page 2009
-
[2]
Health Informatics via Machine Learning for the Clinical M an- agement of Patients,
D. A. Clifton, K. E. Niehaus, P . Charlton, and G. W. Colopy , “Health Informatics via Machine Learning for the Clinical M an- agement of Patients,” Y earb Med Inform, vol. 10, no. 1, pp. 38–43, 2015
work page 2015
-
[3]
Exploratory medical k nowl- edge discovery: Experiences and issues,
J. Roddick, P . Fule, and W. Graco, “Exploratory medical k nowl- edge discovery: Experiences and issues,” ACM SIGKDD Explo- rations Newsletter, pp. 2–7, 2003
work page 2003
-
[4]
Intelligent data analysis for medical diagnosis: Using ma chine learning and temporal abstraction,
N. Lavraˇ c, I. Kononenko, E. Keravnou, M. Kukar, and B. Zu pan, “Intelligent data analysis for medical diagnosis: Using ma chine learning and temporal abstraction,” AI Communications , vol. 11, no. 3, pp. 191–218, 1998
work page 1998
-
[5]
Data quality: A sta tistical perspective,
A. F. Karr, A. P . Sanil, and D. L. Banks, “Data quality: A sta tistical perspective,” Statistical Methodology , vol. 3, no. 2, pp. 137–173, 2006
work page 2006
-
[6]
Evaluat ion of measurement data - guide to the expression of uncertainty in measurement,
W. G. . Joint Committee for Guides in Metrology , “Evaluat ion of measurement data - guide to the expression of uncertainty in measurement,” in T ech. Rep. JCGM 100: 2008 (BIPM, IEC, IFCC, ILAC, ISO, IUP AC, IUP AP and OIML, 2008
work page 2008
-
[7]
Uniqueness of medical da ta mining,
K. J. Cios and G. William Moore, “Uniqueness of medical da ta mining,” Artificial Intelligence in Medicine , vol. 26, no. 1-2, pp. 1–24, 2002
work page 2002
-
[8]
K. Singh, B. K. Jacobsen, S. Solberg, K. H. Bønaa, S. Kumar, R. B ajic, and E. Arnesen, “Intra- and interobserver variability in th e mea- surements of abdominal aortic and common iliac artery diame ter with computed tomography . The Tromsø study,” European Journal Vascular and Endovascular Surgery, vol. 25, no. 5, pp. 399–407, 2003
work page 2003
-
[9]
Measuring left ventricular ejecti on fraction-techniques and potential pitfalls,
T. Foley , S. Mankad, N. Anavekar, C. Bonnichsen, M. Morris , T. Miller, and P . Araoz, “Measuring left ventricular ejecti on fraction-techniques and potential pitfalls,” European Cardiology , vol. 8, no. 2, pp. 108–114, 2012
work page 2012
-
[10]
J. R. Lopez-Minguez, R. Gonzalez-Fernandez, C. Fernan dez-V egas, V . Millan-Nunez, M. E. Fuentes-Canamero, J. M. Nogales-Asensio, J. Doncel-V ecino, M. Y uste Dominguez, L. Garcia Serrano, and D. Sanchez Quintana, “Comparison of imaging techniques to assess appendage anatomy and measurements for left atrial a p- pendage closure device selection.” The Jou...
work page 2014
-
[11]
The quantita tive science of evaluating imaging evidence,
T. S. Genders, B. S. Ferket, and M. M. Hunink, “The quantita tive science of evaluating imaging evidence,” JACC: Cardiovascular Imaging, vol. 10, no. 3, pp. 264–275, 2017
work page 2017
-
[12]
S. de Haan, K. de Boer, J. Commandeur, A. M. Beek, A. C. van Rossum, and C. P . Allaart, “Assessment of left ventricular e jection fraction in patients eligible for ICD therapy: Discrepancy between cardiac magnetic resonance imaging and 2D echocardiograph y,” Netherlands Heart Journal , vol. 22, no. 10, pp. 449–455, 2014
work page 2014
-
[13]
Closing the chasm between research and pra ctice: evidence of and for change,
L. W. Green, “Closing the chasm between research and pra ctice: evidence of and for change,” Health Promotion Journal of Australia , vol. 25, no. 1, pp. 25–29, 2014. (PREPRINT) IEEE TRANSACTIONS ON KNOWLEDGE AND DA T A ENGINEERING, SUBMITTED FOR REVIEW, AUGUST 2019 12
work page 2014
-
[14]
Interactive dichotomizer, id3,
J. Quinlan et al. , “Interactive dichotomizer, id3,” Eds. Morgan Kauffmann, Springer-Verlag, 1979
work page 1979
-
[15]
Quinlan, C4.5: Programs for Machine Learning
R. Quinlan, C4.5: Programs for Machine Learning . San Mateo, CA: Morgan Kaufmann Publishers, 1993
work page 1993
-
[16]
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classifica- tion and Regression T rees. Belmont, CA: Wadsworth International Group, 1984
work page 1984
-
[17]
An exploratory technique for investigatin g large quantities of categorical data,
G. V . Kass, “An exploratory technique for investigatin g large quantities of categorical data,” Applied statistics, pp. 119–127, 1980
work page 1980
-
[18]
Can machine-learning improve cardiovascular risk prediction using routine clinical data?
S. F. Weng, J. Reps, J. Kai, J. M. Garibaldi, and N. Qureshi , “Can machine-learning improve cardiovascular risk prediction using routine clinical data?” PLOS ONE, vol. 12, no. 4, 2017
work page 2017
-
[19]
“Regulation (EU) 2016/679 of the European Parliament a nd of the Council of 27 April 2016 on the protection of natural pers ons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC,” 2 016 O.J. L 119, 4.5.:1–88
work page 2016
-
[20]
Decision trees as probabilistic classi fiers,
J. R. Quinlan, “Decision trees as probabilistic classi fiers,” in Pro- ceedings of the 4th International Workshop on Machine Learn ing. Morgan Kauffman, 1987, pp. 31–37
work page 1987
-
[21]
Softening splits in decision trees using simulated annealing,
J. Dvor´ ak and P . Savick ´y, “Softening splits in decision trees using simulated annealing,” in Adaptive and Natural Computing Algorithms, 8th International Conference, ICANNGA 2007, W arsaw, Poland, April 11-14, 2007, Proceedings, Part I , 2007, pp. 721–729
work page 2007
-
[22]
Decision trees for uncertain data,
S. Tsang, B. Kao, K. Y . Yip, W.-S. Ho, and S. D. Lee, “Decision trees for uncertain data,” IEEE transactions on knowledge and data engineering, vol. 23, no. 1, pp. 64–78, 2011
work page 2011
-
[23]
O. Irsoy , O. T. Yıldız, and E. Alpaydın, “Soft decision tr ees,” in Pattern Recognition (ICPR), 2012 21st International Confe rence on . IEEE, 2012, pp. 1819–1822
work page 2012
-
[24]
Induction of fuzzy decision trees,
Y . Y uan, “Induction of fuzzy decision trees,” Fuzzy Sets and Sys- tems, vol. 69, no. 2, pp. 125–139, 1995
work page 1995
-
[25]
On the optimization of fuzzy decision trees,
X. Wang, B. Chen, G. Qian, and F. Y e, “On the optimization of fuzzy decision trees,” Fuzzy Sets and Systems , vol. 112, no. 1, pp. 117–125, may 2000
work page 2000
-
[26]
On Distribu ted Fuzzy Decision Trees for Big Data,
A. Segatori, F. Marcelloni, and W. Pedrycz, “On Distribu ted Fuzzy Decision Trees for Big Data,” IEEE T ransactions on Fuzzy Systems , pp. 1–1, 2017
work page 2017
-
[27]
J. R. Quinlan, “Probabilistic decision trees,” Machine learning: an artificial intelligence approach , vol. 3, pp. 140–152, 1990
work page 1990
-
[28]
Hierarchical mixtures of experts and the em algorithm,
M. I. Jordan and R. A. Jacobs, “Hierarchical mixtures of experts and the em algorithm,” Neural computation , vol. 6, no. 2, pp. 181– 214, 1994
work page 1994
-
[29]
Constructing optimal binary decision trees is NP-complete,
L. Hyafil and R. L. Rivest, “Constructing optimal binary decision trees is NP-complete,” Information Processing Letters , vol. 5, no. 1, pp. 15–17, 1976
work page 1976
-
[30]
J. R. Quinlan, “Induction of decision trees,” Machine Learning , vol. 1, no. 1, pp. 81–106, 1986
work page 1986
-
[31]
Top-down induction of decisio n trees classifiers - A survey ,
L. Rokach and O. Maimon, “Top-down induction of decisio n trees classifiers - A survey ,” IEEE T ransactions on Systems, Man and Cybernetics Part C: Applications and Reviews , vol. 35, no. 4, pp. 476– 487, 2005
work page 2005
-
[32]
Ross Quinlan’s personal homepage
Quinlan, Ross. Ross Quinlan’s personal homepage. Acce ssed: 2018-06-03. [Online]. Available: www.rulequest.com/Personal/
work page 2018
-
[33]
Bayesian model averaging: a tutorial,
J. A. Hoeting, D. Madigan, A. E. Raftery , and C. T. V olins ky , “Bayesian model averaging: a tutorial,” Statistical science, pp. 382– 401, 1999
work page 1999
-
[34]
J. D’Hooge, D. Barbosa, H. Gao, P . Claus, D. Prater, J. Ha milton, P . Lysyansky , Y . Abe, Y . Ito, H. Houle et al. , “Two-dimensional speckle tracking echocardiography: standardization effo rts based on synthetic ultrasound data,” Eur Heart J Cardiovasc Imaging , vol. 17, no. 6, pp. 693–701, 2016
work page 2016
-
[35]
An experimen tal and theoretical comparison of model selection methods,
M. Kearns, Y . Mansour, A. Y . Ng, and D. Ron, “An experimen tal and theoretical comparison of model selection methods,” Machine Learning, vol. 50, pp. 7–50, 1997
work page 1997
-
[36]
Learning decision rules in no isy do- mains,
T. Niblett and I. Bratko, “Learning decision rules in no isy do- mains,” in Proceedings of Expert Systems ’86, The 6Th Annual T ech- nical Conference on Research and development in expert syst ems III . Cambridge University Press, 1986, pp. 25–34
work page 1986
-
[37]
UCI Machine Learning Repository,
M. Lichman, “UCI Machine Learning Repository,” 2013. [ Online]. Available: http://archive.ics.uci.edu/ml
work page 2013
-
[38]
J. Alcal´ a-Fdez, A. Fern´ andez, J. Luengo, J. Derrac, S. Garc´ ıa, L. S´ anchez, and F. Herrera, “KEEL data-mining software tool: Data set repository , integration of algorithms and experimental analysis framework,” Journal of Multiple-Valued Logic and Soft Computing , vol. 17, no. 2-3, pp. 255–287, 2011
work page 2011
-
[39]
Design of experiments for the nips 2003 variable selection benchmark,
I. Guyon, “Design of experiments for the nips 2003 variable selection benchmark,” 2003. [Online]. Available : clopinet.com/isabelle/Projects/NIPS2003
work page 2003
-
[40]
Scikit-learn: Machine learning in Python ,
F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. T hirion, O. Grisel, M. Blondel, P . Prettenhofer, R. Weiss, V . Dubourg , J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perr ot, and E. Duchesnay , “Scikit-learn: Machine learning in Python ,” Journal of Machine Learning Research , vol. 12, pp. 2825–2830, 2011
work page 2011
-
[41]
The effects of training set size on decision tree complexity ,
D. Jensen and T. Oates, “The effects of training set size on decision tree complexity ,” in Proceedings of the 14th International Conference on Machine Learning , 1999, pp. 254–262
work page 1999
-
[42]
Data Scienc e Bowl Cardiac Challenge Data,
National Heart, Lung, and Blood Institute, “Data Scienc e Bowl Cardiac Challenge Data,” 2015. [Online]. Available: www.kaggle.com/c/second-annual-data-science-bowl
work page 2015
-
[43]
P . Ponikowski et al. , “2016 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure: The Task Force for the diagnosis and treatment of acute and chronic heart failure o f the European Society of Cardiology (ESC) Developed with the speci al contribution of the Heart Failure Association (HFA) of the E SC,” European heart journal ...
work page 2016
-
[44]
Statistical Comparison of Classifiers over M ultiple Data Sets,
J. Demsar, “Statistical Comparison of Classifiers over M ultiple Data Sets,” Journal of Machine Learning Research , vol. 7, no. 7, pp. 1–30, 2006
work page 2006
-
[45]
Individual Comparisons by Ranking Metho ds,
F. Wilcoxon, “Individual Comparisons by Ranking Metho ds,” Biometrics Bulletin , vol. 1, no. 6, pp. 80–83, 1945
work page 1945
-
[46]
The use of confidence or fiduci al limits illustrated in the case of the binomial,
C. Clopper and E. Pearson, “The use of confidence or fiduci al limits illustrated in the case of the binomial,” Biometrika, vol. 26, no. 4, p. 404, 1934. (PREPRINT) IEEE TRANSACTIONS ON KNOWLEDGE AND DA T A ENGINEERING, SUBMITTED FOR REVIEW, AUGUST 2019 13 APPENDIX A PARAMETER TUNING Figures A.1 and A.2 display the average value of the param- eters that con...
work page 1934
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.