pith. sign in

arxiv: 1906.10179 · v1 · pith:KN2KHGDJnew · submitted 2019-06-24 · 📊 stat.ME

The Power of Unbiased Recursive Partitioning: A Unifying View of CTree, MOB, and GUIDE

Pith reviewed 2026-05-25 16:55 UTC · model grok-4.3

classification 📊 stat.ME
keywords unbiased recursive partitioningCTreeMOBGUIDEmodel treesgoodness-of-fit measurevariable selectionsplitting criteria
0
0 comments X

The pith

A unifying framework for CTree, MOB, and GUIDE shows that model scores without dichotomization provide higher power for selecting split variables than other building blocks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper embeds three popular unbiased recursive partitioning algorithms into a single parametric inference framework for model trees. This unification lets the authors isolate the effects of different components: the choice of goodness-of-fit measure, whether to dichotomize it, and how to bin the covariates. A sympathetic reader cares because the analysis reveals that the goodness-of-fit measure matters most, with model scores used directly (no dichotomization) yielding better detection of relevant covariates in many cases. This helps explain performance differences and guides improvements to tree-learning procedures.

Core claim

By placing CTree, MOB, and GUIDE inside one common framework for parametric model trees, the relative advantages of their inference approaches become comparable. The assessment of building blocks shows that the observation-wise goodness-of-fit measure is the key driver of power, with model scores without dichotomization performing much better than residuals or dichotomized versions in many scenarios.

What carries the argument

The common parametric inference framework that embeds the three algorithms and isolates the effects of goodness-of-fit measures, dichotomization, and binning on split-variable selection.

If this is right

  • Model scores without dichotomization increase the power to select appropriate splitting covariates compared to residuals or dichotomized scores.
  • The choice of goodness-of-fit measure has a larger effect on performance than binning of possible split variables.
  • Classical association tests, conditional inference, and parameter instability tests can be compared directly once placed in the shared framework.
  • Linear model trees benefit from using full model scores rather than dichotomized residuals for split selection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar unification might be applied to other tree algorithms outside these three to compare their power sources.
  • Practitioners could prioritize model-score based selection when implementing new unbiased partitioning methods.
  • Extensions to non-linear models or survival trees might show the same dominance of the goodness-of-fit choice.

Load-bearing premise

The three algorithms can be embedded into one parametric inference framework without changing their core variable-selection behavior or introducing artifacts that favor one over the others.

What would settle it

A simulation study in which dichotomized residuals select the correct covariate more often than undichotomized model scores across multiple scenarios would falsify the central claim about the importance of the goodness-of-fit measure.

Figures

Figures reproduced from arXiv: 1906.10179 by Achim Zeileis, Lisa Schlosser, Torsten Hothorn.

Figure 1
Figure 1. Figure 1: Top left panel: True stump structure applied in the data generating process for the [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left panel: True tree structure applied in the data generating process for the “tree” [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Selection probability of the true split variable [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Effects of the three building blocks residuals vs. scores, dichotomization, and catego￾rization in the “stump” scenario with the intercept and the slope parameter both varying. All possible combinations have been evaluated in two different settings and their performances are compared based on the mean p-values corresponding to the true split variable Z1. Left panel: true split point at the median 0 and wit… view at source ↗
Figure 5
Figure 5. Figure 5: Adjusted Rand index (ARI) for the testing strategies CTree, MOB, GUIDE, and [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Proportion of replications where the true split variable [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Selection probability of the true split variable [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Selection probability of the true split variable [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Selection probability of the true split variable [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗
read the original abstract

A core step of every algorithm for learning regression trees is the selection of the best splitting variable from the available covariates and the corresponding split point. Early tree algorithms (e.g., AID, CART) employed greedy search strategies, directly comparing all possible split points in all available covariates. However, subsequent research showed that this is biased towards selecting covariates with more potential split points. Therefore, unbiased recursive partitioning algorithms have been suggested (e.g., QUEST, GUIDE, CTree, MOB) that first select the covariate based on statistical inference using p-values that are adjusted for the possible split points. In a second step a split point optimizing some objective function is selected in the chosen split variable. However, different unbiased tree algorithms obtain these p-values from different inference frameworks and their relative advantages or disadvantages are not well understood, yet. Therefore, three different popular approaches are considered here: classical categorical association tests (as in GUIDE), conditional inference (as in CTree), and parameter instability tests (as in MOB). First, these are embedded into a common inference framework encompassing parametric model trees, in particular linear model trees. Second, it is assessed how different building blocks from this common framework affect the power of the algorithms to select the appropriate covariates for splitting: observation-wise goodness-of-fit measure (residuals vs. model scores), dichotomization of residuals/scores at zero, and binning of possible split variables. This shows that specifically the goodness-of-fit measure is crucial for the power of the procedures, with model scores without dichotomization performing much better in many scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript embeds the variable-selection procedures of GUIDE (categorical association tests), CTree (conditional inference), and MOB (parameter instability tests) into a single parametric inference framework for linear model trees. Within this framework it isolates the effects of three building blocks—choice of observation-wise goodness-of-fit measure (residuals versus model scores), dichotomization at zero, and binning of split variables—on the power to recover the true splitting covariate, concluding that model scores without dichotomization are the dominant factor and yield substantially higher power in many simulation scenarios.

Significance. If the embedding faithfully reproduces the original selection behavior of each algorithm and the simulation design isolates the claimed effects, the work supplies a useful decomposition of why these unbiased tree methods differ in practice and identifies a concrete, easily implemented improvement (model scores, no dichotomization). The paper also supplies reproducible code and a clear parametric unification, both of which strengthen its contribution to the literature on recursive partitioning.

major comments (2)
  1. [§3] §3 (Embedding): The central claim that differences in power can be attributed to the goodness-of-fit measure rests on the assumption that the three embedded procedures reproduce the original covariate-selection frequencies of GUIDE, CTree, and MOB. No calibration experiment or table comparing selection rates on benchmark data is reported; any mismatch in multiplicity correction, asymptotic approximation, or categorical-variable handling would confound the power comparisons.
  2. [§4.2] §4.2, simulation design: The power results are obtained exclusively inside the unified framework. Without a side-by-side comparison of the embedded versions against the published implementations of GUIDE, CTree, and MOB on the same data-generating processes, it is impossible to verify that the reported superiority of model scores is not an artifact of the embedding.
minor comments (2)
  1. [§2] Notation for the score function and the test statistic is introduced in §2 but not consistently reused in the simulation tables; a single equation reference would improve readability.
  2. [Figure 3] Figure 3 caption does not state the number of Monte Carlo replications or the exact data-generating process used for the power curves.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important aspects regarding the validation of our embedding approach. We address each major comment below and will make revisions to incorporate additional calibration and comparison results.

read point-by-point responses
  1. Referee: [§3] §3 (Embedding): The central claim that differences in power can be attributed to the goodness-of-fit measure rests on the assumption that the three embedded procedures reproduce the original covariate-selection frequencies of GUIDE, CTree, and MOB. No calibration experiment or table comparing selection rates on benchmark data is reported; any mismatch in multiplicity correction, asymptotic approximation, or categorical-variable handling would confound the power comparisons.

    Authors: We agree that a calibration study would provide stronger support for the claim. The embedding in §3 is designed to replicate the inference procedures of the original algorithms as closely as possible within the parametric linear model tree framework. For example, the score-based tests for MOB and the conditional inference for CTree are implemented using the same test statistics. To address the concern, we will add a new subsection with a calibration experiment comparing selection frequencies of the embedded methods to the original implementations on benchmark datasets. revision: yes

  2. Referee: [§4.2] §4.2, simulation design: The power results are obtained exclusively inside the unified framework. Without a side-by-side comparison of the embedded versions against the published implementations of GUIDE, CTree, and MOB on the same data-generating processes, it is impossible to verify that the reported superiority of model scores is not an artifact of the embedding.

    Authors: The goal of the simulation design is to isolate the effects of the individual building blocks (goodness-of-fit measure, dichotomization, and binning) by varying them within the common framework. This allows us to attribute power differences specifically to these components rather than to other algorithmic differences. We recognize that direct comparisons with the original software would further validate the results. In the revision, we will include such side-by-side comparisons for the data-generating processes used in the simulations. revision: yes

Circularity Check

0 steps flagged

No circularity: unifying framework and power comparisons are independent of inputs.

full rationale

The paper constructs a common parametric inference framework to embed the three algorithms (categorical association, conditional inference, parameter instability) and then evaluates the effect of building blocks such as observation-wise goodness-of-fit measures and dichotomization on selection power. No equation or result is shown to reduce by construction to a fitted quantity from the same data, nor does any central claim rest on a self-citation chain that itself assumes the target result. The embedding and subsequent assessment are presented as new analytical steps whose outputs are not definitionally equivalent to the inputs; external citations to the original algorithms serve only as reference points rather than load-bearing justifications for the power findings.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the framework is described only at the level of existing inference procedures.

pith-pipeline@v0.9.0 · 5825 in / 1099 out tokens · 23752 ms · 2026-05-25T16:55:34.224156+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages · 1 internal anchor

  1. [1]

    Classification and Regression Trees

    Breiman L, Friedman JH, Olshen RA, Stone CJ (1984). Classification and Regression Trees. Wadsworth, California

  2. [2]

    Nonparametric Estimation of Conditional Quantiles Using Quantile Regression Trees

    Chaudhuri P, Loh WY (2002). Nonparametric Estimation of Conditional Quantiles Using Quantile Regression Trees. Bernoulli, 8(5), 561--576. ://projecteuclid.org:443/euclid.bj/1078435218

  3. [3]

    The Use of Automatic Interaction Detector and Similar Search Procedures

    Doyle P (1973). The Use of Automatic Interaction Detector and Similar Search Procedures. Operational Research Quarterly (1970-1977), 24(3), 465--467. ://www.jstor.org/stable/3008131

  4. [4]

    Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees

    Fokkema M, Smits N, Zeileis A, Hothorn T, Kelderman H (2018). Detecting Treatment-Subgroup Interactions in Clustered Data with Generalized Linear Mixed-Effects Model Trees. Behavior Research Methods, 50(5), 2016--2034. doi:10.3758/s13428-017-0971-x

  5. [5]

    Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned

    Gr\"un B, Kosmidis I, Zeileis A (2012). Extended Beta Regression in R: Shaken, Stirred, Mixed, and Partitioned. Journal of Statistical Software, Articles, 48(11), 1--25. ISSN 1548-7660. doi:10.18637/jss.v048.i11

  6. [6]

    A L ego System for Conditional Inference

    Hothorn T, Hornik K, Van de Wiel MA, Zeileis A (2006 a ). A L ego System for Conditional Inference. The American Statistician, 60(3), 257--263. doi:10.1198/000313006X118430

  7. [7]

    Unbiased Recursive Partitioning: A Conditional Inference Framework

    Hothorn T, Hornik K, Zeileis A (2006 b ). Unbiased Recursive Partitioning: A Conditional Inference Framework. Journal of Computational and Graphical Statistics, 15(3), 651--674. doi:10.1198/106186006x133933

  8. [8]

    Generalized Maximally Selected Statistics

    Hothorn T, Zeileis A (2008). Generalized Maximally Selected Statistics. Biometrics, 64(4), 1263--1269. doi:10.1111/j.1541-0420.2008.00995.x

  9. [9]

    Transformation Forests

    Hothorn T, Zeileis A (2017). Transformation Forests. arXiv 1701.02110, arXiv.org E-Print Archive. ://arxiv.org/abs/1701.02110

  10. [10]

    Classification Trees with Unbiased Multiway Splits

    Kim H, Loh WY (2001). Classification Trees with Unbiased Multiway Splits. Journal of the American Statistical Association, 96(454), 589--604. doi:10.1198/016214501753168271

  11. [11]

    Regression Trees with Unbiased Variable Selection and Interaction Detection

    Loh WY (2002). Regression Trees with Unbiased Variable Selection and Interaction Detection. Statistica Sinica, 12(2), 361--386. ://www.jstor.org/stable/24306967

  12. [12]

    A Regression Tree Approach to Identifying Subgroups with Differential Treatment Effects

    Loh WY, He X, Man M (2015). A Regression Tree Approach to Identifying Subgroups with Differential Treatment Effects. Statistics in Medicine, 34(11), 1818--1833. doi:10.1002/sim.6454

  13. [13]

    Split Selection Methods for Classification Trees

    Loh WY, Shih YS (1997). Split Selection Methods for Classification Trees. Statistica Sinica, 7(4), 815--840. ://www.jstor.org/stable/24306157

  14. [14]

    Tree-Structured Classification via Generalized Discriminant Analysis

    Loh WY, Vanichsetakul N (1988). Tree-Structured Classification via Generalized Discriminant Analysis. Journal of the American Statistical Association, 83(403), 715--725. doi:10.1080/01621459.1988.10478652

  15. [15]

    Regression Trees for Longitudinal and Multiresponse Data

    Loh WY, Zheng W (2013). Regression Trees for Longitudinal and Multiresponse Data. The Annals of Applied Statistics, 7(1), 495--522. doi:10.1214/12-AOAS596

  16. [16]

    Probability Inequalities for Sums of Bounded Ra n- dom Variables

    Morgan JN, Sonquist JA (1963). Problems in the Analysis of Survey Data, and a Proposal. Journal of the American Statistical Association, 58(302), 415--434. doi:10.1080/01621459.1963.10500855

  17. [17]

    The CUSUM Test with OLS Residuals

    Ploberger W, Kr\"amer W (1992). The CUSUM Test with OLS Residuals . Econometrica, 60(2), 271--285. doi:10.2307/2951597

  18. [18]

    Distributional Regression Forests for Probabilistic Precipitation Forecasting in Complex Terrain

    Schlosser L, Hothorn T, Stauffer R, Zeileis A (2019). Distributional Regression Forests for Probabilistic Precipitation Forecasting in Complex Terrain. arXiv 1804.02921, arXiv.org E-Print Archive. ://arxiv.org/abs/1804.02921

  19. [19]

    Model-Based Recursive Partitioning for Subgroup Analyses

    Seibold H, Zeileis A, Hothorn T (2016). Model-Based Recursive Partitioning for Subgroup Analyses. The International Journal of Biostatistics, 12(1), 45--63. doi:10.1515/ijb-2015-0032

  20. [20]

    On the Asymptotic Theory of Permutation Statistics

    Strasser H, Weber C (1999). On the Asymptotic Theory of Permutation Statistics. Mathematical Methods of Statistics, 8, 220--250

  21. [21]

    Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model

    Strobl C, Kopf J, Zeileis A (2015). Rasch Trees: A New Method for Detecting Differential Item Functioning in the Rasch Model. Psychometrika, 80(2), 289--316. doi:10.1007/s11336-013-9388-3

  22. [22]

    Maximum Likelihood Regression Trees

    Su X, Wang M, Fan J (2004). Maximum Likelihood Regression Trees. Journal of Computational and Graphical Statistics, 13(3), 586--598. doi:10.1198/106186004X2165

  23. [23]

    Generalized M -Fluctuation Tests for Parameter Instability

    Zeileis A, Hornik K (2007). Generalized M -Fluctuation Tests for Parameter Instability. Statistica Neerlandica, 61(4), 488--508. doi:10.1111/j.1467-9574.2007.00371.x

  24. [24]

    A Toolbox of Permutation Tests for Structural Change

    Zeileis A, Hothorn T (2013). A Toolbox of Permutation Tests for Structural Change. Statistical Papers, 54(4), 931--954. doi:10.1007/s00362-013-0503-4

  25. [25]

    Model-Based Recursive Partitioning

    Zeileis A, Hothorn T, Hornik K (2008). Model-Based Recursive Partitioning. Journal of Computational and Graphical Statistics, 17(2), 492--514. doi:10.1198/106186008x319331

  26. [26]

    , " * write output.state after.block = add.period write newline

    ENTRY address archive author booktitle chapter collaboration doi edition editor eid eprint howpublished institution isbn issn journal key month note number numpages organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION ...

  27. [27]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...