A Bregman Perspective on Classification and Regression Trees

Mathias Bourel

read the original abstract

Classification and Regression Trees (CART) constitute one of the most influential paradigms in statistical learning. Although a variety of impurity measures have been proposed for different statistical models, these criteria are typically introduced on a case-by-case basis and analyzed separately. In this paper, we study CART through the lens of Bregman divergences. This perspective places the classical least-squares criterion, Poisson deviance, Kullback-Leibler-type losses, and other impurity measures associated with exponential-family models within a common framework. As a result, key ingredients of the CART methodology -- including node representatives, impurity measures, and split selection rules -- can be expressed and analyzed through general properties of convex functions rather than through separate model-specific constructions. Beyond the algorithmic formulation, we investigate theoretical properties of Bregman-based CART procedures. In particular, we analyze how geometric properties of the generating convex function influence impurity reductions and stability of recursive partitions. We also establish consistency results within the proposed framework, providing a unified theoretical treatment for a broad family of CART type procedures. Our results provide a geometric interpretation of impurity-based tree construction and show that many classical CART impurity criteria admit a common interpretation within a Bregman framework.

A Bregman Perspective on Classification and Regression Trees

discussion (0)