Feature quantization for parsimonious and interpretable predictive models

Adrien Ehrhardt; Christophe Biernacki; Philippe Heinrich; Vincent Vandewalle

arxiv: 1903.08920 · v1 · pith:DAUBWWL5new · submitted 2019-03-21 · 📊 stat.ME · econ.EM

Feature quantization for parsimonious and interpretable predictive models

Adrien Ehrhardt , Christophe Biernacki , Vincent Vandewalle , Philippe Heinrich This is my paper

classification 📊 stat.ME econ.EM

keywords predictiveoptimizationquantizationstepaccuracycategoricalconsumercontinuous

0 comments

read the original abstract

For regulatory and interpretability reasons, logistic regression is still widely used. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized and, if numerous, levels of categorical features are grouped. An even better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. But doing so, the predictive loss has to be optimized on a huge set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quantization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Cr\'edit Agricole Consumer Finance (a major European historic player in the consumer credit market).

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Calibrating Model-Based Evaluation Metrics for Summarization
cs.CL 2026-04 unverdicted novelty 5.0

A reference-free proxy scoring framework combined with GIRB calibration produces better-aligned evaluation metrics for summarization and outperforms baselines across seven datasets.