State-of-the-art in selection of variables and functional forms in multivariable analysis -- outstanding issues

Aris Perperoglou; Daniela Dunkler; Frank E. Harrell Jr; Georg Heinze (for TG2 of the STRATOS initiative); Harald Binder; Heiko Becher; Matthias Schmid; Michal Abrahamowicz; Patrick Royston; Willi Sauerbrei

arxiv: 1907.00786 · v1 · pith:7BIMU7YUnew · submitted 2019-07-01 · 📊 stat.ME

State-of-the-art in selection of variables and functional forms in multivariable analysis -- outstanding issues

Willi Sauerbrei , Aris Perperoglou , Matthias Schmid , Michal Abrahamowicz , Heiko Becher , Harald Binder , Daniela Dunkler , Frank E. Harrell Jr

show 2 more authors

Patrick Royston Georg Heinze (for TG2 of the STRATOS initiative)

This is my paper

Pith reviewed 2026-05-25 12:06 UTC · model grok-4.3

classification 📊 stat.ME

keywords variable selectionfunctional form selectionmultivariable regressionmodel buildingstatistical modellingmethod comparisonevidence review

0 comments

The pith

No sufficient evidence exists yet to recommend methods for selecting variables and functional forms in multivariable models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This review examines long-used approaches to choosing variables and modeling continuous predictors in regression analysis. Traditional ad hoc techniques and more recent alternatives have both been available for decades, yet the authors find that direct comparisons of their operating characteristics remain rare. Without such comparisons, no evidence-based guidance can be offered to researchers who have only basic statistical training. Two medical examples illustrate the practical difficulties, and the paper identifies seven specific topics that need further study to build a foundation for recommendations.

Core claim

An overview of the literature shows that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, seven important topics require further investigation.

What carries the argument

Overview of general issues in descriptive regression modeling together with strategies for variable selection, approaches to choosing functional forms for continuous variables, and techniques that combine both selections, illustrated by medical examples.

If this is right

Recommendations for practitioners cannot yet be supported by evidence.
Direct comparisons between methods are required before guidance can be offered.
Seven priority topics have been identified for future research on multivariable modeling.
Fields that rely on regression models, such as medicine, will continue to use ad hoc choices until better comparisons exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Standardized simulation designs and performance metrics would make future method comparisons more cumulative.
The same evidence gaps likely affect automated model-building procedures used in larger data settings.

Load-bearing premise

The authors' judgment that knowledge of the properties of alternative methods and meaningful comparisons between them remain scarce is accurate and complete.

What would settle it

A large-scale simulation study or set of real-data analyses that systematically compares several methods across varied scenarios and shows consistent performance differences would supply the missing evidence.

read the original abstract

How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc 'traditional' approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state-of-the-art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics. We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables, and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling. Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This STRATOS review flags seven specific gaps in comparative evidence for variable selection and functional forms but supplies no new comparisons or data itself.

read the letter

The main point is that Sauerbrei et al. have compiled an expert overview of longstanding problems in multivariable modeling and listed seven topics where head-to-head evidence is still thin. The paper does not test any methods or run simulations; it points out where such work is needed. It does a reasonable job laying out the usual issues with stepwise selection, describing alternatives like fractional polynomials and splines, and showing two medical examples where bad modeling choices affect interpretation. The moderate technical tone matches their stated goal of reaching practitioners and students. The authors are a credible group with long experience in this area, so their judgment on what counts as a meaningful comparison carries some weight. The soft spots are that the claim of scarce evidence rests on their narrative synthesis rather than a systematic literature search with explicit criteria, which leaves room for missed studies or differing views on what has already been compared. The paper is from 2019, so some of the listed gaps may have narrowed since. No equations or derivations appear, so there are no technical inconsistencies to check. This is mainly for applied statisticians and medical researchers who routinely fit regression models and want a structured list of open questions rather than a new technique. A reader already familiar with the basic methods would get the most from the gap agenda. It deserves peer review because it can help set priorities for methodological work, though referees would likely press for more detail on how the seven topics were selected and whether recent simulation studies were considered.

Referee Report

1 major / 1 minor

Summary. The paper provides an overview of methods for variable selection and determining functional forms in multivariable regression analysis. It discusses general issues in model building, strategies for variable selection, approaches to functional forms for continuous variables, and methods combining both. Using two examples from the medical literature, it illustrates practical problems. The main conclusion is that there is not yet enough evidence to base recommendations, and seven important topics are identified as requiring further investigation through comparisons of alternative methods.

Significance. If the synthesis of the literature holds, the paper's identification of specific gaps and suggestions for future research could significantly influence the direction of methodological work in statistics, especially in applied fields like medicine. It promotes the idea that evidence from method comparisons is needed, which aligns with good scientific practice. The moderate technical level makes it accessible to a wide audience.

major comments (1)

The claim that 'knowledge of the properties of alternative methods and meaningful comparisons between them are scarce' underpins the identification of outstanding issues; however, the manuscript does not detail the search strategy or inclusion criteria used for the overview, raising a risk that the assessment of evidence scarcity may not be comprehensive (see paragraph on main aims).

minor comments (1)

The sentence 'To define a state-of-the-art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge many outstanding issues in multivariable modelling remain' appears incomplete or awkwardly phrased; consider revising for clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and positive assessment of the paper's potential impact. We address the single major comment below.

read point-by-point responses

Referee: The claim that 'knowledge of the properties of alternative methods and meaningful comparisons between them are scarce' underpins the identification of outstanding issues; however, the manuscript does not detail the search strategy or inclusion criteria used for the overview, raising a risk that the assessment of evidence scarcity may not be comprehensive (see paragraph on main aims).

Authors: We agree that greater transparency would strengthen the manuscript. Our overview is not presented as a systematic review but draws on the authors' long-standing expertise in the field, familiarity with key methodological papers, and awareness of existing (mostly limited) comparison studies. To address the concern, we will add a short paragraph in the section describing the main aims that explains the basis for our assessment, including reference to major reviews and the process by which the seven topics were identified. This revision will clarify the scope without overstating the comprehensiveness of the literature synthesis. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a literature review and gap analysis with no mathematical derivations, parameter fitting, equations, or model predictions. Its central claim—that comparative evidence on variable selection and functional form methods remains insufficient—rests on a synthesis of external studies rather than any self-referential construction, fitted input renamed as prediction, or load-bearing self-citation chain. No step reduces by construction to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a review paper; it introduces no free parameters, new axioms, or invented entities.

pith-pipeline@v0.9.0 · 5821 in / 979 out tokens · 47738 ms · 2026-05-25T12:06:07.072089+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

have been proposed in an attempt to reduce overestimation bias due to selection of variables and functional forms . It is still unclear how a correction for the variance, e.g., by using a sandwich variance estimator that is robust to model misspecification ( White, 1980a; White, 1980b), may help in achieving approximately valid confidence intervals

work page
[2]

logarithmic or quadratic)

Continuous variables – to categorize or to model? The effects of continuous predictors are typically modelled by either categorizing them (which raises issues such as the number of categories, cutpoint values, implausibility of the resulting step-function relationships, local biases, power loss, or invalidity of inference due to data-dependent cutpoints) ...

work page 1995
[3]

Traditional

Combining variable and function selection Variable selection in the presence of non -linear relationships of covariates with the outcome is an even more complicated exercise. In fact, decisions regarding the inclusion/exclusion of specific variables and modelling of the functional forms of both these variables and potential confounders may depend on each ...

work page 1997
[4]

Examples illustrating the problems In this section, we illustrate by way of selected published analyses of observational studies that guidance on many aspects of multivariable modelling outlined above is urgently needed. 6.1 A case of popular but highly problematic variable selection Ramaiola et al (2015) conducted linear regression analyses to identify p...

work page 2015
[5]

Unfortunately, for all of them knowledge of their properties and the number of informative comparisons are limited

Towards state of the art – research required! In the earlier sections, we have illustrated that many variable selection procedures are available. Unfortunately, for all of them knowledge of their properties and the number of informative comparisons are limited. In all section s we raised many issues requiring further research. We identified several papers...

work page 1996
[6]

conditional model

Discussion In a joint effort, members of TG2 identified seven issues which we consider key to building a multivariable model with continuous variables . The list could be extended, as other experts in the field may have different experience s and preferences and may consider other issue s more important. We welcome discussion and critique as this would tr...

work page 1995
[7]

We thank Tim Haeussler, Andreas Ott and Christine Wallisch for administrative assistance

Acknowledgment This work was supported by the Deutsche Forschungsgemeinschaft [SA580/10-1] to Willi Sauerbrei, by the European Commission’s programme Erasmus+ Staff Mobility for Training during the fellowship of Georg Heinze in Freiburg in November 2018 and by UK Medical Research Council programmes MC to Patrick Royston [UU_12023/21, MC_UU_12023/29]. We t...

work page 2018
[8]

Generalized Additive Model Selection

References ABRAHAMOWICZ, M . DU BERGER, R. and GROVER, S. A. (1997). Flexible modeling of the effects of serum cholesterol on coronary heart disease mortality. American Journal of Epidemiology. 145 714-729. ALTMAN, D. G. and ANDERSEN, P. K. (1989). Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine. 8 771–783. ALTMA...

work page internal anchor Pith review Pith/arXiv arXiv 1997
[9]

Methods based on spline functions Splines (piecewise polynomial functions ) come in many shapes and forms. A basic dichotomy is between regression splines, which focus on the polynomial choice and finding a set of knot locations, and smoothing splines, which focus on minimi zation of a penalty function. Regression splines are particularly attractive due t...

work page 2000
[10]

Smoothing splines extend cubic splines by placing a knot at each observation and adding a roughness penalty to control the smoothness of the fit

emphasize the benefit of unequally spaced knots, to avoid placement in empty regions of the domain of the continuous variable. Smoothing splines extend cubic splines by placing a knot at each observation and adding a roughness penalty to control the smoothness of the fit. Eilers and Marx (1996) introduced P- splines by extending and simplifying ideas firs...

work page 1996
[11]

The selected model and functional forms should be interpretable from a subject-matter perspective

Fractional polynomials With f ractional polynomials (FPs), the aim is to extract full information from continuous variables in univariable and multivariable settings, resulting in models with simple and plausible functional forms. The selected model and functional forms should be interpretable from a subject-matter perspective. Interpretability, transport...

work page 1994
[12]

not significant

Test the best -fitting FP2 model for X at significance level 𝛼𝛼 against the null model using 4 d.f. If the test is not significant, stop, concluding that the effect of X is “not significant” at the 𝛼𝛼 level. Otherwise continue

work page
[13]

If the test is not significant, stop, the final model being a straight line

Test the best-fitting FP2 for X against a straight line at the 𝛼𝛼 level using 3 d.f. If the test is not significant, stop, the final model being a straight line. Otherwise continue

work page
[14]

If the test is not significant, the final model is FP1, otherwise the final model is FP2

Test the best -fitting FP2 for X against the best FP1 for X at the 𝛼𝛼 level using 2 d.f. If the test is not significant, the final model is FP1, otherwise the final model is FP2. End of procedure. The test at step 1 is of overall association of the outcome with X. The test at step 2 examines the evidence for nonlinearity. The test at step 3 chooses betwee...

work page 1977

[1] [1]

have been proposed in an attempt to reduce overestimation bias due to selection of variables and functional forms . It is still unclear how a correction for the variance, e.g., by using a sandwich variance estimator that is robust to model misspecification ( White, 1980a; White, 1980b), may help in achieving approximately valid confidence intervals

work page

[2] [2]

logarithmic or quadratic)

Continuous variables – to categorize or to model? The effects of continuous predictors are typically modelled by either categorizing them (which raises issues such as the number of categories, cutpoint values, implausibility of the resulting step-function relationships, local biases, power loss, or invalidity of inference due to data-dependent cutpoints) ...

work page 1995

[3] [3]

Traditional

Combining variable and function selection Variable selection in the presence of non -linear relationships of covariates with the outcome is an even more complicated exercise. In fact, decisions regarding the inclusion/exclusion of specific variables and modelling of the functional forms of both these variables and potential confounders may depend on each ...

work page 1997

[4] [4]

Examples illustrating the problems In this section, we illustrate by way of selected published analyses of observational studies that guidance on many aspects of multivariable modelling outlined above is urgently needed. 6.1 A case of popular but highly problematic variable selection Ramaiola et al (2015) conducted linear regression analyses to identify p...

work page 2015

[5] [5]

Unfortunately, for all of them knowledge of their properties and the number of informative comparisons are limited

Towards state of the art – research required! In the earlier sections, we have illustrated that many variable selection procedures are available. Unfortunately, for all of them knowledge of their properties and the number of informative comparisons are limited. In all section s we raised many issues requiring further research. We identified several papers...

work page 1996

[6] [6]

conditional model

Discussion In a joint effort, members of TG2 identified seven issues which we consider key to building a multivariable model with continuous variables . The list could be extended, as other experts in the field may have different experience s and preferences and may consider other issue s more important. We welcome discussion and critique as this would tr...

work page 1995

[7] [7]

We thank Tim Haeussler, Andreas Ott and Christine Wallisch for administrative assistance

Acknowledgment This work was supported by the Deutsche Forschungsgemeinschaft [SA580/10-1] to Willi Sauerbrei, by the European Commission’s programme Erasmus+ Staff Mobility for Training during the fellowship of Georg Heinze in Freiburg in November 2018 and by UK Medical Research Council programmes MC to Patrick Royston [UU_12023/21, MC_UU_12023/29]. We t...

work page 2018

[8] [8]

Generalized Additive Model Selection

References ABRAHAMOWICZ, M . DU BERGER, R. and GROVER, S. A. (1997). Flexible modeling of the effects of serum cholesterol on coronary heart disease mortality. American Journal of Epidemiology. 145 714-729. ALTMAN, D. G. and ANDERSEN, P. K. (1989). Bootstrap investigation of the stability of a Cox regression model. Statistics in Medicine. 8 771–783. ALTMA...

work page internal anchor Pith review Pith/arXiv arXiv 1997

[9] [9]

Methods based on spline functions Splines (piecewise polynomial functions ) come in many shapes and forms. A basic dichotomy is between regression splines, which focus on the polynomial choice and finding a set of knot locations, and smoothing splines, which focus on minimi zation of a penalty function. Regression splines are particularly attractive due t...

work page 2000

[10] [10]

Smoothing splines extend cubic splines by placing a knot at each observation and adding a roughness penalty to control the smoothness of the fit

emphasize the benefit of unequally spaced knots, to avoid placement in empty regions of the domain of the continuous variable. Smoothing splines extend cubic splines by placing a knot at each observation and adding a roughness penalty to control the smoothness of the fit. Eilers and Marx (1996) introduced P- splines by extending and simplifying ideas firs...

work page 1996

[11] [11]

The selected model and functional forms should be interpretable from a subject-matter perspective

Fractional polynomials With f ractional polynomials (FPs), the aim is to extract full information from continuous variables in univariable and multivariable settings, resulting in models with simple and plausible functional forms. The selected model and functional forms should be interpretable from a subject-matter perspective. Interpretability, transport...

work page 1994

[12] [12]

not significant

Test the best -fitting FP2 model for X at significance level 𝛼𝛼 against the null model using 4 d.f. If the test is not significant, stop, concluding that the effect of X is “not significant” at the 𝛼𝛼 level. Otherwise continue

work page

[13] [13]

If the test is not significant, stop, the final model being a straight line

Test the best-fitting FP2 for X against a straight line at the 𝛼𝛼 level using 3 d.f. If the test is not significant, stop, the final model being a straight line. Otherwise continue

work page

[14] [14]

If the test is not significant, the final model is FP1, otherwise the final model is FP2

Test the best -fitting FP2 for X against the best FP1 for X at the 𝛼𝛼 level using 2 d.f. If the test is not significant, the final model is FP1, otherwise the final model is FP2. End of procedure. The test at step 1 is of overall association of the outcome with X. The test at step 2 examines the evidence for nonlinearity. The test at step 3 chooses betwee...

work page 1977