Statistical significance in choice modelling: computation, usage and reporting

Andrew Daly; Angelo Guevara; Michiel Bliemer; Ricardo Daziano; Stephane Hess; Thijs Dekker

arxiv: 2506.05996 · v4 · pith:BV2KR42Fnew · submitted 2025-06-06 · 💰 econ.EM

Statistical significance in choice modelling: computation, usage and reporting

Stephane Hess , Andrew Daly , Michiel Bliemer , Angelo Guevara , Ricardo Daziano , Thijs Dekker This is my paper

Pith reviewed 2026-05-22 01:18 UTC · model grok-4.3

classification 💰 econ.EM

keywords statistical significancechoice modellingconfidence intervalsp-valueswillingness to paydiscrete choicereporting standardsrandom heterogeneity

0 comments

The pith

Choice modelling papers over-rely on 95% significance levels while misunderstanding what they mean and reporting them imprecisely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews the sources of uncertainty in choice model parameters, explains how to compute confidence intervals and apply statistical tests, and argues for moving past automatic use of 95% thresholds. It documents imprecise reporting practices, especially with p-values and star symbols, and stresses that statistical significance must be weighed against behavioural or policy relevance. The authors highlight choice-modelling specifics such as uncertainty in willingness-to-pay measures, random heterogeneity, and repeated choice data. A sympathetic reader would care because these habits directly shape the credibility of policy conclusions drawn from discrete choice studies in transport, marketing, and economics.

Core claim

The paper claims that choice modelling exhibits the same over-reliance on 95% confidence levels seen elsewhere in science, along with widespread misunderstandings of significance and imprecise reporting of uncertainty measures, particularly p-values and star indicators. It argues that behavioural or policy significance should receive equal attention, and that derived measures such as willingness-to-pay, random heterogeneity parameters, and results from repeated choice data require special handling in uncertainty calculations and reporting.

What carries the argument

The distinction between statistical significance and behavioural or policy significance, together with explicit computation of confidence intervals for both parameters and derived quantities.

If this is right

Reporting should shift toward precise confidence intervals rather than reliance on p-value stars or binary significance declarations.
Analyses must separately evaluate whether statistically significant effects are large enough to matter for policy or behaviour.
Uncertainty propagation for willingness-to-pay and other derived measures requires explicit treatment in every study.
Models with random heterogeneity and repeated choices need tailored approaches to testing and reporting statistical significance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Widespread adoption of these reporting norms could reduce publication bias toward statistically significant but substantively small effects.
Journals in economics and transport could adopt checklists that require both statistical and behavioural significance statements.
Software packages for choice modelling might add automated routines that compute and display policy-relevant effect sizes alongside p-values.

Load-bearing premise

The authors' observation of imprecise reporting practices in many studies is representative enough of the broader literature to justify general recommendations without systematic quantification of the problem.

What would settle it

A quantitative audit that samples several hundred published choice modelling papers and counts the share using star symbols, vague p-value statements, or missing confidence intervals for willingness-to-pay and random parameters.

Figures

Figures reproduced from arXiv: 2506.05996 by Andrew Daly, Angelo Guevara, Michiel Bliemer, Ricardo Daziano, Stephane Hess, Thijs Dekker.

**Figure 2.** Figure 2: Relationship between estimates and log-likelihood values used for likelihood ratio, Wald, [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

read the original abstract

This paper offers a commentary on the use of notions of statistical significance in choice modelling. We review the reasons for uncertainty in parameter estimates, provide a precise discussion on the computation of measures of uncertainty and confidence intervals, and discuss the use of statistical tests. We argue that, as in many other areas of science, there is an over-reliance on 95\% confidence levels, and misunderstandings of the meaning of significance. We also observe a lack of precision in the reporting of measures of uncertainty in many studies, especially when using $p$-values and even more so with \emph{star} measures. The paper also stresses the importance of considering behavioural or policy significance in addition to statistical significance. Finally, we stress a number of points that are specific to choice modelling and which require special attention, notably in relation to derived measures such as willingness-to-pay, the treatment of random heterogeneity, and the use of repeated choice data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A practical reminder for choice modellers on handling and reporting statistical significance, with some field-specific caveats but limited new evidence.

read the letter

This paper is a commentary that pushes for more thoughtful use of statistical significance in choice modelling. It covers the sources of uncertainty in estimates, how to calculate confidence intervals and conduct tests, and the problems with over-relying on 95% levels or star systems for p-values. The authors do a solid job on the standard statistical points and make them accessible. They correctly note that significance is about the data, not the importance of the finding, and they highlight the need to look at behavioural or policy relevance as well. Where it adds value is in the choice modelling specifics. They point out issues with reporting uncertainty for willingness-to-pay measures, which are often ratios of parameters. They also discuss random heterogeneity and the complications from repeated choices by the same individuals, where observations are not independent. These are real practical concerns in the field. The softer area is the claim that imprecise reporting is common. The paper observes a lack of precision in many studies but does not include a systematic review or any counts of how often stars or incomplete p-value reporting appear. That makes the diagnosis feel anecdotal rather than measured. This kind of paper is aimed at applied researchers and journal editors in discrete choice analysis. People working with stated preference data or travel demand models could pick up useful habits from it. It deserves a serious referee. The advice is grounded in correct principles, and feedback could refine the recommendations for derived measures. I would recommend sending it out for review rather than desk rejecting it.

Referee Report

1 major / 1 minor

Summary. This paper offers a commentary on the use of statistical significance in choice modelling. It reviews reasons for uncertainty in parameter estimates, provides a discussion on the computation of measures of uncertainty and confidence intervals, critiques over-reliance on 95% confidence levels and misunderstandings of significance, observes imprecise reporting of p-values and star measures in many studies, stresses the importance of behavioural or policy significance in addition to statistical significance, and highlights choice-modelling-specific issues such as willingness-to-pay, random heterogeneity, and repeated choice data.

Significance. The manuscript correctly recalls and applies standard statistical principles to the context of discrete choice models. If its observations on common reporting practices prove representative, the paper could usefully raise awareness and encourage more precise reporting and interpretation of significance measures in applied choice modelling work. The explicit call to consider policy or behavioural relevance alongside statistical significance is a constructive contribution for empirical researchers.

major comments (1)

[Discussion of reporting practices and p-values/stars] The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.

minor comments (1)

[Abstract and introduction] The abstract and opening sections could more explicitly delimit the body of literature from which the observational examples are drawn, to help readers assess the scope of the commentary.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of our commentary and for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.

Authors: We acknowledge that our observation regarding imprecise reporting of p-values and star measures is presented without a systematic literature review, defined sample, or frequency counts. As the paper is a commentary rather than an empirical study of reporting practices, we did not conduct such an analysis. The statement reflects patterns noted during the preparation of this discussion and our experience with the literature. To address the concern, we will make a partial revision by adding explicit language to frame the observation as illustrative and based on common practices encountered in the field, rather than implying a comprehensive survey. This will clarify the context for our recommendations on reporting standards. revision: partial

Circularity Check

0 steps flagged

Discursive commentary on reporting practices contains no derivation chain or self-referential reductions

full rationale

The paper is a commentary that reviews uncertainty in estimates, computation of confidence intervals, statistical tests, over-reliance on 95% levels, imprecise reporting of p-values and stars, and the need to consider behavioural significance alongside statistical significance. It contains no equations, fitted parameters, derivations, or load-bearing self-citations that reduce any claim to its own inputs by construction. The observations on reporting practices are presented as illustrative commentary rather than as outputs of a fitted model or theorem derived from prior self-cited results, rendering the text self-contained with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper draws on established statistical knowledge without introducing new free parameters, ad-hoc axioms, or postulated entities.

axioms (1)

standard math Standard principles of statistical inference for interpreting p-values, confidence intervals, and significance tests
The commentary relies on these background principles without re-deriving them.

pith-pipeline@v0.9.0 · 5698 in / 1075 out tokens · 39820 ms · 2026-05-22T01:18:15.715938+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We argue that, as in many other areas of science, there is an over-reliance on 95% confidence levels, and misunderstandings of the meaning of significance... lack of precision in the reporting of measures of uncertainty... especially when using p-values and even more so with star measures.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The standard way of computing CIs is to use asymptotic MLE properties... ˆβk ± z_{α/2} · ˆσk

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 2 internal anchors

[2]

The Patient: patient-centered outcomes research 8, 373–384

Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient: patient-centered outcomes research 8, 373–384. doi:10.1007/s40271-015-0118-z. Ben-Akiva, M., Swait, J.,

work page doi:10.1007/s40271-015-0118-z
[3]

Benjamini, Y., Hochberg, Y.,

doi:10.1038/s41562-017-0189-z. Benjamini, Y., Hochberg, Y.,

work page doi:10.1038/s41562-017-0189-z
[4]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. Berndt, E.R., Hall, B., Hall, R., Hausman, J.,

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995
[5]

Transportation Research Part A: Policy and Practice 176, 103828

Estimating willingness-to-pay from discrete choice models: Setting the record straight. Transportation Research Part A: Policy and Practice 176, 103828. doi:https://doi.org/10.1016/j.tra.2023.103828. Domencich, T.A., McFadden, D.,

work page doi:10.1016/j.tra.2023.103828 2023
[6]

URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722

Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models. URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722. Hess, S., Palma, D.,

work page arXiv
[7]

Journal of Choice Modelling 32, 100170

Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling 32, 100170. doi:https: //doi.org/10.1016/j.jocm.2019.100170. King, G., Roberts, M.E.,

work page doi:10.1016/j.jocm.2019.100170 2019
[8]

Political Analysis 23, 159–179

How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis 23, 159–179. doi:10.1093/pan/mpu015. Krinsky, I., Robb, A.,

work page doi:10.1093/pan/mpu015
[9]

Journal of choice modelling 21, 60–65

Discrete choice models’ρ2: A reintroduction to an old friend. Journal of choice modelling 21, 60–65. URL:https://EconPapers.repec.org/RePEc:eee:eejocm:v: 21:y:2016:i:c:p:60-65. Ortúzar, J. de D., Willumsen, L.G.,

work page 2016
[10]

Transportation 51, 2393–2425

Size matters: the use and misuse of statistical significance in discrete choice models in the transportation academic literature. Transportation 51, 2393–2425. doi:10.1007/s11116-023-10423-y. Plummer, M., Best, N., Cowles, K., Vines, K.,

work page doi:10.1007/s11116-023-10423-y
[11]

Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

The asa statement on p-values: Context, process, and purpose. The American Statistician 70, 129–133. doi:10.1080/00031305.2016.1154108. Wasserstein, R.L., Schirm, A.L., Lazar, N.A.,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1154108 2016
[12]

Dual and anti-dual modes in dielectric spheres

Moving to a world beyond “p < 0.05”. The American Statistician 73, 1–19. doi:10.1080/00031305.2019.1583913. 25

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2019.1583913 2019

[1] [2]

The Patient: patient-centered outcomes research 8, 373–384

Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient: patient-centered outcomes research 8, 373–384. doi:10.1007/s40271-015-0118-z. Ben-Akiva, M., Swait, J.,

work page doi:10.1007/s40271-015-0118-z

[2] [3]

Benjamini, Y., Hochberg, Y.,

doi:10.1038/s41562-017-0189-z. Benjamini, Y., Hochberg, Y.,

work page doi:10.1038/s41562-017-0189-z

[3] [4]

Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. Berndt, E.R., Hall, B., Hall, R., Hausman, J.,

work page doi:10.1111/j.2517-6161.1995.tb02031.x 1995

[4] [5]

Transportation Research Part A: Policy and Practice 176, 103828

Estimating willingness-to-pay from discrete choice models: Setting the record straight. Transportation Research Part A: Policy and Practice 176, 103828. doi:https://doi.org/10.1016/j.tra.2023.103828. Domencich, T.A., McFadden, D.,

work page doi:10.1016/j.tra.2023.103828 2023

[5] [6]

URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722

Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models. URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722. Hess, S., Palma, D.,

work page arXiv

[6] [7]

Journal of Choice Modelling 32, 100170

Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling 32, 100170. doi:https: //doi.org/10.1016/j.jocm.2019.100170. King, G., Roberts, M.E.,

work page doi:10.1016/j.jocm.2019.100170 2019

[7] [8]

Political Analysis 23, 159–179

How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis 23, 159–179. doi:10.1093/pan/mpu015. Krinsky, I., Robb, A.,

work page doi:10.1093/pan/mpu015

[8] [9]

Journal of choice modelling 21, 60–65

Discrete choice models’ρ2: A reintroduction to an old friend. Journal of choice modelling 21, 60–65. URL:https://EconPapers.repec.org/RePEc:eee:eejocm:v: 21:y:2016:i:c:p:60-65. Ortúzar, J. de D., Willumsen, L.G.,

work page 2016

[9] [10]

Transportation 51, 2393–2425

Size matters: the use and misuse of statistical significance in discrete choice models in the transportation academic literature. Transportation 51, 2393–2425. doi:10.1007/s11116-023-10423-y. Plummer, M., Best, N., Cowles, K., Vines, K.,

work page doi:10.1007/s11116-023-10423-y

[10] [11]

Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

The asa statement on p-values: Context, process, and purpose. The American Statistician 70, 129–133. doi:10.1080/00031305.2016.1154108. Wasserstein, R.L., Schirm, A.L., Lazar, N.A.,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1154108 2016

[11] [12]

Dual and anti-dual modes in dielectric spheres

Moving to a world beyond “p < 0.05”. The American Statistician 73, 1–19. doi:10.1080/00031305.2019.1583913. 25

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2019.1583913 2019