pith. sign in

arxiv: 2506.05996 · v4 · pith:BV2KR42Fnew · submitted 2025-06-06 · 💰 econ.EM

Statistical significance in choice modelling: computation, usage and reporting

Pith reviewed 2026-05-22 01:18 UTC · model grok-4.3

classification 💰 econ.EM
keywords statistical significancechoice modellingconfidence intervalsp-valueswillingness to paydiscrete choicereporting standardsrandom heterogeneity
0
0 comments X

The pith

Choice modelling papers over-rely on 95% significance levels while misunderstanding what they mean and reporting them imprecisely.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper reviews the sources of uncertainty in choice model parameters, explains how to compute confidence intervals and apply statistical tests, and argues for moving past automatic use of 95% thresholds. It documents imprecise reporting practices, especially with p-values and star symbols, and stresses that statistical significance must be weighed against behavioural or policy relevance. The authors highlight choice-modelling specifics such as uncertainty in willingness-to-pay measures, random heterogeneity, and repeated choice data. A sympathetic reader would care because these habits directly shape the credibility of policy conclusions drawn from discrete choice studies in transport, marketing, and economics.

Core claim

The paper claims that choice modelling exhibits the same over-reliance on 95% confidence levels seen elsewhere in science, along with widespread misunderstandings of significance and imprecise reporting of uncertainty measures, particularly p-values and star indicators. It argues that behavioural or policy significance should receive equal attention, and that derived measures such as willingness-to-pay, random heterogeneity parameters, and results from repeated choice data require special handling in uncertainty calculations and reporting.

What carries the argument

The distinction between statistical significance and behavioural or policy significance, together with explicit computation of confidence intervals for both parameters and derived quantities.

If this is right

  • Reporting should shift toward precise confidence intervals rather than reliance on p-value stars or binary significance declarations.
  • Analyses must separately evaluate whether statistically significant effects are large enough to matter for policy or behaviour.
  • Uncertainty propagation for willingness-to-pay and other derived measures requires explicit treatment in every study.
  • Models with random heterogeneity and repeated choices need tailored approaches to testing and reporting statistical significance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread adoption of these reporting norms could reduce publication bias toward statistically significant but substantively small effects.
  • Journals in economics and transport could adopt checklists that require both statistical and behavioural significance statements.
  • Software packages for choice modelling might add automated routines that compute and display policy-relevant effect sizes alongside p-values.

Load-bearing premise

The authors' observation of imprecise reporting practices in many studies is representative enough of the broader literature to justify general recommendations without systematic quantification of the problem.

What would settle it

A quantitative audit that samples several hundred published choice modelling papers and counts the share using star symbols, vague p-value statements, or missing confidence intervals for willingness-to-pay and random parameters.

Figures

Figures reproduced from arXiv: 2506.05996 by Andrew Daly, Angelo Guevara, Michiel Bliemer, Ricardo Daziano, Stephane Hess, Thijs Dekker.

Figure 1
Figure 1. Figure 1: Graphical representation of the calculation of asymptotic confidence intervals [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relationship between estimates and log-likelihood values used for likelihood ratio, Wald, [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

This paper offers a commentary on the use of notions of statistical significance in choice modelling. We review the reasons for uncertainty in parameter estimates, provide a precise discussion on the computation of measures of uncertainty and confidence intervals, and discuss the use of statistical tests. We argue that, as in many other areas of science, there is an over-reliance on 95\% confidence levels, and misunderstandings of the meaning of significance. We also observe a lack of precision in the reporting of measures of uncertainty in many studies, especially when using $p$-values and even more so with \emph{star} measures. The paper also stresses the importance of considering behavioural or policy significance in addition to statistical significance. Finally, we stress a number of points that are specific to choice modelling and which require special attention, notably in relation to derived measures such as willingness-to-pay, the treatment of random heterogeneity, and the use of repeated choice data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. This paper offers a commentary on the use of statistical significance in choice modelling. It reviews reasons for uncertainty in parameter estimates, provides a discussion on the computation of measures of uncertainty and confidence intervals, critiques over-reliance on 95% confidence levels and misunderstandings of significance, observes imprecise reporting of p-values and star measures in many studies, stresses the importance of behavioural or policy significance in addition to statistical significance, and highlights choice-modelling-specific issues such as willingness-to-pay, random heterogeneity, and repeated choice data.

Significance. The manuscript correctly recalls and applies standard statistical principles to the context of discrete choice models. If its observations on common reporting practices prove representative, the paper could usefully raise awareness and encourage more precise reporting and interpretation of significance measures in applied choice modelling work. The explicit call to consider policy or behavioural relevance alongside statistical significance is a constructive contribution for empirical researchers.

major comments (1)
  1. [Discussion of reporting practices and p-values/stars] The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.
minor comments (1)
  1. [Abstract and introduction] The abstract and opening sections could more explicitly delimit the body of literature from which the observational examples are drawn, to help readers assess the scope of the commentary.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive assessment of our commentary and for the constructive feedback. We address the major comment below.

read point-by-point responses
  1. Referee: The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.

    Authors: We acknowledge that our observation regarding imprecise reporting of p-values and star measures is presented without a systematic literature review, defined sample, or frequency counts. As the paper is a commentary rather than an empirical study of reporting practices, we did not conduct such an analysis. The statement reflects patterns noted during the preparation of this discussion and our experience with the literature. To address the concern, we will make a partial revision by adding explicit language to frame the observation as illustrative and based on common practices encountered in the field, rather than implying a comprehensive survey. This will clarify the context for our recommendations on reporting standards. revision: partial

Circularity Check

0 steps flagged

Discursive commentary on reporting practices contains no derivation chain or self-referential reductions

full rationale

The paper is a commentary that reviews uncertainty in estimates, computation of confidence intervals, statistical tests, over-reliance on 95% levels, imprecise reporting of p-values and stars, and the need to consider behavioural significance alongside statistical significance. It contains no equations, fitted parameters, derivations, or load-bearing self-citations that reduce any claim to its own inputs by construction. The observations on reporting practices are presented as illustrative commentary rather than as outputs of a fitted model or theorem derived from prior self-cited results, rendering the text self-contained with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper draws on established statistical knowledge without introducing new free parameters, ad-hoc axioms, or postulated entities.

axioms (1)
  • standard math Standard principles of statistical inference for interpreting p-values, confidence intervals, and significance tests
    The commentary relies on these background principles without re-deriving them.

pith-pipeline@v0.9.0 · 5698 in / 1075 out tokens · 39820 ms · 2026-05-22T01:18:15.715938+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages · 2 internal anchors

  1. [2]

    The Patient: patient-centered outcomes research 8, 373–384

    Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient: patient-centered outcomes research 8, 373–384. doi:10.1007/s40271-015-0118-z. Ben-Akiva, M., Swait, J.,

  2. [3]

    Benjamini, Y., Hochberg, Y.,

    doi:10.1038/s41562-017-0189-z. Benjamini, Y., Hochberg, Y.,

  3. [4]

    Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing , volume =

    Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. Berndt, E.R., Hall, B., Hall, R., Hausman, J.,

  4. [5]

    Transportation Research Part A: Policy and Practice 176, 103828

    Estimating willingness-to-pay from discrete choice models: Setting the record straight. Transportation Research Part A: Policy and Practice 176, 103828. doi:https://doi.org/10.1016/j.tra.2023.103828. Domencich, T.A., McFadden, D.,

  5. [6]

    URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722

    Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models. URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722. Hess, S., Palma, D.,

  6. [7]

    Journal of Choice Modelling 32, 100170

    Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling 32, 100170. doi:https: //doi.org/10.1016/j.jocm.2019.100170. King, G., Roberts, M.E.,

  7. [8]

    Political Analysis 23, 159–179

    How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis 23, 159–179. doi:10.1093/pan/mpu015. Krinsky, I., Robb, A.,

  8. [9]

    Journal of choice modelling 21, 60–65

    Discrete choice models’ρ2: A reintroduction to an old friend. Journal of choice modelling 21, 60–65. URL:https://EconPapers.repec.org/RePEc:eee:eejocm:v: 21:y:2016:i:c:p:60-65. Ortúzar, J. de D., Willumsen, L.G.,

  9. [10]

    Transportation 51, 2393–2425

    Size matters: the use and misuse of statistical significance in discrete choice models in the transportation academic literature. Transportation 51, 2393–2425. doi:10.1007/s11116-023-10423-y. Plummer, M., Best, N., Cowles, K., Vines, K.,

  10. [11]

    Stability analysis of fluid flows using Lagrangian Perturbation Theory (LPT): application to the plane Couette flow

    The asa statement on p-values: Context, process, and purpose. The American Statistician 70, 129–133. doi:10.1080/00031305.2016.1154108. Wasserstein, R.L., Schirm, A.L., Lazar, N.A.,

  11. [12]

    Dual and anti-dual modes in dielectric spheres

    Moving to a world beyond “p < 0.05”. The American Statistician 73, 1–19. doi:10.1080/00031305.2019.1583913. 25