Statistical significance in choice modelling: computation, usage and reporting
Pith reviewed 2026-05-22 01:18 UTC · model grok-4.3
The pith
Choice modelling papers over-rely on 95% significance levels while misunderstanding what they mean and reporting them imprecisely.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that choice modelling exhibits the same over-reliance on 95% confidence levels seen elsewhere in science, along with widespread misunderstandings of significance and imprecise reporting of uncertainty measures, particularly p-values and star indicators. It argues that behavioural or policy significance should receive equal attention, and that derived measures such as willingness-to-pay, random heterogeneity parameters, and results from repeated choice data require special handling in uncertainty calculations and reporting.
What carries the argument
The distinction between statistical significance and behavioural or policy significance, together with explicit computation of confidence intervals for both parameters and derived quantities.
If this is right
- Reporting should shift toward precise confidence intervals rather than reliance on p-value stars or binary significance declarations.
- Analyses must separately evaluate whether statistically significant effects are large enough to matter for policy or behaviour.
- Uncertainty propagation for willingness-to-pay and other derived measures requires explicit treatment in every study.
- Models with random heterogeneity and repeated choices need tailored approaches to testing and reporting statistical significance.
Where Pith is reading between the lines
- Widespread adoption of these reporting norms could reduce publication bias toward statistically significant but substantively small effects.
- Journals in economics and transport could adopt checklists that require both statistical and behavioural significance statements.
- Software packages for choice modelling might add automated routines that compute and display policy-relevant effect sizes alongside p-values.
Load-bearing premise
The authors' observation of imprecise reporting practices in many studies is representative enough of the broader literature to justify general recommendations without systematic quantification of the problem.
What would settle it
A quantitative audit that samples several hundred published choice modelling papers and counts the share using star symbols, vague p-value statements, or missing confidence intervals for willingness-to-pay and random parameters.
Figures
read the original abstract
This paper offers a commentary on the use of notions of statistical significance in choice modelling. We review the reasons for uncertainty in parameter estimates, provide a precise discussion on the computation of measures of uncertainty and confidence intervals, and discuss the use of statistical tests. We argue that, as in many other areas of science, there is an over-reliance on 95\% confidence levels, and misunderstandings of the meaning of significance. We also observe a lack of precision in the reporting of measures of uncertainty in many studies, especially when using $p$-values and even more so with \emph{star} measures. The paper also stresses the importance of considering behavioural or policy significance in addition to statistical significance. Finally, we stress a number of points that are specific to choice modelling and which require special attention, notably in relation to derived measures such as willingness-to-pay, the treatment of random heterogeneity, and the use of repeated choice data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper offers a commentary on the use of statistical significance in choice modelling. It reviews reasons for uncertainty in parameter estimates, provides a discussion on the computation of measures of uncertainty and confidence intervals, critiques over-reliance on 95% confidence levels and misunderstandings of significance, observes imprecise reporting of p-values and star measures in many studies, stresses the importance of behavioural or policy significance in addition to statistical significance, and highlights choice-modelling-specific issues such as willingness-to-pay, random heterogeneity, and repeated choice data.
Significance. The manuscript correctly recalls and applies standard statistical principles to the context of discrete choice models. If its observations on common reporting practices prove representative, the paper could usefully raise awareness and encourage more precise reporting and interpretation of significance measures in applied choice modelling work. The explicit call to consider policy or behavioural relevance alongside statistical significance is a constructive contribution for empirical researchers.
major comments (1)
- [Discussion of reporting practices and p-values/stars] The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.
minor comments (1)
- [Abstract and introduction] The abstract and opening sections could more explicitly delimit the body of literature from which the observational examples are drawn, to help readers assess the scope of the commentary.
Simulated Author's Rebuttal
We thank the referee for their positive assessment of our commentary and for the constructive feedback. We address the major comment below.
read point-by-point responses
-
Referee: The central claim that there is a lack of precision in the reporting of measures of uncertainty in many studies (especially p-values and star measures) is presented as an observational premise without a systematic literature review, defined sample of papers, or frequency counts of the practices criticised. This observation underpins the recommendations for improved reporting standards; without quantification it is difficult to judge whether the pattern is representative of the broader choice-modelling literature or merely illustrative.
Authors: We acknowledge that our observation regarding imprecise reporting of p-values and star measures is presented without a systematic literature review, defined sample, or frequency counts. As the paper is a commentary rather than an empirical study of reporting practices, we did not conduct such an analysis. The statement reflects patterns noted during the preparation of this discussion and our experience with the literature. To address the concern, we will make a partial revision by adding explicit language to frame the observation as illustrative and based on common practices encountered in the field, rather than implying a comprehensive survey. This will clarify the context for our recommendations on reporting standards. revision: partial
Circularity Check
Discursive commentary on reporting practices contains no derivation chain or self-referential reductions
full rationale
The paper is a commentary that reviews uncertainty in estimates, computation of confidence intervals, statistical tests, over-reliance on 95% levels, imprecise reporting of p-values and stars, and the need to consider behavioural significance alongside statistical significance. It contains no equations, fitted parameters, derivations, or load-bearing self-citations that reduce any claim to its own inputs by construction. The observations on reporting practices are presented as illustrative commentary rather than as outputs of a fitted model or theorem derived from prior self-cited results, rendering the text self-contained with no circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard principles of statistical inference for interpreting p-values, confidence intervals, and significance tests
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We argue that, as in many other areas of science, there is an over-reliance on 95% confidence levels, and misunderstandings of the meaning of significance... lack of precision in the reporting of measures of uncertainty... especially when using p-values and even more so with star measures.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The standard way of computing CIs is to use asymptotic MLE properties... ˆβk ± z_{α/2} · ˆσk
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[2]
The Patient: patient-centered outcomes research 8, 373–384
Sample size requirements for discrete-choice experiments in healthcare: a practical guide. The Patient: patient-centered outcomes research 8, 373–384. doi:10.1007/s40271-015-0118-z. Ben-Akiva, M., Swait, J.,
-
[3]
doi:10.1038/s41562-017-0189-z. Benjamini, Y., Hochberg, Y.,
-
[4]
Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B (Methodological) 57, 289–300. doi:10.1111/j.2517-6161.1995.tb02031.x. Berndt, E.R., Hall, B., Hall, R., Hausman, J.,
-
[5]
Transportation Research Part A: Policy and Practice 176, 103828
Estimating willingness-to-pay from discrete choice models: Setting the record straight. Transportation Research Part A: Policy and Practice 176, 103828. doi:https://doi.org/10.1016/j.tra.2023.103828. Domencich, T.A., McFadden, D.,
-
[6]
URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722
Get me out of this hole: a profile likelihood approach to identifying and avoiding inferior local optima in choice models. URL:https://arxiv.org/ abs/2506.02722,arXiv:2506.02722. Hess, S., Palma, D.,
-
[7]
Journal of Choice Modelling 32, 100170
Apollo: A flexible, powerful and customisable freeware package for choice model estimation and application. Journal of Choice Modelling 32, 100170. doi:https: //doi.org/10.1016/j.jocm.2019.100170. King, G., Roberts, M.E.,
-
[8]
Political Analysis 23, 159–179
How robust standard errors expose methodological problems they do not fix, and what to do about it. Political Analysis 23, 159–179. doi:10.1093/pan/mpu015. Krinsky, I., Robb, A.,
-
[9]
Journal of choice modelling 21, 60–65
Discrete choice models’ρ2: A reintroduction to an old friend. Journal of choice modelling 21, 60–65. URL:https://EconPapers.repec.org/RePEc:eee:eejocm:v: 21:y:2016:i:c:p:60-65. Ortúzar, J. de D., Willumsen, L.G.,
work page 2016
-
[10]
Size matters: the use and misuse of statistical significance in discrete choice models in the transportation academic literature. Transportation 51, 2393–2425. doi:10.1007/s11116-023-10423-y. Plummer, M., Best, N., Cowles, K., Vines, K.,
-
[11]
The asa statement on p-values: Context, process, and purpose. The American Statistician 70, 129–133. doi:10.1080/00031305.2016.1154108. Wasserstein, R.L., Schirm, A.L., Lazar, N.A.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2016.1154108 2016
-
[12]
Dual and anti-dual modes in dielectric spheres
Moving to a world beyond “p < 0.05”. The American Statistician 73, 1–19. doi:10.1080/00031305.2019.1583913. 25
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1080/00031305.2019.1583913 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.