When is p-hacking detectable?

Stefan Faridani

arxiv: 2506.20035 · v3 · pith:HJBQHMFFnew · submitted 2025-06-24 · 💰 econ.EM

When is p-hacking detectable?

Stefan Faridani This is my paper

Pith reviewed 2026-05-19 07:45 UTC · model grok-4.3

classification 💰 econ.EM

keywords p-hackingselective reportingt-statisticsprojection testmeta-analysiseconometrics

0 comments

The pith

A projection test detects every form of p-hacking visible in the reported t-statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Some kinds of selective reporting leave no detectable trace in the usual histogram of t-statistics or p-values, and even when traces exist, standard tests often lack power to find them. The paper develops a test that measures the distance from the smoothed empirical t-curve to the closest distribution that could have arisen under honest reporting. This projection approach is constructed so that any distortion it misses cannot be caught by any other valid test based on the same t-curve information. When run on a large collection of published economics studies, the test finds that the t-curves for randomized controlled trials and instrumental-variable designs are more distorted than can be produced by chance, rounding, or the Student-t approximation alone.

Core claim

The central claim is that the distance between the smoothed empirical t-curve and the set of all possible honest distributions yields a sharp test for selective reporting. Any form of p-hacking that moves the observed curve away from every honest distribution will produce a positive test statistic, and the test cannot be evaded by a reporting strategy that still satisfies the observable restrictions on the t-curve.

What carries the argument

The projection test that finds the minimum distance from the smoothed empirical distribution of reported t-statistics to the set of all distributions consistent with honest reporting.

If this is right

Histograms of t-statistics and p-values miss some detectable forms of selective reporting.
The new test has power against every distortion that any valid test of the t-curve restrictions can detect.
Application to existing meta-data shows statistically significant excess distortion in the t-curves of RCTs and IVs.
Any evasion strategy must also evade every other valid test based on the same reported statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Meta-analysts could apply the test routinely to flag research designs whose reported statistics are inconsistent with honest selection.
The method could be extended to incorporate additional sources of benign distortion once they are formally characterized.
Because the test is sharp, it sets a benchmark for what any future t-curve-based detector can hope to achieve.

Load-bearing premise

The set of possible honest distributions can be fully characterized from the reported t-statistics alone without additional information on the underlying data-generating process or the precise form of any benign distortions.

What would settle it

A selective reporting rule that produces an empirical t-curve lying strictly outside the honest set yet yields a projection distance of zero.

read the original abstract

We show that some forms of p-hacking cannot be detected by examining the histogram of t-statistics or their p-values. Even when p-hacking is detectable, standard tests may lack power. We propose a novel test that detects every form of selective reporting that is detectable from the distribution of reported t-statistics. Our test statistic is the distance between the smoothed empirical t-curve and the set of possible honest distributions. This projection test is sharp and can only be evaded by selective reporting that also evades all other valid tests of restrictions on the t-curve. We also show how to avoid spurious rejections caused by some benign distortions in the t-curve. Applying the test to the Brodeur et al. (2020) meta-dataset, we find that the t-curves for RCTs and IVs are more distorted than could arise by chance, (de)rounding, or the Student-t approximation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core move is a projection test that measures distance from the empirical t-curve to the set of honest distributions and claims this catches every detectable form of selective reporting from t-statistics alone.

read the letter

The paper's main claim is that it supplies a sharp test for p-hacking by taking the distance between the smoothed empirical t-curve and the closest honest distribution, with a proof that no other valid test on the same information can do better. This projection approach and the sharpness result look new relative to earlier histogram-based checks on t-stats or p-values. The paper also shows why standard tests can miss some selective reporting and can have low power, then applies the method to the Brodeur et al. meta-dataset to report excess distortion in the t-curves for RCTs and IVs after accounting for rounding and the Student-t approximation. That empirical piece is concrete and directly relevant to debates on credibility in empirical work. The soft spot is the characterization of the honest set itself. Honest t-distributions depend on degrees of freedom, so sample sizes matter, and if those are not recovered or bounded from the reported t-statistics alone, the distance can misclassify curves. The abstract notes handling of some benign distortions like rounding, but the full details on how the honest set is constructed and whether it covers varying sample sizes across studies are not visible here. More simulation evidence on size and power under realistic conditions would also help pin down how the test behaves in practice. This is for empirical economists and meta-analysts who work on publication bias and selective reporting. Readers focused on improving evidentiary standards in RCTs, IVs, or similar designs would get the most from the method and the application. I would send it for peer review. The idea is technically distinct and the data exercise adds value, even if the honest-set construction and validation steps need more scrutiny in revision.

Referee Report

1 major / 2 minor

Summary. The paper claims that some forms of p-hacking cannot be detected by examining the histogram of t-statistics or their p-values, and that even when detectable, standard tests may lack power. It proposes a novel projection test whose statistic is the distance between the smoothed empirical t-curve and the set of possible honest distributions; this test is asserted to be sharp in the sense that it detects every form of selective reporting that is detectable from the distribution of reported t-statistics. The authors also show how to avoid spurious rejections from benign distortions such as rounding and apply the test to the Brodeur et al. (2020) meta-dataset, concluding that t-curves for RCTs and IVs exhibit more distortion than can be explained by chance, (de)rounding, or the Student-t approximation.

Significance. If the sharpness claim and the characterization of the honest set hold, the paper would represent a meaningful methodological advance in the detection of selective reporting in empirical economics. The projection approach offers a complete test for all detectable manipulations of the t-curve, going beyond existing histogram-based methods, and the application to a large existing meta-dataset demonstrates practical utility. The explicit treatment of benign distortions is a useful practical contribution.

major comments (1)

[Theoretical section defining the honest distribution set and the projection test] The central sharpness claim (abstract and theoretical development of the projection test) rests on correctly identifying the set of all possible honest t-curves from the reported t-statistics alone. Honest distributions are parameterized by degrees of freedom (hence sample size) and the precise form of continuous or discrete distortions; these parameters are not encoded in the t-values themselves. The manuscript must provide an explicit construction, algorithm, or proof showing how the honest set is recovered or bounded without additional DGP information. If this step relies on assumptions that are not recoverable from the t-statistics, the distance statistic can misclassify honest curves, undermining both the sharpness property and the claim that the test detects every detectable form of selective reporting.

minor comments (2)

[Abstract] The abstract refers to a 'smoothed empirical t-curve' without specifying the smoothing kernel, bandwidth selection rule, or robustness checks; these details are needed for reproducibility and should be stated explicitly.
[Empirical application section] The empirical application would benefit from Monte Carlo simulations or power calculations under controlled selective-reporting scenarios to illustrate finite-sample behavior of the test.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments. The referee correctly identifies that the sharpness of the projection test depends on a precise definition of the honest set. We address this point below and will revise the manuscript to improve clarity on the construction.

read point-by-point responses

Referee: [Theoretical section defining the honest distribution set and the projection test] The central sharpness claim (abstract and theoretical development of the projection test) rests on correctly identifying the set of all possible honest t-curves from the reported t-statistics alone. Honest distributions are parameterized by degrees of freedom (hence sample size) and the precise form of continuous or discrete distortions; these parameters are not encoded in the t-values themselves. The manuscript must provide an explicit construction, algorithm, or proof showing how the honest set is recovered or bounded without additional DGP information. If this step relies on assumptions that are not recoverable from the t-statistics, the distance statistic can misclassify honest curves, undermining both the sharpness property and the claim that the test detects every detectable form of selective re

Authors: The honest set is defined as the closure of the union, over all possible degrees of freedom and all admissible benign distortions (rounding to a stated precision, use of the t rather than normal approximation, and similar), of the distributions of reported t-statistics that can arise under honest reporting. Because the test statistic is the distance from the observed smoothed curve to this union, no specific df or distortion parameter needs to be recovered from the data; the projection simply finds the closest element in the set. The theoretical section provides a characterization of this set via the properties of the t-family and the admissible distortion operators, which is sufficient to establish sharpness: any selective reporting that produces a curve outside the set is detectable by the test, and any curve inside the set is consistent with some honest DGP. We acknowledge that an explicit computational algorithm for approximating the projection was only sketched rather than fully detailed. We will add a self-contained algorithmic description and a short proof that the set is closed under the relevant operations in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity in projection test or honest-set characterization

full rationale

The paper defines its test statistic directly as the distance between the smoothed empirical t-curve and a theoretically characterized set of all possible honest t-distributions. This set is derived from standard properties of the t-distribution, degrees of freedom, and reporting rules rather than from the empirical data itself or any fitted parameters. The sharpness claim follows mathematically from the definition of projection onto that feasible set and does not reduce to a tautology or self-referential construction. No load-bearing self-citations, ansatzes, or renamings of known results appear in the core derivation; the approach remains independent of the specific dataset to which it is applied.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability to characterize the set of all distributions of t-statistics that can arise under honest reporting; this characterization is treated as known and external to the paper.

axioms (1)

domain assumption The set of possible honest t-distributions can be computed or approximated without knowledge of the original data-generating process.
Invoked when defining the projection target for the test statistic.

pith-pipeline@v0.9.0 · 5671 in / 1338 out tokens · 26787 ms · 2026-05-19T07:45:17.766014+00:00 · methodology

When is p-hacking detectable?

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)