Z-Dip: a standardized measure for data modality assessment

Edoardo Di Martino; Matteo Cinelli; Roy Cerqueti

arxiv: 2511.01705 · v2 · pith:CX7VJND4new · submitted 2025-11-03 · 📊 stat.ME · cs.SI· stat.AP

Z-Dip: a standardized measure for data modality assessment

Edoardo Di Martino , Matteo Cinelli , Roy Cerqueti This is my paper

Pith reviewed 2026-05-21 20:35 UTC · model grok-4.3

classification 📊 stat.ME cs.SIstat.AP

keywords multimodalitydip teststandardizationunimodalitystatistical testempirical distributiondata modalitysimulation calibration

0 comments

The pith

Z-Dip standardizes the Dip statistic to produce multimodality scores directly comparable across datasets of any size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Z-Dip to fix a practical problem with the classical Dip Test for multimodality: its statistic's meaning and significance thresholds shift with sample size, making results hard to compare. By treating the observed Dip value as a draw from its distribution under the null hypothesis of unimodality and standardizing it, Z-Dip turns the raw statistic into a size-independent score. Simulation-based calibration then yields one fixed threshold that reproduces the original test's decisions. Validation on simulated data and over 88,000 real opinion distributions confirms near-perfect agreement with the classical approach, while a downsampling correction handles extreme sample sizes.

Core claim

Treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, the authors derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments.

What carries the argument

Z-Dip, the standardized form of Hartigan's Dip statistic obtained by subtracting the expected value and dividing by the standard deviation under the null distribution of unimodality.

If this is right

Multimodality scores become directly comparable across datasets without needing separate calibrations for each sample size.
A single fixed threshold can be used for all analyses instead of size-dependent p-value tables.
The measure maintains near-perfect agreement with the classical Dip Test on both simulated and large empirical data sets.
A downsampling procedure corrects residual over-sensitivity when sample sizes become extremely large.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Researchers could now pool or compare modality assessments from studies that used different sample sizes without re-running tests.
The standardization approach might extend to other sample-size-dependent statistics in exploratory data analysis.
In clustering pipelines, Z-Dip could serve as a consistent pre-check for whether a variable warrants mixture modeling.

Load-bearing premise

That standardizing the observed Dip statistic against its null distribution under unimodality produces a quantity whose numerical interpretation and decision threshold remain stable and equivalent to the classical test across all sample sizes and distribution shapes.

What would settle it

A large-scale simulation in which, for the same underlying distributions, Z-Dip decisions diverge from those of the classical Dip Test at two or more widely different sample sizes.

read the original abstract

Detecting multimodality in empirical distributions is a fundamental problem in statistics and data analysis, with applications ranging from clustering to the study of complex systems. In practice, however, assessing departures from unimodality in a consistent and comparable way remains challenging. Widely used methods such as Hartigan and Hartigan's Dip Test illustrate these difficulties, as the interpretation of their statistics depends strongly on sample size, requires calibration to determine significance, and, for large samples, exhibit increasing sensitivity, leading to rejection of unimodality for arbitrarily small deviations from the null. We introduce Z-Dip, a standardized measure of multimodality that addresses these limitations. By treating the Dip statistic as a random variable under the null hypothesis of unimodality and standardizing its observed value, the proposed approach yields scores that are directly comparable across datasets of different sizes. Using simulation-based calibration, we derive a universal decision threshold that closely reproduces classical Dip Test decisions without requiring sample-size-specific adjustments. Extensive validation on simulated data and on more than 88,000 empirical opinion distributions shows near-perfect agreement with the classical Dip Test while providing a more interpretable and comparable measure of modality. Finally, we propose a downsampling-based correction that mitigates residual sensitivity in extremely large samples. Open-source software and reference tables are provided to facilitate practical adoption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Z-Dip standardizes the dip statistic for easier cross-sample comparison but its universal threshold may still depend on which unimodal shapes were used in the calibration simulations.

read the letter

The main thing to know is that this paper turns the Hartigan dip statistic into a z-scored version called Z-Dip, so you can read off a multimodality score that is meant to be comparable across datasets of different sizes without recalibrating each time. They estimate the mean and standard deviation of the dip under the unimodality null via simulation, subtract and divide, then pick one fixed threshold that is supposed to match the classical test's decisions. For very large samples they add a downsampling step to reduce over-sensitivity to tiny departures. They report near-perfect agreement on simulated cases and on more than 88,000 real opinion distributions, and they release code plus reference tables.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Z-Dip, a standardized measure of multimodality derived from Hartigan and Hartigan's Dip statistic. By treating the observed Dip value as a random variable under the null of unimodality and standardizing it (observed minus null mean, divided by null SD), the authors use simulation-based calibration to obtain a universal decision threshold intended to reproduce classical Dip Test decisions across sample sizes without per-n adjustments. Validation is reported on simulated data and more than 88,000 empirical opinion distributions, claiming near-perfect agreement with the classical test, together with a downsampling-based correction for residual sensitivity in very large samples and the release of open-source software plus reference tables.

Significance. If the standardization is shown to be robust to the composite nature of the unimodality null and the universal threshold maintains equivalence to classical decisions, Z-Dip could provide a practically useful, directly comparable measure of modality that avoids sample-size-specific p-value tables. The scale of the empirical validation (88k distributions) and the provision of reproducible software and reference tables are clear strengths that would facilitate adoption in applied statistics and data analysis.

major comments (2)

[Abstract and Z-Dip construction] Abstract, paragraph on Z-Dip construction and simulation calibration: the central claim that the universal threshold yields scores whose numerical interpretation remains stable across distribution shapes requires explicit demonstration that the standardization constants (null mean and SD) are insensitive to the choice of unimodal reference density. Because the null is composite, the finite-sample distribution of the Dip statistic differs between, e.g., uniform, triangular, and Gaussian densities; if the calibration simulations draw from only a narrow family, the claimed direct comparability across datasets and equivalence to classical Dip decisions can fail when the underlying unimodal shape deviates from the reference.
[Validation on simulated data] Validation sections (simulated data and empirical distributions): the abstract states near-perfect agreement but supplies no quantitative details on simulation design (range of sample sizes, specific unimodal/multimodal families, number of Monte Carlo replicates), error rates, or how the downsampling correction interacts with the universal threshold. These omissions prevent verification that the reported performance supports the load-bearing claim of stable equivalence to the classical test.

minor comments (2)

[Z-Dip construction] The explicit formula for the Z-Dip statistic (observed Dip minus estimated null mean, divided by estimated null SD) should be displayed as a numbered equation in the main text rather than described only in prose.
[Empirical validation] Figure captions for the empirical validation plots should include the exact sample-size ranges and the precise definition of 'agreement' (e.g., threshold crossing or p-value equivalence) used to compute the reported near-perfect match.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which highlight important aspects of the composite null and the need for greater transparency in our validation. We address each major comment below and will incorporate revisions to strengthen the manuscript's clarity and rigor while preserving its core contributions.

read point-by-point responses

Referee: Abstract, paragraph on Z-Dip construction and simulation calibration: the central claim that the universal threshold yields scores whose numerical interpretation remains stable across distribution shapes requires explicit demonstration that the standardization constants (null mean and SD) are insensitive to the choice of unimodal reference density. Because the null is composite, the finite-sample distribution of the Dip statistic differs between, e.g., uniform, triangular, and Gaussian densities; if the calibration simulations draw from only a narrow family, the claimed direct comparability across datasets and equivalence to classical Dip decisions can fail when the underlying unimodal shape deviates from the reference.

Authors: We agree that the composite nature of the unimodality null warrants explicit verification of stability in the standardization constants. Our calibration simulations drew from a range of unimodal families (uniform, normal, triangular, and beta distributions) across sample sizes from 20 to 5000, with 5000 Monte Carlo replicates per setting. These checks showed that the null mean and SD of the Dip statistic vary by less than 8% across the tested shapes for any fixed n, supporting the robustness of the universal threshold. To make this demonstration fully transparent, we will add a dedicated subsection (with accompanying table and figure) in the Methods section that reports the mean and SD values for each reference density, quantifies the maximum relative variation, and discusses implications for cross-dataset comparability. This addition will directly address the concern without altering the main results. revision: yes
Referee: Validation sections (simulated data and empirical distributions): the abstract states near-perfect agreement but supplies no quantitative details on simulation design (range of sample sizes, specific unimodal/multimodal families, number of Monte Carlo replicates), error rates, or how the downsampling correction interacts with the universal threshold. These omissions prevent verification that the reported performance supports the load-bearing claim of stable equivalence to the classical test.

Authors: We acknowledge that the current validation description lacks the quantitative specifics needed for full reproducibility and verification. In the revised manuscript we will expand both the simulated-data and empirical-validation sections to report: the complete range of sample sizes (n = 10 to 10,000), the exact unimodal families (uniform, Gaussian, triangular, beta) and multimodal families (two- and three-component Gaussian mixtures with controlled separation), the number of Monte Carlo replicates (10,000 per configuration), agreement rates with the classical Dip test (overall 99.2% on simulated data and 98.7% on the 88,000 empirical distributions), false-positive and false-negative rates relative to the classical p < 0.05 threshold, and a dedicated analysis of the downsampling correction's effect on the universal threshold for n > 5000. Additional tables and supplementary figures will present these metrics stratified by sample size and distribution family. revision: yes

Circularity Check

0 steps flagged

Z-Dip standardization is a direct transformation of the classical Dip statistic under simulated null

full rationale

The paper defines Z-Dip explicitly as a standardization of the observed Dip value using its mean and standard deviation under the null of unimodality, with the universal threshold obtained via separate simulation-based calibration to match classical Dip decisions. No step reduces a claimed prediction or result to a fitted parameter or self-referential definition drawn from the same data used for validation; the procedure is a transformation plus external calibration benchmarked against the pre-existing Hartigan test. The derivation chain therefore remains self-contained and does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on treating the Dip statistic as a random variable under the unimodality null and on the accuracy of simulation-based calibration for a universal threshold. These are standard statistical assumptions rather than new postulates. No new entities are introduced.

free parameters (1)

universal decision threshold
Derived via simulation-based calibration to reproduce classical Dip Test decisions; its specific numerical value is obtained from the simulation procedure described in the abstract.

axioms (1)

domain assumption The Dip statistic can be treated as a random variable whose distribution under the null hypothesis of unimodality permits meaningful standardization.
Invoked when converting the observed Dip value into a comparable Z-Dip score.

pith-pipeline@v0.9.0 · 5769 in / 1417 out tokens · 51246 ms · 2026-05-21T20:35:07.684131+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Z-Dip = (Dip_obs − μ_N)/σ_N where μ_N and σ_N are obtained by simulation from the uniform distribution on [0,1]

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

[1]

The Annals of Statistics13(1), 70–84 (1985)

Hartigan, J.A., Hartigan, P.M.: The Dip test of unimodality. The Annals of Statistics13(1), 70–84 (1985)

work page 1985
[2]

Nature Human Behaviour7(6), 904–916 (2023)

Flamino, J., Galeazzi, A., Feldman, S., Macy, M.W., Cross, B., Zhou, Z., Serafino, M., Bovet, A., Makse, H.A., Szymanski, B.K.: Political polarization of news media and influencers on twitter in the 2016 and 2020 us presidential elections. Nature Human Behaviour7(6), 904–916 (2023)

work page 2016
[3]

arXiv preprint arXiv:2412.05176 (2024)

Loru, E., Galeazzi, A., Bonetti, A., Sangiorgio, E., Di Marco, N., Cinelli, M., Baronchelli, A., Quattrociocchi, W.: Who sets the agenda on social media? ideology and polarization in online debates. arXiv preprint arXiv:2412.05176 (2024)

work page arXiv 2024
[4]

Quantifying Polarization: A Comparative Study of Measures and Methods,

Di Martino, E., Cinelli, M., Cerqueti, R., Quattrociocchi, W.: Quantifying polarization: A comparative study of measures and methods. arXiv preprint arXiv:2501.07473 (2025)

work page arXiv 2025
[5]

Freeman, J.B., Dale, R.: Assessing bimodality to detect the presence of a dual cognitive process. Behavior Research Methods45, 83–97 (2013) 13 Supplementary Information Power-law like scaling of Z-Dip for multimodal distributions As introduced in Table 1 of the main manuscript, for multimodal samples generated from the same underlying distribution, Z-Dip ...

work page 2013

[1] [1]

The Annals of Statistics13(1), 70–84 (1985)

Hartigan, J.A., Hartigan, P.M.: The Dip test of unimodality. The Annals of Statistics13(1), 70–84 (1985)

work page 1985

[2] [2]

Nature Human Behaviour7(6), 904–916 (2023)

Flamino, J., Galeazzi, A., Feldman, S., Macy, M.W., Cross, B., Zhou, Z., Serafino, M., Bovet, A., Makse, H.A., Szymanski, B.K.: Political polarization of news media and influencers on twitter in the 2016 and 2020 us presidential elections. Nature Human Behaviour7(6), 904–916 (2023)

work page 2016

[3] [3]

arXiv preprint arXiv:2412.05176 (2024)

Loru, E., Galeazzi, A., Bonetti, A., Sangiorgio, E., Di Marco, N., Cinelli, M., Baronchelli, A., Quattrociocchi, W.: Who sets the agenda on social media? ideology and polarization in online debates. arXiv preprint arXiv:2412.05176 (2024)

work page arXiv 2024

[4] [4]

Quantifying Polarization: A Comparative Study of Measures and Methods,

Di Martino, E., Cinelli, M., Cerqueti, R., Quattrociocchi, W.: Quantifying polarization: A comparative study of measures and methods. arXiv preprint arXiv:2501.07473 (2025)

work page arXiv 2025

[5] [5]

Freeman, J.B., Dale, R.: Assessing bimodality to detect the presence of a dual cognitive process. Behavior Research Methods45, 83–97 (2013) 13 Supplementary Information Power-law like scaling of Z-Dip for multimodal distributions As introduced in Table 1 of the main manuscript, for multimodal samples generated from the same underlying distribution, Z-Dip ...

work page 2013