pith. sign in

arxiv: 2510.09276 · v2 · submitted 2025-10-10 · 📊 stat.ME · stat.AP

The bixplot: A variation on the boxplot suited for bimodal data

Pith reviewed 2026-05-18 08:04 UTC · model grok-4.3

classification 📊 stat.ME stat.AP
keywords bixplotboxplotbimodalityunivariate clusteringdata visualizationmultimodalityexploratory analysis
0
0 comments X

The pith

The bixplot extends the boxplot to detect and display bimodality and multimodality using contiguous univariate clustering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the bixplot as an extension to standard boxplots specifically for cases where data show multiple peaks or modes. It builds a univariate clustering method that forces clusters to be contiguous and requires each to hold at least a minimum number of unique values. This produces a display that reveals subgroups while retaining the summary statistics of a boxplot and plotting every data point. A reader would care because many real datasets contain hidden bimodal structure that conventional plots obscure or ignore. The paper illustrates the tool on several examples and supplies code in Python and R with options such as coloring points by an external variable.

Core claim

The bixplot is designed to detect and display bimodality and multimodality when the data warrant it by using a univariate clustering method that ensures contiguous clusters with each containing at least a given number of unique members, thereby facilitating the identification and interpretation of potentially meaningful subgroups underlying the data.

What carries the argument

A univariate clustering method that produces contiguous clusters each containing at least a given number of unique members.

If this is right

  • The display highlights potentially meaningful subgroups when bimodality or multimodality is present.
  • Individual data points remain visible so isolated values stand out.
  • An external variable can be shown through color gradations inside the same plot.
  • The method works for both Python and R users on real datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The bixplot could be tested on time-series data to detect shifts between regimes.
  • Pairing the visual clusters with a formal multimodality test would strengthen interpretation.
  • The minimum-cluster-size parameter might be chosen automatically from the data size and spread.

Load-bearing premise

The clustering method will separate the data into potentially meaningful subgroups rather than arbitrary partitions.

What would settle it

A dataset with independently known bimodal structure in which the bixplot's clusters fail to separate the two modes.

Figures

Figures reproduced from arXiv: 2510.09276 by Camille M. Montalcini, Peter J. Rousseeuw.

Figure 1
Figure 1. Figure 1: Graphical displays of the bloodfat data from ( [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Displaying generated unimodal, bimodal, and multimodal variables by violin plots (left) [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of violin plots and bixplots for (top row) behavior latency time of fish, where [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (Left) bixplot of the standardized variables of the iris data; (right) plot of the fourth [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Bixplots of the iris petal length using three sizing options for the bodies between density [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Penguins data: bixplots of bill length as a function of island and sex. On the left we [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Penguins data: coloring the rugs of bixplots visualizes relations with an additional [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Boxplots and related visualization methods are widely used exploratory tools for taking a first look at collections of univariate variables. In this note an extension is provided that is specifically designed to detect and display bimodality and multimodality when the data warrant it. For this purpose a univariate clustering method is constructed that ensures contiguous clusters, meaning that no cluster has members inside another cluster, and such that each cluster contains at least a given number of unique members. The resulting bixplot display facilitates the identification and interpretation of potentially meaningful subgroups underlying the data. The bixplot also displays the individual data values, which can draw attention to isolated points. Implementations of the bixplot are available in both Python and R, and their many options are illustrated on several real datasets. For instance, an external variable can be visualized by color gradations inside the display.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces the bixplot, a boxplot variant for visualizing bimodal and multimodal univariate data. It constructs a univariate clustering procedure that enforces contiguous clusters (no cluster members inside another) with each cluster containing at least a user-specified minimum number of unique observations. The display shows these clusters, individual data points, and supports options such as coloring by an external variable. Python and R implementations are provided and illustrated on several real datasets.

Significance. If the clustering procedure can be shown to identify multimodality in a statistically principled way rather than through size constraints alone, the bixplot would offer a useful exploratory visualization tool that extends standard boxplots for subgroup detection. The open implementations and real-data examples would strengthen its practical value in statistical methodology.

major comments (1)
  1. Abstract: the claim that the bixplot 'detects and displays bimodality and multimodality when the data warrant it' rests on the univariate clustering method, yet the description supplies only geometric contiguity and minimum-cardinality constraints with no data-driven separation criterion (e.g., gap statistic, density threshold, or likelihood-ratio test). In a sufficiently large unimodal sample the procedure can therefore return multiple clusters justified solely by the size floor, undermining the 'when the data warrant it' guarantee.
minor comments (2)
  1. Abstract and method description: no equations, pseudocode, or formal definition of the clustering algorithm are supplied, preventing readers from reproducing or verifying the exact procedure.
  2. Consider adding a small simulation study comparing the bixplot clusters against known multimodal mixtures or against kernel-density mode detection to quantify false-positive splits.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We agree that the abstract's phrasing regarding detection of bimodality and multimodality requires clarification to accurately reflect the exploratory and constraint-based nature of the clustering procedure. We address the major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: Abstract: the claim that the bixplot 'detects and displays bimodality and multimodality when the data warrant it' rests on the univariate clustering method, yet the description supplies only geometric contiguity and minimum-cardinality constraints with no data-driven separation criterion (e.g., gap statistic, density threshold, or likelihood-ratio test). In a sufficiently large unimodal sample the procedure can therefore return multiple clusters justified solely by the size floor, undermining the 'when the data warrant it' guarantee.

    Authors: We thank the referee for this observation. The univariate clustering procedure partitions the sorted data into contiguous groups (ensuring no interleaving of cluster members) where each group meets a user-specified minimum cardinality of unique observations. No formal statistical modality test, gap statistic, density threshold, or likelihood-based criterion is employed to determine splits; the separation arises from the contiguity constraint combined with the size floor. Consequently, in large unimodal samples, multiple clusters can form purely when the minimum size permits partitioning into several qualifying segments. The original phrasing 'detects and displays bimodality and multimodality when the data warrant it' could be read as implying a data-driven statistical warrant, which is not the case. We have therefore revised the abstract to state that the bixplot 'facilitates the visualization of potential bimodality and multimodality by partitioning the data into contiguous clusters each containing at least a user-specified minimum number of observations.' This revision emphasizes the method's role as an exploratory visualization tool rather than a formal detector of modes, while retaining the practical utility for subgroup exploration. The full manuscript text already describes the procedure in these terms, so only the abstract required adjustment. revision: yes

Circularity Check

0 steps flagged

No significant circularity in bixplot clustering construction

full rationale

The paper presents an original algorithmic construction for a univariate clustering procedure that enforces contiguity and a minimum number of unique members per cluster, then uses the resulting partitions to extend the boxplot into a bixplot for visualizing potential multimodality. This construction is defined directly by the geometric and cardinality constraints stated in the abstract and method description, without reducing any claimed result to a fitted parameter, self-referential definition, or load-bearing self-citation. No equations or steps equate a derived quantity to its own inputs by construction, and the approach is self-contained as a new exploratory tool rather than a tautological renaming or prediction of prior fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities. The minimum cluster size and contiguity requirement are implicit design choices but not quantified or justified here.

pith-pipeline@v0.9.0 · 5678 in / 1111 out tokens · 41386 ms · 2026-05-18T08:04:08.567720+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    Thomas Kelly, Tom M

    Daniel Adler, S. Thomas Kelly, Tom M. Elliott, and Jordan Adamson.vioplot: Violin Plot. CRAN,Rpackage, 2025.https://CRAN.R-project.org/package=vioplot. Andreas Alfons.robustHD: Robust methods for high-dimensional data. CRAN,Rpackage,

  2. [2]

    Jonathan B

    Technical Report MSR-TR-2000-65. Jonathan B. Freeman and Rick Dale. Assessing bimodality to detect the presence of a dual cognitive process.Behavior Research Methods, 45(1):83–97,

  3. [3]

    Michael Friendly, J¨ urgen Symanzik, and Ortac Onder

    doi: 10.3758/s13428-012-0225-x. Michael Friendly, J¨ urgen Symanzik, and Ortac Onder. Visualizing the Titanic disaster.Significance, 16(1):14–19, 2019.https://doi.org/10.1111/j.1740-9713.2019.01229.x. David Hand, F. Daly, A. Lunn, K. McConway, and E. Ostrowski.A Handbook of Small Data Sets. London: Chapman and Hall,

  4. [4]

    Leon Kaufman and Peter J

    doi: 10.18637/jss.v028.c01. Leon Kaufman and Peter J. Rousseeuw. Clustering by means of Medoids. In Y. Dodge, editor, Statistical Data Analysis Based on theL 1–Norm and Related Methods, pages 405–416. North- Holland, Amsterdam,

  5. [5]

    14 Martin Maechler, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik.cluster: Meth- ods for Cluster Analysis

    doi: 10.1093/beheco/araa117. 14 Martin Maechler, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik.cluster: Meth- ods for Cluster Analysis. CRAN,Rpackage, 2013.https://CRAN.R-project.org/package= cluster. Robert McGill, John W. Tukey, and Wayne A. Larsen. Variations of Box Plots.The American Statistician, 32:12–16,

  6. [6]

    URLhttps: //doi.org/10.1016/0377-0427(87)90125-7. Peter J. Rousseeuw, Ida Ruts, and John W. Tukey. The Bagplot: A Bivariate Boxplot.The American Statistician, 53(4):382–387,

  7. [7]

    Scikit-learn-extra development team

    URLhttps://doi.org/10.1016/j.is.2021.101804. Scikit-learn-extra development team. Scikit-learn-extra – A set of useful tools compatible with scikit-learn, 2020.https://github.com/scikit-learn-contrib/scikit-learn-extra. David W. Scott, Antonio M. Gotto, James S. Cole, and G. Antony Gorry. Plasma lipids as collateral risk factors in coronary artery disease...

  8. [8]

    doi: 10.1111/j.1466-8238.2010.00576.x

    ISSN 1466-8238. doi: 10.1111/j.1466-8238.2010.00576.x. John W. Tukey.Exploratory Data Analysis. Reading, MA: Addison-Wesley,

  9. [9]

    Susan VanderPlas, Yawei Ge, Anthony Unwin, and Heike Hofmann. Penguins Go Parallel: A Gram- mar of Graphics Framework for Generalized Parallel Coordinate Plots.Journal of Computational and Graphical Statistics, 32(4):1572–1587, 2023.https://doi.org/10.1080/10618600.2023. 2195462. Michael L. Waskom.seaborn: statistical data visualization.Journal of Open So...

  10. [10]

    seaborn: statistical data visualization

    doi: 10.21105/joss.03021. URLhttps://doi.org/10.21105/joss.03021. 15 Supplementary Text A More examples of the bixplot display The Top Gear data of Alfons (2016) contains numerical and categorical variables on 297 cars. Here we consider four numeric variables: theWeightof the car, itsTopSpeedandPrice, and its engine Displacement, as well as the categorica...

  11. [11]

    Rerunning the bixplot on the cleaned data gave the left panel of Figure S1

    with an impossibly low weight of 210 kilograms, so we set that entry to missing. Rerunning the bixplot on the cleaned data gave the left panel of Figure S1. Figure S1: (Left) bixplot of standardized variables of the Top Gear data; (right) plot ofTopSpeed versusDisplacement. Among the four variables,TopSpeedandDisplacementare deemed not unimodal and fitted...

  12. [12]

    (2000) who constructed a version ofk-means with this type of constraint

    (Similarly, when a variable has fewer values thanclusMinN, only points are drawn.) We have implemented this in a constrained version ofk-medoids for givenk, following ideas of Bradley et al. (2000) who constructed a version ofk-means with this type of constraint. We will provide a sketch of the algorithm. For simplicity we first assume that the univariate...

  13. [13]

    Martin Maechler and Dario Ringach.diptest: Hartigan’s Dip Test Statistic for Unimodality

    https://CRAN.R-project.org/package=lpSolve. Martin Maechler and Dario Ringach.diptest: Hartigan’s Dip Test Statistic for Unimodality. CRAN,Rpackage, 2024.https://CRAN.R-project.org/package=diptest. Stuart Mitchell, Michael O’Sullivan, and Iain Dunning.Pulp: A Linear Programming Toolkit for Python. Technical Report, The University of Auckland, New Zealand,...