The bixplot: A variation on the boxplot suited for bimodal data
Pith reviewed 2026-05-18 08:04 UTC · model grok-4.3
The pith
The bixplot extends the boxplot to detect and display bimodality and multimodality using contiguous univariate clustering.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The bixplot is designed to detect and display bimodality and multimodality when the data warrant it by using a univariate clustering method that ensures contiguous clusters with each containing at least a given number of unique members, thereby facilitating the identification and interpretation of potentially meaningful subgroups underlying the data.
What carries the argument
A univariate clustering method that produces contiguous clusters each containing at least a given number of unique members.
If this is right
- The display highlights potentially meaningful subgroups when bimodality or multimodality is present.
- Individual data points remain visible so isolated values stand out.
- An external variable can be shown through color gradations inside the same plot.
- The method works for both Python and R users on real datasets.
Where Pith is reading between the lines
- The bixplot could be tested on time-series data to detect shifts between regimes.
- Pairing the visual clusters with a formal multimodality test would strengthen interpretation.
- The minimum-cluster-size parameter might be chosen automatically from the data size and spread.
Load-bearing premise
The clustering method will separate the data into potentially meaningful subgroups rather than arbitrary partitions.
What would settle it
A dataset with independently known bimodal structure in which the bixplot's clusters fail to separate the two modes.
Figures
read the original abstract
Boxplots and related visualization methods are widely used exploratory tools for taking a first look at collections of univariate variables. In this note an extension is provided that is specifically designed to detect and display bimodality and multimodality when the data warrant it. For this purpose a univariate clustering method is constructed that ensures contiguous clusters, meaning that no cluster has members inside another cluster, and such that each cluster contains at least a given number of unique members. The resulting bixplot display facilitates the identification and interpretation of potentially meaningful subgroups underlying the data. The bixplot also displays the individual data values, which can draw attention to isolated points. Implementations of the bixplot are available in both Python and R, and their many options are illustrated on several real datasets. For instance, an external variable can be visualized by color gradations inside the display.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the bixplot, a boxplot variant for visualizing bimodal and multimodal univariate data. It constructs a univariate clustering procedure that enforces contiguous clusters (no cluster members inside another) with each cluster containing at least a user-specified minimum number of unique observations. The display shows these clusters, individual data points, and supports options such as coloring by an external variable. Python and R implementations are provided and illustrated on several real datasets.
Significance. If the clustering procedure can be shown to identify multimodality in a statistically principled way rather than through size constraints alone, the bixplot would offer a useful exploratory visualization tool that extends standard boxplots for subgroup detection. The open implementations and real-data examples would strengthen its practical value in statistical methodology.
major comments (1)
- Abstract: the claim that the bixplot 'detects and displays bimodality and multimodality when the data warrant it' rests on the univariate clustering method, yet the description supplies only geometric contiguity and minimum-cardinality constraints with no data-driven separation criterion (e.g., gap statistic, density threshold, or likelihood-ratio test). In a sufficiently large unimodal sample the procedure can therefore return multiple clusters justified solely by the size floor, undermining the 'when the data warrant it' guarantee.
minor comments (2)
- Abstract and method description: no equations, pseudocode, or formal definition of the clustering algorithm are supplied, preventing readers from reproducing or verifying the exact procedure.
- Consider adding a small simulation study comparing the bixplot clusters against known multimodal mixtures or against kernel-density mode detection to quantify false-positive splits.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We agree that the abstract's phrasing regarding detection of bimodality and multimodality requires clarification to accurately reflect the exploratory and constraint-based nature of the clustering procedure. We address the major comment below and have revised the manuscript accordingly.
read point-by-point responses
-
Referee: Abstract: the claim that the bixplot 'detects and displays bimodality and multimodality when the data warrant it' rests on the univariate clustering method, yet the description supplies only geometric contiguity and minimum-cardinality constraints with no data-driven separation criterion (e.g., gap statistic, density threshold, or likelihood-ratio test). In a sufficiently large unimodal sample the procedure can therefore return multiple clusters justified solely by the size floor, undermining the 'when the data warrant it' guarantee.
Authors: We thank the referee for this observation. The univariate clustering procedure partitions the sorted data into contiguous groups (ensuring no interleaving of cluster members) where each group meets a user-specified minimum cardinality of unique observations. No formal statistical modality test, gap statistic, density threshold, or likelihood-based criterion is employed to determine splits; the separation arises from the contiguity constraint combined with the size floor. Consequently, in large unimodal samples, multiple clusters can form purely when the minimum size permits partitioning into several qualifying segments. The original phrasing 'detects and displays bimodality and multimodality when the data warrant it' could be read as implying a data-driven statistical warrant, which is not the case. We have therefore revised the abstract to state that the bixplot 'facilitates the visualization of potential bimodality and multimodality by partitioning the data into contiguous clusters each containing at least a user-specified minimum number of observations.' This revision emphasizes the method's role as an exploratory visualization tool rather than a formal detector of modes, while retaining the practical utility for subgroup exploration. The full manuscript text already describes the procedure in these terms, so only the abstract required adjustment. revision: yes
Circularity Check
No significant circularity in bixplot clustering construction
full rationale
The paper presents an original algorithmic construction for a univariate clustering procedure that enforces contiguity and a minimum number of unique members per cluster, then uses the resulting partitions to extend the boxplot into a bixplot for visualizing potential multimodality. This construction is defined directly by the geometric and cardinality constraints stated in the abstract and method description, without reducing any claimed result to a fitted parameter, self-referential definition, or load-bearing self-citation. No equations or steps equate a derived quantity to its own inputs by construction, and the approach is self-contained as a new exploratory tool rather than a tautological renaming or prediction of prior fits.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
a univariate clustering method is constructed that ensures contiguous clusters... each cluster contains at least a given number of unique members... silhouette score... Hartigan’s dip test
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the bixplot display facilitates the identification and interpretation of potentially meaningful subgroups
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Daniel Adler, S. Thomas Kelly, Tom M. Elliott, and Jordan Adamson.vioplot: Violin Plot. CRAN,Rpackage, 2025.https://CRAN.R-project.org/package=vioplot. Andreas Alfons.robustHD: Robust methods for high-dimensional data. CRAN,Rpackage,
work page 2025
-
[2]
Technical Report MSR-TR-2000-65. Jonathan B. Freeman and Rick Dale. Assessing bimodality to detect the presence of a dual cognitive process.Behavior Research Methods, 45(1):83–97,
work page 2000
-
[3]
Michael Friendly, J¨ urgen Symanzik, and Ortac Onder
doi: 10.3758/s13428-012-0225-x. Michael Friendly, J¨ urgen Symanzik, and Ortac Onder. Visualizing the Titanic disaster.Significance, 16(1):14–19, 2019.https://doi.org/10.1111/j.1740-9713.2019.01229.x. David Hand, F. Daly, A. Lunn, K. McConway, and E. Ostrowski.A Handbook of Small Data Sets. London: Chapman and Hall,
-
[4]
doi: 10.18637/jss.v028.c01. Leon Kaufman and Peter J. Rousseeuw. Clustering by means of Medoids. In Y. Dodge, editor, Statistical Data Analysis Based on theL 1–Norm and Related Methods, pages 405–416. North- Holland, Amsterdam,
-
[5]
doi: 10.1093/beheco/araa117. 14 Martin Maechler, Peter Rousseeuw, Anja Struyf, Mia Hubert, and Kurt Hornik.cluster: Meth- ods for Cluster Analysis. CRAN,Rpackage, 2013.https://CRAN.R-project.org/package= cluster. Robert McGill, John W. Tukey, and Wayne A. Larsen. Variations of Box Plots.The American Statistician, 32:12–16,
-
[6]
URLhttps: //doi.org/10.1016/0377-0427(87)90125-7. Peter J. Rousseeuw, Ida Ruts, and John W. Tukey. The Bagplot: A Bivariate Boxplot.The American Statistician, 53(4):382–387,
-
[7]
Scikit-learn-extra development team
URLhttps://doi.org/10.1016/j.is.2021.101804. Scikit-learn-extra development team. Scikit-learn-extra – A set of useful tools compatible with scikit-learn, 2020.https://github.com/scikit-learn-contrib/scikit-learn-extra. David W. Scott, Antonio M. Gotto, James S. Cole, and G. Antony Gorry. Plasma lipids as collateral risk factors in coronary artery disease...
-
[8]
doi: 10.1111/j.1466-8238.2010.00576.x
ISSN 1466-8238. doi: 10.1111/j.1466-8238.2010.00576.x. John W. Tukey.Exploratory Data Analysis. Reading, MA: Addison-Wesley,
-
[9]
Susan VanderPlas, Yawei Ge, Anthony Unwin, and Heike Hofmann. Penguins Go Parallel: A Gram- mar of Graphics Framework for Generalized Parallel Coordinate Plots.Journal of Computational and Graphical Statistics, 32(4):1572–1587, 2023.https://doi.org/10.1080/10618600.2023. 2195462. Michael L. Waskom.seaborn: statistical data visualization.Journal of Open So...
-
[10]
seaborn: statistical data visualization
doi: 10.21105/joss.03021. URLhttps://doi.org/10.21105/joss.03021. 15 Supplementary Text A More examples of the bixplot display The Top Gear data of Alfons (2016) contains numerical and categorical variables on 297 cars. Here we consider four numeric variables: theWeightof the car, itsTopSpeedandPrice, and its engine Displacement, as well as the categorica...
-
[11]
Rerunning the bixplot on the cleaned data gave the left panel of Figure S1
with an impossibly low weight of 210 kilograms, so we set that entry to missing. Rerunning the bixplot on the cleaned data gave the left panel of Figure S1. Figure S1: (Left) bixplot of standardized variables of the Top Gear data; (right) plot ofTopSpeed versusDisplacement. Among the four variables,TopSpeedandDisplacementare deemed not unimodal and fitted...
work page 2019
-
[12]
(2000) who constructed a version ofk-means with this type of constraint
(Similarly, when a variable has fewer values thanclusMinN, only points are drawn.) We have implemented this in a constrained version ofk-medoids for givenk, following ideas of Bradley et al. (2000) who constructed a version ofk-means with this type of constraint. We will provide a sketch of the algorithm. For simplicity we first assume that the univariate...
work page 2000
-
[13]
Martin Maechler and Dario Ringach.diptest: Hartigan’s Dip Test Statistic for Unimodality
https://CRAN.R-project.org/package=lpSolve. Martin Maechler and Dario Ringach.diptest: Hartigan’s Dip Test Statistic for Unimodality. CRAN,Rpackage, 2024.https://CRAN.R-project.org/package=diptest. Stuart Mitchell, Michael O’Sullivan, and Iain Dunning.Pulp: A Linear Programming Toolkit for Python. Technical Report, The University of Auckland, New Zealand,...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.