Investigating Performance and Practices with Univariate Distribution Charts

Anna Kurtenkova; Daniel Pahr; Laura Lotteraner; Torsten M\"oller

arxiv: 2604.08378 · v1 · submitted 2026-04-09 · 💻 cs.GR

Investigating Performance and Practices with Univariate Distribution Charts

Laura Lotteraner , Anna Kurtenkova , Torsten M\"oller , Daniel Pahr This is my paper

Pith reviewed 2026-05-10 16:58 UTC · model grok-4.3

classification 💻 cs.GR

keywords univariate distributionsvisualization chartsuser studyboxplotsviolin plotshistogramstask performancechart preferences

0 comments

The pith

Different charts for univariate distributions yield varying accuracy on analysis tasks, and popular options like histograms or boxplots are not always the most effective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper groups univariate distribution charts into four categories and tests four representatives—boxplots, violin plots, jittered strip plots, and histograms—on how well 215 participants perform low-level visual analysis tasks. Accuracy, misunderstandings, and subjective preferences are measured together through a click-to-select method plus interviews, showing that chart performance differs by task while familiarity and stated preference often fail to predict who does better. Practitioner interviews add that charts favored by general audiences or common in science are not inherently optimal across all uses.

Core claim

Charts for univariate distributions differ in the accuracy they support for low-level tasks, with measurable gaps in performance, distinct patterns of misunderstanding, and a mismatch between what users prefer or know and how well they actually perform; interviews confirm that widely adopted charts such as histograms and boxplots are not the best choice for every task.

What carries the argument

Mixed-methods user study that measures task accuracy via click-to-select on four representative charts (boxplots, violinplots, jittered stripplots, histograms) while also collecting preference and familiarity data plus practitioner interviews.

If this is right

Task accuracy is not uniform across chart types; some charts support certain low-level questions more reliably than others.
User preference and prior familiarity with a chart do not reliably predict higher accuracy on analysis tasks.
Commonly used charts such as histograms and boxplots can underperform relative to less familiar alternatives on specific tasks.
Visualization practitioners select charts based on convention or audience preference rather than measured effectiveness for the intended task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design tools could default to the chart type shown to be most accurate for the current analysis goal rather than the user's stated favorite.
Training materials might need to separate familiarity from effectiveness, teaching users to switch chart types when accuracy matters.
Follow-up work could test whether the accuracy gaps persist when participants use the charts in their own data-analysis workflows rather than in a controlled benchmark.

Load-bearing premise

The selected low-level benchmark tasks, click-to-select measurement, and sample of 215 participants stand in for real-world visual analysis needs and broader user populations.

What would settle it

A replication with a different set of tasks or a new participant pool in which accuracy differences between the four charts disappear or reverse would falsify the reported performance gaps.

Figures

Figures reproduced from arXiv: 2604.08378 by Anna Kurtenkova, Daniel Pahr, Laura Lotteraner, Torsten M\"oller.

**Figure 1.** Figure 1: The four charts used in this study (dataset 1): B Boxplot, V Violinplot, H Histogram, and S Stripplot, each representing one of the four classes of charts: Summary Statistics, Smooth Densities, Binned Densities, Individual Data Points. ABSTRACT A range of charts with different strengths and weaknesses exists to support the visual analysis of univariate distributions, with a limited understanding of which c… view at source ↗

**Figure 2.** Figure 2: The datasets used in the study vary in skew (symmetric: 1, 2, 5, 6; left-skewed: 3, 4, 7, 8), outlier values (left: 1, 3, 5, 7; right: 2, 4, 6, 8), and modality (unimodal: 1–4; bimodal: 5–8). The B boxplot (geom_boxplot) in our survey is based on the original publication by Tukey [Tuk77], with the whiskers representing the quartiles ±1.5 · IQR, and any values beyond that range individually plotted as outli… view at source ↗

**Figure 3.** Figure 3: Examples of participants’ selections on dataset 6 in the DESCRIBE task, for which participants could leave up to 10 marks on a chart to denote points of interest. A red dot represents a mark left by a participant. B Boxplots exhibit a high concentration on the median line and around outliers. V Violinplots show similar behavior also on quartile markings, as well as peaks and valleys. H Histograms are marke… view at source ↗

**Figure 4.** Figure 4: Examples of participants’ selections. Colored histograms indicate participants’ selections, with the outlines of the chart the selections refer to in black and the correct value in red. For better readability, only the coordinates of valid clicks (i.e., clicks on the chart) along the data axis are displayed. B RANGE: Participants mistake both the range of the box, and the end of the whiskers for the range … view at source ↗

**Figure 5.** Figure 5: Percentage of (a) valid clicks for the CLUSTERS task, (b) additional clicks for the CLUSTERS task, and (c) correct comparisons for both MEDIAN and MEAN. Because no correct value exists for the clusters task in boxplots, all clicks on the chart are counted as invalid, so no additional clicks exist by definition. the box as a cluster of values, and overall, several difficulties with reading the representatio… view at source ↗

**Figure 6.** Figure 6: Distributions of relative errors for (a) the RANGE and (b) the MEDIAN task that clearly show the superior performance of jittered stripplots and inferior performance of boxplots for identifying the range, and the superior performance of charts with explicit encodings for identifying the median. CLUSTERS, our analysis shows a significantly higher percentage of invalid clicks for boxplots across all levels o… view at source ↗

**Figure 7.** Figure 7: Confidence ratings (a) across all tasks and (b) for the MEDIAN task. Overall, participants were most confident with histograms, but the explicit encoding of the median leads to a higher confidence in boxplots and violinplots. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

read the original abstract

A range of charts with different strengths and weaknesses exists to support the visual analysis of univariate distributions, with a limited understanding of which charts best support which tasks and users, and how practitioners use charts. We categorize the available charts for univariate distributions into four groups and present the results of a mixed-methods comparison (n=215) of participants' perception and preferences across boxplots, violinplots, jittered stripplots, and histograms as representatives of their respective categories. The click-to-select approach in our study, combined with data on participants' subjective experiences and preferences, allows to both measure accuracy on benchmark tasks and discuss participants' choices qualitatively. Our analysis reveals differences between charts in task accuracy, common misunderstandings, and preferences across various low-level tasks, and indicates that chart preference and familiarity do not necessarily align with participants' task performance. Interviews with five visualization practitioners further reveal that charts widely preferred by general audiences (such as histograms) or commonly used in scientific domains (such as boxplots) are not inherently the most effective for all tasks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper supplies new comparative numbers on task accuracy and preference mismatches for boxplots, violinplots, stripplots, and histograms, but the narrow benchmark tasks limit how much the results speak to real analysis.

read the letter

The main contribution is a mixed-methods user study with 215 participants that directly compares four univariate distribution charts on low-level tasks. They used click-to-select to measure accuracy for things like locating medians or outliers, collected preference and familiarity ratings, and added interviews with five practitioners. The findings show chart-specific differences in performance and errors, plus a clear gap between what people prefer or recognize and how well they actually do the tasks. The practitioner comments reinforce that common choices like histograms or boxplots are not always the strongest for every job. This kind of side-by-side empirical data is still fairly rare in visualization work on distribution charts, so it fills a practical gap. The design is straightforward and the sample size is large enough to spot patterns across the four categories. The qualitative layer helps make sense of the accuracy scores rather than leaving them as raw numbers. The soft spots are mostly around scope. The tasks stay at isolated benchmark operations, which may not match how people actually scan or compare distributions in messy, real datasets or during longer exploration sessions. The participant group is general rather than domain experts, and five interviews give only a thin view of current practice. Without the full methods and stats in front of me it is hard to judge effect sizes or variability, but the abstract suggests the differences exist even if they are modest. This work is aimed at visualization researchers and anyone who teaches or recommends charts for data exploration. A reader running user studies or building guidelines will find the comparison useful to cite or extend. It deserves peer review because the core setup is independent and the topic has clear applied value, even though reviewers will probably ask for tighter claims about generalizability and more expert input.

Referee Report

2 major / 1 minor

Summary. The manuscript reports on a mixed-methods empirical study (n=215 participants) comparing four representative univariate distribution charts—boxplots, violin plots, jittered strip plots, and histograms—using click-to-select tasks for low-level analysis operations. It finds variations in accuracy, common errors, and user preferences, notes misalignment between preference/familiarity and performance, and based on five practitioner interviews concludes that widely used charts are not inherently optimal for all tasks.

Significance. If the results are robust, the work offers practical guidance for visualization design by identifying performance differences across chart types and highlighting the value of empirical evaluation over reliance on convention or preference. The mixed-methods approach, combining quantitative accuracy data with qualitative insights, is a notable strength, as is the relatively large participant sample for the user study component.

major comments (2)

The central claims about differences in effectiveness and the conclusion that common charts (histograms, boxplots) are not inherently optimal rest on the assumption that the chosen low-level benchmark tasks and click-to-select measurement are representative of real-world univariate distribution analysis. The Methods section provides no explicit justification or validation for how these isolated tasks map to integrated workflows or domain-expert use cases, and the general participant sample (rather than experts) limits extrapolation; this is load-bearing for the practitioner-interview claims.
Abstract and Results sections: the description of the study design and sample size is given, but detailed statistical results (including error bars, exact p-values or effect sizes for accuracy differences, and exclusion criteria) are not reported, preventing full verification of the claimed differences in task accuracy and misunderstandings.

minor comments (1)

The four-group categorization of univariate distribution charts in the Introduction could be strengthened by explicit references to prior visualization taxonomies.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive feedback. We value the recognition of the mixed-methods design and sample size as strengths. We address each major comment below, with planned revisions to improve clarity and verifiability.

read point-by-point responses

Referee: The central claims about differences in effectiveness and the conclusion that common charts (histograms, boxplots) are not inherently optimal rest on the assumption that the chosen low-level benchmark tasks and click-to-select measurement are representative of real-world univariate distribution analysis. The Methods section provides no explicit justification or validation for how these isolated tasks map to integrated workflows or domain-expert use cases, and the general participant sample (rather than experts) limits extrapolation; this is load-bearing for the practitioner-interview claims.

Authors: We agree that an explicit justification for the task selection would strengthen the paper. The click-to-select tasks were adapted from established low-level analysis operations in the visualization literature (e.g., identifying extrema, spread, outliers, and shape features). We will add a subsection in Methods that cites relevant task taxonomies and explains their mapping to common univariate analysis workflows. The general participant sample was intentional to assess broad perceptual performance rather than domain expertise; we will expand the Limitations and Discussion sections to address extrapolation to experts and clarify that the five practitioner interviews provide complementary qualitative context on real-world usage rather than direct validation of the quantitative findings. revision: partial
Referee: Abstract and Results sections: the description of the study design and sample size is given, but detailed statistical results (including error bars, exact p-values or effect sizes for accuracy differences, and exclusion criteria) are not reported, preventing full verification of the claimed differences in task accuracy and misunderstandings.

Authors: We acknowledge that fuller statistical reporting is required for verification. In the revision we will add exact p-values, effect sizes (e.g., Cramér’s V), 95% confidence intervals or error bars to all accuracy figures, and a transparent account of exclusion criteria and data-cleaning steps in both Methods and Results. The Abstract will be updated to reference the key statistical outcomes within length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical user study

full rationale

The paper reports results from a mixed-methods empirical study with 215 participants completing click-to-select benchmark tasks on four chart types plus qualitative interviews with five practitioners. All central claims (task accuracy differences, common misunderstandings, preference-performance misalignment, and practitioner insights) are grounded in this independently collected data rather than any derivation, fitted parameter, or self-citation chain. No equations, ansatzes, or uniqueness theorems appear; the work contains no load-bearing self-referential steps that reduce outputs to inputs by construction. This matches the default case of an empirical paper whose findings are falsifiable against external replication.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work is an empirical user study and therefore rests on standard domain assumptions from human-computer interaction research rather than new mathematical constructs or fitted parameters.

axioms (2)

domain assumption Accuracy on click-to-select benchmark tasks serves as a valid proxy for chart effectiveness in univariate distribution analysis
Invoked to justify the measurement of performance differences across chart types.
domain assumption The participant pool and task set generalize to broader visualization practice
Required for the claims about real-world implications and practitioner interviews.

pith-pipeline@v0.9.0 · 5482 in / 1330 out tokens · 58530 ms · 2026-05-10T16:58:20.124756+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

[AES05] R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. InIEEE Symposium on Information Visualization, 2005., pages 111–117,

work page 2005
[2]

Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making

[FWM+18] Michael Fernandes, Logan Walls, Sean Munson, Jessica Hullman, and Matthew Kay. Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12, Montreal QC Canada,

work page 2018
[3]

AccuStripes: Visual exploration and comparison of univariate data distributions using color and binning.Computers & Graphics, 119:103906,

14 APREPRINT- APRIL10, 2026 [HGW+24] Anja Heim, Alexander Gall, Manuela Waldner, Eduard Gröller, and Christoph Heinzl. AccuStripes: Visual exploration and comparison of univariate data distributions using color and binning.Computers & Graphics, 119:103906,

work page 2026
[4]

Hullman, and Sean A

[KKHM16] Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. When (ish) is My Bus?: User- centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 5092–5103. ACM,

work page 2016
[5]

RidgeBuilder: Interactive Authoring of Expressive Ridgeline Plots

[LLL+25] Shuhan Liu, Yangtian Liu, Junxin Li, Yanwei Huang, Yue Shangguan, Zikun Deng, Di Weng, and Yingcai Wu. RidgeBuilder: Interactive Authoring of Expressive Ridgeline Plots. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–18. ACM,

work page 2025
[6]

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

[MF17] Justin Matejka and George Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1290–1294, Denver Colorado USA,

work page 2017
[7]

Molina, L

15 APREPRINT- APRIL10, 2026 [MVV22] E. Molina, L. Viale, and P. Vazquez. How should we design violin plots? pages 1–7, Oklahoma City, OK, USA,

work page 2026
[8]

[R C21] R Core Team.R: A Language and Environment for Statistical Computing

Accessed: 2025-11-23. [R C21] R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria,

work page 2025
[9]

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations

[Shn96] Ben Shneiderman. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Proceedings of the 1996 IEEE Symposium on Visual Languages, page 336,

work page 1996
[10]

Statistics and Computing

16 APREPRINT- APRIL10, 2026 [Wil05] Leland Wilkinson.The grammar of graphics. Statistics and Computing. Springer, New York, NY , second edition edition,

work page 2026
[11]

How to Visualize and Compare Distributions in R

[Yau12] Nathan Yau. How to Visualize and Compare Distributions in R. https://flowingdata.com/2012/ 05/15/how-to-visualize-and-compare-distributions/,

work page 2012
[12]

Accessed: 2025-11-12. 17

work page 2025

[1] [1]

[AES05] R. Amar, J. Eagan, and J. Stasko. Low-level components of analytic activity in information visualization. InIEEE Symposium on Information Visualization, 2005., pages 111–117,

work page 2005

[2] [2]

Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making

[FWM+18] Michael Fernandes, Logan Walls, Sean Munson, Jessica Hullman, and Matthew Kay. Uncertainty Displays Using Quantile Dotplots or CDFs Improve Transit Decision-Making. InProceedings of the 2018 CHI Conference on Human Factors in Computing Systems, pages 1–12, Montreal QC Canada,

work page 2018

[3] [3]

AccuStripes: Visual exploration and comparison of univariate data distributions using color and binning.Computers & Graphics, 119:103906,

14 APREPRINT- APRIL10, 2026 [HGW+24] Anja Heim, Alexander Gall, Manuela Waldner, Eduard Gröller, and Christoph Heinzl. AccuStripes: Visual exploration and comparison of univariate data distributions using color and binning.Computers & Graphics, 119:103906,

work page 2026

[4] [4]

Hullman, and Sean A

[KKHM16] Matthew Kay, Tara Kola, Jessica R. Hullman, and Sean A. Munson. When (ish) is My Bus?: User- centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems. InProceedings of the 2016 CHI Conference on Human Factors in Computing Systems, pages 5092–5103. ACM,

work page 2016

[5] [5]

RidgeBuilder: Interactive Authoring of Expressive Ridgeline Plots

[LLL+25] Shuhan Liu, Yangtian Liu, Junxin Li, Yanwei Huang, Yue Shangguan, Zikun Deng, Di Weng, and Yingcai Wu. RidgeBuilder: Interactive Authoring of Expressive Ridgeline Plots. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1–18. ACM,

work page 2025

[6] [6]

Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing

[MF17] Justin Matejka and George Fitzmaurice. Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. InProceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 1290–1294, Denver Colorado USA,

work page 2017

[7] [7]

Molina, L

15 APREPRINT- APRIL10, 2026 [MVV22] E. Molina, L. Viale, and P. Vazquez. How should we design violin plots? pages 1–7, Oklahoma City, OK, USA,

work page 2026

[8] [8]

[R C21] R Core Team.R: A Language and Environment for Statistical Computing

Accessed: 2025-11-23. [R C21] R Core Team.R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria,

work page 2025

[9] [9]

The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations

[Shn96] Ben Shneiderman. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualizations. Proceedings of the 1996 IEEE Symposium on Visual Languages, page 336,

work page 1996

[10] [10]

Statistics and Computing

16 APREPRINT- APRIL10, 2026 [Wil05] Leland Wilkinson.The grammar of graphics. Statistics and Computing. Springer, New York, NY , second edition edition,

work page 2026

[11] [11]

How to Visualize and Compare Distributions in R

[Yau12] Nathan Yau. How to Visualize and Compare Distributions in R. https://flowingdata.com/2012/ 05/15/how-to-visualize-and-compare-distributions/,

work page 2012

[12] [12]

Accessed: 2025-11-12. 17

work page 2025