pith. sign in

arxiv: 2601.20805 · v4 · submitted 2026-01-28 · 📊 stat.ME · astro-ph.IM· physics.data-an

Plotting correlated data

Pith reviewed 2026-05-16 10:20 UTC · model grok-4.3

classification 📊 stat.ME astro-ph.IMphysics.data-an
keywords correlated uncertaintiesdata visualizationerror barsprincipal componentscovariance matrixmodel fittingconditional uncertainty
0
0 comments X

The pith

When data uncertainties correlate, vertical error bars alone do not show whether a model line fits the points.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard error bars, which display only the square roots of the diagonal elements of a covariance matrix, leave insufficient information for judging model agreement whenever off-diagonal correlations are non-negligible. It demonstrates that the usual rule of thumb, that a model fits if it passes through roughly two-thirds of the bars, no longer applies. The proposed remedy is to add explicit display of the leading principal component of the uncertainties together with the conditional uncertainties on each point. A sympathetic reader cares because many real measurements, from time-series data to spectra, carry shared systematic errors that change how one should read a plot. Without this extra information, visual assessment of model quality can be misleading.

Core claim

If the error bars only show the square root of diagonal elements of some covariance matrix with non-negligible off-diagonal elements, we simply do not have enough information in the plot to judge whether a drawn model line agrees well with the data or not. The paper demonstrates this by showing the contribution of the first principal component of the uncertainties and by displaying the conditional uncertainties of all data points.

What carries the argument

The leading principal component of the uncertainty covariance matrix, which encodes the dominant shared variation across points, together with the conditional uncertainties obtained after removing that component's contribution.

If this is right

  • Model evaluation must incorporate the full covariance rather than treating points as independent when drawing agreement conclusions from a plot.
  • Conditional uncertainties reveal which residuals are independent of the dominant shared error, allowing targeted diagnosis of model deficiencies.
  • Plots that include the first principal component direction make visible the range over which the data can collectively shift without violating the reported uncertainties.
  • Software that renders data with known covariances should offer the option to overlay these derived quantities by default.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same display technique could be extended to two-dimensional plots by showing the leading eigenvectors of the joint covariance.
  • When full covariances are unavailable but partial information exists, approximate conditional bands might still improve judgment over raw diagonal bars.
  • Routine inclusion of these elements would change how experimental papers present results in fields that routinely share calibration or background errors.

Load-bearing premise

The full covariance matrix of the uncertainties is known and available for computing principal components and conditional uncertainties.

What would settle it

A concrete data set whose covariance matrix is known, plotted once with only diagonal error bars and once with the principal-component and conditional-uncertainty overlays, where the two versions lead to opposite conclusions about whether a given model line fits the points.

read the original abstract

A very common task in data visualization is to plot many data points with some measured y-value as a function of fixed x-values. Uncertainties on the y-values are typically presented as vertical error bars that represent either a Frequentist confidence interval or Bayesian credible interval for each data point. Most of the time, these error bars represent a 68\% confidence/credibility level, which leads to the intuition that a model fits the data reasonably well if its prediction lies within the error bars of roughly two thirds of the data points. Unfortunately, this and other intuitions no longer work when the uncertainties of the data points are correlated. If the error bars only show the square root of diagonal elements of some covariance matrix with non-negligible off-diagonal elements, we simply do not have enough information in the plot to judge whether a drawn model line agrees well with the data or not. In this paper we will demonstrate this problem and discuss ways to add more information to the plots to make it easier to judge the agreement between the data and some model prediction in the plot, as well as glean some insight where the model might be deficient. This is done by explicitly showing the contribution of the first principal component of the uncertainties, and by displaying the conditional uncertainties of all data points.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper demonstrates that standard vertical error bars (square roots of the diagonal elements of a covariance matrix) are insufficient to judge model-data agreement when uncertainties have non-negligible off-diagonal correlations. It proposes two remedies: displaying the leading principal component of the covariance and showing conditional uncertainties for each data point given a model prediction, both derived from the full covariance matrix under a multivariate Gaussian assumption.

Significance. If the visualizations are shown to be effective in examples, the work addresses a genuine and common pitfall in statistical plotting. The remedies follow directly from standard linear-algebra operations on the covariance matrix (principal components and Schur complement for conditionals) without introducing new assumptions beyond the premise that the full covariance is known to the analyst. This could improve interpretability in fields that routinely plot correlated measurements.

major comments (1)
  1. [Methods / conditional uncertainties] The central claim rests on the premise that the full covariance matrix C is known and available; the manuscript should explicitly flag this as a prerequisite (e.g., in the section introducing the conditional-uncertainty display) and discuss what happens when only marginal variances are supplied by an experiment.
minor comments (2)
  1. [Abstract] The abstract states the problem clearly but the manuscript would benefit from a short numerical example (e.g., a 5-point data vector with a non-diagonal C) showing the difference between marginal and conditional error bars.
  2. [Visualization proposal] Notation for the first principal component contribution should be defined once and used consistently when describing the added visual element.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the constructive suggestion regarding the prerequisite of a known full covariance matrix. We agree that this assumption should be stated explicitly and that the limitations when only marginal variances are available merit discussion. The revision will incorporate these clarifications without altering the core technical content.

read point-by-point responses
  1. Referee: [Methods / conditional uncertainties] The central claim rests on the premise that the full covariance matrix C is known and available; the manuscript should explicitly flag this as a prerequisite (e.g., in the section introducing the conditional-uncertainty display) and discuss what happens when only marginal variances are supplied by an experiment.

    Authors: We agree with this observation. The conditional-uncertainty visualization relies on the Schur complement of the full covariance matrix C, which presupposes that C is known in its entirety. In the revised manuscript we will insert an explicit statement at the start of the conditional-uncertainties section declaring that the full covariance matrix must be supplied. We will also add a short paragraph noting that, when only the diagonal marginal variances are provided, the conditional uncertainties cannot be evaluated and the analyst must revert to conventional error bars, with the attendant loss of information about correlations. This addition will be placed immediately after the description of the conditional display and will not require any change to the principal-component visualization, which likewise assumes the full matrix. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper is a methodological proposal for improving visualization of correlated data points. It demonstrates the insufficiency of marginal error bars when off-diagonal covariance terms are present and proposes displaying the leading principal component of the covariance matrix plus conditional uncertainties. No equations, derivations, fitted parameters, or predictions are present that reduce to the paper's own inputs by construction. The central claim follows directly from the standard definition of the multivariate Gaussian likelihood and introduces no self-citation chains, ansatzes, or renamings of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a methodological visualization paper with no free parameters fitted to data, no additional axioms beyond standard statistical concepts like covariance matrices, and no invented entities.

pith-pipeline@v0.9.0 · 5513 in / 1034 out tokens · 34469 ms · 2026-05-16T10:20:47.673263+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.