pith. sign in

arxiv: 1907.09637 · v1 · pith:WUEUR3X4new · submitted 2019-07-23 · 📊 stat.AP

An Investigation into Outlier Elimination and Calculation Methods in the Determination of Reference Intervals using Serum Immunoglobulin A as a Model Data Collection

Pith reviewed 2026-05-24 17:23 UTC · model grok-4.3

classification 📊 stat.AP
keywords reference intervalsoutlier eliminationTukey methodimmunoglobulin Aparametric methodsnon-parametric methodsrobust methodsDixon Reed method
0
0 comments X

The pith

Outlier elimination determines reference intervals more than calculation method for IgA data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how different outlier removal techniques and calculation approaches affect reference intervals derived from more than 32,000 serum IgA measurements. It shows that the choice between Tukey and block outlier elimination has a larger impact on the resulting intervals than whether parametric, non-parametric, or robust methods are used to compute them. Tukey elimination removes substantially more values across age groups and produces more consistent intervals no matter which calculation method follows, while block elimination removes almost none. Non-parametric intervals shift more when outliers remain, especially widening for older patients, whereas robust and parametric results stay close. The work concludes that for datasets like this one, Tukey elimination should be favored and that robust methods offer little gain over simpler parametric ones.

Core claim

The outlier elimination method was significantly more determinative of the reference intervals than the calculation method. The Tukey elimination procedure consistently eliminated significantly more values than the block method of Dixon and Reed across all age ranges. If Tukey elimination was applied, variation between reference intervals produced by the different calculation methods was minimal. Block elimination rarely eliminated values. The non-parametric reference intervals were more sensitive to outliers, which in the IgA context, led to higher and wider reference intervals for the older age groups. There were only minimal differences between robust and parametric reference intervals.

What carries the argument

Tukey versus block (Dixon/Reed) outlier elimination applied before parametric, non-parametric, and robust calculations of reference intervals on a large IgA dataset.

If this is right

  • Tukey elimination should be preferred over the block D/R method for datasets similar to the one used in this study.
  • The robust method is not advantageous over the parametric method and therefore due to its complexity is not particularly useful.
  • Non-parametric reference intervals are more sensitive to outliers and produce higher and wider intervals for older age groups when outliers remain.
  • Previous literature has focused on the calculation technique and not discussed outlier elimination.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Laboratories handling large skewed biomarker datasets may achieve more stable reference intervals by standardizing on Tukey elimination first.
  • Guidelines that prioritize robust methods over parametric ones may need re-examination if the pattern holds for other common tests.
  • The relative importance of elimination versus calculation could be tested directly on other clinical analytes with known population reference data.

Load-bearing premise

The large IgA dataset is representative of the populations for which reference intervals are intended and the observed differences in intervals are clinically relevant rather than data-specific artifacts.

What would settle it

Repeating the analysis on a different laboratory analyte where block elimination removes more values than Tukey or where robust intervals differ substantially from parametric ones.

Figures

Figures reproduced from arXiv: 1907.09637 by Aidan Zellner, Alice M. Richardson, Brett A. Lidbury, Peter Hobson, Tony Badrick.

Figure 1
Figure 1. Figure 1: As age increases the mean and spread of the [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2 [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Background: Reference intervals are essential to interpret diagnostic tests, but their determination has become controversial. Methods: In this paper parametric, non-parametric and robust reference intervals with Tukey and block elimination are calculated from a dataset of over 32,000 serum immunoglobulin A (IgA) measurements. Results: The outlier elimination method was significantly more determinative of the reference intervals than the calculation method. The Tukey elimination procedure consistently eliminated significantly more values than the block method of Dixon and Reed across all age ranges. If Tukey elimination was applied, variation between reference intervals produced by the different calculation methods was minimal. Block elimination rarely eliminated values. The non-parametric reference intervals were more sensitive to outliers, which in the IgA context, led to higher and wider reference intervals for the older age groups. There were only minimal differences between robust and parametric reference intervals. Conclusions: This suggests that Tukey elimination should be preferred over the block D/R method for datasets similar to the one used in this study. These are predominantly new observations, as previous literature has focused on the calculation technique and not discussed outlier elimination. This suggests the robust method is not advantageous over the parametric method and therefore due to its complexity is not particularly useful, contrary to CLSI Guidelines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper compares parametric, non-parametric, and robust methods for calculating reference intervals on a dataset of >32,000 serum IgA measurements, combined with two outlier elimination procedures (Tukey versus block/Dixon-Reed). It reports that the choice of elimination method exerts a larger effect on the resulting intervals than the choice of calculation method, that Tukey removes substantially more points than block elimination, that non-parametric intervals are most sensitive to retained outliers, and that robust and parametric intervals differ only minimally; the authors therefore recommend Tukey elimination and question the added value of the robust approach.

Significance. If the central empirical finding holds, the work would usefully redirect attention from calculation technique to outlier handling in reference-interval construction and would supply evidence against routine use of the robust method. The large sample size (>32,000 observations) is a clear strength that supports stable comparisons across age bands and methods.

major comments (3)
  1. [Results] Results section: the claim that 'the outlier elimination method was significantly more determinative of the reference intervals than the calculation method' is presented only descriptively (Tukey removes far more points; block removes almost none; non-parametric intervals widen when outliers remain). No quantitative metric—such as the range or standard deviation of the six interval endpoints per age band, or a formal decomposition of variance attributable to each factor—is supplied, so the magnitude of 'significantly more' cannot be verified.
  2. [Results] Results and Conclusions: the repeated use of 'significantly' to describe differences between elimination and calculation methods is not accompanied by any statistical test, p-value, or confidence interval on the relative impacts, leaving the comparative claim without formal support.
  3. [Methods] Methods: the precise implementation of the block elimination procedure (Dixon and Reed) and the rules used to define age bands are not stated in sufficient detail to allow independent reproduction of the finding that block elimination 'rarely eliminated values.'
minor comments (1)
  1. A table listing the actual lower and upper reference limits for each of the six method combinations and each age band would allow readers to judge the clinical magnitude of the reported differences.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment below, indicating revisions where appropriate to improve clarity and reproducibility.

read point-by-point responses
  1. Referee: [Results] Results section: the claim that 'the outlier elimination method was significantly more determinative of the reference intervals than the calculation method' is presented only descriptively (Tukey removes far more points; block removes almost none; non-parametric intervals widen when outliers remain). No quantitative metric—such as the range or standard deviation of the six interval endpoints per age band, or a formal decomposition of variance attributable to each factor—is supplied, so the magnitude of 'significantly more' cannot be verified.

    Authors: We agree that the comparative claim would be strengthened by a quantitative metric. In the revised manuscript we will add a table (or supplementary table) reporting, for each age band, the range and standard deviation of the six reference-interval endpoints obtained from the different method combinations. This will allow readers to assess the relative magnitude of the effects directly from the data. revision: yes

  2. Referee: [Results] Results and Conclusions: the repeated use of 'significantly' to describe differences between elimination and calculation methods is not accompanied by any statistical test, p-value, or confidence interval on the relative impacts, leaving the comparative claim without formal support.

    Authors: We acknowledge that 'significantly' was employed in a descriptive rather than inferential sense. To prevent any misinterpretation, we will replace the word 'significantly' with 'substantially' (or equivalent phrasing such as 'to a greater extent') in the results and conclusions sections of the revised manuscript. revision: yes

  3. Referee: [Methods] Methods: the precise implementation of the block elimination procedure (Dixon and Reed) and the rules used to define age bands are not stated in sufficient detail to allow independent reproduction of the finding that block elimination 'rarely eliminated values.'

    Authors: We will expand the Methods section to include the exact algorithmic steps and any iterative thresholds applied for the Dixon-Reed block elimination procedure, as well as the precise numerical boundaries and any additional grouping criteria used to define the age bands. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison on external data

full rationale

The paper conducts an empirical study applying Tukey and block outlier elimination methods alongside parametric, non-parametric, and robust calculation methods to an external dataset of over 32,000 IgA measurements. Resulting reference intervals are compared descriptively across age groups. No mathematical derivations, equations, or first-principles predictions are present that could reduce to self-definitional inputs, fitted parameters renamed as predictions, or self-citation chains. The central observation (outlier elimination being more determinative) is a direct report of computed differences on independent data, with no load-bearing self-citations or ansatzes smuggled in. This matches the default case of a self-contained empirical analysis with no circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper applies existing statistical procedures for reference interval estimation without introducing new free parameters, axioms beyond standard statistical assumptions, or invented entities.

axioms (1)
  • domain assumption Standard assumptions underlying parametric reference interval methods (e.g., approximate normality or log-normality after transformation)
    Invoked when applying the parametric calculation method to the IgA data.

pith-pipeline@v0.9.0 · 5771 in / 1214 out tokens · 27323 ms · 2026-05-24T17:23:37.147692+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Reference interval estimation: Methodological comparison using extensive simulations and empirical data

    Daly CH, Higgins V, Adeli K, Grey VL, Hamid JS. Reference interval estimation: Methodological comparison using extensive simulations and empirical data. Clin Biochem 2017; 50: 502—505

  2. [2]

    Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline: Third Edition

    Horowitz GL, Altaie S, Bodye JC, Ceriotti F, Garg U, Horn P, Pesce A, Sine HE, Zakowski J. Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline: Third Edition. CLSI document EP28—A3c. Wayne, PA: Clinical and Laboratory Standards Institute; 2010

  3. [3]

    Processing data for outliers

    Dixon WJ. Processing data for outliers. Biometrics 1953; 9: 74-89

  4. [4]

    Influence of statistical method used on the resulting estimate of normal range

    Reed AH, Henry RJ, Mason WB. Influence of statistical method used on the resulting estimate of normal range. Clin Chem 1971; 17: 275 – 284

  5. [5]

    Exploratory Data Analysis

    Tukey JW. Exploratory Data Analysis. Reading, MA: Addison-Wesley; 1977

  6. [6]

    Approved recommendation (1987) on the theory of reference values

    Solberg HE. Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits. Clin Chim Acta 1987; 170: S13—S32

  7. [7]

    A biweight prediction interval for random samples

    Horn PS. A biweight prediction interval for random samples. J Am Stat Assoc 1988; 83: 249 – 256

  8. [8]

    Use of total patient data for indirect estimation of reference intervals for 40 clinical chemical analytes in Turkey

    Ilcol YO, Aslan D. Use of total patient data for indirect estimation of reference intervals for 40 clinical chemical analytes in Turkey. Clin Chem Lab Med 2006; 44: 867–76

  9. [9]

    Use of routine clinical laboratory data to define reference in tervals

    Shine B. Use of routine clinical laboratory data to define reference in tervals. Ann Clin Biochem. 2008; 45: 467–750

  10. [10]

    Application of the Stockholm Hierarchy to Defining the Quality of Reference Intervals and Clinical Decision Limits

    Sikaris K. Application of the Stockholm Hierarchy to Defining the Quality of Reference Intervals and Clinical Decision Limits. Clin Biochem Revs 2012; 33(4): 141-148

  11. [11]

    Fleming, Dajie Luo, Arren H

    Alexander Katayev, James K. Fleming, Dajie Luo, Arren H . Fisher, Thomas M. Sharp; Reference Intervals Data Mining: No Longer a Probability Paper Method, Amer J Clin Path 2015; 143(1): 134–142

  12. [12]

    Arzideh F, Wosniok W, Haeckel R. Reference limits of plasma and serum creatinine concentrations from intra-laboratory data bases of several German and Italian medical centres: 15 Comparison between direct and indirect procedures. Clin Chim Acta 2010; 411: 215–21

  13. [13]

    Statistical considerations for determining high- sensitivity cardiac troponin reference intervals

    Hickman PE, Koerbin G, Potter JM, Abhayaratna WP. Statistical considerations for determining high- sensitivity cardiac troponin reference intervals. Clin Biochem 2017; 50: 502—505

  14. [14]

    Pediatric reference intervals for 28 chemistries and immunoassays on the Roche cobas ® 6000 analyzer – a CALIPER pilot study

    Kulasingam V, Jung BP, Blasutig IM, Baradaran S, Chan MK, Aytekin M, Colantonio DA, Adeli K. Pediatric reference intervals for 28 chemistries and immunoassays on the Roche cobas ® 6000 analyzer – a CALIPER pilot study. Clin Biochem 2010; 43: 1045 —1050

  15. [15]

    Copeland

    Horn PS, Pesce AJ, Bradley E. Copeland. A robust approach to reference interval estimation and evaluation. Clin Biochem 1998; 33(3): 622—631

  16. [16]

    Detection of outliers in refer ence distributions; performance of Horn’s algorithm

    Solberg HE, Laht A. Detection of outliers in refer ence distributions; performance of Horn’s algorithm. Clin Chem 2005; 51(12); 2326 – 2332

  17. [17]

    Reference intervals: an update

    Horn PS, Pesce AJ. Reference intervals: an update. Clin Chim Acta 2003; 334: 5 – 23

  18. [18]

    Determination of reference limits: statistical concepts and tools for sample size calculation

    Wellek S, Lackner KJ, Jennen- Steinmetz C, Reinhard I, Hoffman I, Blettner M. Determination of reference limits: statistical concepts and tools for sample size calculation. Clin Chem Lab Med 2014; 52(12): 1685 – 1694

  19. [19]

    Quantitative serum immunoglobulin tests

    Loh RKS, Vale S, McLean -Tooke A. Quantitative serum immunoglobulin tests. A ust Fam Physician 2013; 42: 195 - 198

  20. [20]

    Serum levels of immunoglobulins (IgG, IgA, IgM) in a general adult population and their relationship with alcohol consumption, smoking and common metabolic abnormalities

    A Gonzalez-Quintela A, Alende R, Gude F, Campos J, Rey J, Meijide LM, Fernandez- Merino C, Vidal C. Serum levels of immunoglobulins (IgG, IgA, IgM) in a general adult population and their relationship with alcohol consumption, smoking and common metabolic abnormalities. Clin Exp Immunol 2007; 151: 42 —50

  21. [21]

    The applied statistical approach highly influences the 99th percentile of cardiac troponin I Clin Biochem 2016; 49: 1109 —1112

    Eggers KM, Apple FS, Lind L, Lindahl B. The applied statistical approach highly influences the 99th percentile of cardiac troponin I Clin Biochem 2016; 49: 1109 —1112

  22. [22]

    An analysis of transformations

    Box G, Cox D. An analysis of transformations. J R Stat Soc Series B Stat Methodol 1964; 26: 211-252

  23. [23]

    A biweight approach to the one-sample problem

    Kafadar K. A biweight approach to the one-sample problem. J Amer Statist Assoc 1982; 77: 416 – 424

  24. [24]

    R: A language and environment for statistical computing

    R Core Team . R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2013

  25. [25]

    geoR: A package for geostatistical analysis

    Ribeiro JR, Diggle PJ. geoR: A package for geostatistical analysis. R-NEWS 2001: 1. 16

  26. [26]

    Statistical response to issues with the determination of the troponin 99th percentile

    Richardson A, Neeman T, Yoon H -J, Haslett S. Statistical response to issues with the determination of the troponin 99th percentile. Clin Biochem 2017; 53: 412 – 414

  27. [27]

    Indirect methods for reference interval determination – review and recommendations

    Jones GRD, Haeckel R, Loh TP, Sikaris K, Streichert T, Katayev A, Barth JH, Ozarda Y. Indirect methods for reference interval determination – review and recommendations. Clin Chem Lab Med 2018; e0073

  28. [28]

    Distribution based outlier detection in univariate data

    van der Loo MPJ. Distribution based outlier detection in univariate data. Discussion paper 10003, Statistics Netherlands, The Hague, 2010. 17 Figures Figure 1. The Tukey biweight function. 18 Figure 2: Box plots of the values of IgA used to calculate the reference intervals, plotted in one year age groups. Red box plots are used for the female datasets an...