An Investigation into Outlier Elimination and Calculation Methods in the Determination of Reference Intervals using Serum Immunoglobulin A as a Model Data Collection
Pith reviewed 2026-05-24 17:23 UTC · model grok-4.3
The pith
Outlier elimination determines reference intervals more than calculation method for IgA data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The outlier elimination method was significantly more determinative of the reference intervals than the calculation method. The Tukey elimination procedure consistently eliminated significantly more values than the block method of Dixon and Reed across all age ranges. If Tukey elimination was applied, variation between reference intervals produced by the different calculation methods was minimal. Block elimination rarely eliminated values. The non-parametric reference intervals were more sensitive to outliers, which in the IgA context, led to higher and wider reference intervals for the older age groups. There were only minimal differences between robust and parametric reference intervals.
What carries the argument
Tukey versus block (Dixon/Reed) outlier elimination applied before parametric, non-parametric, and robust calculations of reference intervals on a large IgA dataset.
If this is right
- Tukey elimination should be preferred over the block D/R method for datasets similar to the one used in this study.
- The robust method is not advantageous over the parametric method and therefore due to its complexity is not particularly useful.
- Non-parametric reference intervals are more sensitive to outliers and produce higher and wider intervals for older age groups when outliers remain.
- Previous literature has focused on the calculation technique and not discussed outlier elimination.
Where Pith is reading between the lines
- Laboratories handling large skewed biomarker datasets may achieve more stable reference intervals by standardizing on Tukey elimination first.
- Guidelines that prioritize robust methods over parametric ones may need re-examination if the pattern holds for other common tests.
- The relative importance of elimination versus calculation could be tested directly on other clinical analytes with known population reference data.
Load-bearing premise
The large IgA dataset is representative of the populations for which reference intervals are intended and the observed differences in intervals are clinically relevant rather than data-specific artifacts.
What would settle it
Repeating the analysis on a different laboratory analyte where block elimination removes more values than Tukey or where robust intervals differ substantially from parametric ones.
Figures
read the original abstract
Background: Reference intervals are essential to interpret diagnostic tests, but their determination has become controversial. Methods: In this paper parametric, non-parametric and robust reference intervals with Tukey and block elimination are calculated from a dataset of over 32,000 serum immunoglobulin A (IgA) measurements. Results: The outlier elimination method was significantly more determinative of the reference intervals than the calculation method. The Tukey elimination procedure consistently eliminated significantly more values than the block method of Dixon and Reed across all age ranges. If Tukey elimination was applied, variation between reference intervals produced by the different calculation methods was minimal. Block elimination rarely eliminated values. The non-parametric reference intervals were more sensitive to outliers, which in the IgA context, led to higher and wider reference intervals for the older age groups. There were only minimal differences between robust and parametric reference intervals. Conclusions: This suggests that Tukey elimination should be preferred over the block D/R method for datasets similar to the one used in this study. These are predominantly new observations, as previous literature has focused on the calculation technique and not discussed outlier elimination. This suggests the robust method is not advantageous over the parametric method and therefore due to its complexity is not particularly useful, contrary to CLSI Guidelines.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper compares parametric, non-parametric, and robust methods for calculating reference intervals on a dataset of >32,000 serum IgA measurements, combined with two outlier elimination procedures (Tukey versus block/Dixon-Reed). It reports that the choice of elimination method exerts a larger effect on the resulting intervals than the choice of calculation method, that Tukey removes substantially more points than block elimination, that non-parametric intervals are most sensitive to retained outliers, and that robust and parametric intervals differ only minimally; the authors therefore recommend Tukey elimination and question the added value of the robust approach.
Significance. If the central empirical finding holds, the work would usefully redirect attention from calculation technique to outlier handling in reference-interval construction and would supply evidence against routine use of the robust method. The large sample size (>32,000 observations) is a clear strength that supports stable comparisons across age bands and methods.
major comments (3)
- [Results] Results section: the claim that 'the outlier elimination method was significantly more determinative of the reference intervals than the calculation method' is presented only descriptively (Tukey removes far more points; block removes almost none; non-parametric intervals widen when outliers remain). No quantitative metric—such as the range or standard deviation of the six interval endpoints per age band, or a formal decomposition of variance attributable to each factor—is supplied, so the magnitude of 'significantly more' cannot be verified.
- [Results] Results and Conclusions: the repeated use of 'significantly' to describe differences between elimination and calculation methods is not accompanied by any statistical test, p-value, or confidence interval on the relative impacts, leaving the comparative claim without formal support.
- [Methods] Methods: the precise implementation of the block elimination procedure (Dixon and Reed) and the rules used to define age bands are not stated in sufficient detail to allow independent reproduction of the finding that block elimination 'rarely eliminated values.'
minor comments (1)
- A table listing the actual lower and upper reference limits for each of the six method combinations and each age band would allow readers to judge the clinical magnitude of the reported differences.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address each major comment below, indicating revisions where appropriate to improve clarity and reproducibility.
read point-by-point responses
-
Referee: [Results] Results section: the claim that 'the outlier elimination method was significantly more determinative of the reference intervals than the calculation method' is presented only descriptively (Tukey removes far more points; block removes almost none; non-parametric intervals widen when outliers remain). No quantitative metric—such as the range or standard deviation of the six interval endpoints per age band, or a formal decomposition of variance attributable to each factor—is supplied, so the magnitude of 'significantly more' cannot be verified.
Authors: We agree that the comparative claim would be strengthened by a quantitative metric. In the revised manuscript we will add a table (or supplementary table) reporting, for each age band, the range and standard deviation of the six reference-interval endpoints obtained from the different method combinations. This will allow readers to assess the relative magnitude of the effects directly from the data. revision: yes
-
Referee: [Results] Results and Conclusions: the repeated use of 'significantly' to describe differences between elimination and calculation methods is not accompanied by any statistical test, p-value, or confidence interval on the relative impacts, leaving the comparative claim without formal support.
Authors: We acknowledge that 'significantly' was employed in a descriptive rather than inferential sense. To prevent any misinterpretation, we will replace the word 'significantly' with 'substantially' (or equivalent phrasing such as 'to a greater extent') in the results and conclusions sections of the revised manuscript. revision: yes
-
Referee: [Methods] Methods: the precise implementation of the block elimination procedure (Dixon and Reed) and the rules used to define age bands are not stated in sufficient detail to allow independent reproduction of the finding that block elimination 'rarely eliminated values.'
Authors: We will expand the Methods section to include the exact algorithmic steps and any iterative thresholds applied for the Dixon-Reed block elimination procedure, as well as the precise numerical boundaries and any additional grouping criteria used to define the age bands. revision: yes
Circularity Check
No circularity: empirical comparison on external data
full rationale
The paper conducts an empirical study applying Tukey and block outlier elimination methods alongside parametric, non-parametric, and robust calculation methods to an external dataset of over 32,000 IgA measurements. Resulting reference intervals are compared descriptively across age groups. No mathematical derivations, equations, or first-principles predictions are present that could reduce to self-definitional inputs, fitted parameters renamed as predictions, or self-citation chains. The central observation (outlier elimination being more determinative) is a direct report of computed differences on independent data, with no load-bearing self-citations or ansatzes smuggled in. This matches the default case of a self-contained empirical analysis with no circular steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions underlying parametric reference interval methods (e.g., approximate normality or log-normality after transformation)
Reference graph
Works this paper leans on
-
[1]
Daly CH, Higgins V, Adeli K, Grey VL, Hamid JS. Reference interval estimation: Methodological comparison using extensive simulations and empirical data. Clin Biochem 2017; 50: 502—505
work page 2017
-
[2]
Horowitz GL, Altaie S, Bodye JC, Ceriotti F, Garg U, Horn P, Pesce A, Sine HE, Zakowski J. Defining, Establishing, and Verifying Reference Intervals in the Clinical Laboratory; Approved Guideline: Third Edition. CLSI document EP28—A3c. Wayne, PA: Clinical and Laboratory Standards Institute; 2010
work page 2010
-
[3]
Dixon WJ. Processing data for outliers. Biometrics 1953; 9: 74-89
work page 1953
-
[4]
Influence of statistical method used on the resulting estimate of normal range
Reed AH, Henry RJ, Mason WB. Influence of statistical method used on the resulting estimate of normal range. Clin Chem 1971; 17: 275 – 284
work page 1971
-
[5]
Tukey JW. Exploratory Data Analysis. Reading, MA: Addison-Wesley; 1977
work page 1977
-
[6]
Approved recommendation (1987) on the theory of reference values
Solberg HE. Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits. Clin Chim Acta 1987; 170: S13—S32
work page 1987
-
[7]
A biweight prediction interval for random samples
Horn PS. A biweight prediction interval for random samples. J Am Stat Assoc 1988; 83: 249 – 256
work page 1988
-
[8]
Ilcol YO, Aslan D. Use of total patient data for indirect estimation of reference intervals for 40 clinical chemical analytes in Turkey. Clin Chem Lab Med 2006; 44: 867–76
work page 2006
-
[9]
Use of routine clinical laboratory data to define reference in tervals
Shine B. Use of routine clinical laboratory data to define reference in tervals. Ann Clin Biochem. 2008; 45: 467–750
work page 2008
-
[10]
Sikaris K. Application of the Stockholm Hierarchy to Defining the Quality of Reference Intervals and Clinical Decision Limits. Clin Biochem Revs 2012; 33(4): 141-148
work page 2012
-
[11]
Alexander Katayev, James K. Fleming, Dajie Luo, Arren H . Fisher, Thomas M. Sharp; Reference Intervals Data Mining: No Longer a Probability Paper Method, Amer J Clin Path 2015; 143(1): 134–142
work page 2015
-
[12]
Arzideh F, Wosniok W, Haeckel R. Reference limits of plasma and serum creatinine concentrations from intra-laboratory data bases of several German and Italian medical centres: 15 Comparison between direct and indirect procedures. Clin Chim Acta 2010; 411: 215–21
work page 2010
-
[13]
Statistical considerations for determining high- sensitivity cardiac troponin reference intervals
Hickman PE, Koerbin G, Potter JM, Abhayaratna WP. Statistical considerations for determining high- sensitivity cardiac troponin reference intervals. Clin Biochem 2017; 50: 502—505
work page 2017
-
[14]
Kulasingam V, Jung BP, Blasutig IM, Baradaran S, Chan MK, Aytekin M, Colantonio DA, Adeli K. Pediatric reference intervals for 28 chemistries and immunoassays on the Roche cobas ® 6000 analyzer – a CALIPER pilot study. Clin Biochem 2010; 43: 1045 —1050
work page 2010
- [15]
-
[16]
Detection of outliers in refer ence distributions; performance of Horn’s algorithm
Solberg HE, Laht A. Detection of outliers in refer ence distributions; performance of Horn’s algorithm. Clin Chem 2005; 51(12); 2326 – 2332
work page 2005
-
[17]
Reference intervals: an update
Horn PS, Pesce AJ. Reference intervals: an update. Clin Chim Acta 2003; 334: 5 – 23
work page 2003
-
[18]
Determination of reference limits: statistical concepts and tools for sample size calculation
Wellek S, Lackner KJ, Jennen- Steinmetz C, Reinhard I, Hoffman I, Blettner M. Determination of reference limits: statistical concepts and tools for sample size calculation. Clin Chem Lab Med 2014; 52(12): 1685 – 1694
work page 2014
-
[19]
Quantitative serum immunoglobulin tests
Loh RKS, Vale S, McLean -Tooke A. Quantitative serum immunoglobulin tests. A ust Fam Physician 2013; 42: 195 - 198
work page 2013
-
[20]
A Gonzalez-Quintela A, Alende R, Gude F, Campos J, Rey J, Meijide LM, Fernandez- Merino C, Vidal C. Serum levels of immunoglobulins (IgG, IgA, IgM) in a general adult population and their relationship with alcohol consumption, smoking and common metabolic abnormalities. Clin Exp Immunol 2007; 151: 42 —50
work page 2007
-
[21]
Eggers KM, Apple FS, Lind L, Lindahl B. The applied statistical approach highly influences the 99th percentile of cardiac troponin I Clin Biochem 2016; 49: 1109 —1112
work page 2016
-
[22]
An analysis of transformations
Box G, Cox D. An analysis of transformations. J R Stat Soc Series B Stat Methodol 1964; 26: 211-252
work page 1964
-
[23]
A biweight approach to the one-sample problem
Kafadar K. A biweight approach to the one-sample problem. J Amer Statist Assoc 1982; 77: 416 – 424
work page 1982
-
[24]
R: A language and environment for statistical computing
R Core Team . R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2013
work page 2013
-
[25]
geoR: A package for geostatistical analysis
Ribeiro JR, Diggle PJ. geoR: A package for geostatistical analysis. R-NEWS 2001: 1. 16
work page 2001
-
[26]
Statistical response to issues with the determination of the troponin 99th percentile
Richardson A, Neeman T, Yoon H -J, Haslett S. Statistical response to issues with the determination of the troponin 99th percentile. Clin Biochem 2017; 53: 412 – 414
work page 2017
-
[27]
Indirect methods for reference interval determination – review and recommendations
Jones GRD, Haeckel R, Loh TP, Sikaris K, Streichert T, Katayev A, Barth JH, Ozarda Y. Indirect methods for reference interval determination – review and recommendations. Clin Chem Lab Med 2018; e0073
work page 2018
-
[28]
Distribution based outlier detection in univariate data
van der Loo MPJ. Distribution based outlier detection in univariate data. Discussion paper 10003, Statistics Netherlands, The Hague, 2010. 17 Figures Figure 1. The Tukey biweight function. 18 Figure 2: Box plots of the values of IgA used to calculate the reference intervals, plotted in one year age groups. Red box plots are used for the female datasets an...
work page 2010
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.