An ordinal measure of interrater absolute agreement
Pith reviewed 2026-05-24 17:30 UTC · model grok-4.3
The pith
A measure of interrater absolute agreement for ordinal scales is constructed from Leti's dispersion index to avoid variance restriction problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce an interrater absolute agreement measure for ordinal variables that capitalizes on Leti's dispersion index. This construction avoids the restriction of variance issue that can affect traditional agreement measures. They provide an unbiased estimator, study its sampling properties, and develop asymptotic and bootstrap confidence intervals, demonstrating accuracy through simulations and a real application.
What carries the argument
Leti's dispersion index for ordinal variables, adapted as the foundation for an absolute agreement measure between multiple raters.
If this is right
- The new measure provides a direct quantification of absolute agreement without being distorted by low variance in ratings.
- An unbiased estimator allows reliable point estimation of the agreement level.
- Both asymptotic theory and bootstrap methods yield valid confidence intervals for the measure.
- Simulations confirm the procedure's accuracy for assessing agreement in ordinal data.
- Application to real data illustrates practical utility in fields using ordinal scales.
Where Pith is reading between the lines
- If the measure performs well, it could replace or supplement kappa-like statistics in settings where raters tend to use similar score ranges.
- Extensions might include incorporating weights for different disagreement levels or handling missing ratings.
- The approach could be adapted to other types of categorical data beyond ordinal.
- Further work might compare its power to detect disagreement against existing methods in large samples.
Load-bearing premise
Leti's dispersion index provides a suitable basis for measuring absolute agreement between raters on ordinal scales without introducing new biases.
What would settle it
A simulation where the new measure still shows variance restriction effects similar to traditional measures, or where its estimator is biased in finite samples.
Figures
read the original abstract
A measure of interrater absolute agreement for ordinal scales is proposed capitalizing on the dispersion index for ordinal variables proposed by Giuseppe Leti. The procedure allows to avoid the problem of restriction of variance that sometimes affect traditional measures of interrater agreement in different fields of application. An unbiased estimator of the proposed measure is introduced and its sampling properties are investigated. In order to construct confidence intervals for interrater absolute agreement both asymptotic results and bootstrapping methods are used and their performance is evaluated. Simulated data are employed to demonstrate the accuracy and practical utility of the new procedure for assessing agreement. Finally, an application to a real case is provided.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a measure of interrater absolute agreement for ordinal scales constructed by adapting Giuseppe Leti's dispersion index for ordinal variables (typically via 1 minus a normalized dispersion). It derives an unbiased estimator, establishes sampling properties, constructs confidence intervals via delta-method asymptotics and bootstrap, evaluates bias/variance/coverage through simulations across numbers of raters, categories, and agreement levels, and illustrates the procedure on a real dataset.
Significance. If the construction and simulation results hold, the index supplies a direct, non-chance-corrected alternative to measures such as weighted kappa that can suffer from marginal variance restriction. The explicit unbiased estimator, dual CI methods, and simulation coverage checks constitute concrete strengths that would make the contribution useful in applied settings where ordinal ratings are common.
minor comments (3)
- [Abstract] The abstract states that the procedure 'allows to avoid the problem of restriction of variance' but does not indicate the precise mechanism (joint-distribution definition versus marginal normalization); a single clarifying sentence would improve readability.
- [Simulations] In the simulation section, coverage results are presented for selected combinations of raters and categories; adding a brief table or figure summarizing coverage across the full grid (including low-agreement and high-category cases) would make the performance claims easier to assess.
- [Application] The real-data example would be strengthened by a side-by-side numerical comparison with at least one conventional index (e.g., quadratic-weighted kappa) on the same ratings to illustrate the practical difference.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of the manuscript and the recommendation of minor revision. The report does not list any specific major comments requiring a point-by-point response.
Circularity Check
No significant circularity
full rationale
The paper defines its interrater agreement index by direct adaptation of Leti's external dispersion measure for ordinal data (1 minus a normalized form of the index), then applies standard unbiased estimation, delta-method asymptotics, and bootstrap for inference. No equation reduces the proposed quantity to a fitted parameter or self-citation by construction; the core definition operates on the joint rating distribution independently of variance-restriction issues in chance-corrected coefficients. All load-bearing steps (estimator derivation, sampling properties, CI construction) rest on external statistical machinery and the cited Leti index rather than internal self-reference or renaming.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Leti's dispersion index is a valid and appropriate measure of dispersion for ordinal variables
- standard math Standard asymptotic and bootstrap theory applies to the proposed estimator
Reference graph
Works this paper leans on
-
[1]
Booth, J. G., R. W. Butler, and P. Hall (1994). Bootstrap methods for finite populations. Journal of the American Statistical Association , 89 (428), 1282–1289
work page 1994
-
[2]
(2018) Measurement of interrater agreement for the assessment of language proficiency
Bove, G., Nuzzo, E., Serafini, A. (2018) Measurement of interrater agreement for the assessment of language proficiency. In: S. Capecchi, Di Iorio F., Simone R. ASMOD 2018: Proceedings of the Advanced Statistical Modelling for Ordinal Data Conference . Universit` a Federico II di Napoli, 24-26 October 2018. Napoli: FedOAPress, 61–68
work page 2018
-
[3]
Grilli L., Rampichini C. (2002) Scomposizione della dispersione per variabili statistiche ordinali [Dispersion decomposition for ordinal variables], Statistica, 62, 111–116
work page 2002
-
[4]
Gross, S. (1980). Median estimation in sample surveys. In Proceedings of the Section on Survey Research Methods, American Statistical Association, pp. 181–184
work page 1980
-
[5]
James, L. J., Demaree, R. G.,Wolf, G. (1984). Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology , 69, 85–98
work page 1984
-
[6]
James L. J., Demaree R. G., Wolf G. (1993) rwg: An assessment of within-group interrater agreement, Journal of Applied Psychology , 78, 306–309
work page 1993
-
[7]
Efron, B. (1979). Bootstrap methods: another look at the jackknife. The Annals of Statistics, 7(1), 1–26
work page 1979
-
[8]
(2017) Functional adequacy in L2 writing
Kuiken F., Vedder I. (2017) Functional adequacy in L2 writing. Towards a new rating scale, Language Testing, 34, 321-336
work page 2017
-
[9]
LeBreton J.M., Burgess J.R.D., Kaiser R.B., Atchley E.K., James L.R. (2003) The restriction of variance hypothesis and interrater reliability and agreement: Are ratings from multiple sources really dissimilar?, Organizational Research Methods, 6, 80–128
work page 2003
-
[10]
LeBreton J.M., Senter, J.L. (2008). Answers to 20 questions about interrater reliability and interrater agreement. Organizational Research Methods, 11(4), 815–852
work page 2008
-
[11]
(1983) Statistica descrittiva, Il Mulino, Bologna
Leti G. (1983) Statistica descrittiva, Il Mulino, Bologna
work page 1983
-
[12]
(1952) The Standard Error of Gini’s Mean Difference
Lomnicki Z.A. (1952) The Standard Error of Gini’s Mean Difference. The Annals of Mathematical Statistics, 23, 14, 635–637
work page 1952
-
[13]
Mashreghi, Z., Haziza, D., L´ eger, C. (2016). A survey of bootstrap methods in finite population sampling. Statistics Surveys, 10, 1–52
work page 2016
-
[14]
McGraw K.O., Wong S.P. (1996) Forming inferences about some intraclass correlation coefficients, Psychological Methods, 1, 30–46
work page 1996
-
[15]
Nuzzo E., Bove G. (2018) Assessing functional adequacy across tasks: A comparison of learners and native speakers’ written texts, (submitted for publication)
work page 2018
-
[16]
Piccarretta, R. (2001). A new measure of nomila-ordinal association, Journal of Applied Statistics, 28, 1, 107–120
work page 2001
-
[17]
Sheather, S.J. and Jones, M.C. (1991). A Reliable Data-Based Bandwidth Selection Method for Kernel Density Estimation. Journal of the Royal Statistical Society Series B , 53, 683–690
work page 1991
-
[18]
Shrout, P. E., and Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing reliability. Psychological Bulletin, 86, 420–428
work page 1979
-
[19]
(2005) Analyzing rater agreement
von Eye A., Mun E.Y. (2005) Analyzing rater agreement. Manifest variable methods , Lawrence Erlbaum Associates, Mahwah, New Jersey
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.