pith. sign in

arxiv: 2606.02059 · v1 · pith:IFSDPFKOnew · submitted 2026-06-01 · 📊 stat.ME

ICCDesign: An R Package for the Design and Analysis of ICC-Based Reliability Studies with Continuous Responses

Pith reviewed 2026-06-28 13:20 UTC · model grok-4.3

classification 📊 stat.ME
keywords intraclass correlationreliabilityR packagesample sizeconfidence intervalShiny appMcGraw and Wong
0
0 comments X

The pith

The ICCDesign R package provides an integrated workflow for estimating, planning, and evaluating intraclass correlations in reliability studies.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the ICCDesign package to solve the problem of fragmented R tools for intraclass correlation coefficient analysis in reliability research. It combines point estimation with confidence intervals, sample size planning, reliability evaluation, and a Shiny app into a single package with a decision framework for selecting the right ICC form. A sympathetic reader would care because this reduces the risk of switching between tools and making analytical errors when applying ICC methods in medical, psychological, and behavioral studies.

Core claim

ICCDesign integrates four core functionalities for ICC-based reliability studies with continuous responses: point estimation and ANOVA-based confidence intervals for supported ICC forms following the McGraw and Wong framework with a four-step decision guide, sample size planning based on Zou's closed-form formulas, automated reliability evaluation using Koo and Li criteria, and an interactive Shiny web application.

What carries the argument

The ICCDesign package and its built-in four-step decision framework that guides selection of the appropriate ICC form under the McGraw and Wong framework.

Load-bearing premise

The built-in four-step decision framework correctly maps user study designs to the appropriate ICC form under the McGraw and Wong framework and the package implementations match the cited methods without coding errors.

What would settle it

Compare the package output for ICC point estimate and confidence interval on a standard dataset to results obtained from direct implementation of the McGraw and Wong ANOVA formulas or other established packages.

read the original abstract

The intraclass correlation coefficient (ICC) is among the most widely used statistics in reliability research, playing a central role in medical measurement, psychological assessment, and behavioral science. However, practical application of ICC faces two major obstacles. First, ICC can be organized into multiple forms under the McGraw and Wong (1996) framework -- including six widely reported standard forms and four additional design combinations -- and researchers must select the appropriate form based on their study design, yet existing guidelines are not always operationalized in software interfaces. Second, available R tools are highly fragmented: sample size calculation, ICC estimation with confidence intervals, and reliability evaluation are distributed across separate packages, compelling researchers to switch between tools and increasing the risk of analytical errors. This paper introduces the ICCDesign package, designed specifically to provide an integrated workflow for ICC-based reliability studies with continuous responses, assuming one continuous rating per subject-rater cell. The package integrates four core functionalities: (1) point estimation, ANOVA-based confidence intervals, and implemented hypothesis tests for supported ICC design combinations following the McGraw and Wong (1996) framework, with a built-in four-step decision framework guiding users toward an appropriate ICC form; (2) sample size planning based on Zou's (2012) closed-form formulas, supporting two planning modes and an inverse assurance calculation; (3) automated reliability evaluation based on Koo and Li (2016) criteria, with an uncertainty notification when the confidence interval spans the 0.75 good-reliability threshold; and (4) an interactive Shiny web application covering the main analysis and planning functionalities. ICCDesign is available from GitHub at https://github.com/KlariZhang/ICCDesign.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the ICCDesign R package for ICC-based reliability studies with continuous responses. It claims to integrate (1) point estimation, ANOVA-based CIs, and hypothesis tests for McGraw & Wong (1996) ICC forms via a built-in four-step decision framework, (2) sample-size planning using Zou (2012) closed-form formulas in two modes plus inverse assurance, (3) automated reliability evaluation per Koo & Li (2016) criteria with uncertainty notification, and (4) a Shiny web application, addressing fragmentation across existing R tools.

Significance. If the implementations prove correct, the package would usefully consolidate sample-size planning, estimation, and evaluation into one workflow with usability aids, reducing switching errors for researchers in medical measurement and behavioral sciences. The decision framework and Shiny component add practical value. However, the complete absence of any validation, test cases, or numerical checks against the cited sources substantially lowers the assessed significance, as the contribution rests entirely on the unverified claim of faithful integration.

major comments (2)
  1. [Abstract] Abstract and overall manuscript: the central claim that the package correctly implements the McGraw & Wong (1996) forms via a four-step decision framework is unsupported, because the manuscript provides neither the decision logic, pseudocode, nor any worked examples showing how user designs are mapped to specific ICC forms.
  2. [Abstract] Abstract and overall manuscript: no section supplies validation, test cases, side-by-side numerical comparisons against Zou (2012) formulas, McGraw & Wong (1996) CIs, Koo & Li (2016) thresholds, or outputs from other packages; this directly undermines the claim that the integrated functionalities are correctly implemented.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive criticism. The comments correctly identify that the manuscript lacks explicit documentation of the decision framework and any form of validation or numerical checks. We address each point below and will revise the manuscript to incorporate the requested details.

read point-by-point responses
  1. Referee: [Abstract] Abstract and overall manuscript: the central claim that the package correctly implements the McGraw & Wong (1996) forms via a four-step decision framework is unsupported, because the manuscript provides neither the decision logic, pseudocode, nor any worked examples showing how user designs are mapped to specific ICC forms.

    Authors: We agree that the four-step decision framework is described only at a high level in the current manuscript. In the revised version we will add (i) the explicit decision logic in both text and pseudocode, (ii) a table mapping common study-design features (number of raters, fixed vs. random, etc.) to the six standard McGraw & Wong forms plus the four additional combinations, and (iii) two fully worked examples that trace a user-specified design through the four steps to the resulting ICC form, ANOVA model, and confidence-interval formula. revision: yes

  2. Referee: [Abstract] Abstract and overall manuscript: no section supplies validation, test cases, side-by-side numerical comparisons against Zou (2012) formulas, McGraw & Wong (1996) CIs, Koo & Li (2016) thresholds, or outputs from other packages; this directly undermines the claim that the integrated functionalities are correctly implemented.

    Authors: We acknowledge that the manuscript currently contains no validation material. The revised manuscript will include a new “Validation” section containing: (a) unit-test results for the Zou (2012) sample-size formulas against the original closed-form expressions, (b) side-by-side numerical comparisons of ICC point estimates and ANOVA-based CIs with the irr and psych packages for the same data sets, (c) verification that Koo & Li (2016) reliability labels are assigned correctly, including the uncertainty notification when a CI straddles 0.75, and (d) a small set of reproducible R code snippets that readers can run to reproduce the comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: software wrapper around externally published methods

full rationale

The paper introduces an R package that integrates four functionalities by wrapping previously published methods: McGraw and Wong (1996) ICC forms with a four-step decision framework, Zou (2012) sample-size formulas, and Koo and Li (2016) reliability criteria. No new derivations, predictions, fitted parameters, or first-principles results appear in the manuscript. The central claim is the provision of an integrated workflow and Shiny app; all load-bearing statistical content is imported from external citations whose validity is independent of the present work. No self-citation chains, ansatzes, or renamings reduce any claim to its own inputs by construction. This is the expected outcome for a software-description paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper introduces no new mathematical derivations, free parameters, or postulated entities. It packages previously published methods whose assumptions (standard ANOVA models for ICC, closed-form sample-size formulas, fixed reliability thresholds) are inherited from the cited references.

pith-pipeline@v0.9.1-grok · 5849 in / 1175 out tokens · 32911 ms · 2026-06-28T13:20:43.835834+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

10 extracted references · 4 canonical work pages

  1. [1]

    Brueckl, M. (2022). irrNA: Coefficients of Interrater Reliability – Generalized for Randomly In- complete Datasets . R package version 0.2.2. https://CRAN.R-project.org/package=irrNA 21

  2. [2]

    Gamer, M., Lemon, J., & Singh, I. F. P. (2019). irr: Various Coefficients of Interrater Reliability and Agreement. R package version 0.84.1. https://CRAN.R-project.org/package=irr

  3. [3]

    A guideline of selecting and reporting intraclass correlation coefficients for reliability research,

    Koo, T. K., & Li, M. Y. (2016). A guideline of selecting and reporting intraclass correla- tion coefficients for reliability research. Journal of Chiropractic Medicine , 15(2), 155–163. https://doi.org/10.1016/j.jcm.2016.02.012

  4. [4]

    Liu, Z., Ma, R., Gao, C., & Zhang, Y. (2026). ICCDesign: An R Package for ICC-Based Reliability Studies. Version 0.1.0. https://github.com/KlariZhang/ICCDesign

  5. [5]

    Forming inferences about some intraclass correlation coefficients,

    McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 30–46. https://doi.org/10.1037/1082-989X.1.1.30 R Core Team (2024). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  6. [6]

    Revelle, W. (2024). psych: Procedures for Psychological, Psychometric, and Personality Research . R package version 2.4.3. https://CRAN.R-project.org/package=psych

  7. [7]

    Intraclass correlations: Uses in assessing rater reliability,

    Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin , 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

  8. [8]

    Wickham, H., Hester, J., Chang, W., & Bryan, J. (2022). devtools: Tools to Make Developing R Packages Easier . R package version 2.4.5. https://CRAN.R-project.org/package=devtools

  9. [9]

    E., Fairbairn, D

    Wolak, M. E., Fairbairn, D. J., & Paulsen, Y. R. (2012). ICC.Sample.Size: Calcu- lation of Sample Size and Power for ICC . R package version 1.0. https://CRAN.R- project.org/package=ICC.Sample.Size

  10. [10]

    Zou, G. Y. (2012). Sample size formulas for estimating intraclass correlation coefficients with pre- cision and assurance. Statistics in Medicine , 31(29), 3972–3981. https://doi.org/10.1002/sim.5466 22