The Effect of Choice of Metric and Scan Length on Reliability in Resting-State fMRI
Pith reviewed 2026-06-28 18:22 UTC · model grok-4.3
The pith
The choice of distance metric for assessing reliability in resting-state fMRI leads to different conclusions, and longer scan lengths improve reliability more than the time between sessions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Applying the distance-based intraclass correlation coefficient to resting-state fMRI correlation matrices from the Midnight Scanning Club dataset reveals that reliability assessments differ depending on whether the Frobenius metric or the Affine Invariant Riemannian Metric is used. Longer scan lengths significantly improve reliability estimates, while the time interval between sessions has less impact.
What carries the argument
The distance-based intraclass correlation coefficient (dbICC) applied to correlation matrices using Frobenius and AIRM distance metrics to quantify reliability across multiple sessions.
If this is right
- Reliability conclusions can change based on the distance metric selected for the analysis.
- Longer scan lengths lead to higher reliability in connectivity estimates.
- The time between sessions affects reliability less than scan length does.
- The geometry-respecting metric provides an alternative view of reliability compared to the standard metric.
Where Pith is reading between the lines
- Future studies could compare these metrics against other reliability measures to determine which aligns better with biological variability.
- Optimal scan protocols might prioritize longer continuous scans over more frequent short ones.
- The method could be applied to task-based fMRI or other neuroimaging modalities to assess similar effects.
Load-bearing premise
That the patterns observed in ten subjects with ten sessions each apply broadly to other resting-state fMRI experiments with different numbers of subjects or scan setups.
What would settle it
A study with a different dataset or more subjects finding that the two metrics produce identical reliability rankings or that scan length has no effect on reliability would challenge the claims.
Figures
read the original abstract
Resting-state fMRI (rs-fMRI) is widely used to investigate brain functional connectivity, but the reliability of these measurements remains a key concern for ensuring reproducibility. The distance-based intraclass correlation coefficient (dbICC) generalizes classical ICC to more general data types, making it well-suited for assessing the reliability of measures of functional connectivity. In this study, we applied dbICC to assess the reliability of rs-fMRI data from the Midnight Scanning Club (MSC) dataset, which consists of 10 subjects, each undergoing 10 sessions of 30-minute rs-fMRI scans. The functional connectivity was estimated using Pearson's correlation coefficients between all pairs of brain regions, resulting in a correlation matrix for each session. We compared two distance metrics-the widely used Frobenius metric and the Affine Invariant Riemannian Metric (AIRM) selected to respect the geometry of the space of covariance matrices-to evaluate how the choice of metric affects the reliability of estimating correlation. In addition, we investigated the impact of scan length and time interval between sessions on reliability. Results based on each metric agreed in some respects but disagreed in others, illustrating the impact of choice of metric. We also found that longer scan lengths significantly improve reliability, while the time interval between sessions has less impact.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript applies the distance-based intraclass correlation coefficient (dbICC) to functional connectivity matrices derived from Pearson correlations in the Midnight Scanning Club (MSC) dataset (10 subjects, each with 10 sessions of 30-minute rs-fMRI scans). It compares reliability estimates under the Frobenius metric versus the Affine Invariant Riemannian Metric (AIRM), and examines how scan length and inter-session interval affect those estimates, reporting partial agreement/disagreement between metrics and a stronger effect of scan length than interval.
Significance. If the central claims hold after addressing sample-size and reporting limitations, the work would usefully illustrate that metric geometry can produce divergent reliability conclusions in rs-fMRI and that scan duration is a more actionable lever than inter-session spacing for improving reproducibility.
major comments (2)
- [Abstract] Abstract and dataset description: the claim that metric choice produces both agreements and disagreements, and that longer scans significantly improve reliability, rests on a single cohort of n=10 subjects. No power analysis, cross-dataset replication, or sensitivity checks are described that would separate intrinsic metric/scan-length effects from subject-specific or acquisition idiosyncrasies.
- [Abstract] Abstract: the statement that 'longer scan lengths significantly improve reliability' is presented without accompanying statistical details, confidence intervals, or exclusion criteria, so the magnitude and robustness of the reported effect cannot be evaluated from the given information.
minor comments (1)
- [Abstract] The abstract refers to 'correlation matrix for each session' but does not specify whether matrices are Fisher-z transformed or otherwise regularized before distance computation; this detail affects interpretation of both metrics.
Simulated Author's Rebuttal
Thank you for the referee's comments highlighting limitations in sample size and statistical reporting in the abstract. We address each point below with proposed revisions where feasible.
read point-by-point responses
-
Referee: [Abstract] Abstract and dataset description: the claim that metric choice produces both agreements and disagreements, and that longer scans significantly improve reliability, rests on a single cohort of n=10 subjects. No power analysis, cross-dataset replication, or sensitivity checks are described that would separate intrinsic metric/scan-length effects from subject-specific or acquisition idiosyncrasies.
Authors: The MSC dataset was selected specifically for its rare structure of 10 sessions per subject, which enables within-subject reliability estimation across metrics that larger single-session cohorts cannot provide. We acknowledge the absence of formal power analysis or cross-dataset replication. In revision we will add sensitivity analyses (e.g., subsampling sessions) and an explicit limitations paragraph on generalizability; however, replication on independent datasets lies outside the current study scope. revision: partial
-
Referee: [Abstract] Abstract: the statement that 'longer scan lengths significantly improve reliability' is presented without accompanying statistical details, confidence intervals, or exclusion criteria, so the magnitude and robustness of the reported effect cannot be evaluated from the given information.
Authors: We agree the abstract should be more informative. The full manuscript already compares dbICC across discrete scan lengths (5–30 min) and reports the corresponding values; we will revise the abstract to include the key quantitative results (effect magnitudes and any inferential statistics) and state that no sessions or subjects were excluded. revision: yes
Circularity Check
No circularity: empirical application of dbICC to MSC data
full rationale
The paper applies the distance-based ICC (dbICC) to correlation matrices from the MSC dataset (10 subjects, 10 sessions each) to compare Frobenius and AIRM metrics and assess scan-length effects on reliability. No equations, fitted parameters, or self-citations are presented that reduce the reported agreements/disagreements or scan-length findings to tautological definitions or inputs by construction. The analysis is a direct statistical computation on external data without self-referential predictions or uniqueness claims imported from the authors' prior work.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://doi.org/10.1038/nature18933 Gordon, E. M., Laumann, T. O., Gilmore, A. W., Newbold, D. J., Greene, D. J., Berg, J. J., Ortega, M., Hoyt-Drazen, C., Gratton, C., Sun, H., Hampton, J. M., Coalson, R. S., Nguyen, A. L., McDermott, K. B., Shimony, J. S., Snyder, A. Z., Schlaggar, B. L., Petersen, S. E., Nelson, S. M., & Dosenbach, N. U. F. (2017). Pre...
-
[2]
G., Shokri-Kojori, E., & V olkow, N
https://doi.org/10.1137/22M1538144 Tomasi, D. G., Shokri-Kojori, E., & V olkow, N. D. (2017). Temporal Evolution of Brain Functional Connectivity Metrics: Could 7 Min of Rest be Enough? Cerebral Cortex (New York, N.Y.: 1991), 27(8), 4153–4165. https://doi.org/10.1093/cercor/bhw227 Van Den Heuvel, M. P., & Hulshoff Pol, H. E. (2010). Exploring the brain ne...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.