The Effect of Choice of Metric and Scan Length on Reliability in Resting-State fMRI

Philip T. Reiss; R. Todd Ogden; Seonjoo Lee; Yu Huang

arxiv: 2606.00767 · v1 · pith:DNJMR3SBnew · submitted 2026-05-30 · 📊 stat.ME

The Effect of Choice of Metric and Scan Length on Reliability in Resting-State fMRI

Yu Huang , Philip T. Reiss , Seonjoo Lee , R. Todd Ogden This is my paper

Pith reviewed 2026-06-28 18:22 UTC · model grok-4.3

classification 📊 stat.ME

keywords resting-state fMRIreliabilitydbICCmetric choicescan lengthfunctional connectivityreproducibilitycorrelation matrices

0 comments

The pith

The choice of distance metric for assessing reliability in resting-state fMRI leads to different conclusions, and longer scan lengths improve reliability more than the time between sessions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how the selection of a distance metric and the duration of scans influence the reliability of functional connectivity measurements in resting-state fMRI. Using data from ten subjects scanned multiple times, it applies a generalized intraclass correlation coefficient with two different metrics: the standard Frobenius metric and the Affine Invariant Riemannian Metric. The results show partial agreement but notable disagreements between the metrics on reliability levels. Longer scans are found to enhance reliability substantially, whereas the interval between sessions has a smaller effect. This matters because unreliable measures can undermine studies of brain connectivity and reproducibility in neuroscience.

Core claim

Applying the distance-based intraclass correlation coefficient to resting-state fMRI correlation matrices from the Midnight Scanning Club dataset reveals that reliability assessments differ depending on whether the Frobenius metric or the Affine Invariant Riemannian Metric is used. Longer scan lengths significantly improve reliability estimates, while the time interval between sessions has less impact.

What carries the argument

The distance-based intraclass correlation coefficient (dbICC) applied to correlation matrices using Frobenius and AIRM distance metrics to quantify reliability across multiple sessions.

If this is right

Reliability conclusions can change based on the distance metric selected for the analysis.
Longer scan lengths lead to higher reliability in connectivity estimates.
The time between sessions affects reliability less than scan length does.
The geometry-respecting metric provides an alternative view of reliability compared to the standard metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future studies could compare these metrics against other reliability measures to determine which aligns better with biological variability.
Optimal scan protocols might prioritize longer continuous scans over more frequent short ones.
The method could be applied to task-based fMRI or other neuroimaging modalities to assess similar effects.

Load-bearing premise

That the patterns observed in ten subjects with ten sessions each apply broadly to other resting-state fMRI experiments with different numbers of subjects or scan setups.

What would settle it

A study with a different dataset or more subjects finding that the two metrics produce identical reliability rankings or that scan length has no effect on reliability would challenge the claims.

Figures

Figures reproduced from arXiv: 2606.00767 by Philip T. Reiss, R. Todd Ogden, Seonjoo Lee, Yu Huang.

**Figure 1.** Figure 1: Distance matrices for all ROIs and two representative networks computed using two different metrics. Each row and column corresponds to a scan session; each cell represents the distance between a pair of sessions. Darker colors represent larger distance [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗

**Figure 2.** Figure 2: Density plots for all ROIs and two representative networks comparing between-subject (red) and within-subject distances (blue) using Frobenius metric and AIRM [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Scatterplot for all ROIs comparing pairwise distances calculated using Frobenius metric and AIRM. Each point represents distance between a pair of sessions. Triangles indicate within-subject distances, colored by subject ID, while grey circles represent between-subject distances [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Resting-state fMRI (rs-fMRI) is widely used to investigate brain functional connectivity, but the reliability of these measurements remains a key concern for ensuring reproducibility. The distance-based intraclass correlation coefficient (dbICC) generalizes classical ICC to more general data types, making it well-suited for assessing the reliability of measures of functional connectivity. In this study, we applied dbICC to assess the reliability of rs-fMRI data from the Midnight Scanning Club (MSC) dataset, which consists of 10 subjects, each undergoing 10 sessions of 30-minute rs-fMRI scans. The functional connectivity was estimated using Pearson's correlation coefficients between all pairs of brain regions, resulting in a correlation matrix for each session. We compared two distance metrics-the widely used Frobenius metric and the Affine Invariant Riemannian Metric (AIRM) selected to respect the geometry of the space of covariance matrices-to evaluate how the choice of metric affects the reliability of estimating correlation. In addition, we investigated the impact of scan length and time interval between sessions on reliability. Results based on each metric agreed in some respects but disagreed in others, illustrating the impact of choice of metric. We also found that longer scan lengths significantly improve reliability, while the time interval between sessions has less impact.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies dbICC to MSC data to compare Frobenius and AIRM metrics on correlation matrices and reports that metric choice produces mixed agreement while longer scans boost reliability more than inter-session interval.

read the letter

The main thing to know is that this work takes the Midnight Scanning Club dataset—ten subjects each scanned ten times—and uses distance-based ICC to check how Frobenius versus AIRM distance changes reliability estimates for resting-state correlation matrices. It also looks at scan length and session spacing. The concrete comparison on this repeated-measures data is the new piece; prior work has used dbICC but not this exact metric pairing on MSC.

The dataset choice is a strength. Multiple sessions per person let them estimate reliability without mixing subjects, and picking AIRM because it respects covariance geometry is a defensible move over the usual Frobenius norm. The finding that the two metrics sometimes agree and sometimes do not is useful to see in one place.

The soft spot is the sample. Ten subjects is thin ground for general statements about metric effects or scan-length benefits, and nothing in the abstract shows power checks, sensitivity runs, or tests on another dataset. The claim that longer scans “significantly improve” reliability is stated without numbers, error bars, or pipeline details, so it is hard to gauge how robust it is. The stress-test note on limited generalizability holds up on the evidence given.

This is for labs already doing rs-fMRI reliability work who might want to try AIRM themselves. A methods reader could get value from the side-by-side, but someone needing broad scan-design rules would find it suggestive rather than definitive.

I would bring it to a reading group to talk through the metric results. I would not cite it in the next year unless the full numbers turn out unusually clean. It deserves peer review because the question is practical and the data setup is reasonable, even if revisions will likely need more subjects or replication.

Referee Report

2 major / 1 minor

Summary. The manuscript applies the distance-based intraclass correlation coefficient (dbICC) to functional connectivity matrices derived from Pearson correlations in the Midnight Scanning Club (MSC) dataset (10 subjects, each with 10 sessions of 30-minute rs-fMRI scans). It compares reliability estimates under the Frobenius metric versus the Affine Invariant Riemannian Metric (AIRM), and examines how scan length and inter-session interval affect those estimates, reporting partial agreement/disagreement between metrics and a stronger effect of scan length than interval.

Significance. If the central claims hold after addressing sample-size and reporting limitations, the work would usefully illustrate that metric geometry can produce divergent reliability conclusions in rs-fMRI and that scan duration is a more actionable lever than inter-session spacing for improving reproducibility.

major comments (2)

[Abstract] Abstract and dataset description: the claim that metric choice produces both agreements and disagreements, and that longer scans significantly improve reliability, rests on a single cohort of n=10 subjects. No power analysis, cross-dataset replication, or sensitivity checks are described that would separate intrinsic metric/scan-length effects from subject-specific or acquisition idiosyncrasies.
[Abstract] Abstract: the statement that 'longer scan lengths significantly improve reliability' is presented without accompanying statistical details, confidence intervals, or exclusion criteria, so the magnitude and robustness of the reported effect cannot be evaluated from the given information.

minor comments (1)

[Abstract] The abstract refers to 'correlation matrix for each session' but does not specify whether matrices are Fisher-z transformed or otherwise regularized before distance computation; this detail affects interpretation of both metrics.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the referee's comments highlighting limitations in sample size and statistical reporting in the abstract. We address each point below with proposed revisions where feasible.

read point-by-point responses

Referee: [Abstract] Abstract and dataset description: the claim that metric choice produces both agreements and disagreements, and that longer scans significantly improve reliability, rests on a single cohort of n=10 subjects. No power analysis, cross-dataset replication, or sensitivity checks are described that would separate intrinsic metric/scan-length effects from subject-specific or acquisition idiosyncrasies.

Authors: The MSC dataset was selected specifically for its rare structure of 10 sessions per subject, which enables within-subject reliability estimation across metrics that larger single-session cohorts cannot provide. We acknowledge the absence of formal power analysis or cross-dataset replication. In revision we will add sensitivity analyses (e.g., subsampling sessions) and an explicit limitations paragraph on generalizability; however, replication on independent datasets lies outside the current study scope. revision: partial
Referee: [Abstract] Abstract: the statement that 'longer scan lengths significantly improve reliability' is presented without accompanying statistical details, confidence intervals, or exclusion criteria, so the magnitude and robustness of the reported effect cannot be evaluated from the given information.

Authors: We agree the abstract should be more informative. The full manuscript already compares dbICC across discrete scan lengths (5–30 min) and reports the corresponding values; we will revise the abstract to include the key quantitative results (effect magnitudes and any inferential statistics) and state that no sessions or subjects were excluded. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical application of dbICC to MSC data

full rationale

The paper applies the distance-based ICC (dbICC) to correlation matrices from the MSC dataset (10 subjects, 10 sessions each) to compare Frobenius and AIRM metrics and assess scan-length effects on reliability. No equations, fitted parameters, or self-citations are presented that reduce the reported agreements/disagreements or scan-length findings to tautological definitions or inputs by construction. The analysis is a direct statistical computation on external data without self-referential predictions or uniqueness claims imported from the authors' prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract introduces no new free parameters, axioms, or invented entities; it applies the existing dbICC method and two standard metrics to an existing public dataset.

pith-pipeline@v0.9.1-grok · 5766 in / 1101 out tokens · 27047 ms · 2026-06-28T18:22:08.111326+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

M., Laumann, T

https://doi.org/10.1038/nature18933 Gordon, E. M., Laumann, T. O., Gilmore, A. W., Newbold, D. J., Greene, D. J., Berg, J. J., Ortega, M., Hoyt-Drazen, C., Gratton, C., Sun, H., Hampton, J. M., Coalson, R. S., Nguyen, A. L., McDermott, K. B., Shimony, J. S., Snyder, A. Z., Schlaggar, B. L., Petersen, S. E., Nelson, S. M., & Dosenbach, N. U. F. (2017). Pre...

work page doi:10.1038/nature18933 2017
[2]

G., Shokri-Kojori, E., & V olkow, N

https://doi.org/10.1137/22M1538144 Tomasi, D. G., Shokri-Kojori, E., & V olkow, N. D. (2017). Temporal Evolution of Brain Functional Connectivity Metrics: Could 7 Min of Rest be Enough? Cerebral Cortex (New York, N.Y.: 1991), 27(8), 4153–4165. https://doi.org/10.1093/cercor/bhw227 Van Den Heuvel, M. P., & Hulshoff Pol, H. E. (2010). Exploring the brain ne...

work page doi:10.1137/22m1538144 2017

[1] [1]

M., Laumann, T

https://doi.org/10.1038/nature18933 Gordon, E. M., Laumann, T. O., Gilmore, A. W., Newbold, D. J., Greene, D. J., Berg, J. J., Ortega, M., Hoyt-Drazen, C., Gratton, C., Sun, H., Hampton, J. M., Coalson, R. S., Nguyen, A. L., McDermott, K. B., Shimony, J. S., Snyder, A. Z., Schlaggar, B. L., Petersen, S. E., Nelson, S. M., & Dosenbach, N. U. F. (2017). Pre...

work page doi:10.1038/nature18933 2017

[2] [2]

G., Shokri-Kojori, E., & V olkow, N

https://doi.org/10.1137/22M1538144 Tomasi, D. G., Shokri-Kojori, E., & V olkow, N. D. (2017). Temporal Evolution of Brain Functional Connectivity Metrics: Could 7 Min of Rest be Enough? Cerebral Cortex (New York, N.Y.: 1991), 27(8), 4153–4165. https://doi.org/10.1093/cercor/bhw227 Van Den Heuvel, M. P., & Hulshoff Pol, H. E. (2010). Exploring the brain ne...

work page doi:10.1137/22m1538144 2017