An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)
Pith reviewed 2026-06-28 06:03 UTC · model grok-4.3
The pith
Deep learning predictions filtered by conformal prediction scale up analysis of knee structures and pain trajectories to 2,175 cases.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The framework first produces MOAKS predictions from MRIs with conformal prediction intervals, retains only high-confidence knee-level outputs, and feeds those into an LCMM that discovers two pain trajectories. In the rapid-progression trajectory the estimated odds ratios are 1.62 (1.12-2.35) for BML, 1.83 (1.24-2.70) for cartilage loss, and 2.50 (1.75-3.57) for meniscal extrusion, derived from the expanded cohort of 2,175 knees.
What carries the argument
Conformal-prediction filtering of deep-learning MOAKS outputs before they enter the longitudinal latent class mixed model, which selects reliable inputs and thereby enlarges the analyzable sample while preserving uncertainty awareness.
If this is right
- Bone marrow lesions, cartilage loss, and meniscal extrusion each function as risk factors for rapid rather than stable pain progression.
- The Matthews correlation coefficient for the three structural features rises from 0.69/0.45/0.59 to 0.91/0.80/0.89 after conformal filtering.
- Two distinct longitudinal pain trajectories can be recovered from the larger filtered cohort.
- The same structural associations appear across four complementary pain measures.
Where Pith is reading between the lines
- The same filtering step could be inserted into other longitudinal imaging studies to increase usable sample size without proportional increases in manual annotation effort.
- If the conformal calibration remains valid across scanners or populations, the framework could be reused for multi-center osteoarthritis cohorts.
- The reported odds ratios supply quantitative targets that future intervention trials could use to test whether modifying these structures alters pain trajectory membership.
Load-bearing premise
The retained high-confidence MOAKS predictions are sufficiently accurate and unbiased that they do not distort the odds ratios estimated by the latent class mixed model.
What would settle it
Re-estimate the odds ratios after replacing the filtered deep-learning scores with manual MOAKS scores on the overlapping subset of knees and check whether the point estimates or confidence intervals change materially.
read the original abstract
Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We first developed a deep learning framework to predict MOAKS features directly from knee MRIs and incorporated conformal prediction to provide prediction uncertainty quantification. This uncertainty-aware strategy enables explicit filtering of model outputs, retaining only high-confidence MOAKS predictions at the knee level. Second, we applied a longitudinal latent class mixed model (LCMM) to examine associations between key structural abnormalities and four complementary knee pain measurements. Results: Among the three MRI-defined abnormalities (i.e., bone marrow lesions (BML), cartilage loss (CART), and meniscal extrusion (ME)), our framework substantially improved the Matthews correlation coefficient (MCC) and some other metrics. For example, MCC increased from 0.69 to 0.91 for BML, from 0.45 to 0.80 for CART, and from 0.59 to 0.89 for ME. Using these high-confidence predictions, we expanded the sample size to 2,175 knees for the LCMM analysis. Two distinct pain trajectories were identified (rapid and stable pain progression). The estimated odds ratios (95% CI) for the rapid progression group were 1.62 (1.12-2.35) for BML, 1.83 (1.24-2.70) for CART loss, and 2.50 (1.75-3.57) for ME. Conclusion: These results highlight the importance of these structural abnormalities as risk factors for pain and functional progression in osteoarthritis.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a framework that trains a deep learning model to predict MOAKS features (BML, cartilage loss, meniscal extrusion) from knee MRIs, applies conformal prediction to quantify uncertainty and retain only high-confidence outputs, and feeds the filtered predictions (n=2,175 knees) into a longitudinal latent class mixed model (LCMM) to identify two pain trajectories and estimate odds ratios for rapid progression (1.62 for BML, 1.83 for CART loss, 2.50 for ME). The work reports large gains in Matthews correlation coefficient (e.g., 0.69 to 0.91 for BML) over baseline methods and concludes that the structural features are risk factors for pain progression.
Significance. If the conformal-filtered predictions prove representative, the approach would enable larger-scale, uncertainty-aware structure-pain analyses in osteoarthritis than manual MOAKS scoring permits while keeping the statistical modeling step interpretable and separate from the DL step.
major comments (2)
- [Results] Results section (LCMM analysis on n=2,175 knees): the manuscript reports the odds ratios 1.62 (1.12-2.35), 1.83 (1.24-2.70), and 2.50 (1.75-3.57) but contains no comparison of pain scores, baseline demographics, or imaging characteristics between the high-confidence retained subset and the discarded knees; without this or a sensitivity analysis varying the conformal threshold, the central claim that the associations are trustworthy cannot be evaluated.
- [Methods] Methods (conformal prediction filtering step): the claim that retaining only high-confidence MOAKS predictions yields unbiased inputs for the LCMM rests on the untested assumption that prediction confidence is independent of factors that also affect pain trajectories; the abstract and results provide no empirical check on this independence.
Simulated Author's Rebuttal
We appreciate the referee's thoughtful review and the opportunity to clarify and strengthen our manuscript. We address the major comments below and will incorporate revisions as indicated.
read point-by-point responses
-
Referee: [Results] Results section (LCMM analysis on n=2,175 knees): the manuscript reports the odds ratios 1.62 (1.12-2.35), 1.83 (1.24-2.70), and 2.50 (1.75-3.57) but contains no comparison of pain scores, baseline demographics, or imaging characteristics between the high-confidence retained subset and the discarded knees; without this or a sensitivity analysis varying the conformal threshold, the central claim that the associations are trustworthy cannot be evaluated.
Authors: We agree that such a comparison is necessary to evaluate potential selection bias introduced by the conformal filtering. In the revised version, we will add a supplementary table comparing baseline pain scores, demographics (age, sex, BMI), and imaging characteristics (e.g., Kellgren-Lawrence grade) between the retained high-confidence knees (n=2,175) and the discarded ones. We will also perform a sensitivity analysis by reporting the LCMM results at different conformal prediction thresholds (e.g., 80%, 90%, 95% confidence) to demonstrate the robustness of the odds ratios. revision: yes
-
Referee: [Methods] Methods (conformal prediction filtering step): the claim that retaining only high-confidence MOAKS predictions yields unbiased inputs for the LCMM rests on the untested assumption that prediction confidence is independent of factors that also affect pain trajectories; the abstract and results provide no empirical check on this independence.
Authors: We acknowledge the importance of empirically testing this independence assumption. In the revision, we will include an analysis examining the correlation between the conformal prediction scores (or p-values) and key factors influencing pain trajectories, such as baseline WOMAC pain scores, age, and BMI. If no significant associations are found, this will support the validity of the filtering step; otherwise, we will discuss potential implications. revision: yes
Circularity Check
No circularity: DL prediction and LCMM steps remain independent
full rationale
The paper trains a DL model to predict MOAKS features from MRI images, applies conformal prediction to retain only high-confidence outputs, and then feeds those filtered structural labels into a separate LCMM that regresses against external OAI pain measurements. No equation reduces the reported odds ratios to a quantity fitted inside the DL step, no self-citation supplies a load-bearing uniqueness theorem, and the pain outcomes are measured independently of the imaging predictions. The derivation chain therefore contains no self-definitional, fitted-input, or self-citation reductions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption High-confidence MOAKS predictions from the DL model are sufficiently accurate and unbiased for use in LCMM association analysis.
Reference graph
Works this paper leans on
-
[1]
An interpretable AI framework combining deep learning–based MRI Osteoarthritis Knee Score prediction with uncertainty quantification increased prediction accuracy, with Matthews correlation coefficients improving from 0.69 to 0.91 for bone marrow lesions, from 0.45 to 0.80 for cartilage loss, and from 0.59 to 0.89 for meniscal extrusion
-
[2]
Applying uncertainty-based filtering enabled expansion of the analytical data to 2,175 knees, supporting robust longitudinal modeling of structure–pain relationships using four complementary pain measures
-
[3]
Bone marrow lesions, cartilage loss, and meniscal extrusion were significantly associated with rapid pain progression, with odds ratios (95% confidence intervals) of 1.62 (1.12–2.35), 1.83 (1.24–2.70), and 2.50 (1.75–3.57), respectively. Summary Statement This study developed a trustworthy and interpretable AI framework that integrates MRI-based structura...
-
[4]
S. Tang et al. , “Osteoarthritis,” Nat. Rev. Dis. Primer , vol. 11, no. 1, p. 10, Feb. 2025, doi: 10.1038/s41572-025-00594-6. [2] D. Hayashi, F. W. Roemer, and A. Guermazi, “Osteoarthritis year in review 2024: Imaging,” Osteoarthritis Cartilage , vol. 33, no. 1, pp. 88–93, Jan. 2025, doi: 10.1016/j.joca.2024.10.009. [3] H. Harandi, S. Mohammadi, and A. Gu...
-
[5]
M. S. Pedoia V Norman B, Mehany SN, Bucknor MD, Link TM, “3D convolutional neural networks for detection and severity staging of meniscus and PFJ cartilage morphological degenerative changes in osteoarthritis and anterior cruciate ligament subjects,” J Magn Reson Imaging , vol. 49, no. 2, pp. 400–410, 2019. [7] Z. X. Hu J Zheng C, Yu Q, Zhong L, Yu K, Che...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.