Automatic Grading of Individual Knee Osteoarthritis Features in Plain Radiographs using Deep Convolutional Neural Networks

Aleksei Tiulpin; Simo Saarakkala

arxiv: 1907.08020 · v1 · pith:6PVCUJKDnew · submitted 2019-07-18 · 📡 eess.IV · cs.CV· cs.LG

Automatic Grading of Individual Knee Osteoarthritis Features in Plain Radiographs using Deep Convolutional Neural Networks

Aleksei Tiulpin , Simo Saarakkala This is my paper

Pith reviewed 2026-05-24 19:24 UTC · model grok-4.3

classification 📡 eess.IV cs.CVcs.LG

keywords knee osteoarthritisdeep convolutional neural networksKellgren-Lawrence gradeOARSI gradingradiographic assessmentmulti-task learningtransfer learning

0 comments

The pith

An ensemble of deep residual networks predicts KL and OARSI grades for knee osteoarthritis in radiographs with Cohen's kappas of 0.82 and higher.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a multi-task deep learning method to automatically grade both the composite Kellgren-Lawrence score and the individual OARSI features such as osteophytes and joint space narrowing from knee radiographs. It trains an ensemble of 50-layer residual networks on the full OAI dataset using transfer learning and evaluates on the independent MOST dataset. The approach yields strong agreement with expert labels and surpasses prior methods in detecting the presence of radiographic OA. A reader would care because manual grading suffers from only moderate consistency between raters, so reliable automation could standardize assessments used in research and clinical decisions.

Core claim

Our multi-task method based on an ensemble of deep residual networks with squeeze-excitation and ResNeXt blocks yields Cohen's kappa coefficients of 0.82 for KL-grade and 0.79-0.94 for the OARSI features, with an AUC of 0.98 for detecting radiographic OA on the MOST dataset.

What carries the argument

Ensemble of 50-layer residual networks incorporating squeeze-excitation and ResNeXt blocks for simultaneous prediction of KL and multiple OARSI grades.

If this is right

The method provides more consistent grading than typical human readers for both overall severity and specific features.
Radiographic OA can be detected with near-perfect AUC and average precision on held-out data from a different study.
Transfer learning from ImageNet combined with fine-tuning on OAI enables strong performance on MOST without additional adaptation.
Multi-task training allows joint learning of the composite score and the fine-grained features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deployment in clinical workflows could reduce variability in OA severity assessment across different healthcare providers.
Similar multi-task CNN approaches might extend to automated grading in other joint diseases or imaging modalities.
Large epidemiological studies could benefit from using these automated scores to track OA progression at scale.

Load-bearing premise

The labels provided in the OAI and MOST datasets serve as sufficiently accurate ground truth for both training the model and measuring its performance.

What would settle it

Performance measured against a panel of multiple radiologists on a fresh set of radiographs acquired under different conditions, or a substantial drop in accuracy on radiographs from a new population.

Figures

Figures reproduced from arXiv: 1907.08020 by Aleksei Tiulpin, Simo Saarakkala.

**Figure 1.** Figure 1: Examples of knee osteoarthritis features graded according to the Osteoarthritis Research Society (OARSI) grading atlas and Kellgren-Lawrence (KL) grading scale. FL, TL, FM and TM represent the femoral lateral, tibial lateral, femoral medial and tibial medial compartments, respectively. In the subplot (a), a right knee without visual OA-related changes is presented (KL 0, all OARSI grades also zero). In the… view at source ↗

**Figure 2.** Figure 2: Schematic representation of the workflow of our approach. We use transfer learning from ImageNet and train two deep neural network models, average their predictions and predict totally six knee joint radiographic features according to the OARSI grading atlas as well as a the KL grade. OARSI grades for osteophytes in femoral lateral (FL), tibial-lateral (TL), femoral-medial (FM) and tibial-medial (TM) compa… view at source ↗

**Figure 3.** Figure 3: ROC and precision-recall curves demonstrating the performance of detecting the presence of radiographic OA (KL ≥ 2) osteophytes (grade ≥ 1) and joint-space narrowing (grade ≥ 1). could provide better quantitative information for a clinician in a systematic manner. Acknowledgments The OAI is a public-private partnership comprised of five contracts (N01- AR-2-2258; N01-AR-2-2259; N01-AR-2- 2260; N01-AR-2-226… view at source ↗

**Figure 4.** Figure 4: Confusion matrices for the OARSI grades prediction tasks. The subplots (a)-(c) show the matrices for femoral osteophytes (FO), tibial osteophytes (TO) and joint space narrowing (JSN) automatic grading in lateral compartment and the subplots (d)-(f) show the confusion matrices in the same order, but for the lateral compartment. The numbers indicate percentages. References 1. Arden, N. & Nevitt, M. C. Osteoa… view at source ↗

**Figure 1.** Figure 1: Confusion matrix for Kellgren-Lawrence (KL) grading. The numbers indicate percentages. 11/14 [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗

**Figure 2.** Figure 2: Visual representation of lateral OARSI grades distributions in MOST (2a, 2c, 2e) and OAI (2b, 2d, 2f) datasets. 13/14 [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗

**Figure 3.** Figure 3: Visual representation of lateral OARSI grades distributions in MOST (3a, 3c, 3e) and OAI (3b, 3d, 3f) datasets. 14/14 [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

read the original abstract

Knee osteoarthritis (OA) is the most common musculoskeletal disease in the world. In primary healthcare, knee OA is diagnosed using clinical examination and radiographic assessment. Osteoarthritis Research Society International (OARSI) atlas of OA radiographic features allows to perform independent assessment of knee osteophytes, joint space narrowing and other knee features. This provides a fine-grained OA severity assessment of the knee, compared to the gold standard and most commonly used Kellgren-Lawrence (KL) composite score. However, both OARSI and KL grading systems suffer from moderate inter-rater agreement, and therefore, the use of computer-aided methods could help to improve the reliability of the process. In this study, we developed a robust, automatic method to simultaneously predict KL and OARSI grades in knee radiographs. Our method is based on Deep Learning and leverages an ensemble of deep residual networks with 50 layers, squeeze-excitation and ResNeXt blocks. Here, we used transfer learning from ImageNet with a fine-tuning on the whole Osteoarthritis Initiative (OAI) dataset. An independent testing of our model was performed on the whole Multicenter Osteoarthritis Study (MOST) dataset. Our multi-task method yielded Cohen's kappa coefficients of 0.82 for KL-grade and 0.79, 0.84, 0.94, 0.83, 0.84, 0.90 for femoral osteophytes, tibial osteophytes and joint space narrowing for lateral and medial compartments respectively. Furthermore, our method yielded area under the ROC curve of 0.98 and average precision of 0.98 for detecting the presence of radiographic OA (KL $\geq 2$), which is better than the current state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Solid external validation on MOST after OAI training for multi-task KL and OARSI grading, but the moderate inter-rater label noise caps how much the kappas can be read as real reliability gains.

read the letter

The paper trains an ensemble of ResNet50 variants with squeeze-excitation and ResNeXt blocks on the full OAI dataset using transfer learning from ImageNet, then evaluates on the entire independent MOST cohort. It reports Cohen’s kappas of 0.82 for KL grade and 0.79–0.94 across the OARSI features, plus AUC 0.98 and AP 0.98 for radiographic OA detection, claiming improvement over prior work on the detection task. The external test set and simultaneous prediction of multiple features are the clearest strengths; many similar papers stop at internal cross-validation, so this setup gives more credible numbers. The multi-task framing also makes sense for a single model that outputs both the composite KL score and the finer OARSI details. The methods are standard but executed on large, well-known cohorts, which is useful for benchmarking. The soft spot is the ground-truth labels. The abstract notes that both KL and OARSI systems have only moderate inter-rater agreement, yet all metrics are computed against single-rater annotations. Without a multi-rater consensus on the test set or a direct comparison of model disagreement to human disagreement rates, it is difficult to tell whether the model exceeds the label noise ceiling or simply reproduces it. The paper acknowledges the reliability problem but does not appear to solve it in the reported results. Training details such as exact hyperparameter search and ensemble construction are not visible in the abstract, though the overall pipeline looks reproducible from the description. This work is mainly for groups already building or evaluating automated OA grading tools and for readers who need concrete performance numbers on OAI/MOST. It is not a methodological advance, but the external validation and multi-task results give it enough substance to deserve referee time. A reviewer could usefully press on the label-noise question and ask for human-agreement baselines. I would send it to review rather than desk-reject.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a multi-task deep learning method based on an ensemble of 50-layer residual networks incorporating squeeze-excitation and ResNeXt blocks. The model is trained via ImageNet transfer learning followed by fine-tuning on the full Osteoarthritis Initiative (OAI) dataset and evaluated on the independent Multicenter Osteoarthritis Study (MOST) dataset. It reports Cohen's kappa of 0.82 for KL-grade, kappas of 0.79–0.94 for OARSI femoral/tibial osteophytes and joint-space narrowing (lateral/medial), and AUC 0.98 / average precision 0.98 for detecting radiographic OA (KL ≥ 2), stated to exceed current state-of-the-art.

Significance. If the results hold after addressing label-noise concerns, the work supplies concrete evidence that CNN ensembles can achieve high numerical agreement with single-rater labels on an external test set for both composite KL grading and fine-grained OARSI features. The independent MOST evaluation and multi-task formulation are clear strengths that support reproducibility and practical utility claims.

major comments (2)

[Abstract] Abstract: the reported Cohen's kappas (0.82 KL; 0.79–0.94 OARSI) and AUC 0.98 are measured exclusively against single-rater labels; the abstract itself states that both KL and OARSI systems have only moderate inter-rater agreement, yet no section quantifies whether the model metrics exceed typical human inter-rater kappa or were validated against multi-rater consensus on MOST. This directly limits interpretation of the central performance claims.
[Abstract] Abstract (and Results): the assertion that AUC 0.98 and AP 0.98 are 'better than the current state-of-the-art' is presented without naming the specific prior methods, their reported numbers, or the exact evaluation protocol on MOST, making the comparative claim impossible to verify from the given information.

minor comments (1)

[Abstract] Abstract: training hyperparameters, ensemble size, and any overfitting controls are omitted, which would aid assessment of robustness even if full details appear later in the manuscript.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which help clarify the interpretation of our results. We respond to each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the reported Cohen's kappas (0.82 KL; 0.79–0.94 OARSI) and AUC 0.98 are measured exclusively against single-rater labels; the abstract itself states that both KL and OARSI systems have only moderate inter-rater agreement, yet no section quantifies whether the model metrics exceed typical human inter-rater kappa or were validated against multi-rater consensus on MOST. This directly limits interpretation of the central performance claims.

Authors: We agree that all reported metrics reflect agreement with single-rater labels on MOST, which is the standard evaluation setting for this scale of external validation. The manuscript already notes the moderate inter-rater reliability of both grading systems. Because multi-rater consensus labels are not available for the full MOST cohort, we cannot directly demonstrate that model performance exceeds human inter-rater agreement on this specific test set. We will revise the abstract and add a short paragraph in the Discussion to (i) explicitly state that metrics are versus single-rater labels and (ii) cite representative inter-rater kappa ranges from the literature for context. revision: partial
Referee: [Abstract] Abstract (and Results): the assertion that AUC 0.98 and AP 0.98 are 'better than the current state-of-the-art' is presented without naming the specific prior methods, their reported numbers, or the exact evaluation protocol on MOST, making the comparative claim impossible to verify from the given information.

Authors: We accept that the abstract claim requires explicit references to be verifiable. The full manuscript contains comparisons to prior CNN-based OA grading studies, but the abstract does not name them. We will revise the abstract to list the key prior works, their reported AUC/AP values, and the datasets/protocols used, thereby making the state-of-the-art comparison self-contained and transparent. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical evaluation on held-out data

full rationale

The paper trains an ensemble of ResNet-based models via transfer learning on the OAI dataset and reports Cohen's kappa, AUC, and average precision on the independent MOST test set against the provided single-rater labels. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the derivation of the reported metrics. The evaluation is a standard held-out performance measurement against external benchmarks and is therefore self-contained.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

As an applied machine learning paper, it relies on standard assumptions in deep learning and the quality of the provided datasets rather than new physical axioms or invented entities.

free parameters (2)

Learning rate and other training hyperparameters
Chosen during fine-tuning to achieve reported performance.
Ensemble configuration
Number and combination of networks in the ensemble.

axioms (2)

domain assumption Pretraining on ImageNet transfers useful features to radiographic images
The method relies on transfer learning from ImageNet.
domain assumption The OAI and MOST datasets provide representative samples for training and testing
Used for training and independent testing.

pith-pipeline@v0.9.0 · 5859 in / 1380 out tokens · 60378 ms · 2026-05-24T19:24:18.237041+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 5 internal anchors

[1]

& Nevitt, M

Arden, N. & Nevitt, M. C. Osteoarthritis: epidemiology. Best practice & research Clin. rheumatology 20, 3–25 (2006)

work page 2006
[2]

Cross, M. et al. The global burden of hip and knee osteoarthritis: estimates from the global burden of disease 2010 study. Annals rheumatic diseases 73, 1323–1330 (2014)

work page 2010
[3]

E., Lombard, C

Wluka, A. E., Lombard, C. B. & Cicuttini, F. M. Tackling obesity in knee osteoarthritis. Nat. Rev. Rheumatol. 9, 225 (2013)

work page 2013
[4]

& Saarakkala, S

Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P. & Saarakkala, S. Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Sci. reports 8, 1727 (2018)

work page 2018
[5]

& Lawrence, J

Kellgren, J. & Lawrence, J. Radiological assessment of osteo-arthrosis. Annals rheumatic diseases 16, 494 (1957)

work page 1957
[6]

Altman, R. D. & Gold, G. Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthr. cartilage 15, A1–A56 (2007)

work page 2007
[7]

Esteva, A. et al. A guide to deep learning in healthcare. Nat. medicine 25, 24 (2019). 8/14

work page 2019
[8]

Pedoia, V .et al. 3d convolutional neural networks for detection and severity staging of meniscus and pfj cartilage morphological degenerative changes in osteoarthritis and anterior cruciate ligament subjects. J. Magn. Reson. Imaging 49, 400–410 (2019)

work page 2019
[9]

& Majumdar, S

Norman, B., Pedoia, V . & Majumdar, S. Use of 2d u-net convolutional neural networks for automated cartilage and meniscus segmentation of knee mr imaging data to determine relaxometry and morphometry. Radiology 288, 177–185 (2018)

work page 2018
[10]

Tiulpin, A., Finnil¨a, M., Lehenkari, P., Nieminen, H. J. & Saarakkala, S. Deep-learning for tidemark segmentation in human osteochondral tissues imaged with micro-computed tomography. arXiv preprint arXiv:1907.05089 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907
[11]

Tiulpin, A. et al. Multimodal machine learning-based knee osteoarthritis progression prediction from plain radiographs and clinical data. arXiv preprint arXiv:1904.06236 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904
[12]

& O’Connor, N

Antony, J., McGuinness, K., Moran, K. & O’Connor, N. E. Automatic detection of knee joints and quantiﬁcation of knee osteoarthritis severity using convolutional neural networks. In International conference on machine learning and data mining in pattern recognition, 376–390 (Springer, 2017)

work page 2017
[13]

Norman, B., Pedoia, V ., Noworolski, A., Link, T. M. & Majumdar, S. Applying densely connected convolutional neural networks for staging osteoarthritis severity from plain radiographs. J. digital imaging 1–7 (2018)

work page 2018
[14]

& Jiang, T

Xue, Y ., Zhang, R., Deng, Y ., Chen, K. & Jiang, T. A preliminary examination of the diagnostic value of deep learning in hip osteoarthritis. PloS one 12, e0178992 (2017)

work page 2017
[15]

Oka, H. et al. Normal and threshold values of radiographic parameters for knee osteoarthritis using a computer-assisted measuring system (koacad): the road study. J. Orthop. Sci. 15, 781–789 (2010)

work page 2010
[16]

& Cootes, T

Thomson, J., O’Neill, T., Felson, D. & Cootes, T. Detecting osteophytes in radiographs of the knee to diagnose osteoarthritis. In International Workshop on Machine Learning in Medical Imaging, 45–52 (Springer, 2016)

work page 2016
[17]

Antony, A. J. Automatic quantiﬁcation of radiographic knee osteoarthritis severity and associated diagnostic features using deep convolutional neural networks. Ph.D. thesis, Dublin City University (2018)

work page 2018
[18]

Antony, J., McGuinness, K., O’Connor, N. E. & Moran, K. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), 1195–1200 (IEEE, 2016)

work page 2016
[19]

& Sun, G

Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141 (2018)

work page 2018
[20]

Xie, S., Girshick, R., Doll ´ar, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500 (2017)

work page 2017
[21]

Lindner, C. et al. Fully automatic segmentation of the proximal femur using random forest regression voting. IEEE transactions on medical imaging 32, 1462–1472 (2013)

work page 2013
[22]

Shin, H.-C. et al. Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteris- tics and transfer learning. IEEE transactions on medical imaging 35, 1285–1298 (2016)

work page 2016
[23]

Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09 (2009)

work page 2009
[24]

Kothari, M. et al. Fixed-ﬂexion radiography of the knee provides reproducible joint space width measurements in osteoarthritis. Eur. radiology 14, 1568–1573 (2004)

work page 2004
[25]

& Saarakkala, S

Tiulpin, A., Thevenot, J., Rahtu, E. & Saarakkala, S. A novel method for automatic localization of joint area on knee plain radiographs. In Scandinavian Conference on Image Analysis, 290–301 (Springer, 2017)

work page 2017
[26]

& Sun, J

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)

work page 2016
[27]

Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification

Qiu, S. Global weighted average pooling bridges pixel-level localization and image-level classiﬁcation.arXiv preprint arXiv:1809.08264 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[28]

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014
[29]

Solt: Streaming over lightweight transformations

Tiulpin, A. Solt: Streaming over lightweight transformations. https://github.com/MIPT-Oulu/solt (2019)

work page 2019
[30]

Paszke, A. et al. Automatic differentiation in pytorch. In NIPS-W (2017)

work page 2017
[31]

L., Jiranek, W

Riddle, D. L., Jiranek, W. A. & Hull, J. R. Validity and reliability of radiographic knee osteoarthritis measures by arthroplasty surgeons. Orthopedics 36, e25–e32 (2013). 9/14

work page 2013
[32]

Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. The Royal Soc. Interface 15, 20170387 (2018)

work page 2018
[33]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). 10/14 Supplementary data 0 1 2 3 4 Predicted 0 1 2 3 4 True 62.95 12.82 24.21 0.02 0.00 8.08 11.02 77.32 3.58 0.00 0.30 0.35 79.77 19.59 0.00 0.00 0.34 3.98 84.76 10.92 0.00 0.00 0.10 5.02 94.88 Figure 1. Confusion matrix for Kellgren-L...

work page internal anchor Pith review Pith/arXiv arXiv 2015

[1] [1]

& Nevitt, M

Arden, N. & Nevitt, M. C. Osteoarthritis: epidemiology. Best practice & research Clin. rheumatology 20, 3–25 (2006)

work page 2006

[2] [2]

Cross, M. et al. The global burden of hip and knee osteoarthritis: estimates from the global burden of disease 2010 study. Annals rheumatic diseases 73, 1323–1330 (2014)

work page 2010

[3] [3]

E., Lombard, C

Wluka, A. E., Lombard, C. B. & Cicuttini, F. M. Tackling obesity in knee osteoarthritis. Nat. Rev. Rheumatol. 9, 225 (2013)

work page 2013

[4] [4]

& Saarakkala, S

Tiulpin, A., Thevenot, J., Rahtu, E., Lehenkari, P. & Saarakkala, S. Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Sci. reports 8, 1727 (2018)

work page 2018

[5] [5]

& Lawrence, J

Kellgren, J. & Lawrence, J. Radiological assessment of osteo-arthrosis. Annals rheumatic diseases 16, 494 (1957)

work page 1957

[6] [6]

Altman, R. D. & Gold, G. Atlas of individual radiographic features in osteoarthritis, revised. Osteoarthr. cartilage 15, A1–A56 (2007)

work page 2007

[7] [7]

Esteva, A. et al. A guide to deep learning in healthcare. Nat. medicine 25, 24 (2019). 8/14

work page 2019

[8] [8]

Pedoia, V .et al. 3d convolutional neural networks for detection and severity staging of meniscus and pfj cartilage morphological degenerative changes in osteoarthritis and anterior cruciate ligament subjects. J. Magn. Reson. Imaging 49, 400–410 (2019)

work page 2019

[9] [9]

& Majumdar, S

Norman, B., Pedoia, V . & Majumdar, S. Use of 2d u-net convolutional neural networks for automated cartilage and meniscus segmentation of knee mr imaging data to determine relaxometry and morphometry. Radiology 288, 177–185 (2018)

work page 2018

[10] [10]

Tiulpin, A., Finnil¨a, M., Lehenkari, P., Nieminen, H. J. & Saarakkala, S. Deep-learning for tidemark segmentation in human osteochondral tissues imaged with micro-computed tomography. arXiv preprint arXiv:1907.05089 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1907

[11] [11]

Tiulpin, A. et al. Multimodal machine learning-based knee osteoarthritis progression prediction from plain radiographs and clinical data. arXiv preprint arXiv:1904.06236 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904

[12] [12]

& O’Connor, N

Antony, J., McGuinness, K., Moran, K. & O’Connor, N. E. Automatic detection of knee joints and quantiﬁcation of knee osteoarthritis severity using convolutional neural networks. In International conference on machine learning and data mining in pattern recognition, 376–390 (Springer, 2017)

work page 2017

[13] [13]

Norman, B., Pedoia, V ., Noworolski, A., Link, T. M. & Majumdar, S. Applying densely connected convolutional neural networks for staging osteoarthritis severity from plain radiographs. J. digital imaging 1–7 (2018)

work page 2018

[14] [14]

& Jiang, T

Xue, Y ., Zhang, R., Deng, Y ., Chen, K. & Jiang, T. A preliminary examination of the diagnostic value of deep learning in hip osteoarthritis. PloS one 12, e0178992 (2017)

work page 2017

[15] [15]

Oka, H. et al. Normal and threshold values of radiographic parameters for knee osteoarthritis using a computer-assisted measuring system (koacad): the road study. J. Orthop. Sci. 15, 781–789 (2010)

work page 2010

[16] [16]

& Cootes, T

Thomson, J., O’Neill, T., Felson, D. & Cootes, T. Detecting osteophytes in radiographs of the knee to diagnose osteoarthritis. In International Workshop on Machine Learning in Medical Imaging, 45–52 (Springer, 2016)

work page 2016

[17] [17]

Antony, A. J. Automatic quantiﬁcation of radiographic knee osteoarthritis severity and associated diagnostic features using deep convolutional neural networks. Ph.D. thesis, Dublin City University (2018)

work page 2018

[18] [18]

Antony, J., McGuinness, K., O’Connor, N. E. & Moran, K. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), 1195–1200 (IEEE, 2016)

work page 2016

[19] [19]

& Sun, G

Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7132–7141 (2018)

work page 2018

[20] [20]

Xie, S., Girshick, R., Doll ´ar, P., Tu, Z. & He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492–1500 (2017)

work page 2017

[21] [21]

Lindner, C. et al. Fully automatic segmentation of the proximal femur using random forest regression voting. IEEE transactions on medical imaging 32, 1462–1472 (2013)

work page 2013

[22] [22]

Shin, H.-C. et al. Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteris- tics and transfer learning. IEEE transactions on medical imaging 35, 1285–1298 (2016)

work page 2016

[23] [23]

Deng, J. et al. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09 (2009)

work page 2009

[24] [24]

Kothari, M. et al. Fixed-ﬂexion radiography of the knee provides reproducible joint space width measurements in osteoarthritis. Eur. radiology 14, 1568–1573 (2004)

work page 2004

[25] [25]

& Saarakkala, S

Tiulpin, A., Thevenot, J., Rahtu, E. & Saarakkala, S. A novel method for automatic localization of joint area on knee plain radiographs. In Scandinavian Conference on Image Analysis, 290–301 (Springer, 2017)

work page 2017

[26] [26]

& Sun, J

He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016)

work page 2016

[27] [27]

Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification

Qiu, S. Global weighted average pooling bridges pixel-level localization and image-level classiﬁcation.arXiv preprint arXiv:1809.08264 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[28] [28]

Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

work page internal anchor Pith review Pith/arXiv arXiv 2014

[29] [29]

Solt: Streaming over lightweight transformations

Tiulpin, A. Solt: Streaming over lightweight transformations. https://github.com/MIPT-Oulu/solt (2019)

work page 2019

[30] [30]

Paszke, A. et al. Automatic differentiation in pytorch. In NIPS-W (2017)

work page 2017

[31] [31]

L., Jiranek, W

Riddle, D. L., Jiranek, W. A. & Hull, J. R. Validity and reliability of radiographic knee osteoarthritis measures by arthroplasty surgeons. Orthopedics 36, e25–e32 (2013). 9/14

work page 2013

[32] [32]

Ching, T. et al. Opportunities and obstacles for deep learning in biology and medicine. J. The Royal Soc. Interface 15, 20170387 (2018)

work page 2018

[33] [33]

Distilling the Knowledge in a Neural Network

Hinton, G., Vinyals, O. & Dean, J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015). 10/14 Supplementary data 0 1 2 3 4 Predicted 0 1 2 3 4 True 62.95 12.82 24.21 0.02 0.00 8.08 11.02 77.32 3.58 0.00 0.30 0.35 79.77 19.59 0.00 0.00 0.34 3.98 84.76 10.92 0.00 0.00 0.10 5.02 94.88 Figure 1. Confusion matrix for Kellgren-L...

work page internal anchor Pith review Pith/arXiv arXiv 2015