pith. sign in

arxiv: 2605.28176 · v1 · pith:GLJJL7DXnew · submitted 2026-05-27 · 💻 cs.CV

From Kellgren-Lawrence to Calcium Pyrophosphate Crystal Deposition: A Soft-Labelling Framework for Knee Osteoarthritis Assessmen

Pith reviewed 2026-06-29 13:07 UTC · model grok-4.3

classification 💻 cs.CV
keywords knee osteoarthritissoft labellingdeep learningX-ray gradingKellgren-LawrenceCPPDordinal classificationmedical imaging
0
0 comments X

The pith

Soft-labelling with unimodal distributions improves ordinal grading of knee osteoarthritis on X-rays over one-hot labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an ordinal deep learning framework that replaces one-hot labels with soft unimodal probability distributions for grading knee X-rays on both the Kellgren-Lawrence and CPPD scales. It tests four formulations—binomial, beta, triangular, and exponential—on a dataset of 2172 images including 968 jointly annotated for both tasks. All strategies outperformed the conventional one-hot baseline, with the triangular formulation reaching the highest QWK of 0.796 and lowest MAE of 0.438 on CPPD grading, and the beta formulation reaching QWK of 0.777 and MAE of 0.529 on KL grading. A sympathetic reader would care because this addresses the mismatch between standard classification losses and the ordinal, uncertain nature of clinical severity scores while respecting the observed asymmetry between the two scales.

Core claim

The central claim is that an ordinal DL framework based on soft-labelling, replacing one-hot targets with unimodal probability distributions centred on the annotated grade, consistently outperforms nominal one-hot supervision for both KL and CPPD grading tasks. Specifically, the triangular formulation achieved the highest QWK and lowest MAE for CPPD (QWK = 0.796; MAE = 0.438), while the beta-based approach provided the best overall performance for KL (QWK = 0.777; MAE = 0.529; AMAE = 0.523; MMAE = 0.775), with all soft-labelling strategies demonstrating statistically significant improvements over the baseline (p < 0.001).

What carries the argument

Soft-labelling via unimodal probability distributions (binomial, beta, triangular, exponential) centred on the annotated grade, used as targets instead of one-hot vectors.

If this is right

  • All four soft-labelling strategies improve Quadratic Weighted Kappa and reduce Mean Absolute Error compared to one-hot labels on both grading tasks.
  • The triangular formulation yields the best overall metrics for CPPD grading.
  • The beta formulation yields the best overall metrics for KL grading, including lowest class-wise errors.
  • The performance gains are statistically significant at p < 0.001 across the 2172-image dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may extend to other ordinal scoring tasks in radiology where annotation uncertainty is high.
  • Joint modelling of KL and CPPD could further exploit the asymmetric clinical relationship if the framework is adapted to multi-task training.
  • If the unimodal assumption holds across datasets, the method could lower sensitivity to inter-rater variability in clinical annotations.

Load-bearing premise

Unimodal probability distributions centred on the annotated grade accurately capture both the ordinal uncertainty of the scores and the asymmetric clinical relationship between the KL and CPPD scales.

What would settle it

A replication study on an independent set of knee X-rays where the soft-labelling models fail to achieve higher QWK or lower MAE than the one-hot baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.28176 by C\'esar Herv\'as-Mart\'inez, Edoardo Cipolletta, Emilio Filippucci, Francisco B\'erchez-Moreno, Luca Romeo, Maria Chiara Fiorentino, Pedro A. Guti\'errez, Riccardo Rosati, V\'ictor M. Vargas.

Figure 1
Figure 1. Figure 1: Knee radiographs illustrating the complexity of jointly assessing Osteoarthritis [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Representative examples of the different Kellgren–Lawrence (KL) and Calcium [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Schematic representation of the proposed ordinal deep learning framework for [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Example of the binomial (a), exponential (b), beta (c) and triangular (d) discrete [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Combined violin and box plots of the AMAE distributions for all methodologies. [PITH_FULL_IMAGE:figures/full_fig_p022_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Confusion matrices comparing the best-performing soft-labelling models for each [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Grad-CAM visualisations across all severity grades for the nominal baseline and [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Residuals with the difference of the average obtained contingency tables and [PITH_FULL_IMAGE:figures/full_fig_p026_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Grad-CAM analysis of model robustness under co-occurring pathologies. The [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗
read the original abstract

Background and objective. Conventional Deep Learning (DL) approaches for Knee Osteoarthritis (KOA) grading rely on one-hot labels, which fail to capture both the ordinal uncertainty of Kellgren--Lawrence (KL) and Calcium Pyrophosphate Deposition Disease (CPPD) severity scores and the asymmetric relationship between the two scales observed in clinical practice. Methods. We retrospectively collected 2172 knee X-ray images, including 968 radiographs jointly annotated for KL and CPPD severity. An ordinal DL framework based on soft-labelling was developed for both tasks, replacing one-hot targets with unimodal probability distributions centred on the annotated grade. Four formulations were investigated: binomial, beta, triangular, and exponential. Results. All soft-labelling strategies consistently outperformed the nominal baseline. For CPPD grading, the triangular formulation achieved the highest Quadratic Weighted Kappa (QWK) and the lowest Mean Absolute Error (MAE) (QWK = 0.796; MAE = 0.438), while the beta formulation yielded the most balanced class-wise performance considering Average MAE (AMAE) and Maximum MAE (MMAE) across classes (AMAE = 0.458; MMAE = 0.573). For KL grading, the beta-based approach provided the best overall performance, achieving the highest QWK together with the lowest MAE and class-wise errors (QWK = 0.777; MAE = 0.529; AMAE = 0.523; MMAE = 0.775). Statistical analysis demonstrated significant improvements over conventional one-hot supervision (p < 0.001).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes a soft-labelling framework for ordinal grading of knee osteoarthritis on X-rays, replacing one-hot targets with four fixed unimodal distributions (binomial, beta, triangular, exponential) centered on the annotated KL or CPPD grade. Using 2172 images (968 jointly annotated), it reports that all soft-labelling variants outperform the one-hot baseline on QWK and MAE for both tasks, with the triangular distribution best for CPPD (QWK=0.796, MAE=0.438) and beta best for KL (QWK=0.777, MAE=0.529), all with p<0.001.

Significance. If the gains arise from faithful modeling of ordinal uncertainty rather than generic regularization, the approach could improve robustness in medical ordinal classification tasks. The use of jointly annotated cases and multiple distribution families is a positive empirical step, but the heuristic nature of the labels limits claims about capturing clinical asymmetry or uncertainty.

major comments (3)
  1. [Methods] Methods (soft-labelling section): The four distributions are defined with fixed, hand-chosen parameters and applied independently to KL and CPPD; no derivation from inter-rater agreement data, longitudinal progression, or joint KL-CPPD statistics is provided, so the claim that they capture 'ordinal uncertainty' and 'asymmetric relationship' rests on an untested assumption.
  2. [Results] Results and data description: The 968 jointly annotated radiographs are used only for separate per-task training; no joint model, cross-task loss, or analysis of KL-CPPD co-occurrence is presented, leaving the background claim of asymmetry unaddressed by the experiments.
  3. [Results] Evaluation: Performance improvements are reported on QWK/MAE but no ablation compares the chosen unimodal forms against alternatives (e.g., learned label smoothing or empirical inter-rater distributions), so it is unclear whether gains exceed what standard regularization would achieve.
minor comments (2)
  1. [Abstract] Abstract: The statistical test yielding p<0.001 is not named (paired t-test, Wilcoxon, etc.), and the exact data splits or cross-validation scheme are not summarized.
  2. [Methods] Notation: The precise functional forms and parameter values for the beta, triangular, and exponential distributions should be given explicitly (e.g., as equations) rather than described only qualitatively.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below, indicating revisions where the manuscript will be updated to clarify claims and strengthen the evaluation.

read point-by-point responses
  1. Referee: [Methods] Methods (soft-labelling section): The four distributions are defined with fixed, hand-chosen parameters and applied independently to KL and CPPD; no derivation from inter-rater agreement data, longitudinal progression, or joint KL-CPPD statistics is provided, so the claim that they capture 'ordinal uncertainty' and 'asymmetric relationship' rests on an untested assumption.

    Authors: We agree that the parameters were selected heuristically based on general ordinal properties rather than derived from inter-rater statistics, longitudinal data, or joint KL-CPPD co-occurrence in this dataset. The claims in the introduction regarding capturing ordinal uncertainty and asymmetry therefore rest on the suitability of unimodal distributions rather than empirical derivation. In the revised manuscript we will qualify these claims in the methods, introduction, and discussion, add a dedicated limitations paragraph, and emphasize that the contribution is the empirical demonstration of performance gains over one-hot labels. revision: yes

  2. Referee: [Results] Results and data description: The 968 jointly annotated radiographs are used only for separate per-task training; no joint model, cross-task loss, or analysis of KL-CPPD co-occurrence is presented, leaving the background claim of asymmetry unaddressed by the experiments.

    Authors: The jointly annotated cases were used exclusively for separate per-task training and evaluation. No joint model, cross-task loss, or co-occurrence analysis was performed, as the study scope was limited to validating soft-labelling for each grading task independently. The background reference to asymmetry draws from clinical literature rather than our results. We will revise the manuscript to remove any implication that the experiments address asymmetry and will add a future-work statement on multi-task or joint modeling. revision: yes

  3. Referee: [Results] Evaluation: Performance improvements are reported on QWK/MAE but no ablation compares the chosen unimodal forms against alternatives (e.g., learned label smoothing or empirical inter-rater distributions), so it is unclear whether gains exceed what standard regularization would achieve.

    Authors: We acknowledge that the current evaluation lacks ablations against other regularization strategies. We will add an ablation study comparing the four soft-labelling distributions against standard label smoothing (multiple epsilon values) using the same backbone and metrics. The revised results section will report these comparisons to clarify whether the unimodal forms provide benefits beyond generic smoothing. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of fixed heuristic label encodings on collected data

full rationale

The paper performs an empirical study: 2172 radiographs (968 jointly annotated) are used to train ordinal DL models under one-hot vs. four fixed unimodal soft-label distributions (binomial, beta, triangular, exponential) centered on the annotated grade. Performance is measured by QWK, MAE, AMAE, MMAE with statistical tests. No equations derive a target quantity from fitted parameters within the paper; the distributions are chosen as alternative encodings rather than learned or self-referential. No self-citation chain, uniqueness theorem, or ansatz smuggling supports a central claim. The work is self-contained against external benchmarks (held-out test performance) and does not reduce any reported result to a quantity defined by its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review based solely on abstract; no explicit free parameters, invented entities, or additional axioms described beyond standard assumptions of supervised learning on annotated medical images.

axioms (1)
  • domain assumption The 968 jointly annotated radiographs provide accurate ground-truth grades that reflect clinical severity and the observed asymmetry between scales.
    Framework trains and evaluates directly against these annotations as targets.

pith-pipeline@v0.9.1-grok · 5880 in / 1091 out tokens · 48289 ms · 2026-06-29T13:07:03.704290+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 46 canonical work pages

  1. [1]

    K. D. Allen, L. Thoma, Y. Golightly, Epidemiology of osteoarthritis, Osteoarthritis and cartilage 30 (2) (2022) 184–195.doi:10.1016/j. joca.2021.04.020

  2. [2]

    Sharma, Osteoarthritis of the knee, New England Journal of Medicine 384 (1) (2021) 51–59.doi:10.1056/NEJMcp1903768

    L. Sharma, Osteoarthritis of the knee, New England Journal of Medicine 384 (1) (2021) 51–59.doi:10.1056/NEJMcp1903768

  3. [3]

    Sakellariou, P

    G. Sakellariou, P. G. Conaghan, W. Zhang, J. W. Bijlsma, P. Boye- sen, M. A. D’Agostino, M. Doherty, D. Fodor, M. Kloppenburg, F. Miese, et al., Eular recommendations for the use of imaging in the clinical management of peripheral joint osteoarthritis, Annals of the rheumatic diseases 76 (9) (2017) 1484–1494.doi:10.1136/ annrheumdis-2016-210815

  4. [4]

    M. D. Kohn, A. A. Sassoon, N. D. Fernando, Classifications in brief: Kellgren-lawrence classification of osteoarthritis, Clinical Orthopaedics and Related Research®474 (8) (2016) 1886–1893.doi:10.1007/ s11999-016-4732-4

  5. [5]

    Filippou, E

    G. Filippou, E. Filippucci, P. Mandl, A. Abhishek, A critical review of the available evidence on the diagnosis and clinical features of cppd: do we really need imaging?, Clinical rheumatology 40 (7) (2021) 2581–2592. doi:10.1007/s10067-020-05516-3

  6. [6]

    Q. D. Buchlak, J. Clair, N. Esmaili, A. Barmare, S. Chandrasekaran, Clinical outcomes associated with robotic and computer-navigated total knee arthroplasty: a machine learning-augmented systematic review, European Journal of Orthopaedic Surgery & Traumatology 32 (5) (2022) 915–931.doi:10.1007/s00590-021-03059-0

  7. [7]

    Y. X. Teoh, A. Othmani, K. W. Lai, S. L. Goh, J. Usman, Stratify- ing knee osteoarthritis features through multitask deep hybrid learning: data from the osteoarthritis initiative, Computer methods and programs in biomedicine 242 (2023) 107807.doi:10.1016/j.cmpb.2023.107807. 33

  8. [8]

    S. M. Ahmed, R. J. Mstafa, A comprehensive survey on bone seg- mentation techniques in knee osteoarthritis research: From conven- tional methods to deep learning, Diagnostics 12 (3) (2022) 611.doi: 10.3390/diagnostics12030611

  9. [9]

    L. Si, J. Zhong, J. Huo, K. Xuan, Z. Zhuang, Y. Hu, Q. Wang, H. Zhang, W. Yao, Deep learning in knee imaging: a systematic re- view utilizing a checklist for artificial intelligence in medical imaging (claim), European Radiology 32 (2) (2022) 1353–1361.doi:10.1007/ s00330-021-08190-4

  10. [10]

    W. Lv, J. Peng, J. Hu, Y. Lu, Z. Zhou, H. Xu, K. Xing, X. Zhang, L. Lu, Lmsst-gcn: Longitudinal mri sub-structural texture guided graph convo- lution network for improved progression prediction of knee osteoarthri- tis, ComputerMethodsandProgramsinBiomedicine261(2025)108600. doi:10.1016/j.cmpb.2025.108600

  11. [11]

    Hinterwimmer, I

    F. Hinterwimmer, I. Lazic, C. Suren, M. T. Hirschmann, F. Pohlig, D. Rueckert, R. Burgkart, R. von Eisenhart-Rothe, Machine learning in knee arthroplasty: specific data are key—a systematic review, Knee Surgery, Sports Traumatology, Arthroscopy 30 (2) (2022) 376–388.doi: 10.1007/s00167-021-06848-6

  12. [12]

    P. Chen, L. Gao, X. Shi, K. Allen, L. Yang, Fully automatic knee os- teoarthritis severity grading using deep neural networks with a novel ordinal loss, Computerized Medical Imaging and Graphics 75 (2019) 84–92.doi:10.1016/j.compmedimag.2019.06.002

  13. [13]

    C. W. Yong, K. Teo, B. P. Murphy, Y. C. Hum, Y. K. Tee, K. Xia, K. W. Lai, Knee osteoarthritis severity classification with ordinal regression module, MultimediaToolsandApplications81(29)(2022)41497–41509. doi:10.1007/s11042-021-10557-0

  14. [14]

    Kokkotis, S

    C. Kokkotis, S. Moustakidis, E. Papageorgiou, G. Giakas, D. Tsaopou- los, Machine learning in knee osteoarthritis: A review, Osteoarthritis andCartilageOpen2(3)(2020)100069.doi:10.1016/j.ocarto.2020. 100069

  15. [15]

    Upadhyay, O

    A. Upadhyay, O. Sawant, P. Choudhary, Detection of knee osteoarthritis stages using convolutional neural network, SN Computer Science 4 (3) (2023) 257.doi:10.1007/s42979-022-01644-6. 34

  16. [16]

    Y. Wang, S. Li, B. Zhao, J. Zhang, Y. Yang, B. Li, A resnet-based approach for accurate radiographic diagnosis of knee osteoarthritis, CAAI Transactions on Intelligence Technology 7 (3) (2022) 512–521. doi:10.1049/cit2.12079

  17. [17]

    M. W. Brejnebøl, P. Hansen, J. U. Nybing, R. Bachmann, U. Ratjen, I. V. Hansen, A. Lenskjold, M. Axelsen, M. Lundemann, M. Boesen, External validation of an artificial intelligence tool for radiographic knee osteoarthritis severity classification, European Journal of Radiology 150 (2022) 110249.doi:10.1016/j.ejrad.2022.110249

  18. [18]

    S. V. Chaugule, V. Malemath, Knee osteoarthritis grading using densenet and radiographic images, SN Computer Science 4 (1) (2022) 63.doi:10.1007/s42979-022-01468-4

  19. [19]

    Kalpana, G

    V. Kalpana, G. H. Kumar, et al., Evaluating the efficacy of deep learn- ing models for knee osteoarthritis prediction based on kellgren-lawrence grading system, e-Prime-Advances in Electrical Engineering, Electronics and Energy 5 (2023) 100266.doi:10.1016/j.prime.2023.100266

  20. [20]

    Jahan, M

    M. Jahan, M. Z. Hasan, I. J. Samia, K. Fatema, M. A. H. Rony, M. S. Arefin, A. Moustafa, Koa-cctnet: An enhanced knee osteoarthri- tis grade assessment framework using modified compact convolutional transformer model, IEEE Access 12 (2024) 107719–107741.doi:10. 1109/ACCESS.2024.3435572

  21. [21]

    Maqsood, N

    S. Maqsood, N. Maqsood, S. Shahid, F. E. Subhan, M. A. Sarwar, M. Yousufi, A. Qurthobi, A. Zafar, M. A. Khan, R. Damaševičius, et al., Knee osteoarthritis network: A hybrid transformer-based ap- proach for enhanced detection and grading of knee osteoarthritis, Engi- neering Applications of Artificial Intelligence 159 (2025) 111751.doi: 10.1016/j.engappai....

  22. [22]

    Albuquerque, R

    T. Albuquerque, R. Cruz, J. S. Cardoso, Ordinal losses for classification of cervical cancer risk, PeerJ Computer Science 7 (2021) e457.doi: 10.7717/peerj-cs.457

  23. [23]

    T. T. Le Vuong, K. Kim, B. Song, J. T. Kwak, Joint categorical and ordinal learning for cancer grading in pathology images, Medical image analysis 73 (2021) 102206.doi:10.1016/j.media.2021.102206. 35

  24. [24]

    L. Wang, H. Wang, Y. Su, F. Lure, J. Li, A novel hybrid ordinal learning model with health care application, IEEE Transactions on Automation Science and Engineering 22 (2024) 339–352.doi:10.1109/TASE.2024. 3350894

  25. [25]

    Rivera-Gavilán, V

    M. Rivera-Gavilán, V. M. Vargas, P. A. Gutiérrez, J. Briceño, C. Hervás-Martínez, D. Guijo-Rubio, Ordinal classification approach for donor-recipient matching in liver transplantation with circula- tory death donors, in: International Work-Conference on Artifi- cial Neural Networks, Springer, 2023, pp. 517–528.doi:10.1007/ 978-3-031-43078-7_42

  26. [26]

    H. L. Le, H. G. Roh, H. J. Kim, J. T. Kwak, A 3d multi-task regression and ordinal regression deep neural network for collateral imaging from dynamic susceptibility contrast-enhanced mr perfusion in acute ischemic stroke, Computer Methods and Programs in Biomedicine 225 (2022) 107071.doi:10.1016/j.cmpb.2022.107071

  27. [27]

    X. Liu, F. Fan, L. Kong, Z. Diao, W. Xie, J. Lu, J. You, Unimodal regu- larized neuron stick-breaking for ordinal classification, Neurocomputing 388 (2020) 34–44.doi:10.1016/j.neucom.2020.01.025

  28. [28]

    Q. Li, J. Wang, Z. Yao, Y. Li, P. Yang, J. Yan, C. Wang, S. Pu, Unimodal-concentrated loss: Fully adaptive label distribution learning for ordinal regression, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20513–20522. doi:10.1109/CVPR52688.2022.01986

  29. [29]

    V. M. Vargas, P. A. Gutiérrez, C. Hervás-Martínez, Unimodal regular- isation based on beta distribution for deep ordinal regression, Pattern Recognition 122 (2022) 108310.doi:10.1016/j.patcog.2021.108310

  30. [30]

    V. M. Vargas, P. A. Gutiérrez, R. Rosati, L. Romeo, E. Frontoni, C. Hervás-Martínez, Exponential loss regularisation for encouraging or- dinalconstrainttoshotgunstocksqualityassessment, AppliedSoftCom- puting 138 (2023) 110191.doi:10.1016/j.asoc.2023.110191

  31. [31]

    V. M. Vargas, P. A. Gutiérrez, J. Barbero-Gómez, C. Hervás-Martínez, Soft labelling based on triangular distributions for ordinal classification, 36 Information Fusion 93 (2023) 258–267.doi:10.1016/j.inffus.2023. 01.003

  32. [32]

    V. M. Vargas, A. M. Duran-Rosal, D. Guijo-Rubio, P. A. Gutierrez, C. Hervas-Martinez, Generalised triangular distributions for ordinal deep learning: Novel proposal and optimisation, Information Sciences 648 (2023) 119606.doi:10.1016/j.ins.2023.119606

  33. [33]

    J. S. Cardoso, R. P. Cruz, T. Albuquerque, Unimodal distributions for ordinal regression, IEEE Transactions on Artificial Intelligence 6 (2025) 2498–2509.doi:10.1109/TAI.2025.3549740

  34. [34]

    V. M. Vargas, D. Guijo-Rubio, R. Ayllón-Gavilán, A. M. Gómez- Orellana, P. A. Gutiérrez, C. Hervás-Martínez, Soft labelling for deep ordinal classification: an experimental review, IEEE Transactions on Knowledge and Data Engineering (2026).doi:10.1109/TKDE.2026. 3681678

  35. [35]

    van Veldhuizen, V

    V. van Veldhuizen, V. Botha, C. Lu, M. E. Cesur, K. G. Lipman, E. D. de Jong, H. Horlings, C. I. Sanchez, C. G. Snoek, L. Wessels, et al., Foundation models in medical imaging: A review and outlook, arXiv preprint arXiv:2506.09095 (2025).doi:10.48550/arXiv.2506.09095

  36. [36]

    A Whitney polynomial for hype rmaps

    O. Elharrouss, Y. Himeur, Y. Mahmood, S. Alrabaee, A. Ouamane, F. Bensaali, Y. Bechqito, A. Chouchane, Vits as backbones: Leveraging visiontransformersforfeatureextraction, InformationFusion118(2025) 102951.doi:10.1016/j.inffus.2025.102951

  37. [37]

    P. A. Gutiérrez, M. Perez-Ortiz, J. Sanchez-Monedero, F. Fernandez- Navarro, C. Hervas-Martinez, Ordinal regression methods: survey and experimental study, IEEE Transactions on Knowledge and Data Engi- neering 28 (1) (2015) 127–146.doi:10.1109/TKDE.2015.2457911

  38. [38]

    J. Moon, P. Jadhav, S. Choi, Deep learning analysis for rheumatologic imaging: current trends, future directions, and the role of human, Jour- nal of rheumatic diseases 32 (2) (2025) 73–88.doi:10.4078/jrd.2024. 0128

  39. [39]

    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision 37 and pattern recognition, 2016, pp. 770–778.doi:10.1109/CVPR.2016. 90

  40. [40]

    Gómez-Orellana, D

    A. Gómez-Orellana, D. Guijo-Rubio, P. Gutiérrez, C. Hervás-Martínez, V. Vargas, ORFEO: Ordinal classifier and regressor fusion for estimating an ordinal categorical target, Eng. Applications of Artificial Intelligence 133 (2024) 108462.doi:10.1016/j.engappai.2024.108462

  41. [41]

    Bérchez-Moreno, R

    F. Bérchez-Moreno, R. Ayllón-Gavilán, V. M. Vargas, D. Guijo-Rubio, C. Hervás-Martínez, J. C. Fernández, P. A. Gutiérrez, dlordinal: A python package for deep ordinal classification, Neurocomputing (2025) 129305doi:10.1016/j.neucom.2024.129305

  42. [42]

    de La Torre, D

    J. de La Torre, D. Puig, A. Valls, Weighted kappa loss function for multi- class classification of ordinal data in deep learning, Pattern Recognition Letters 105 (2018) 144–154.doi:10.1016/j.patrec.2017.05.018

  43. [43]

    Cohen, Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit., Psychological bulletin 70 (4) (1968) 213

    J. Cohen, Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit., Psychological bulletin 70 (4) (1968) 213. doi:10.1037/h0026256

  44. [44]

    M. J. Warrens, Cohen’s quadratically weighted kappa is higher than linearly weighted kappa for tridiagonal agreement tables, Statistical Methodology 9 (3) (2012) 440–444.doi:10.1016/j.stamet.2011.08. 006

  45. [45]

    C. J. Willmott, K. Matsuura, Advantages of the mean absolute er- ror (mae) over the root mean square error (rmse) in assessing aver- age model performance, Climate research 30 (1) (2005) 79–82.doi: 10.3354/cr030079

  46. [46]

    Baccianella, A

    S. Baccianella, A. Esuli, F. Sebastiani, Evaluation measures for ordinal regression, in: 2009 Ninth international conference on intelligent systems design and applications, IEEE, 2009, pp. 283–287.doi:10.1109/ISDA. 2009.230

  47. [47]

    Cruz-Ramírez, C

    M. Cruz-Ramírez, C. Hervás-Martínez, J. Sánchez-Monedero, P. A. Gutiérrez, Metrics to guide a multi-objective evolutionary algorithm for ordinal classification, Neurocomputing 135 (2014) 21–31.doi: 10.1016/j.neucom.2013.05.058. 38

  48. [48]

    J. C. Fernandez Caballero, F. J. Martinez, C. Hervas, P. A. Gutierrez, Sensitivity versus accuracy in multiclass problems using memetic pareto evolutionary neural networks, IEEE Transactions on Neural Networks 21 (5) (2010) 750–770.doi:10.1109/TNN.2010.2041468

  49. [49]

    V. M. Vargas, A. M. Gómez-Orellana, P. A. Gutiérrez, C. Hervás- Martínez, D. Guijo-Rubio, Ebano: A novel ensemble based on uni- modal ordinal classifiers for the prediction of significant wave height, Knowledge-Based Systems 300 (2024) 112223.doi:10.1016/j.knosys. 2024.112223

  50. [50]

    Grad-CAM: visual explanations from deep networks via gradient-based localization.Proceedings of the IEEE International Conference on Com- puter Vision

    R.R.Selvaraju, M.Cogswell, A.Das, R.Vedantam, D.Parikh, D.Batra, Grad-cam: Visual explanations from deep networks via gradient-based localization, in: Proceedings of the IEEE international conference on computer vision, 2017, pp. 618–626.doi:10.1109/ICCV.2017.74

  51. [51]

    Kullback, Information theory and statistics, Courier Corporation, 1997

    S. Kullback, Information theory and statistics, Courier Corporation, 1997. 39