pith. sign in

arxiv: 2605.01066 · v1 · submitted 2026-05-01 · 💻 cs.LG

A dimensional R2 regression metric

Pith reviewed 2026-05-09 19:19 UTC · model grok-4.3

classification 💻 cs.LG
keywords R2 scoreregression evaluationmultidimensional regressionmodel metricsnoise sensitivityvariance explainedDim-R2
0
0 comments X

The pith

Dim-R2 extends the R2 metric to handle regression data of any dimension while showing detailed accuracy patterns and resisting noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors propose the Dimensional R2 score to fix three problems with the usual R2 in regression. Usual R2 only handles up to two dimensions, squeezes all results into one number that hides details, and can produce confusing negative scores from small noise in some channels. Dim-R2 works for any number of dimensions, returns separate scores for each dimension to reveal where the model does well or poorly, and lowers the impact of noise. This matters because many real regression problems involve high-dimensional outputs, such as predicting multiple related variables at once, and better metrics help improve the models.

Core claim

The paper presents Dim-R2 as a simple extension of the R2 score that accepts data of arbitrary dimensionality, supplies a vector of accuracy values rather than a single scalar, and shows reduced sensitivity to low-variance noise channels. Experiments on synthetic sinusoidal data and three real multidimensional regression datasets confirm that it highlights patterns in prediction accuracy that standard R2 conceals.

What carries the argument

The Dimensional R2 score (Dim-R2), a direct generalization of the standard R-squared formula that processes each dimension independently to produce a multidimensional accuracy profile.

If this is right

  • Regression models on high-dimensional targets can be assessed without the information loss of collapsing to one number.
  • Specific dimensions where predictions fail become identifiable for targeted model fixes.
  • Large negative scores from noise in low-variance channels are avoided, keeping the metric interpretable.
  • The approach applies across synthetic and real datasets from different domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Analysts could use Dim-R2 to compare models across different output dimensionalities more fairly.
  • It may encourage the design of loss functions that optimize per-dimension performance explicitly.
  • This metric could improve evaluation in any area that uses multidimensional regression, such as predicting multiple outcomes simultaneously.

Load-bearing premise

The particular way of extending the R2 calculation to multiple dimensions preserves its original properties of normalization and interpretability without adding unexpected distortions.

What would settle it

Compare Dim-R2 and standard R2 on a dataset with one low-variance noisy dimension and perfect prediction on others; if Dim-R2 still produces large negative values or loses its ability to show per-dimension accuracy, the advantages do not hold.

Figures

Figures reproduced from arXiv: 2605.01066 by Adam Hantman, Jaesung Yoo, Jian Zhong Guo, Kanaka Rajan, Stefan Lemke.

Figure 1
Figure 1. Figure 1: Schematic of Dim-R2 on 3D y and yˆ with A = Data, Anorm = Apool = T ime. 3.2 Datasets 3.2.1 Synthetic Sinusoidal Dataset To illustrate the dimensional view of regression accuracy (Section 4.1), we generated waveforms of shape (Data, Time, Channels)=(1000, 100, 5) ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic sinusoidal dataset y and yˆ with time-varying noise per channel. No Bias and Varying Channel Bias conditions differ in added bias, introducing variability across channel and data dimensions ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dimensional view of Dim-R2 on synthetic sinusoidal data with [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Dimensional view of Dim-R2 on DC-RNN neural data. Each heatmap shows Dim-R2 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Dimensional view of Dim-R2 on MNIST image reconstruction with [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dimensional view of Dim-R2 on CelebA image reconstruction with [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Synthetic sinusoidal dataset y and yˆ with noise channels, with corresponding mean R2 and Dim-R2 scores. (a) Noise channel variance: y=0.01, yˆ=0.01, (b) y=0.01, yˆ=1.00. Full combinations of noise channel variances in [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Dim-R2 yields higher scores than mean R2 in the presence of low-variance noise channels. [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Dim-R2 and mean R2 on DC-RNN neural data predictions across preprocessing Gaussian [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Dim-R2 presents rich patterns of prediction accuracy across designated dimensions (Axis). [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Dim-R2 is more resilient to noise channels than conventional mean R2 because high [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Simple 2D sine wave example to illustrate how [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Synthetic sinusoidal datasets with time-varying noise ( [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Synthetic sinusoidal datasets with noise channels and corresponding mean R2 and Dim-R2 [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Examples of y and yˆ from DC-RNN trained to reproduce neural activity. This single session contains 78 trials, with 42, 93, 11, and 106 neurons from DCN, M1, Striatum, and Thalamus, respectively. This session corresponds to Session index 21 with a 50ms Gaussian filter in [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Dimensional view of Dim-R2 on DC-RNN neural data. Each heatmap shows Dim [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Dim-R2 highlights the presence of channels with high predictive accuracy in the presence [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Variance weighted mean R2 scores measured on simulated sinusoidal data (Fig. 7) across [PITH_FULL_IMAGE:figures/full_fig_p024_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Mean D2 absolute error scores measured on simulated sinusoidal data (Fig. 7) across [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Mean explained variance scores measured on simulated sinusoidal data (Fig. 7) across [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Mean variance-weighted explained variance scores measured on simulated sinusoidal data [PITH_FULL_IMAGE:figures/full_fig_p025_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Mean correlation scores measured on simulated sinusoidal data (Fig. 7) across hyper [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Dim-R2 highlights the presence of channels with high predictive accuracy in the presence [PITH_FULL_IMAGE:figures/full_fig_p027_23.png] view at source ↗
read the original abstract

R2 score is the standard metric for evaluating regression tasks, offering a normalized magnitude-agnostic measure of accuracy that captures variance. However, R2 has three key limitations: it is limited to at most two dimensional inputs, it reduces the score to a single scalar that hides rich patterns of prediction accuracy, and it is sensitive to low-variance noise channels which can yield large, uninterpretable negative values. We introduce the Dimensional R2 score (Dim-R2), a simple extension of R2 that accepts data of arbitrary dimensionality, provides a multidimensional view of accuracy, and reduces sensitivity to noise. We demonstrate its advantages on both synthetic sinusoidal data and three multidimensional regression datasets. Dim-R2 offers an interpretable and flexible metric that highlights patterns in regression accuracy, guiding regression modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper introduces the Dimensional R² (Dim-R2) score as a simple extension of the standard R² metric for regression on data of arbitrary dimensionality. Dim-R2 computes R² per dimension and aggregates via L2-norm (with optional per-channel reporting), reducing exactly to scalar R² for 1D targets. It reduces sensitivity to low-variance noise channels via a data-driven variance threshold that preserves the [0,1] range and variance-explained interpretation in expectation. The approach is demonstrated on synthetic sinusoidal data and three real multidimensional regression datasets, highlighting improved interpretability and noise robustness.

Significance. If the claims hold, Dim-R2 provides a practical, interpretable metric for high-dimensional regression evaluation that extends R²'s desirable properties without new artifacts or loss of interpretability. The explicit per-dimension construction with exact 1D reduction, the variance-threshold noise handling, and the confirmation on both synthetic and real data are strengths. This could aid multi-output regression tasks in machine learning where standard R² is limited.

minor comments (3)
  1. [Abstract] Abstract: The summary of benefits is clear but would be strengthened by briefly stating the explicit per-dimension + L2-norm definition and the variance-threshold mechanism, as these are central to the contribution.
  2. [Experiments] Experiments section: Include quantitative tables comparing Dim-R2 values against standard R² on the three real datasets, with specific effect sizes for noise reduction, to make the advantages more concrete and reproducible.
  3. [Method] Method: Specify the exact formula for the variance threshold (e.g., how the data-driven cutoff is computed) and confirm it requires no additional hyperparameters beyond standard R² assumptions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of our work and the recommendation for minor revision. The referee accurately captures the core contributions of Dim-R2, including its exact reduction to scalar R², per-dimension interpretability, and variance-threshold noise handling. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines Dim-R2 explicitly as a per-dimension R² computation followed by optional L2-norm aggregation (or per-channel reporting), which reduces exactly to scalar R² on 1D targets by algebraic construction. This is a direct definitional extension with no fitted parameters, no self-citation load-bearing steps, and no predictions that collapse to inputs. All claimed properties (range preservation, noise robustness via variance thresholding, multidimensional view) follow immediately from the stated formulas and the standard premises of R²; the synthetic and real-data sections simply verify these consequences without introducing new artifacts or hidden assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that R2 properties extend naturally to higher dimensions; no free parameters or new entities are described in the abstract.

axioms (1)
  • domain assumption The standard R2 score properties can be extended to arbitrary dimensionality while preserving interpretability and reducing noise sensitivity
    This assumption underpins the definition and claimed advantages of Dim-R2.

pith-pipeline@v0.9.0 · 5433 in / 1176 out tokens · 35634 ms · 2026-05-09T19:19:26.988906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 1 internal anchor

  1. [1]

    M. I. Jordan and T. M. Mitchell. Machine learning: Trends, perspectives, and prospects.Science, 349(6245):255–260, 2015

  2. [2]

    The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation.Peerj computer science, 7:e623, 2021

    Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation.Peerj computer science, 7:e623, 2021

  3. [3]

    An introduction to regression analysis

    Alan O Sykes. An introduction to regression analysis. 1993

  4. [4]

    R2: a useful measure of model performance when predicting a dichotomous outcome.Statistics in medicine, 18(4):375–384, 1999

    Arlene Ash and Michael Shwartz. R2: a useful measure of model performance when predicting a dichotomous outcome.Statistics in medicine, 18(4):375–384, 1999. 9

  5. [5]

    Cortical pattern generation during dexterous movement is input-driven.Nature, 577(7790):386–391, 2020

    Britton A Sauerbrei, Jian-Zhong Guo, Jeremy D Cohen, Matteo Mischiati, Wendy Guo, Mayank Kabra, Nakul Verma, Brett Mensh, Kristin Branson, and Adam W Hantman. Cortical pattern generation during dexterous movement is input-driven.Nature, 577(7790):386–391, 2020

  6. [6]

    Pedregosa, G

    F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

  7. [7]

    Deep learning for accelerated and robust mri reconstruction.Magnetic Resonance Materials in Physics, Biology and Medicine, 37(3):335–368, 2024

    Reinhard Heckel, Mathews Jacob, Akshay Chaudhari, Or Perlman, and Efrat Shimron. Deep learning for accelerated and robust mri reconstruction.Magnetic Resonance Materials in Physics, Biology and Medicine, 37(3):335–368, 2024

  8. [8]

    A comprehensive review of deep learning-based hyperspectral image reconstruction for agri-food quality appraisal.Artificial Intelligence Review, 58(4):96, 2025

    Md Toukir Ahmed, Ocean Monjur, Alin Khaliduzzaman, and Mohammed Kamruzzaman. A comprehensive review of deep learning-based hyperspectral image reconstruction for agri-food quality appraisal.Artificial Intelligence Review, 58(4):96, 2025

  9. [9]

    Image response regression via deep neural networks.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5):1589–1614, 07 2023

    Daiwei Zhang, Lexin Li, Chandra Sripada, and Jian Kang. Image response regression via deep neural networks.Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5):1589–1614, 07 2023

  10. [10]

    Inferring brain-wide interactions using data-constrained recurrent neural network models.BioRxiv, pages 2020–12, 2020

    Matthew G Perich, Charlotte Arlt, Sofia Soares, Megan E Young, Clayton P Mosher, Juri Minxha, Eugene Carter, Ueli Rutishauser, Peter H Rudebeck, Christopher D Harvey, et al. Inferring brain-wide interactions using data-constrained recurrent neural network models.BioRxiv, pages 2020–12, 2020

  11. [11]

    Jaesung Yoo, Ilhan Yoo, Ina Youn, Sung-Min Kim, Ri Yu, Kwangsoo Kim, Keewon Kim, and Seung-Bo Lee. Residual one-dimensional convolutional neural network for neuromuscular disorder classification from needle electromyography signals with explainability.Computer Methods and Programs in Biomedicine, 226:107079, 2022

  12. [12]

    Rethinking brain-wide interactions through multi-region ‘network of networks’ models.Current Opinion in Neurobiology, 65:146–151, 2020

    Matthew G Perich and Kanaka Rajan. Rethinking brain-wide interactions through multi-region ‘network of networks’ models.Current Opinion in Neurobiology, 65:146–151, 2020. Whole- brain interactions between neural circuits

  13. [13]

    Multiple linear regression.Nature methods, 12(12):1103–1105, 2015

    Martin Krzywinski and Naomi Altman. Multiple linear regression.Nature methods, 12(12):1103–1105, 2015

  14. [14]

    Simple linear regression.Nature methods, 12(11), 2015

    Naomi Altman and Martin Krzywinski. Simple linear regression.Nature methods, 12(11), 2015

  15. [15]

    Hyperspectral imaging-based multiple predicting models for functional component contents in brassica juncea.Agriculture, 12(10), 2022

    Jae-Hyeong Choi, Soo Hyun Park, Dae-Hyun Jung, Yun Ji Park, Jung-Seok Yang, Jai-Eok Park, Hyein Lee, and Sang Min Kim. Hyperspectral imaging-based multiple predicting models for functional component contents in brassica juncea.Agriculture, 12(10), 2022

  16. [16]

    Qingyang Liu, Yanrong Hu, and Hongjiu Liu. Enhanced stock price prediction with opti- mized ensemble modeling using multi-source heterogeneous data: Integrating lstm attention mechanism and multidimensional gray model.Journal of Industrial Information Integration, 42:100711, 2024

  17. [17]

    Assessing stroke severity using electronic health record data: a machine learning approach.BMC medical informatics and decision making, 20(1):8, 2020

    Emily Kogan, Kathryn Twyman, Jesse Heap, Dejan Milentijevic, Jennifer H Lin, and Mark Alberts. Assessing stroke severity using electronic health record data: a machine learning approach.BMC medical informatics and decision making, 20(1):8, 2020

  18. [18]

    Statistical learning with sparsity

    Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical learning with sparsity. Monographs on statistics and applied probability, 143(143):8, 2015

  19. [19]

    Harris, K

    Charles R. Harris, K. Jarrod Millman, Stéfan J. van der Walt, Ralf Gommers, Pauli Virtanen, David Cournapeau, Eric Wieser, Julian Taylor, Sebastian Berg, Nathaniel J. Smith, Robert Kern, Matti Picus, Stephan Hoyer, Marten H. van Kerkwijk, Matthew Brett, Allan Haldane, Jaime Fer- nández del Río, Mark Wiebe, Pearu Peterson, Pierre Gérard-Marchant, Kevin She...

  20. [20]

    Variational autoencoder for deep learning of images, labels and captions

    Yunchen Pu, Zhe Gan, Ricardo Henao, Xin Yuan, Chunyuan Li, Andrew Stevens, and Lawrence Carin. Variational autoencoder for deep learning of images, labels and captions. In D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016

  21. [21]

    Tutorial on variational autoencoders,

    Carl Doersch. Tutorial on variational autoencoders.arXiv preprint arXiv:1606.05908, 2016

  22. [22]

    Springer International Publishing, Cham, 2021

    Lucas Pinheiro Cinelli, Matheus Araújo Marins, Eduardo Antúnio Barros da Silva, and Sérgio Lima Netto.Variational Autoencoder, pages 111–149. Springer International Publishing, Cham, 2021

  23. [23]

    Mnist handwritten digit database.ATT Labs [Online]

    Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database.ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010

  24. [24]

    Deep learning face attributes in the wild

    Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of International Conference on Computer Vision (ICCV), December 2015

  25. [25]

    Measuring and controlling solution de- generacy across task-trained recurrent neural networks.arXiv preprint arXiv:2410.03972, 2024

    Ann Huang, Satpreet H Singh, and Kanaka Rajan. Measuring and controlling solution de- generacy across task-trained recurrent neural networks.arXiv preprint arXiv:2410.03972, 2024

  26. [26]

    Dual policy as self-model for planning.Journal of Korean Institute of Intelligent Systems, 34(1):15–24, 2024

    Jaesung Yoo, Fernanda de la Torre, and Guangyu Robert Yang. Dual policy as self-model for planning.Journal of Korean Institute of Intelligent Systems, 34(1):15–24, 2024

  27. [27]

    Machine learning and artificial intelligence in neuroscience: A primer for researchers.Brain, Behavior, and Immunity, 115:470–479, 2024

    Fakhirah Badrulhisham, Esther Pogatzki-Zahn, Daniel Segelcke, Tamas Spisak, and Jan V ollert. Machine learning and artificial intelligence in neuroscience: A primer for researchers.Brain, Behavior, and Immunity, 115:470–479, 2024

  28. [28]

    Catalyzing next-generation artificial intelligence through neuroai.Nature communications, 14(1):1597, 2023

    Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, et al. Catalyzing next-generation artificial intelligence through neuroai.Nature communications, 14(1):1597, 2023

  29. [29]

    Artificial neural networks for neuroscientists: a primer.Neuron, 107(6):1048–1070, 2020

    Guangyu Robert Yang and Xiao-Jing Wang. Artificial neural networks for neuroscientists: a primer.Neuron, 107(6):1048–1070, 2020

  30. [30]

    Temporal encoding in deep reinforcement learning agents.Scientific Reports, 13(1):22335, 2023

    Dongyan Lin, Ann Zixiang Huang, and Blake Aaron Richards. Temporal encoding in deep reinforcement learning agents.Scientific Reports, 13(1):22335, 2023

  31. [31]

    Disrupting cortico-cerebellar communication impairs dexterity.eLife, 10:e65906, jul 2021

    Jian-Zhong Guo, Britton A Sauerbrei, Jeremy D Cohen, Matteo Mischiati, Austin R Graves, Ferruccio Pisanello, Kristin M Branson, and Adam W Hantman. Disrupting cortico-cerebellar communication impairs dexterity.eLife, 10:e65906, jul 2021

  32. [32]

    Mensh, Yitzhak Schiller, Ron Meir, Omri Barak, Ronen Talmon, Adam W

    Shahar Levy, Maria Lavzin, Hadas Benisty, Amir Ghanayim, Uri Dubin, Shay Achvat, Zohar Brosh, Fadi Aeed, Brett D. Mensh, Yitzhak Schiller, Ron Meir, Omri Barak, Ronen Talmon, Adam W. Hantman, and Jackie Schiller. Cell-type-specific outcome representation in the primary motor cortex.Neuron, 107(5):954–971.e9, 2020

  33. [33]

    Nicholas A. Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia Böhm, Martijn Broux, Susu Chen, Jennifer Colonell, Richard J. Gardner, Bill Karsh, Fabian Kloosterman, Dimitar Kostadinov, Carolina Mora-Lopez, John O’Callaghan, Junchol Park, Jan Putzeys, Britton Sauerbrei, Rik J. J. van Da...

  34. [34]

    APT: The Animal Part Tracker

    Allen Lee, Mayank Kabra, Alice Robie, Stephen Huston, Felipe Rodriguez, Roian Egnor, Austin Edwards, and Kristin Branson. APT: The Animal Part Tracker. https://github.com/ kristinbranson/APT, 2020. Howard Hughes Medical Institute, Janelia Research Campus. GNU GPLv3 license. 11

  35. [35]

    Yoo, Morris A

    Andy B. Yoo, Morris A. Jette, and Mark Grondona. Slurm: Simple linux utility for resource management. In Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn, editors,Job Scheduling Strategies for Parallel Processing, pages 44–60, Berlin, Heidelberg, 2003. Springer Berlin Heidelberg

  36. [36]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

  37. [37]

    Backpropagation through time: what it does and how to do it.Proceedings of the IEEE, 78(10):1550–1560, 1990

    Paul J Werbos. Backpropagation through time: what it does and how to do it.Proceedings of the IEEE, 78(10):1550–1560, 1990

  38. [38]

    Automatic differentiation in pytorch

    Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. 2017. Workshop Paper

  39. [39]

    Review learning: Real world validation of privacy preserving continual learning across medical institutions.Computers in Biology and Medicine, 192:110239, 2025

    Jaesung Yoo, Sunghyuk Choi, Ye Seul Yang, Suhyeon Kim, Jieun Choi, Dongkyeong Lim, Yaeji Lim, Hyung Joon Joo, Dae Jung Kim, Rae Woong Park, Hyung-Jin Yoon, and Kwangsoo Kim. Review learning: Real world validation of privacy preserving continual learning across medical institutions.Computers in Biology and Medicine, 192:110239, 2025. 12 A Common regression...