Soft Learning
Pith reviewed 2026-05-20 14:14 UTC · model grok-4.3
The pith
Soft Learning learns optimal non-negative weights to combine diverse specialists, guaranteeing performance that matches or exceeds the best weighted mix while training far faster than deep networks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Soft Learning maintains a library of heterogeneous specialists and discovers provably optimal combination weights through cross-validated non-negative least squares. This construction guarantees that the resulting model will match or exceed the best weighted combination of its specialists. The method trains 72-435 times faster than deep networks on CPU hardware alone, requires no hyperparameter tuning, and supplies inherent interpretability via the learned weights that indicate which algorithmic family fits the data.
What carries the argument
Cross-validated non-negative least squares, which solves for non-negative weights that minimize validation error when combining the prediction outputs of the specialist models.
If this is right
- Performance is guaranteed to remain the same or improve when any new specialist is added to the library.
- The learned weights reveal which modeling paradigm best matches a given dataset without extra analysis.
- No GPU hardware or hyperparameter tuning is required to reach competitive or superior results on both classification and regression tasks.
- The same framework applies uniformly to the 25 classification and 12 regression datasets tested.
Where Pith is reading between the lines
- Practitioners could stop asking which single algorithm is best and instead ask what combination of available specialists is optimal for the data at hand.
- Resource-limited settings might adopt this style of combination to reach high performance without specialized hardware.
- The guarantee structure could be tested on streaming or continually arriving data to see whether the weights remain stable over time.
Load-bearing premise
Weights found by non-negative least squares on cross-validation folds will continue to produce good combinations on completely new test data.
What would settle it
A held-out test set on which the Soft Learning output performs materially worse than its single best specialist despite the non-negative least-squares combination being applied.
Figures
read the original abstract
Modern machine learning forces practitioners to choose between powerful but expensive deep networks and fast but limited classical algorithms. Here we introduce Soft Learning, a framework that maintains a library of heterogeneous specialists -- spanning linear models, tree ensembles, kernel machines, and neural networks -- and discovers provably optimal combination weights through cross-validated non-negative least squares. Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, trains over two orders of magnitude faster than deep networks on CPU alone (72-435x faster across tested configurations), provides inherent interpretability through learned weights that reveal which algorithmic paradigm best fits the data, and is future-proof: adding specialists is mathematically guaranteed to maintain or improve performance. Across 37 datasets (25 classification, 12 regression) against nine methods including CatBoost and tuned deep networks, Soft Learning ranks first on 70% of tasks, achieves the best mean rank (Friedman test, p = 1.12 x 10^-12), and is the only method to simultaneously excel at both classification and regression -- all without GPU hardware or hyperparameter tuning. These results suggest a paradigm shift from "which algorithm is best?" to "what is the provably optimal combination?" -- a question Soft Learning answers with formal guarantees for any data modality.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Soft Learning as a method to combine predictions from a diverse set of specialist models (linear, tree-based, kernel, and neural) by solving for non-negative weights using cross-validated non-negative least squares. It asserts formal optimality guarantees, significant speed advantages over deep networks, and superior empirical performance across 37 datasets in both classification and regression tasks, all without hyperparameter tuning or specialized hardware.
Significance. Should the central claims regarding out-of-sample optimality and generalization of the learned weights be substantiated, this approach could meaningfully advance ensemble methods by offering a principled, efficient, and interpretable alternative to both classical algorithms and deep learning. The ability to add specialists while maintaining guarantees and the lack of need for GPU resources are strong practical advantages. The work also provides a clear path toward understanding which paradigms suit particular data.
major comments (2)
- [Abstract] The claim that Soft Learning is 'guaranteed to match or exceed the best weighted combination of its specialists' is based on the cross-validated non-negative least squares solution. However, since this solution is obtained from the same cross-validation folds used in evaluation, the optimality may not extend to unseen test data without additional safeguards against overfitting in the weight estimation step.
- [Empirical evaluation] The reported best mean rank and first-place ranking on 70% of tasks rely on the learned weights generalizing from CV to test. Given that specialist predictions are often correlated and the number of specialists is not specified as small, a nested cross-validation loop isolating the weight-learning generalization error would be necessary to support these claims robustly.
minor comments (2)
- [Abstract] Ensure that the number of specialists and their types are clearly stated in the main text for reproducibility.
- [Introduction] The transition from 'which algorithm is best?' to 'what is the provably optimal combination?' is compelling but would benefit from a brief discussion of related work on meta-learning and stacking.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive comments on our work on Soft Learning. We provide point-by-point responses to the major comments below.
read point-by-point responses
-
Referee: [Abstract] The claim that Soft Learning is 'guaranteed to match or exceed the best weighted combination of its specialists' is based on the cross-validated non-negative least squares solution. However, since this solution is obtained from the same cross-validation folds used in evaluation, the optimality may not extend to unseen test data without additional safeguards against overfitting in the weight estimation step.
Authors: The optimality guarantee applies specifically to the cross-validation data used for weight estimation. The non-negative least squares solver finds the weights that minimize the squared error on the out-of-fold specialist predictions, which are generated without using the target instances in training the specialists. This ensures the combination is optimal for those CV predictions. For the test data, we apply the learned weights and evaluate empirically, without claiming a formal optimality guarantee on the test distribution. We agree that this distinction should be clarified to avoid misinterpretation. In the revised manuscript, we will update the abstract and add a section explaining the scope of the guarantees. revision: partial
-
Referee: [Empirical evaluation] The reported best mean rank and first-place ranking on 70% of tasks rely on the learned weights generalizing from CV to test. Given that specialist predictions are often correlated and the number of specialists is not specified as small, a nested cross-validation loop isolating the weight-learning generalization error would be necessary to support these claims robustly.
Authors: We recognize the value of nested cross-validation for isolating the generalization performance of the weight estimation step, particularly given potential correlations among specialist predictions. In our current implementation, we employ a single cross-validation procedure to balance computational efficiency with the scale of our experiments across 37 datasets. The number of specialists is 9 in the reported experiments, which is modest. While a full nested CV would strengthen the claims, the observed performance advantages and the statistical significance (Friedman test p-value) provide supporting evidence that the weights generalize effectively. We will revise the manuscript to specify the number of specialists, discuss this limitation, and include a nested CV analysis on a representative subset of datasets. revision: partial
Circularity Check
Optimality guarantee reduces to NNLS fit on CV folds by construction
specific steps
-
fitted input called prediction
[Abstract]
"Soft Learning is guaranteed to match or exceed the best weighted combination of its specialists, ... discovers provably optimal combination weights through cross-validated non-negative least squares."
The guarantee is obtained by fitting NNLS weights on the cross-validation folds; the reported superiority on the 37 datasets is therefore the in-sample fit on those folds, not a prediction that must generalize beyond the data used to compute the weights.
full rationale
The paper's central guarantee that Soft Learning 'is guaranteed to match or exceed the best weighted combination' is achieved by solving non-negative least squares on the same cross-validation folds later used to report performance. This makes the headline claims (best mean rank, first on 70% of tasks, matches/exceeds best specialist) a direct consequence of the fitted weights rather than an independent prediction on held-out test data. No nested outer loop isolates the generalization of the weight-learning step itself. The derivation chain therefore collapses the 'provable optimality' claim into the fitting procedure on the evaluation data.
Axiom & Free-Parameter Ledger
free parameters (1)
- combination weights
axioms (1)
- domain assumption The library of specialists contains sufficiently complementary models so that a convex combination improves over the best single model.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Soft Learning ... discovers provably optimal combination weights through cross-validated non-negative least squares ... oracle inequality ... Krogh-Vedelsby decomposition
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, 785–794, DOI: 10.1145/2939672.2939785 (Association for Computing Machinery, New York, NY , USA, 2016). 3.Friedman, J. H. Greedy function approximation: A gradient boosting machine....
-
[2]
Energy and Policy Considerations for Deep Learning in NLP
Strubell, E., Ganesh, A. & McCallum, A. Energy and policy considerations for deep learning in NLP. In Korhonen, A., Traum, D. & Màrquez, L. (eds.)Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3645–3650, DOI: 10.18653/v1/P19-1355 (Association for Computational Linguistics, Florence, Italy, 2019)
work page internal anchor Pith review doi:10.18653/v1/p19-1355 2019
-
[3]
Schwartz, R., Dodge, J., Smith, N. A. & Etzioni, O. Green ai.Commun. ACM63, 54–63, DOI: 10.1145/3381831 (2020)
-
[4]
Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals, O. Understanding deep learning (still) requires rethinking generalization.Commun. ACM64, 107–115, DOI: 10.1145/3446776 (2021)
-
[5]
Hendrycks, D. & Dietterich, T. Benchmarking neural network robustness to common corruptions and perturbations. In International Conference on Learning Representations(2019)
work page 2019
-
[6]
Guo, C., Pleiss, G., Sun, Y . & Weinberger, K. Q. On calibration of modern neural networks. InProceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, 1321–1330 (JMLR.org, 2017)
work page 2017
-
[7]
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nat. Mach. Intell.1, 206–215, DOI: 10.1038/s42256-019-0048-x (2019)
-
[8]
In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M
Sculley, D.et al.Hidden technical debt in machine learning systems. In Cortes, C., Lawrence, N., Lee, D., Sugiyama, M. & Garnett, R. (eds.)Advances in Neural Information Processing Systems, vol. 28 (Curran Associates, Inc., 2015)
work page 2015
-
[9]
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q. (eds.)Advances in Neural Information Processing Systems, vol. 25, 1097–1105 (Curran Associates, Inc., 2012)
work page 2012
-
[10]
Deep residual learning for image recognition,
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778, DOI: 10.1109/CVPR.2016.90 (2016)
-
[11]
Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N
Hinton, G.et al.Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups.IEEE Signal Process. Mag.29, 82–97, DOI: 10.1109/MSP.2012.2205597 (2012)
-
[12]
Sutskever, I., Vinyals, O. & Le, Q. V . Sequence to sequence learning with neural networks. InProceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, 3104–3112 (MIT Press, Cambridge, MA, USA, 2014)
work page 2014
-
[13]
Vaswani, A.et al.Attention is all you need. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 6000–6010 (Curran Associates Inc., Red Hook, NY , USA, 2017)
work page 2017
-
[14]
Bengio, Y ., Courville, A. & Vincent, P. Representation learning: A review and new perspectives.IEEE Transactions on Pattern Analysis Mach. Intell.35, 1798–1828, DOI: 10.1109/TPAMI.2013.50 (2013)
-
[15]
Learning Representations by Back- Propagating Errors
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors.Nature323, 533–536, DOI: 10.1038/323533a0 (1986)
-
[16]
Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? InProceedings of the 36th International Conference on Neural Information Processing Systems, NIPS ’22 (Curran Associates Inc., Red Hook, NY , USA, 2022). 15/33
work page 2022
-
[17]
Shwartz-Ziv, R. & Armon, A. Tabular data: Deep learning is not all you need.Inf. Fusion81, 84–90, DOI: 10.1016/j. inffus.2021.11.011 (2022)
work page doi:10.1016/j 2021
-
[18]
Borisov, V .et al.Deep neural networks and tabular data: A survey.IEEE Transactions on Neural Networks Learn. Syst.35, 7499–7519, DOI: 10.1109/TNNLS.2022.3229161 (2024)
-
[19]
Fernández-Delgado, M., Cernadas, E., Barro, S. & Amorim, D. Do we need hundreds of classifiers to solve real world classification problems?J. Mach. Learn. Res.15, 3133–3181 (2014)
work page 2014
-
[20]
Wolpert, D. & Macready, W. No free lunch theorems for optimization.IEEE Transactions on Evol. Comput.1, 67–82, DOI: 10.1109/4235.585893 (1997)
-
[21]
Dietterich, T. G. Ensemble methods in machine learning. InMultiple Classifier Systems, 1–15 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2000)
work page 2000
-
[22]
Wolpert, D. H. Stacked generalization.Neural Networks5, 241–259, DOI: https://doi.org/10.1016/S0893-6080(05)80023-1 (1992). 27.Breiman, L. Stacked regressions.Mach. Learn.24, 49–64, DOI: 10.1007/BF00117832 (1996)
-
[23]
Le Goallec, A., Diai, S., Collin, S., Prost, J.-B., Vincent, T., and Patel, C
van der Laan, M. J., Polley, E. C. & Hubbard, A. E. Super learner.Stat. Appl. Genet. Mol. Biol.6, 1–23, DOI: 10.2202/1544-6115.1309 (2007)
-
[24]
Polley, E. C. & van der Laan, M. J. Super learner in prediction. Working Paper 266, U.C. Berkeley Division of Biostatistics Working Paper Series (2010)
work page 2010
-
[25]
van der Laan, M. J. & Dudoit, S. Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. Working Paper 130, U.C. Berkeley Division of Biostatistics Working Paper Series (2003)
work page 2003
-
[26]
van der Vaart, A. W., Dudoit, S. & van der Laan, M. J. Oracle inequalities for multi-fold cross validation.Stat. & Decis. 24, 351–371, DOI: 10.1524/stnd.2006.24.3.351 (2006)
-
[27]
Naimi, A. I. & Balzer, L. B. Stacked generalization: an introduction to super learning.Eur. J. Epidemiol.33, 459–464, DOI: 10.1007/s10654-018-0390-z (2018)
-
[28]
ISBN 1581138385.DOI: 10.1145/1015330.1015430
Caruana, R., Niculescu-Mizil, A., Crew, G. & Ksikes, A. Ensemble selection from libraries of models. InProceedings of the Twenty-First International Conference on Machine Learning, ICML ’04, 18, DOI: 10.1145/1015330.1015432 (Association for Computing Machinery, New York, NY , USA, 2004)
-
[29]
Adaptive Mixtures of Local Experts
Jacobs, R. A., Jordan, M. I., Nowlan, S. J. & Hinton, G. E. Adaptive mixtures of local experts.Neural Comput.3, 79–87, DOI: 10.1162/neco.1991.3.1.79 (1991). https://direct.mit.edu/neco/article-pdf/3/1/79/812104/neco.1991.3.1.79.pdf
-
[30]
InInternational Conference on Learning Representations(2017)
Shazeer, N.et al.Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInternational Conference on Learning Representations(2017)
work page 2017
-
[31]
Masoudnia, S. & Ebrahimpour, R. Mixture of experts: a literature survey.Artif. Intell. Rev.42, 275–293, DOI: 10.1007/s10462-012-9338-y (2014)
-
[32]
Lakshminarayanan, B., Pritzel, A. & Blundell, C. Simple and scalable predictive uncertainty estimation using deep ensembles. In Guyon, I.et al.(eds.)Advances in Neural Information Processing Systems, vol. 30 (Curran Associates, Inc., 2017)
work page 2017
-
[33]
Feurer, M.et al.Efficient and robust automated machine learning. InProceedings of the 29th International Conference on Neural Information Processing Systems - Volume 2, NIPS’15, 2755–2763 (MIT Press, Cambridge, MA, USA, 2015)
work page 2015
-
[34]
Hutter, F., Kotthoff, L. & Vanschoren, J. (eds.)Automated Machine Learning: Methods, Systems, Challenges. The Springer Series on Challenges in Machine Learning (Springer, Cham, 2019)
work page 2019
-
[35]
Thornton, C., Hutter, F., Hoos, H. H. & Leyton-Brown, K. Auto-weka: combined selection and hyperparameter optimization of classification algorithms. InProceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’13, 847–855, DOI: 10.1145/2487575.2487629 (Association for Computing Machinery, New York, NY , USA, 2013)
-
[36]
In7th ICML Workshop on Automated Machine Learning (AutoML 2020)(2020)
Erickson, N.et al.AutoGluon-Tabular: Robust and accurate AutoML for structured data. In7th ICML Workshop on Automated Machine Learning (AutoML 2020)(2020)
work page 2020
-
[37]
InInternational Conference on Learning Representations (ICLR) (2025)
Liu, Z.et al.KAN: Kolmogorov–Arnold networks. InInternational Conference on Learning Representations (ICLR) (2025). 16/33
work page 2025
-
[38]
Garcez, A. d. & Lamb, L. C. Neurosymbolic ai: the 3rd wave.Artif. Intell. Rev.56, 12387–12406, DOI: 10.1007/ s10462-023-10448-w (2023)
work page 2023
-
[39]
Kautz, H. A. The third ai summer: Aaai robert s. engelmore memorial lecture.AI Mag.43, 105–125, DOI: https: //doi.org/10.1002/aaai.12036 (2022). https://onlinelibrary.wiley.com/doi/pdf/10.1002/aaai.12036
-
[40]
& Hanson, R.Solving Least Squares Problems
Lawson, C. & Hanson, R.Solving Least Squares Problems. Classics in Applied Mathematics (Society for Industrial and Applied Mathematics, 1995)
work page 1995
-
[41]
Choromanska, A., Henaff, M., Mathieu, M., Ben Arous, G. & LeCun, Y . The Loss Surfaces of Multilayer Networks. In Lebanon, G. & Vishwanathan, S. V . N. (eds.)Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics, vol. 38 ofProceedings of Machine Learning Research, 192–204 (PMLR, San Diego, California, USA, 2015)
work page 2015
-
[42]
Krogh, A. & Vedelsby, J. Neural network ensembles, cross validation and active learning. InProceedings of the 8th International Conference on Neural Information Processing Systems, NIPS’94, 231–238 (MIT Press, Cambridge, MA, USA, 1994). 48.Friedman, J. H. Multivariate adaptive regression splines.The Annals Stat.19, 1–67 (1991)
work page 1994
-
[43]
van Rijn, Bernd Bischl, and Luis Torgo
Vanschoren, J., van Rijn, J. N., Bischl, B. & Torgo, L. Openml: networked science in machine learning.SIGKDD Explor. Newsl.15, 49–60, DOI: 10.1145/2641190.2641198 (2014). 50.Kelly, M., Longjohn, R. & Nottingham, K. The UCI machine learning repository. https://archive.ics.uci.edu
-
[44]
Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Bach, F. & Blei, D. (eds.)Proceedings of the 32nd International Conference on Machine Learning, vol. 37 ofProceedings of Machine Learning Research, 448–456 (PMLR, Lille, France, 2015)
work page 2015
-
[45]
& Friedman, J.The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Hastie, T., Tibshirani, R. & Friedman, J.The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics (Springer, New York, NY , 2009), 2 edn
work page 2009
-
[46]
Prokhorenkova, L., Gusev, G., V orobev, A., Dorogush, A. V . & Gulin, A. Catboost: unbiased boosting with categorical features. InProceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, 6639–6649 (Curran Associates Inc., Red Hook, NY , USA, 2018)
work page 2018
-
[47]
Schölkopf, B. & Smola, A. J.Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (The MIT Press, 2001)
work page 2001
-
[48]
N.The Nature of Statistical Learning Theory
Vapnik, V . N.The Nature of Statistical Learning Theory. Information Science and Statistics (Springer, New York, NY , 2000), 2 edn. 56.Pedregosa, F.et al.Scikit-learn: Machine learning in python.J. Mach. Learn. Res.12, 2825–2830 (2011). 57.Nocedal, J. Updating quasi-newton matrices with limited storage.Math. Comput.35, 773–782 (1980)
work page 2000
-
[49]
Breiman, L., Friedman, J., Olshen, R. A. & Stone, C. J.Classification and Regression Trees(Chapman and Hall/CRC, 1984), 1 edn
work page 1984
-
[50]
Ke, G.et al.Lightgbm: a highly efficient gradient boosting decision tree. InProceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, 3149–3157 (Curran Associates Inc., Red Hook, NY , USA, 2017)
work page 2017
-
[51]
Platt, J. C. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Smola, A. J., Bartlett, P., Schölkopf, B. & Schuurmans, D. (eds.)Advances in Large Margin Classifiers, 61–74 (MIT Press, 1999)
work page 1999
-
[52]
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization. InInternational Conference on Learning Representations (ICLR)(2015)
work page 2015
-
[53]
A software package for sequential quadratic programming
Kraft, D. A software package for sequential quadratic programming. Tech. Rep., Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (DFVLR) (1988)
work page 1988
-
[54]
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting.J. Mach. Learn. Res.15, 1929–1958 (2014). 64.Demšar, J. Statistical comparisons of classifiers over multiple data sets.The J. Mach. Learn. Res.7, 1–30 (2006)
work page 1929
-
[55]
Gneiting, T. & Raftery, A. E. Strictly proper scoring rules, prediction, and estimation.J. Am. Stat. Assoc.102, 359–378, DOI: 10.1198/016214506000001437 (2007). https://doi.org/10.1198/016214506000001437. 17/33
-
[56]
Anthony, M. & Bartlett, P. L.Neural Network Learning: Theoretical Foundations(Cambridge University Press, Cambridge, 1999)
work page 1999
-
[57]
Probability inequalities for sums of bounded random variables
Hoeffding, W. Probability inequalities for sums of bounded random variables. In Fisher, N. I. & Sen, P. K. (eds.)The Collected Works of Wassily Hoeffding, 409–426, DOI: 10.1007/978-1-4612-0865-5_26 (Springer New York, New York, NY , 1994)
-
[58]
Shalev-Shwartz, S. & Ben-David, S.Understanding Machine Learning: From Theory to Algorithms(Cambridge University Press, Cambridge, 2014)
work page 2014
-
[59]
Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. InInternational Conference on Learning Representations (ICLR)(2015)
work page 2015
-
[60]
Task” column indicates classification (C) or regression (Reg). “Bal
Madry, A., Makelov, A., Schmidt, L., Tsipras, D. & Vladu, A. Towards deep learning models resistant to adversarial attacks. InInternational Conference on Learning Representations(2018). 18/33 Supplementary Information Soft Learning Mohammed Aledhari, Ali Aledhari, Fatimah Aledhari, Mohamed Rahouti S1. Formal Framework and Definitions S1.1 Problem Setting ...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.