Automated Machine Learning in Practice: State of the Art and Recent Results
Pith reviewed 2026-05-24 19:15 UTC · model grok-4.3
The pith
AutoML methods automate model building and deliver competitive results on business tasks per current benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.
What carries the argument
Empirical benchmarks comparing leading AutoML frameworks on datasets chosen to reflect business use cases.
If this is right
- Organizations can apply AutoML tools to build predictive models with reduced need for specialized data scientists.
- Certain AutoML frameworks show consistent accuracy across preprocessing, feature selection, and model choice steps.
- Benchmark results identify which algorithms handle typical business data volumes and feature types effectively.
- The overview supports decisions on tool selection by showing trade-offs in speed, accuracy, and ease of use.
Where Pith is reading between the lines
- Wider adoption could speed up digitization projects in sectors with limited ML talent pools.
- Extending benchmarks to time-series or unstructured data common in industry would test broader utility.
- Combining AutoML outputs with domain-specific constraints might improve results on regulated business problems.
Load-bearing premise
The selected AutoML algorithms and benchmark tasks are representative of the most important methods and real-world business use cases.
What would settle it
A new set of benchmarks on diverse proprietary business datasets where all surveyed AutoML systems underperform manual expert tuning by a wide margin would undermine the claim of practical applicability.
Figures
read the original abstract
A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically - AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper provides an overview of the state of the art in Automated Machine Learning (AutoML), with emphasis on practical applicability in business contexts, and presents recent benchmark results on the most important AutoML algorithms.
Significance. If the benchmark results are reproducible and the selected algorithms and tasks are defensible as representative, the work could help practitioners identify suitable AutoML tools; however, the current lack of methodological detail and selection criteria reduces its value as a reliable reference.
major comments (2)
- [Abstract] Abstract: the claim of providing 'recent benchmark results on the most important AutoML algorithms' is load-bearing for the paper's contribution, yet the manuscript supplies no description of the experimental methodology, chosen datasets, performance metrics, statistical tests, or error analysis, rendering the results unverifiable.
- The central claim that the paper covers 'the most important AutoML algorithms' and benchmarks relevant to business use cases requires a documented selection protocol; no inclusion/exclusion criteria, systematic literature search description, or argument for coverage of dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) or business constraints (imbalance, missing data, interpretability) is supplied.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback highlighting the need for greater methodological transparency. We address each major comment below and will revise the manuscript to strengthen these aspects.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim of providing 'recent benchmark results on the most important AutoML algorithms' is load-bearing for the paper's contribution, yet the manuscript supplies no description of the experimental methodology, chosen datasets, performance metrics, statistical tests, or error analysis, rendering the results unverifiable.
Authors: We agree that the abstract's emphasis on benchmark results requires supporting methodological detail to ensure verifiability. In the revision we will insert a new subsection (likely Section 4 or equivalent) that explicitly describes the experimental methodology, the chosen datasets and their characteristics, the performance metrics employed, the statistical tests used for comparisons, and any error or sensitivity analysis conducted. revision: yes
-
Referee: The central claim that the paper covers 'the most important AutoML algorithms' and benchmarks relevant to business use cases requires a documented selection protocol; no inclusion/exclusion criteria, systematic literature search description, or argument for coverage of dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) or business constraints (imbalance, missing data, interpretability) is supplied.
Authors: We concur that a documented selection protocol is needed to substantiate coverage of the most important algorithms and business-relevant constraints. The revised manuscript will add a dedicated subsection outlining the literature search strategy, explicit inclusion/exclusion criteria, and a rationale showing how the selected methods represent the dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) while addressing practical business issues such as class imbalance, missing data, and interpretability requirements. revision: yes
Circularity Check
Survey paper with no internal derivations exhibits no circularity
full rationale
This manuscript is a literature survey and benchmark report on AutoML methods drawn from external sources. It contains no mathematical derivations, predictions, or fitted parameters that could reduce to quantities defined within the paper itself. The selection of algorithms and tasks, while potentially open to critique on representativeness, does not constitute circularity under the defined criteria, as no load-bearing claim reduces by construction to self-defined inputs. The paper is self-contained against external benchmarks and literature.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
M. Braschler, K. Stockinger, and T. Stadelmann (Eds.), Applied Data Science—Lessons Learned for the Data-Driven Business . Springer International Publishing, 2019
work page 2019
-
[2]
Learning neural models for end-to-end clustering,
B. B. Meier, I. Elezi, M. Amirian, O. D ¨urr, and T. Stadelmann, “Learning neural models for end-to-end clustering,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition , pp. 126–138, Springer, 2018
work page 2018
-
[3]
Automatic machine learn- ing: methods, systems, challenges,
F. Hutter, L. Kotthoff, and J. Vanschoren, “Automatic machine learn- ing: methods, systems, challenges,” Challenges in Machine Learning , 2019
work page 2019
-
[4]
Auto- weka: Combined selection and hyperparameter optimization of clas- sification algorithms,
C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto- weka: Combined selection and hyperparameter optimization of clas- sification algorithms,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 847–855, ACM, 2013
work page 2013
-
[5]
Machine learning for predictive maintenance: A multiple classifier approach,
G. A. Susto, A. Schirru, S. Pampuri, S. McLoone, and A. Beghi, “Machine learning for predictive maintenance: A multiple classifier approach,” IEEE Transactions on Industrial Informatics, vol. 11, no. 3, pp. 812–820, 2015
work page 2015
-
[6]
Improving rail network velocity: A machine learning approach to predictive maintenance,
H. Li, D. Parikh, Q. He, B. Qian, Z. Li, D. Fang, and A. Hampapur, “Improving rail network velocity: A machine learning approach to predictive maintenance,” Transportation Research Part C: Emerging Technologies, vol. 45, pp. 17–26, 2014
work page 2014
-
[7]
Machine learning algorithms for damage detection under operational and environmental variability,
E. Figueiredo, G. Park, C. R. Farrar, K. Worden, and J. Figueiras, “Machine learning algorithms for damage detection under operational and environmental variability,” Structural Health Monitoring , vol. 10, no. 6, pp. 559–572, 2011
work page 2011
-
[8]
E. St ¨uhler, S. Braune, F. Lionetto, Y . Heer, P. Kassraian-Fard, E. Jules, C. Westermann, A. Bergmann, P. van Hvell, and N. S. Group, “Frame- work for personalized prediction of treatment response in relapsing remitting multiple sclerosis,” BMC medical research methodology , submitted
-
[9]
How neural networks can help loan officers to make better informed application decisions,
M. Handzic, F. Tjandrawibawa, and J. Yeo, “How neural networks can help loan officers to make better informed application decisions,” Informing Science, vol. 6, pp. 97–109, 2003
work page 2003
-
[10]
Auto claim fraud detec- tion using bayesian learning neural networks,
S. Viaene, G. Dedene, and R. A. Derrig, “Auto claim fraud detec- tion using bayesian learning neural networks,” Expert Systems with Applications, vol. 29, no. 3, pp. 653–666, 2005
work page 2005
-
[11]
Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance,
J. M. P ´erez, J. Muguerza, O. Arbelaitz, I. Gurrutxaga, and J. I. Mart ´ın, “Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance,” in International Conference on Pattern Recognition and Image Analysis , pp. 381–389, Springer, 2005
work page 2005
-
[12]
A survey of machine learning techniques for food sales prediction,
G. Tsoumakas, “A survey of machine learning techniques for food sales prediction,” Artificial Intelligence Review , pp. 1–7, 2018
work page 2018
-
[13]
Hyperband: A novel bandit-based approach to hyperparameter opti- mization,
L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter opti- mization,” The Journal of Machine Learning Research , vol. 18, no. 1, pp. 6765–6816, 2017
work page 2017
-
[14]
Automated generation and selection of interpretable features for enterprise security,
J. Duan, Z. Zeng, A. Oprea, and S. Vasudevan, “Automated generation and selection of interpretable features for enterprise security,” in 2018 IEEE International Conference on Big Data (Big Data) , pp. 1258– 1265, IEEE, 2018
work page 2018
-
[15]
Learning to learn by gradient descent by gradient descent,
M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems , pp. 3981–3989, 2016
work page 2016
-
[16]
Neural architecture search with reinforcement learning,
B. Zoph and Q. V . Le, “Neural architecture search with reinforcement learning,” in Proceedings of International Conference on Learning Representations (ICLR), 2017
work page 2017
-
[17]
Efficient and robust automated machine learning,
M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems , pp. 2962–2970, 2015
work page 2015
-
[18]
Feature selection as a one-player game,
R. Gaudel and M. Sebag, “Feature selection as a one-player game,” in International Conference on Machine Learning , pp. 359–366, 2010
work page 2010
-
[19]
Explorekit: Automatic feature generation and selection,
G. Katz, E. C. R. Shin, and D. Song, “Explorekit: Automatic feature generation and selection,” in Data Mining (ICDM), 2016 IEEE 16th International Conference on , pp. 979–984, IEEE, 2016
work page 2016
-
[20]
Learning feature engineering for classification,
F. Nargesian, H. Samulowitz, U. Khurana, E. B. Khalil, and D. Turaga, “Learning feature engineering for classification,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI, vol. 17, pp. 2529–2535, 2017
work page 2017
-
[21]
Autolearnautomated feature generation and selection,
A. Kaul, S. Maheshwary, and V . Pudi, “Autolearnautomated feature generation and selection,” in Data Mining (ICDM), 2017 IEEE Inter- national Conference on , pp. 217–226, IEEE, 2017
work page 2017
-
[22]
N. Meinshausen and P. B ¨uhlmann, “Stability selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 72, no. 4, pp. 417–473, 2010
work page 2010
-
[23]
Meta-learning by landmarking various learning algorithms.,
B. Pfahringer, H. Bensusan, and C. G. Giraud-Carrier, “Meta-learning by landmarking various learning algorithms.,” in ICML, pp. 743–750, 2000
work page 2000
-
[24]
Learning curve prediction with bayesian neural networks,
A. Klein, S. Falkner, J. T. Springenberg, and F. Hutter, “Learning curve prediction with bayesian neural networks,” 2016
work page 2016
-
[25]
Neural networks for predicting algorithm runtime distributions.,
K. Eggensperger, M. Lindauer, and F. Hutter, “Neural networks for predicting algorithm runtime distributions.,” in IJCAI, pp. 1442–1448, 2018
work page 2018
-
[26]
A comparison of ranking methods for classification algorithm selection,
P. B. Brazdil and C. Soares, “A comparison of ranking methods for classification algorithm selection,” inEuropean conference on machine learning, pp. 63–75, Springer, 2000
work page 2000
-
[27]
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997
work page 1997
-
[28]
Learning to learn without gradient descent by gradient descent,
Y . Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. de Freitas, “Learning to learn without gradient descent by gradient descent,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 748–756, JMLR. org, 2017
work page 2017
-
[29]
C. Cortes and V . Vapnik, “Support-vector networks,” Machine learn- ing, vol. 20, no. 3, pp. 273–297, 1995
work page 1995
-
[30]
Simple and efficient archi- tecture search for convolutional neural networks,
T. Elsken, J.-H. Metzen, and F. Hutter, “Simple and efficient archi- tecture search for convolutional neural networks,” in Proceedings of International Conference on Learning Representations (ICLR) , 2018
work page 2018
-
[31]
Large-scale evolution of image classifiers,
E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine Learning (D. Precup and Y . W. Teh, eds.), vol. 70 of Proceedings of Machine Learning Research , (International Convention Centre, Sydney, Australia), pp. 29...
work page 2017
-
[32]
Amc: Automl for model compression and acceleration on mobile devices,
Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018
work page 2018
-
[33]
Analysis of the automl challenge series 2015-2018,
I. Guyon, L. Sun-Hosoya, M. Boull ´e, H. Escalante, S. Escalera, Z. Liu, D. Jajetic, B. Ray, M. Saeed, M. Sebag, et al., “Analysis of the automl challenge series 2015-2018,” 2017
work page 2015
-
[34]
E. Brochu, V . M. Cora, and N. De Freitas, “A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[35]
Sequential model-based optimization for general algorithm configuration,
F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in International Conference on Learning and Intelligent Optimization , pp. 507–523, Springer, 2011
work page 2011
-
[36]
Using meta-learning to initialize bayesian optimization of hyperparameters,
M. Feurer, J. T. Springenberg, and F. Hutter, “Using meta-learning to initialize bayesian optimization of hyperparameters,” in Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection-Volume 1201, pp. 3–10, Citeseer, 2014
work page 2014
-
[37]
Non-stochastic best arm identifica- tion and hyperparameter optimization,
K. Jamieson and A. Talwalkar, “Non-stochastic best arm identifica- tion and hyperparameter optimization,” in Artificial Intelligence and Statistics, pp. 240–248, 2016
work page 2016
-
[38]
Population Based Training of Neural Networks
M. Jaderberg, V . Dalibard, S. Osindero, W. M. Czarnecki, J. Don- ahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al., “Population based training of neural networks,” arXiv preprint arXiv:1711.09846, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[39]
Gradient-based hyper- parameter optimization through reversible learning,
D. Maclaurin, D. Duvenaud, and R. Adams, “Gradient-based hyper- parameter optimization through reversible learning,” in International Conference on Machine Learning , pp. 2113–2122, 2015
work page 2015
-
[40]
W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic pro- gramming: an introduction, vol. 1. Morgan Kaufmann San Francisco, 1998
work page 1998
-
[41]
The kernel trick for distances,
B. Sch ¨olkopf, “The kernel trick for distances,” in Advances in neural information processing systems , pp. 301–307, 2001
work page 2001
-
[42]
Atm: A distributed, collaborative, scalable system for automated machine learning,
T. Swearingen, W. Drevo, B. Cyphers, A. Cuesta-Infante, A. Ross, and K. Veeramachaneni, “Atm: A distributed, collaborative, scalable system for automated machine learning,” in IEEE International Con- ference on Big Data , 2017
work page 2017
-
[43]
Practical automated machine learning for the automl challenge 2018,
M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Practical automated machine learning for the automl challenge 2018,” in International Workshop on Automatic Machine Learning at ICML , 2018
work page 2018
-
[44]
T. Stadelmann, M. Amirian, I. Arabaci, M. Arnold, G. F. Duivesteijn, I. Elezi, M. Geiger, S. L ¨orwald, B. B. Meier, K. Rombach, et al. , “Deep learning in the wild,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition , pp. 17–38, Springer, 2018
work page 2018
-
[45]
Automating biomedical data science through tree-based pipeline optimization,
R. S. Olson, R. J. Urbanowicz, P. C. Andrews, N. A. Lavender, J. H. Moore, et al., “Automating biomedical data science through tree-based pipeline optimization,” in European Conference on the Applications of Evolutionary Computation, pp. 123–137, Springer, 2016
work page 2016
-
[46]
Openml: Net- worked science in machine learning,
J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo, “Openml: Net- worked science in machine learning,” SIGKDD Explorations, vol. 15, no. 2, pp. 49–60, 2013
work page 2013
-
[47]
K. Li and J. Malik, “Learning to optimize,” arXiv preprint arXiv:1606.01885, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.