pith. sign in

arxiv: 1907.08392 · v1 · pith:WA74GXO6new · submitted 2019-07-19 · 💻 cs.LG · cs.AI· stat.ML

Automated Machine Learning in Practice: State of the Art and Recent Results

Pith reviewed 2026-05-24 19:15 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML
keywords AutoMLautomated machine learningbenchmarksstate of the artpractical applicabilitybusiness contextmachine learning pipelines
0
0 comments X

The pith

AutoML methods automate model building and deliver competitive results on business tasks per current benchmarks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reviews the state of the art in automated machine learning with emphasis on practical use in business settings. It surveys leading algorithms and reports recent benchmark results comparing their performance. Growing demand for machine learning skills drives interest in these tools because they aim to reduce reliance on scarce expert labor. The focus on applicability helps identify which systems can handle real deployment without extensive manual work. Readers gain a map of current options and evidence on how well they perform on representative tasks.

Core claim

This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.

What carries the argument

Empirical benchmarks comparing leading AutoML frameworks on datasets chosen to reflect business use cases.

If this is right

  • Organizations can apply AutoML tools to build predictive models with reduced need for specialized data scientists.
  • Certain AutoML frameworks show consistent accuracy across preprocessing, feature selection, and model choice steps.
  • Benchmark results identify which algorithms handle typical business data volumes and feature types effectively.
  • The overview supports decisions on tool selection by showing trade-offs in speed, accuracy, and ease of use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Wider adoption could speed up digitization projects in sectors with limited ML talent pools.
  • Extending benchmarks to time-series or unstructured data common in industry would test broader utility.
  • Combining AutoML outputs with domain-specific constraints might improve results on regulated business problems.

Load-bearing premise

The selected AutoML algorithms and benchmark tasks are representative of the most important methods and real-world business use cases.

What would settle it

A new set of benchmarks on diverse proprietary business datasets where all surveyed AutoML systems underperform manual expert tuning by a wide margin would undermine the claim of practical applicability.

Figures

Figures reproduced from arXiv: 1907.08392 by Anastasia Varlet, Christian Westermann, Katharina Rombach, Lukas Tuggener, Mohammadreza Amirian, Stefan L\"orwald, Thilo Stadelmann.

Figure 1
Figure 1. Figure 1: Schematic overview of the Portfolio Hyperband workflow. Portfolio Hyperband [13], [43]: Inspired by PoSH Auto￾sklearn [43] that combines a portfolio of initial configurations with successive halving (SH) and Bayesian optimization, we tested a system that combines a portfolio with Hyperband [13]. Our goal was to combine the portfolio variant of meta￾learning, which is very simple and fast, with Hyperband th… view at source ↗
read the original abstract

A main driver behind the digitization of industry and society is the belief that data-driven model building and decision making can contribute to higher degrees of automation and more informed decisions. Building such models from data often involves the application of some form of machine learning. Thus, there is an ever growing demand in work force with the necessary skill set to do so. This demand has given rise to a new research topic concerned with fitting machine learning models fully automatically - AutoML. This paper gives an overview of the state of the art in AutoML with a focus on practical applicability in a business context, and provides recent benchmark results on the most important AutoML algorithms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper provides an overview of the state of the art in Automated Machine Learning (AutoML), with emphasis on practical applicability in business contexts, and presents recent benchmark results on the most important AutoML algorithms.

Significance. If the benchmark results are reproducible and the selected algorithms and tasks are defensible as representative, the work could help practitioners identify suitable AutoML tools; however, the current lack of methodological detail and selection criteria reduces its value as a reliable reference.

major comments (2)
  1. [Abstract] Abstract: the claim of providing 'recent benchmark results on the most important AutoML algorithms' is load-bearing for the paper's contribution, yet the manuscript supplies no description of the experimental methodology, chosen datasets, performance metrics, statistical tests, or error analysis, rendering the results unverifiable.
  2. The central claim that the paper covers 'the most important AutoML algorithms' and benchmarks relevant to business use cases requires a documented selection protocol; no inclusion/exclusion criteria, systematic literature search description, or argument for coverage of dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) or business constraints (imbalance, missing data, interpretability) is supplied.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater methodological transparency. We address each major comment below and will revise the manuscript to strengthen these aspects.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of providing 'recent benchmark results on the most important AutoML algorithms' is load-bearing for the paper's contribution, yet the manuscript supplies no description of the experimental methodology, chosen datasets, performance metrics, statistical tests, or error analysis, rendering the results unverifiable.

    Authors: We agree that the abstract's emphasis on benchmark results requires supporting methodological detail to ensure verifiability. In the revision we will insert a new subsection (likely Section 4 or equivalent) that explicitly describes the experimental methodology, the chosen datasets and their characteristics, the performance metrics employed, the statistical tests used for comparisons, and any error or sensitivity analysis conducted. revision: yes

  2. Referee: The central claim that the paper covers 'the most important AutoML algorithms' and benchmarks relevant to business use cases requires a documented selection protocol; no inclusion/exclusion criteria, systematic literature search description, or argument for coverage of dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) or business constraints (imbalance, missing data, interpretability) is supplied.

    Authors: We concur that a documented selection protocol is needed to substantiate coverage of the most important algorithms and business-relevant constraints. The revised manuscript will add a dedicated subsection outlining the literature search strategy, explicit inclusion/exclusion criteria, and a rationale showing how the selected methods represent the dominant paradigms (Bayesian optimization, evolutionary methods, meta-learning, NAS) while addressing practical business issues such as class imbalance, missing data, and interpretability requirements. revision: yes

Circularity Check

0 steps flagged

Survey paper with no internal derivations exhibits no circularity

full rationale

This manuscript is a literature survey and benchmark report on AutoML methods drawn from external sources. It contains no mathematical derivations, predictions, or fitted parameters that could reduce to quantities defined within the paper itself. The selection of algorithms and tasks, while potentially open to critique on representativeness, does not constitute circularity under the defined criteria, as no load-bearing claim reduces by construction to self-defined inputs. The paper is self-contained against external benchmarks and literature.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a survey paper; the abstract introduces no new free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5662 in / 953 out tokens · 20004 ms · 2026-05-24T19:15:44.507336+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

  1. [1]

    Braschler, K

    M. Braschler, K. Stockinger, and T. Stadelmann (Eds.), Applied Data Science—Lessons Learned for the Data-Driven Business . Springer International Publishing, 2019

  2. [2]

    Learning neural models for end-to-end clustering,

    B. B. Meier, I. Elezi, M. Amirian, O. D ¨urr, and T. Stadelmann, “Learning neural models for end-to-end clustering,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition , pp. 126–138, Springer, 2018

  3. [3]

    Automatic machine learn- ing: methods, systems, challenges,

    F. Hutter, L. Kotthoff, and J. Vanschoren, “Automatic machine learn- ing: methods, systems, challenges,” Challenges in Machine Learning , 2019

  4. [4]

    Auto- weka: Combined selection and hyperparameter optimization of clas- sification algorithms,

    C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Auto- weka: Combined selection and hyperparameter optimization of clas- sification algorithms,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining , pp. 847–855, ACM, 2013

  5. [5]

    Machine learning for predictive maintenance: A multiple classifier approach,

    G. A. Susto, A. Schirru, S. Pampuri, S. McLoone, and A. Beghi, “Machine learning for predictive maintenance: A multiple classifier approach,” IEEE Transactions on Industrial Informatics, vol. 11, no. 3, pp. 812–820, 2015

  6. [6]

    Improving rail network velocity: A machine learning approach to predictive maintenance,

    H. Li, D. Parikh, Q. He, B. Qian, Z. Li, D. Fang, and A. Hampapur, “Improving rail network velocity: A machine learning approach to predictive maintenance,” Transportation Research Part C: Emerging Technologies, vol. 45, pp. 17–26, 2014

  7. [7]

    Machine learning algorithms for damage detection under operational and environmental variability,

    E. Figueiredo, G. Park, C. R. Farrar, K. Worden, and J. Figueiras, “Machine learning algorithms for damage detection under operational and environmental variability,” Structural Health Monitoring , vol. 10, no. 6, pp. 559–572, 2011

  8. [8]

    Frame- work for personalized prediction of treatment response in relapsing remitting multiple sclerosis,

    E. St ¨uhler, S. Braune, F. Lionetto, Y . Heer, P. Kassraian-Fard, E. Jules, C. Westermann, A. Bergmann, P. van Hvell, and N. S. Group, “Frame- work for personalized prediction of treatment response in relapsing remitting multiple sclerosis,” BMC medical research methodology , submitted

  9. [9]

    How neural networks can help loan officers to make better informed application decisions,

    M. Handzic, F. Tjandrawibawa, and J. Yeo, “How neural networks can help loan officers to make better informed application decisions,” Informing Science, vol. 6, pp. 97–109, 2003

  10. [10]

    Auto claim fraud detec- tion using bayesian learning neural networks,

    S. Viaene, G. Dedene, and R. A. Derrig, “Auto claim fraud detec- tion using bayesian learning neural networks,” Expert Systems with Applications, vol. 29, no. 3, pp. 653–666, 2005

  11. [11]

    Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance,

    J. M. P ´erez, J. Muguerza, O. Arbelaitz, I. Gurrutxaga, and J. I. Mart ´ın, “Consolidated tree classifier learning in a car insurance fraud detection domain with class imbalance,” in International Conference on Pattern Recognition and Image Analysis , pp. 381–389, Springer, 2005

  12. [12]

    A survey of machine learning techniques for food sales prediction,

    G. Tsoumakas, “A survey of machine learning techniques for food sales prediction,” Artificial Intelligence Review , pp. 1–7, 2018

  13. [13]

    Hyperband: A novel bandit-based approach to hyperparameter opti- mization,

    L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter opti- mization,” The Journal of Machine Learning Research , vol. 18, no. 1, pp. 6765–6816, 2017

  14. [14]

    Automated generation and selection of interpretable features for enterprise security,

    J. Duan, Z. Zeng, A. Oprea, and S. Vasudevan, “Automated generation and selection of interpretable features for enterprise security,” in 2018 IEEE International Conference on Big Data (Big Data) , pp. 1258– 1265, IEEE, 2018

  15. [15]

    Learning to learn by gradient descent by gradient descent,

    M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. De Freitas, “Learning to learn by gradient descent by gradient descent,” in Advances in Neural Information Processing Systems , pp. 3981–3989, 2016

  16. [16]

    Neural architecture search with reinforcement learning,

    B. Zoph and Q. V . Le, “Neural architecture search with reinforcement learning,” in Proceedings of International Conference on Learning Representations (ICLR), 2017

  17. [17]

    Efficient and robust automated machine learning,

    M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” in Advances in Neural Information Processing Systems , pp. 2962–2970, 2015

  18. [18]

    Feature selection as a one-player game,

    R. Gaudel and M. Sebag, “Feature selection as a one-player game,” in International Conference on Machine Learning , pp. 359–366, 2010

  19. [19]

    Explorekit: Automatic feature generation and selection,

    G. Katz, E. C. R. Shin, and D. Song, “Explorekit: Automatic feature generation and selection,” in Data Mining (ICDM), 2016 IEEE 16th International Conference on , pp. 979–984, IEEE, 2016

  20. [20]

    Learning feature engineering for classification,

    F. Nargesian, H. Samulowitz, U. Khurana, E. B. Khalil, and D. Turaga, “Learning feature engineering for classification,” in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI, vol. 17, pp. 2529–2535, 2017

  21. [21]

    Autolearnautomated feature generation and selection,

    A. Kaul, S. Maheshwary, and V . Pudi, “Autolearnautomated feature generation and selection,” in Data Mining (ICDM), 2017 IEEE Inter- national Conference on , pp. 217–226, IEEE, 2017

  22. [22]

    Stability selection,

    N. Meinshausen and P. B ¨uhlmann, “Stability selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol. 72, no. 4, pp. 417–473, 2010

  23. [23]

    Meta-learning by landmarking various learning algorithms.,

    B. Pfahringer, H. Bensusan, and C. G. Giraud-Carrier, “Meta-learning by landmarking various learning algorithms.,” in ICML, pp. 743–750, 2000

  24. [24]

    Learning curve prediction with bayesian neural networks,

    A. Klein, S. Falkner, J. T. Springenberg, and F. Hutter, “Learning curve prediction with bayesian neural networks,” 2016

  25. [25]

    Neural networks for predicting algorithm runtime distributions.,

    K. Eggensperger, M. Lindauer, and F. Hutter, “Neural networks for predicting algorithm runtime distributions.,” in IJCAI, pp. 1442–1448, 2018

  26. [26]

    A comparison of ranking methods for classification algorithm selection,

    P. B. Brazdil and C. Soares, “A comparison of ranking methods for classification algorithm selection,” inEuropean conference on machine learning, pp. 63–75, Springer, 2000

  27. [27]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

  28. [28]

    Learning to learn without gradient descent by gradient descent,

    Y . Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. de Freitas, “Learning to learn without gradient descent by gradient descent,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pp. 748–756, JMLR. org, 2017

  29. [29]

    Support-vector networks,

    C. Cortes and V . Vapnik, “Support-vector networks,” Machine learn- ing, vol. 20, no. 3, pp. 273–297, 1995

  30. [30]

    Simple and efficient archi- tecture search for convolutional neural networks,

    T. Elsken, J.-H. Metzen, and F. Hutter, “Simple and efficient archi- tecture search for convolutional neural networks,” in Proceedings of International Conference on Learning Representations (ICLR) , 2018

  31. [31]

    Large-scale evolution of image classifiers,

    E. Real, S. Moore, A. Selle, S. Saxena, Y . L. Suematsu, J. Tan, Q. V . Le, and A. Kurakin, “Large-scale evolution of image classifiers,” in Proceedings of the 34th International Conference on Machine Learning (D. Precup and Y . W. Teh, eds.), vol. 70 of Proceedings of Machine Learning Research , (International Convention Centre, Sydney, Australia), pp. 29...

  32. [32]

    Amc: Automl for model compression and acceleration on mobile devices,

    Y . He, J. Lin, Z. Liu, H. Wang, L.-J. Li, and S. Han, “Amc: Automl for model compression and acceleration on mobile devices,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 784–800, 2018

  33. [33]

    Analysis of the automl challenge series 2015-2018,

    I. Guyon, L. Sun-Hosoya, M. Boull ´e, H. Escalante, S. Escalera, Z. Liu, D. Jajetic, B. Ray, M. Saeed, M. Sebag, et al., “Analysis of the automl challenge series 2015-2018,” 2017

  34. [34]

    A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning

    E. Brochu, V . M. Cora, and N. De Freitas, “A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning,” arXiv preprint arXiv:1012.2599, 2010

  35. [35]

    Sequential model-based optimization for general algorithm configuration,

    F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Sequential model-based optimization for general algorithm configuration,” in International Conference on Learning and Intelligent Optimization , pp. 507–523, Springer, 2011

  36. [36]

    Using meta-learning to initialize bayesian optimization of hyperparameters,

    M. Feurer, J. T. Springenberg, and F. Hutter, “Using meta-learning to initialize bayesian optimization of hyperparameters,” in Proceedings of the 2014 International Conference on Meta-learning and Algorithm Selection-Volume 1201, pp. 3–10, Citeseer, 2014

  37. [37]

    Non-stochastic best arm identifica- tion and hyperparameter optimization,

    K. Jamieson and A. Talwalkar, “Non-stochastic best arm identifica- tion and hyperparameter optimization,” in Artificial Intelligence and Statistics, pp. 240–248, 2016

  38. [38]

    Population Based Training of Neural Networks

    M. Jaderberg, V . Dalibard, S. Osindero, W. M. Czarnecki, J. Don- ahue, A. Razavi, O. Vinyals, T. Green, I. Dunning, K. Simonyan, et al., “Population based training of neural networks,” arXiv preprint arXiv:1711.09846, 2017

  39. [39]

    Gradient-based hyper- parameter optimization through reversible learning,

    D. Maclaurin, D. Duvenaud, and R. Adams, “Gradient-based hyper- parameter optimization through reversible learning,” in International Conference on Machine Learning , pp. 2113–2122, 2015

  40. [40]

    Banzhaf, P

    W. Banzhaf, P. Nordin, R. E. Keller, and F. D. Francone, Genetic pro- gramming: an introduction, vol. 1. Morgan Kaufmann San Francisco, 1998

  41. [41]

    The kernel trick for distances,

    B. Sch ¨olkopf, “The kernel trick for distances,” in Advances in neural information processing systems , pp. 301–307, 2001

  42. [42]

    Atm: A distributed, collaborative, scalable system for automated machine learning,

    T. Swearingen, W. Drevo, B. Cyphers, A. Cuesta-Infante, A. Ross, and K. Veeramachaneni, “Atm: A distributed, collaborative, scalable system for automated machine learning,” in IEEE International Con- ference on Big Data , 2017

  43. [43]

    Practical automated machine learning for the automl challenge 2018,

    M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Practical automated machine learning for the automl challenge 2018,” in International Workshop on Automatic Machine Learning at ICML , 2018

  44. [44]

    Deep learning in the wild,

    T. Stadelmann, M. Amirian, I. Arabaci, M. Arnold, G. F. Duivesteijn, I. Elezi, M. Geiger, S. L ¨orwald, B. B. Meier, K. Rombach, et al. , “Deep learning in the wild,” in IAPR Workshop on Artificial Neural Networks in Pattern Recognition , pp. 17–38, Springer, 2018

  45. [45]

    Automating biomedical data science through tree-based pipeline optimization,

    R. S. Olson, R. J. Urbanowicz, P. C. Andrews, N. A. Lavender, J. H. Moore, et al., “Automating biomedical data science through tree-based pipeline optimization,” in European Conference on the Applications of Evolutionary Computation, pp. 123–137, Springer, 2016

  46. [46]

    Openml: Net- worked science in machine learning,

    J. Vanschoren, J. N. van Rijn, B. Bischl, and L. Torgo, “Openml: Net- worked science in machine learning,” SIGKDD Explorations, vol. 15, no. 2, pp. 49–60, 2013

  47. [47]

    Learning to Optimize

    K. Li and J. Malik, “Learning to optimize,” arXiv preprint arXiv:1606.01885, 2016