Two-stage Optimization for Machine Learning Workflow

Alexandre Quemy

arxiv: 1907.00678 · v1 · pith:DRQUBKZ4new · submitted 2019-07-01 · 💻 cs.LG · cs.AI

Two-stage Optimization for Machine Learning Workflow

Alexandre Quemy This is my paper

Pith reviewed 2026-05-25 12:25 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords autoMLdata pipelineshyperparameter tuningmachine learning workflowstwo-stage optimizationtime allocationpipeline specificity

0 comments

The pith

A two-stage optimization builds data pipelines before configuring algorithms and finds preprocessing has larger impact than hyperparameter tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-stage optimization process for machine learning workflows that first constructs data pipelines and then tunes algorithm settings. Experiments compare the two stages to establish that data preprocessing steps often influence final model quality more than adjustments to algorithm parameters. Time-allocation policies are given to divide search effort between the stages in a way that does not depend on any particular meta-optimizer. A metric is also defined to measure whether a given pipeline is tied to one algorithm or works independently, which supports pruning and cold-start meta-learning. These elements together aim to reduce manual work in building production machine learning systems.

Core claim

The paper claims that data pipeline construction contributes more to model performance than algorithm configuration, that time can be split efficiently between the two stages using agnostic policies, and that a pipeline-algorithm specificity metric enables targeted pruning and meta-learning.

What carries the argument

Two-stage optimization process that separates data pipeline search from algorithm configuration, together with time-allocation policies and a pipeline specificity metric.

If this is right

Machine learning model building can be automated by allocating more search resources to pipeline construction than to parameter tuning.
Time-allocation policies can be used with any meta-optimizer to balance the two stages without redesign.
The specificity metric supports removal of algorithm-dependent pipelines and transfer of pipelines across algorithms for faster cold starts.
Production deployment of machine learning becomes more scalable when pipeline search is treated as the primary stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same separation of pipeline and configuration stages could be tested in domains outside standard supervised learning such as reinforcement learning or time-series forecasting.
The policies might be adapted to dynamic time budgets that change during a single run based on early performance signals.
Combining the specificity metric with existing pipeline libraries could reduce redundant searches across many algorithms.

Load-bearing premise

The observed greater impact of data pipelines over algorithm configuration, along with the effectiveness of the time policies, will hold for datasets and meta-optimizers beyond those used in the experiments.

What would settle it

Repeating the experiments on a fresh collection of datasets with a different meta-optimizer and obtaining results where algorithm configuration consistently improves performance more than pipeline search would falsify the central importance claim.

Figures

Figures reproduced from arXiv: 1907.00678 by Alexandre Quemy.

**Figure 1.** Figure 1: Typical machine learning workflow On one hand, there are plenty of reasons that can explain why a data source cannot be used directly and require preprocessing: too many variables, imbalanced dataset, missing values, outliers, noise, specific domain restriction of the algorithms, etc. On the other hand, data preprocessing has a huge impact on the model performances [4, 3, 5]. 2 [PITH_FULL_IMAGE:figures/fu… view at source ↗

**Figure 2.** Figure 2: Example of real-life pipelines designed with SAS (left) and IBM Watson Studio [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

**Figure 3.** Figure 3: Two-stage optimization process The proposed two-stage optimization process is illustrated by [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Each node can be instantiated with an operator or left empty. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: Density of pipeline configurations (left). The vertical line represents the baseline [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Accuracy depending on the time spent on each phase of the optimization process. [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: Evolution of the best score in time for different policies. Split 300 and Split 0 are [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Heatmap depicting the accuracy depending on the pipeline parameter configura [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗

read the original abstract

Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames autoML as two-stage pipeline-then-config optimization with meta-optimizer-agnostic time policies and a pipeline specificity metric, but the key claim on preprocessing importance rests on experiments whose reach beyond the tested cases is unclear.

read the letter

The main takeaway is that the paper proposes a two-stage optimization for machine learning workflows, separating data pipeline building from algorithm configuration, and introduces time-allocation policies that work across different meta-optimizers along with a metric for pipeline specificity. What stands out as useful is the focus on making autoML more efficient by deciding how much time to spend on each stage. The claim that data preprocessing has a bigger impact than hyperparameter tuning is backed by their study, which could help practitioners prioritize correctly. The agnostic nature of the policies is a good point because it means they can be applied on top of various existing optimizers without changing them. The specificity metric seems like a reasonable way to handle pruning and cold-start issues in meta-learning. The potential issue is whether those findings about the relative importance of pipelines hold up outside the datasets and optimizers they used. The stress-test note raises a fair point that without some analysis of sensitivity to data characteristics or search methods, the conclusion might be tied to their specific setup. If the paper only shows results on a handful of benchmarks, that would be a limitation worth noting in review. The abstract doesn't give details on the experiments, so the strength of the evidence is hard to gauge from what's here, though the full text may clarify this. Overall, this paper is for researchers and engineers working on automated machine learning systems who are looking for ways to improve workflow construction and search efficiency. Someone already familiar with autoML literature might see this as an incremental but practical step. I would recommend sending it to peer review. The core ideas are clear and address a real problem in deployment, even if the experimental validation needs closer scrutiny to confirm the generality of the results.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-stage optimization process for machine learning workflows. It first empirically studies the relative impact of data pipeline construction versus algorithm hyperparameter configuration to argue for the greater importance of data preprocessing. It then introduces meta-optimizer-agnostic policies for allocating search time between the two stages and defines a metric to assess whether a given data pipeline is algorithm-specific or independent, supporting pruning and meta-learning for cold-start problems.

Significance. If the empirical comparisons and policy evaluations hold under broader conditions, the work could inform more efficient AutoML designs by directing attention to data preprocessing and providing practical, optimizer-independent time-allocation heuristics. The pipeline-specificity metric is a concrete, potentially reusable contribution for meta-learning pipelines.

major comments (2)

[Experiments] Experiments section: the headline claim that data pipelines have greater impact than algorithm configuration rests on results from a fixed collection of datasets and particular meta-optimizers. Without additional cross-domain validation (e.g., on high-dimensional, noisy, or heterogeneous-feature datasets) or sensitivity analysis, the general methodological recommendation does not follow.
[Time Allocation Policies] Time-allocation policies section: the reported effectiveness of the proposed policies is demonstrated only within the same experimental setup; the meta-optimizer-agnostic claim requires explicit tests on at least one additional search algorithm or a different class of meta-optimizer to confirm robustness.

minor comments (2)

[Abstract] Abstract contains grammatical errors ('Machines learning techniques plays' should read 'Machine learning techniques play').
[Metric Definition] Notation for the specificity metric should be introduced with a clear equation or definition before its use in the pruning discussion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: the headline claim that data pipelines have greater impact than algorithm configuration rests on results from a fixed collection of datasets and particular meta-optimizers. Without additional cross-domain validation (e.g., on high-dimensional, noisy, or heterogeneous-feature datasets) or sensitivity analysis, the general methodological recommendation does not follow.

Authors: We agree that the headline claim would be strengthened by broader validation. The experiments use a collection of standard benchmark datasets, but we acknowledge the limitation regarding cross-domain coverage. In the revision we will add a sensitivity analysis subsection and results on additional high-dimensional and heterogeneous-feature datasets to better support the scope of the methodological recommendation. revision: yes
Referee: [Time Allocation Policies] Time-allocation policies section: the reported effectiveness of the proposed policies is demonstrated only within the same experimental setup; the meta-optimizer-agnostic claim requires explicit tests on at least one additional search algorithm or a different class of meta-optimizer to confirm robustness.

Authors: The policies are formulated without reference to any particular meta-optimizer internals and are therefore intended to be agnostic. Nevertheless, the empirical demonstration was limited to the optimizers used in the study. To confirm robustness we will include results with at least one additional search algorithm (from a different class) in the revised Time Allocation Policies section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical study with no derivations or self-referential fitting

full rationale

The paper describes a two-stage optimization process and reports experimental comparisons of data-pipeline impact versus algorithm configuration, plus time-allocation policies and a specificity metric. No equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. All central claims rest on direct empirical measurements rather than any reduction by construction to the paper's own inputs or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5700 in / 953 out tokens · 17769 ms · 2026-05-25T12:25:05.121716+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 7 internal anchors

[1]

Automated Machine Learning: State-of-The-Art and Open Challenges

R. Elshawi, M. Maher, S. Sakr, Automated machine learning: State-of- the-art and open challenges (2019).arXiv:1906.02287

work page internal anchor Pith review Pith/arXiv arXiv 2019
[2]

Hutter, L

F. Hutter, L. Kotthoﬀ, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Challenges in Mach. Learn

work page
[3]

S. F. Crone, S. Lessmann, R. Stahlbock, The impact of preprocessing on data mining: An evaluation of classiﬁer sensitivity in direct marketing, Eur. J. Oper. Res. 173 (3) (2006) 781 – 800

work page 2006
[4]

T. Dasu, T. Johnson, Exploratory data mining and data cleaning, Vol. 479, John Wiley & Sons, 2003

work page 2003
[5]

N. M. Nawi, W. H. Atomi, M. Z. Rehman, The eﬀect of data pre- processing on optimized training of artiﬁcial neural networks, Procedia Technology 11 (2013) 32 – 39, int. Conf. Elect. Eng. Info

work page 2013
[6]

D. H. Wolpert, The lack of a priori distinctions between learning algo- rithms, Neural Comput. 8 (7) (1996) 1341–1390

work page 1996
[7]

Chessell, F

M. Chessell, F. Scheepers, N. Nguyen, R. van Kessel, R. van der Starre, Governing and managing big data for analytics and decision makers, IBM Redguides for Business Leaders

work page
[8]

Quemy, Data pipeline selection and optimization, in: Pro

A. Quemy, Data pipeline selection and optimization, in: Pro. Int. Work- shop on Design, Optim., Languages and Anal. Processing of Big Data, 2019

work page 2019
[9]

D. C. Montgomery, Design and analysis of experiments, John wiley & sons, 2017. 26

work page 2017
[10]

Bergstra, Y

J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res. 13 (Feb) (2012) 281–305

work page 2012
[11]

Hutter, H

F. Hutter, H. H. Hoos, K. Leyton-Brown, Sequential model-based opti- mization for general algorithm conﬁguration, in: Proc. Int. Conf. Learn. Intel. Optim., Springer-Verlag, Berlin, Heidelberg, 2011, pp. 507–523

work page 2011
[12]

Bergstra, R

J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper- parameter optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2011, pp. 2546–2554

work page 2011
[13]

Thornton, F

C. Thornton, F. Hutter, H. H. Hoos, K. Leyton-Brown, Auto-weka: Combined selection and hyperparameter optimization of classiﬁcation algorithms, in: Int. Conf. Knowl. Disc. Data Min., ACM, 2013, pp. 847–855

work page 2013
[14]

Kotthoﬀ, C

L. Kotthoﬀ, C. Thornton, H. H. Hoos, F. Hutter, K. Leyton-Brown, Auto- weka 2.0: Automatic model selection and hyperparameter optimization in weka, J. Mach. Learn. Res. 18 (1) (2017) 826–830

work page 2017
[15]

Feurer, A

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hut- ter, Eﬃcient and robust automated machine learning, in: C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett (Eds.), Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 2962–2970

work page 2015
[16]

Bergstra, B

J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. Cox, Hyperopt: a python library for model selection and hyperparameter optimization, Comput. Sci. & Discovery 8 (1) (2015) 014008

work page 2015
[17]

Snoek, H

J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algorithms, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2012, pp. 2951–2959

work page 2012
[18]

Wilson, F

J. Wilson, F. Hutter, M. Deisenroth, Maximizing acquisition functions for bayesian optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 9884–9895

work page 2018
[19]

P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811. 27

work page internal anchor Pith review Pith/arXiv arXiv
[20]

Močkus, On bayesian methods for seeking the extremum, in: Op- timization Techniques IFIP Technical Conference, Springer, 1975, pp

J. Močkus, On bayesian methods for seeking the extremum, in: Op- timization Techniques IFIP Technical Conference, Springer, 1975, pp. 400–404

work page 1975
[21]

Rakotoarison, M

H. Rakotoarison, M. Sebag, AutoML with Monte Carlo Tree Search, in: Workshop AutoML 2018 @ ICML/IJCAI-ECAI, Pavel Brazdil, Christophe Giraud-Carrier, and Isabelle Guyon, Stockholm, Sweden, 2018

work page 2018
[22]

Domhan, J

T. Domhan, J. T. Springenberg, F. Hutter, Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, in: Int. Conf. Artif. Intel., 2015

work page 2015
[23]

Freeze-Thaw Bayesian Optimization

K. Swersky, J. Snoek, R. P. Adams, Freeze-thaw bayesian optimization, arXiv preprint arXiv:1406.3896

work page internal anchor Pith review Pith/arXiv arXiv
[24]

Jamieson, A

K. Jamieson, A. Talwalkar, Non-stochastic best arm identiﬁcation and hyperparameter optimization, in: Artiﬁcial Intelligence and Statistics, 2016, pp. 240–248

work page 2016
[25]

L. Li, K. Jamieson, Hyperband: A novel bandit-based approach to hyperparameter optimization, J. Mach. Learn. Res. 18 (2018) 1–52

work page 2018
[26]

J.Nalepa, M.Myller, S.Piechaczek, K.Hrynczenko, M.Kawulok, Genetic selection of training sets for (not only) artiﬁcial neural networks, in: Proc. Int. Conf. Beyond Databases, Architectures Struct., 2018, pp. 194–206

work page 2018
[27]

R. S. Olson, N. Bartley, R. J. Urbanowicz, J. H. Moore, Evaluation of a tree-based pipeline optimization tool for automating data science, in: Proc. Gen. and Evol. Comput. Conf., ACM, 2016, pp. 485–492

work page 2016
[28]

B. Chen, H. Wu, W. Mo, I. Chattopadhyay, H. Lipson, Autostacker: A compositional evolutionary learning system, in: Proc. Gen. and Evol. Comput. Conf., 2018, pp. 402–409

work page 2018
[29]

X. Sun, J. Lin, B. Bischl, Reinbo: Machine learning pipeline search and conﬁguration with bayesian optimization embedded reinforcement learning, CoRR abs/1904.05381

work page arXiv 1904
[30]

J. Kim, S. Kim, S. Choi, Learning to warm-start bayesian hyperparameter optimization, arXiv preprint arXiv:1710.06219. 28

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Bilalli, A

B. Bilalli, A. Abelló, T. Aluja-Banet, On the predictive power of meta- features in openml, Int. J. Appl. Math. Comput. Sci. 27 (4) (2017) 697–712

work page 2017
[32]

Meta-Learning: A Survey

J. Vanschoren, Meta-learning: A survey, arXiv preprint arXiv:1810.03548

work page internal anchor Pith review Pith/arXiv arXiv
[33]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830

work page 2011
[35]

Eggensperger, M

K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, K. Leyton-Brown, Towards an empirical foundation for assessing bayesian optimization of hyperparameters, in: NIPS workshop on Bayesian Opti- mization in Theory and Practice, 2013

work page 2013
[36]

Bilalli, A

B. Bilalli, A. Abelló, T. Aluja-Banet, R. Wrembel, Intelligent assistance for data pre-processing, Computer Standards & Interfaces (2018) 101 – 109

work page 2018
[37]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. G. Manto- vani, J. N. van Rijn, J. Vanschoren, Openml benchmarking suites and the openml100, arXiv preprint arXiv:1708.03731

work page arXiv
[38]

Quantifying error contributions of computational steps, algorithms and hyperparameter choices in image classification pipelines

A. Chowdhury, M. Magdon-Ismail, B. Yener, Quantifying error contribu- tions of computational steps, algorithms and hyperparameter choices in image classiﬁcation pipelines, CoRR abs/1903.02521. 29 Appendix A. Pipeline configuration space Table A.5: Pipeline search space. #𝜆 |Λ| impl. Rebalance No operator 0 0 - Near Miss 1 3 imblearn Condensed Nearest Neig...

work page internal anchor Pith review Pith/arXiv arXiv 1903

[1] [1]

Automated Machine Learning: State-of-The-Art and Open Challenges

R. Elshawi, M. Maher, S. Sakr, Automated machine learning: State-of- the-art and open challenges (2019).arXiv:1906.02287

work page internal anchor Pith review Pith/arXiv arXiv 2019

[2] [2]

Hutter, L

F. Hutter, L. Kotthoﬀ, J. Vanschoren, Automatic machine learning: methods, systems, challenges, Challenges in Mach. Learn

work page

[3] [3]

S. F. Crone, S. Lessmann, R. Stahlbock, The impact of preprocessing on data mining: An evaluation of classiﬁer sensitivity in direct marketing, Eur. J. Oper. Res. 173 (3) (2006) 781 – 800

work page 2006

[4] [4]

T. Dasu, T. Johnson, Exploratory data mining and data cleaning, Vol. 479, John Wiley & Sons, 2003

work page 2003

[5] [5]

N. M. Nawi, W. H. Atomi, M. Z. Rehman, The eﬀect of data pre- processing on optimized training of artiﬁcial neural networks, Procedia Technology 11 (2013) 32 – 39, int. Conf. Elect. Eng. Info

work page 2013

[6] [6]

D. H. Wolpert, The lack of a priori distinctions between learning algo- rithms, Neural Comput. 8 (7) (1996) 1341–1390

work page 1996

[7] [7]

Chessell, F

M. Chessell, F. Scheepers, N. Nguyen, R. van Kessel, R. van der Starre, Governing and managing big data for analytics and decision makers, IBM Redguides for Business Leaders

work page

[8] [8]

Quemy, Data pipeline selection and optimization, in: Pro

A. Quemy, Data pipeline selection and optimization, in: Pro. Int. Work- shop on Design, Optim., Languages and Anal. Processing of Big Data, 2019

work page 2019

[9] [9]

D. C. Montgomery, Design and analysis of experiments, John wiley & sons, 2017. 26

work page 2017

[10] [10]

Bergstra, Y

J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res. 13 (Feb) (2012) 281–305

work page 2012

[11] [11]

Hutter, H

F. Hutter, H. H. Hoos, K. Leyton-Brown, Sequential model-based opti- mization for general algorithm conﬁguration, in: Proc. Int. Conf. Learn. Intel. Optim., Springer-Verlag, Berlin, Heidelberg, 2011, pp. 507–523

work page 2011

[12] [12]

Bergstra, R

J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper- parameter optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2011, pp. 2546–2554

work page 2011

[13] [13]

Thornton, F

C. Thornton, F. Hutter, H. H. Hoos, K. Leyton-Brown, Auto-weka: Combined selection and hyperparameter optimization of classiﬁcation algorithms, in: Int. Conf. Knowl. Disc. Data Min., ACM, 2013, pp. 847–855

work page 2013

[14] [14]

Kotthoﬀ, C

L. Kotthoﬀ, C. Thornton, H. H. Hoos, F. Hutter, K. Leyton-Brown, Auto- weka 2.0: Automatic model selection and hyperparameter optimization in weka, J. Mach. Learn. Res. 18 (1) (2017) 826–830

work page 2017

[15] [15]

Feurer, A

M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, F. Hut- ter, Eﬃcient and robust automated machine learning, in: C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett (Eds.), Proc. Int. Conf. Neural Inf. Process. Syst., 2015, pp. 2962–2970

work page 2015

[16] [16]

Bergstra, B

J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. Cox, Hyperopt: a python library for model selection and hyperparameter optimization, Comput. Sci. & Discovery 8 (1) (2015) 014008

work page 2015

[17] [17]

Snoek, H

J. Snoek, H. Larochelle, R. P. Adams, Practical bayesian optimization of machine learning algorithms, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2012, pp. 2951–2959

work page 2012

[18] [18]

Wilson, F

J. Wilson, F. Hutter, M. Deisenroth, Maximizing acquisition functions for bayesian optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2018, pp. 9884–9895

work page 2018

[19] [19]

P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811. 27

work page internal anchor Pith review Pith/arXiv arXiv

[20] [20]

Močkus, On bayesian methods for seeking the extremum, in: Op- timization Techniques IFIP Technical Conference, Springer, 1975, pp

J. Močkus, On bayesian methods for seeking the extremum, in: Op- timization Techniques IFIP Technical Conference, Springer, 1975, pp. 400–404

work page 1975

[21] [21]

Rakotoarison, M

H. Rakotoarison, M. Sebag, AutoML with Monte Carlo Tree Search, in: Workshop AutoML 2018 @ ICML/IJCAI-ECAI, Pavel Brazdil, Christophe Giraud-Carrier, and Isabelle Guyon, Stockholm, Sweden, 2018

work page 2018

[22] [22]

Domhan, J

T. Domhan, J. T. Springenberg, F. Hutter, Speeding up automatic hyperparameter optimization of deep neural networks by extrapolation of learning curves, in: Int. Conf. Artif. Intel., 2015

work page 2015

[23] [23]

Freeze-Thaw Bayesian Optimization

K. Swersky, J. Snoek, R. P. Adams, Freeze-thaw bayesian optimization, arXiv preprint arXiv:1406.3896

work page internal anchor Pith review Pith/arXiv arXiv

[24] [24]

Jamieson, A

K. Jamieson, A. Talwalkar, Non-stochastic best arm identiﬁcation and hyperparameter optimization, in: Artiﬁcial Intelligence and Statistics, 2016, pp. 240–248

work page 2016

[25] [25]

L. Li, K. Jamieson, Hyperband: A novel bandit-based approach to hyperparameter optimization, J. Mach. Learn. Res. 18 (2018) 1–52

work page 2018

[26] [26]

J.Nalepa, M.Myller, S.Piechaczek, K.Hrynczenko, M.Kawulok, Genetic selection of training sets for (not only) artiﬁcial neural networks, in: Proc. Int. Conf. Beyond Databases, Architectures Struct., 2018, pp. 194–206

work page 2018

[27] [27]

R. S. Olson, N. Bartley, R. J. Urbanowicz, J. H. Moore, Evaluation of a tree-based pipeline optimization tool for automating data science, in: Proc. Gen. and Evol. Comput. Conf., ACM, 2016, pp. 485–492

work page 2016

[28] [28]

B. Chen, H. Wu, W. Mo, I. Chattopadhyay, H. Lipson, Autostacker: A compositional evolutionary learning system, in: Proc. Gen. and Evol. Comput. Conf., 2018, pp. 402–409

work page 2018

[29] [29]

X. Sun, J. Lin, B. Bischl, Reinbo: Machine learning pipeline search and conﬁguration with bayesian optimization embedded reinforcement learning, CoRR abs/1904.05381

work page arXiv 1904

[30] [30]

J. Kim, S. Kim, S. Choi, Learning to warm-start bayesian hyperparameter optimization, arXiv preprint arXiv:1710.06219. 28

work page internal anchor Pith review Pith/arXiv arXiv

[31] [31]

Bilalli, A

B. Bilalli, A. Abelló, T. Aluja-Banet, On the predictive power of meta- features in openml, Int. J. Appl. Math. Comput. Sci. 27 (4) (2017) 697–712

work page 2017

[32] [32]

Meta-Learning: A Survey

J. Vanschoren, Meta-learning: A survey, arXiv preprint arXiv:1810.03548

work page internal anchor Pith review Pith/arXiv arXiv

[33] [33]

D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830

work page 2011

[35] [35]

Eggensperger, M

K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, K. Leyton-Brown, Towards an empirical foundation for assessing bayesian optimization of hyperparameters, in: NIPS workshop on Bayesian Opti- mization in Theory and Practice, 2013

work page 2013

[36] [36]

Bilalli, A

B. Bilalli, A. Abelló, T. Aluja-Banet, R. Wrembel, Intelligent assistance for data pre-processing, Computer Standards & Interfaces (2018) 101 – 109

work page 2018

[37] [37]

Bischl, G

B. Bischl, G. Casalicchio, M. Feurer, F. Hutter, M. Lang, R. G. Manto- vani, J. N. van Rijn, J. Vanschoren, Openml benchmarking suites and the openml100, arXiv preprint arXiv:1708.03731

work page arXiv

[38] [38]

Quantifying error contributions of computational steps, algorithms and hyperparameter choices in image classification pipelines

A. Chowdhury, M. Magdon-Ismail, B. Yener, Quantifying error contribu- tions of computational steps, algorithms and hyperparameter choices in image classiﬁcation pipelines, CoRR abs/1903.02521. 29 Appendix A. Pipeline configuration space Table A.5: Pipeline search space. #𝜆 |Λ| impl. Rebalance No operator 0 0 - Near Miss 1 3 imblearn Condensed Nearest Neig...

work page internal anchor Pith review Pith/arXiv arXiv 1903