Two-stage Optimization for Machine Learning Workflow
Pith reviewed 2026-05-25 12:25 UTC · model grok-4.3
The pith
A two-stage optimization builds data pipelines before configuring algorithms and finds preprocessing has larger impact than hyperparameter tuning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that data pipeline construction contributes more to model performance than algorithm configuration, that time can be split efficiently between the two stages using agnostic policies, and that a pipeline-algorithm specificity metric enables targeted pruning and meta-learning.
What carries the argument
Two-stage optimization process that separates data pipeline search from algorithm configuration, together with time-allocation policies and a pipeline specificity metric.
If this is right
- Machine learning model building can be automated by allocating more search resources to pipeline construction than to parameter tuning.
- Time-allocation policies can be used with any meta-optimizer to balance the two stages without redesign.
- The specificity metric supports removal of algorithm-dependent pipelines and transfer of pipelines across algorithms for faster cold starts.
- Production deployment of machine learning becomes more scalable when pipeline search is treated as the primary stage.
Where Pith is reading between the lines
- The same separation of pipeline and configuration stages could be tested in domains outside standard supervised learning such as reinforcement learning or time-series forecasting.
- The policies might be adapted to dynamic time budgets that change during a single run based on early performance signals.
- Combining the specificity metric with existing pipeline libraries could reduce redundant searches across many algorithms.
Load-bearing premise
The observed greater impact of data pipelines over algorithm configuration, along with the effectiveness of the time policies, will hold for datasets and meta-optimizers beyond those used in the experiments.
What would settle it
Repeating the experiments on a fresh collection of datasets with a different meta-optimizer and obtaining results where algorithm configuration consistently improves performance more than pipeline search would falsify the central importance claim.
Figures
read the original abstract
Machines learning techniques plays a preponderant role in dealing with massive amount of data and are employed in almost every possible domain. Building a high quality machine learning model to be deployed in production is a challenging task, from both, the subject matter experts and the machine learning practitioners. For a broader adoption and scalability of machine learning systems, the construction and configuration of machine learning workflow need to gain in automation. In the last few years, several techniques have been developed in this direction, known as autoML. In this paper, we present a two-stage optimization process to build data pipelines and configure machine learning algorithms. First, we study the impact of data pipelines compared to algorithm configuration in order to show the importance of data preprocessing over hyperparameter tuning. The second part presents policies to efficiently allocate search time between data pipeline construction and algorithm configuration. Those policies are agnostic from the metaoptimizer. Last, we present a metric to determine if a data pipeline is specific or independent from the algorithm, enabling fine-grain pipeline pruning and meta-learning for the coldstart problem.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-stage optimization process for machine learning workflows. It first empirically studies the relative impact of data pipeline construction versus algorithm hyperparameter configuration to argue for the greater importance of data preprocessing. It then introduces meta-optimizer-agnostic policies for allocating search time between the two stages and defines a metric to assess whether a given data pipeline is algorithm-specific or independent, supporting pruning and meta-learning for cold-start problems.
Significance. If the empirical comparisons and policy evaluations hold under broader conditions, the work could inform more efficient AutoML designs by directing attention to data preprocessing and providing practical, optimizer-independent time-allocation heuristics. The pipeline-specificity metric is a concrete, potentially reusable contribution for meta-learning pipelines.
major comments (2)
- [Experiments] Experiments section: the headline claim that data pipelines have greater impact than algorithm configuration rests on results from a fixed collection of datasets and particular meta-optimizers. Without additional cross-domain validation (e.g., on high-dimensional, noisy, or heterogeneous-feature datasets) or sensitivity analysis, the general methodological recommendation does not follow.
- [Time Allocation Policies] Time-allocation policies section: the reported effectiveness of the proposed policies is demonstrated only within the same experimental setup; the meta-optimizer-agnostic claim requires explicit tests on at least one additional search algorithm or a different class of meta-optimizer to confirm robustness.
minor comments (2)
- [Abstract] Abstract contains grammatical errors ('Machines learning techniques plays' should read 'Machine learning techniques play').
- [Metric Definition] Notation for the specificity metric should be introduced with a clear equation or definition before its use in the pruning discussion.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below, indicating planned revisions to the manuscript.
read point-by-point responses
-
Referee: [Experiments] Experiments section: the headline claim that data pipelines have greater impact than algorithm configuration rests on results from a fixed collection of datasets and particular meta-optimizers. Without additional cross-domain validation (e.g., on high-dimensional, noisy, or heterogeneous-feature datasets) or sensitivity analysis, the general methodological recommendation does not follow.
Authors: We agree that the headline claim would be strengthened by broader validation. The experiments use a collection of standard benchmark datasets, but we acknowledge the limitation regarding cross-domain coverage. In the revision we will add a sensitivity analysis subsection and results on additional high-dimensional and heterogeneous-feature datasets to better support the scope of the methodological recommendation. revision: yes
-
Referee: [Time Allocation Policies] Time-allocation policies section: the reported effectiveness of the proposed policies is demonstrated only within the same experimental setup; the meta-optimizer-agnostic claim requires explicit tests on at least one additional search algorithm or a different class of meta-optimizer to confirm robustness.
Authors: The policies are formulated without reference to any particular meta-optimizer internals and are therefore intended to be agnostic. Nevertheless, the empirical demonstration was limited to the optimizers used in the study. To confirm robustness we will include results with at least one additional search algorithm (from a different class) in the revised Time Allocation Policies section. revision: yes
Circularity Check
No circularity: empirical study with no derivations or self-referential fitting
full rationale
The paper describes a two-stage optimization process and reports experimental comparisons of data-pipeline impact versus algorithm configuration, plus time-allocation policies and a specificity metric. No equations, derivations, fitted parameters presented as predictions, or load-bearing self-citations appear in the provided text. All central claims rest on direct empirical measurements rather than any reduction by construction to the paper's own inputs or prior self-citations.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Automated Machine Learning: State-of-The-Art and Open Challenges
R. Elshawi, M. Maher, S. Sakr, Automated machine learning: State-of- the-art and open challenges (2019).arXiv:1906.02287
work page internal anchor Pith review Pith/arXiv arXiv 2019
- [2]
-
[3]
S. F. Crone, S. Lessmann, R. Stahlbock, The impact of preprocessing on data mining: An evaluation of classifier sensitivity in direct marketing, Eur. J. Oper. Res. 173 (3) (2006) 781 – 800
work page 2006
-
[4]
T. Dasu, T. Johnson, Exploratory data mining and data cleaning, Vol. 479, John Wiley & Sons, 2003
work page 2003
-
[5]
N. M. Nawi, W. H. Atomi, M. Z. Rehman, The effect of data pre- processing on optimized training of artificial neural networks, Procedia Technology 11 (2013) 32 – 39, int. Conf. Elect. Eng. Info
work page 2013
-
[6]
D. H. Wolpert, The lack of a priori distinctions between learning algo- rithms, Neural Comput. 8 (7) (1996) 1341–1390
work page 1996
-
[7]
M. Chessell, F. Scheepers, N. Nguyen, R. van Kessel, R. van der Starre, Governing and managing big data for analytics and decision makers, IBM Redguides for Business Leaders
-
[8]
Quemy, Data pipeline selection and optimization, in: Pro
A. Quemy, Data pipeline selection and optimization, in: Pro. Int. Work- shop on Design, Optim., Languages and Anal. Processing of Big Data, 2019
work page 2019
-
[9]
D. C. Montgomery, Design and analysis of experiments, John wiley & sons, 2017. 26
work page 2017
-
[10]
J. Bergstra, Y. Bengio, Random search for hyper-parameter optimization, J. Mach. Learn. Res. 13 (Feb) (2012) 281–305
work page 2012
- [11]
-
[12]
J. Bergstra, R. Bardenet, Y. Bengio, B. Kégl, Algorithms for hyper- parameter optimization, in: Proc. Int. Conf. Neural Inf. Process. Syst., 2011, pp. 2546–2554
work page 2011
-
[13]
C. Thornton, F. Hutter, H. H. Hoos, K. Leyton-Brown, Auto-weka: Combined selection and hyperparameter optimization of classification algorithms, in: Int. Conf. Knowl. Disc. Data Min., ACM, 2013, pp. 847–855
work page 2013
-
[14]
L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, K. Leyton-Brown, Auto- weka 2.0: Automatic model selection and hyperparameter optimization in weka, J. Mach. Learn. Res. 18 (1) (2017) 826–830
work page 2017
- [15]
-
[16]
J. Bergstra, B. Komer, C. Eliasmith, D. Yamins, D. D. Cox, Hyperopt: a python library for model selection and hyperparameter optimization, Comput. Sci. & Discovery 8 (1) (2015) 014008
work page 2015
- [17]
- [18]
-
[19]
P. I. Frazier, A tutorial on bayesian optimization, arXiv preprint arXiv:1807.02811. 27
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
J. Močkus, On bayesian methods for seeking the extremum, in: Op- timization Techniques IFIP Technical Conference, Springer, 1975, pp. 400–404
work page 1975
-
[21]
H. Rakotoarison, M. Sebag, AutoML with Monte Carlo Tree Search, in: Workshop AutoML 2018 @ ICML/IJCAI-ECAI, Pavel Brazdil, Christophe Giraud-Carrier, and Isabelle Guyon, Stockholm, Sweden, 2018
work page 2018
- [22]
-
[23]
Freeze-Thaw Bayesian Optimization
K. Swersky, J. Snoek, R. P. Adams, Freeze-thaw bayesian optimization, arXiv preprint arXiv:1406.3896
work page internal anchor Pith review Pith/arXiv arXiv
-
[24]
K. Jamieson, A. Talwalkar, Non-stochastic best arm identification and hyperparameter optimization, in: Artificial Intelligence and Statistics, 2016, pp. 240–248
work page 2016
-
[25]
L. Li, K. Jamieson, Hyperband: A novel bandit-based approach to hyperparameter optimization, J. Mach. Learn. Res. 18 (2018) 1–52
work page 2018
-
[26]
J.Nalepa, M.Myller, S.Piechaczek, K.Hrynczenko, M.Kawulok, Genetic selection of training sets for (not only) artificial neural networks, in: Proc. Int. Conf. Beyond Databases, Architectures Struct., 2018, pp. 194–206
work page 2018
-
[27]
R. S. Olson, N. Bartley, R. J. Urbanowicz, J. H. Moore, Evaluation of a tree-based pipeline optimization tool for automating data science, in: Proc. Gen. and Evol. Comput. Conf., ACM, 2016, pp. 485–492
work page 2016
-
[28]
B. Chen, H. Wu, W. Mo, I. Chattopadhyay, H. Lipson, Autostacker: A compositional evolutionary learning system, in: Proc. Gen. and Evol. Comput. Conf., 2018, pp. 402–409
work page 2018
- [29]
-
[30]
J. Kim, S. Kim, S. Choi, Learning to warm-start bayesian hyperparameter optimization, arXiv preprint arXiv:1710.06219. 28
work page internal anchor Pith review Pith/arXiv arXiv
-
[31]
B. Bilalli, A. Abelló, T. Aluja-Banet, On the predictive power of meta- features in openml, Int. J. Appl. Math. Comput. Sci. 27 (4) (2017) 697–712
work page 2017
-
[32]
J. Vanschoren, Meta-learning: A survey, arXiv preprint arXiv:1810.03548
work page internal anchor Pith review Pith/arXiv arXiv
-
[33]
D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander- plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res. 12 (2011) 2825–2830
work page 2011
-
[35]
K. Eggensperger, M. Feurer, F. Hutter, J. Bergstra, J. Snoek, H. Hoos, K. Leyton-Brown, Towards an empirical foundation for assessing bayesian optimization of hyperparameters, in: NIPS workshop on Bayesian Opti- mization in Theory and Practice, 2013
work page 2013
-
[36]
B. Bilalli, A. Abelló, T. Aluja-Banet, R. Wrembel, Intelligent assistance for data pre-processing, Computer Standards & Interfaces (2018) 101 – 109
work page 2018
- [37]
-
[38]
A. Chowdhury, M. Magdon-Ismail, B. Yener, Quantifying error contribu- tions of computational steps, algorithms and hyperparameter choices in image classification pipelines, CoRR abs/1903.02521. 29 Appendix A. Pipeline configuration space Table A.5: Pipeline search space. #𝜆 |Λ| impl. Rebalance No operator 0 0 - Near Miss 1 3 imblearn Condensed Nearest Neig...
work page internal anchor Pith review Pith/arXiv arXiv 1903
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.