Towards AutoML in the presence of Drift: first results

Eduardo F. Morales; Hugo Jair Escalante; Isabelle Guyon; Jorge G. Madrid; Lisheng Sun-Hosoya; Michele Sebag; Wei-Wei Tu; Yang Yu

arxiv: 1907.10772 · v1 · pith:6CV6HHNBnew · submitted 2019-07-24 · 💻 cs.LG · cs.AI· stat.ML

Towards AutoML in the presence of Drift: first results

Jorge G. Madrid , Hugo Jair Escalante , Eduardo F. Morales , Wei-Wei Tu , Yang Yu , Lisheng Sun-Hosoya , Isabelle Guyon , Michele Sebag This is my paper

Pith reviewed 2026-05-24 16:35 UTC · model grok-4.3

classification 💻 cs.LG cs.AIstat.ML

keywords AutoMLconcept driftlifelong learningAuto-Sklearndata distribution changemodel adaptationsupervised learning

0 comments

The pith

Auto-Sklearn can be extended with drift detection to adapt models automatically in lifelong learning with slowly changing data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an AutoML approach for supervised learning tasks where data distributions evolve gradually rather than staying fixed. It modifies Auto-Sklearn by adding mechanisms that work with concept drift detectors to decide when initial models require updates. Experiments use benchmark data from AutoML competitions that match this slow-drift, lifelong-learning scenario. The results indicate that the combined system handles these conditions more effectively than the original Auto-Sklearn.

Core claim

By extending Auto-Sklearn with mechanisms that cope with non-stationary data and pairing it with concept drift detection, the system can automatically determine when models trained on earlier data must be adapted in a lifelong learning setting where distributions change relatively slowly.

What carries the argument

Extended Auto-Sklearn combined with concept drift detection techniques that trigger model adaptation.

If this is right

AutoML pipelines can be deployed in domains such as spam filtering or user-preference modeling without assuming fixed i.i.d. data.
Initial models can be maintained automatically instead of requiring manual retraining schedules.
The same AutoML search process can be reused across successive data windows once drift is detected.
Benchmark results from AutoML competitions become directly relevant to real evolving applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production systems using this approach might reduce the frequency of full retraining cycles.
The method could be combined with other drift-aware techniques such as online feature selection.
Further tests on datasets with varying drift speeds would clarify the range of applicability.

Load-bearing premise

Data distributions change relatively slowly over time in a lifelong learning setting.

What would settle it

A test on data that drifts rapidly enough that the drift detectors fail to trigger timely updates and performance falls below a static Auto-Sklearn baseline.

read the original abstract

Research progress in AutoML has lead to state of the art solutions that can cope quite wellwith supervised learning task, e.g., classification with AutoSklearn. However, so far thesesystems do not take into account the changing nature of evolving data over time (i.e., theystill assume i.i.d. data); even when this sort of domains are increasingly available in realapplications (e.g., spam filtering, user preferences, etc.). We describe a first attempt to de-velop an AutoML solution for scenarios in which data distribution changes relatively slowlyover time and in which the problem is approached in a lifelong learning setting. We extendAuto-Sklearn with sound and intuitive mechanisms that allow it to cope with this sort ofproblems. The extended Auto-Sklearn is combined with concept drift detection techniquesthat allow it to automatically determine when the initial models have to be adapted. Wereport experimental results in benchmark data from AutoML competitions that adhere tothis scenario. Results demonstrate the effectiveness of the proposed methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a modest first extension of Auto-Sklearn that adds drift detection to trigger updates under slow concept drift in lifelong learning.

read the letter

This paper is a first attempt to make AutoML handle slow concept drift in a lifelong learning setting. The authors extend Auto-Sklearn by adding drift detection so the system can decide when to adapt its models. They test the idea on benchmark data from AutoML competitions that fit the slow-change scenario and report that the approach is effective. What is new is the combination of AutoML search with drift detection rather than treating them separately. The mechanisms are described as sound and intuitive, which is a plus for a practical system. They correctly note that standard AutoML assumes stationary data, and they target the regime where changes happen gradually enough that full re-optimization remains possible. The paper does a good job of keeping the scope realistic. It does not claim to solve fast drift or provide a complete lifelong AutoML solution. That honesty makes the contribution easier to evaluate. The soft spots are the usual ones for early work. The results are summarized at a high level with no visible error bars or detailed comparisons in the abstract. It is not clear how much the drift detection adds over just running Auto-Sklearn periodically or on recent data. The computational overhead of repeated AutoML searches is also not discussed. These are not fatal, but they mean the evidence for effectiveness is still preliminary. This kind of paper is for people building AutoML tools that need to run over time with changing data. A reader working on streaming or online learning applications will find it a useful starting point. It deserves serious referee attention because the problem is important and the basic idea is reasonable. Referees can ask for more experiments and baselines, but the direction is worth pursuing. I would bring this to a reading group if the group is interested in AutoML or concept drift. I would not cite it in my own work yet because it is framed as first results. It should go to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript describes a first attempt to extend Auto-Sklearn for lifelong learning under slow concept drift. It augments the system with mechanisms to adapt models and combines it with concept-drift detectors that trigger re-optimization when distribution shift is detected; effectiveness is demonstrated on AutoML-competition benchmarks that exhibit gradual drift.

Significance. If the empirical gains hold under proper controls, the work fills a clear gap: standard AutoML pipelines assume i.i.d. data, yet many deployed tasks (spam, user preferences) exhibit slow drift. The approach is pragmatic—leveraging existing drift detectors rather than inventing new ones—and the use of public competition benchmarks supports reproducibility.

major comments (3)

[§4] §4 (Method): the precise interface between the drift detector and Auto-Sklearn’s meta-learning / ensemble stages is not specified. It is unclear whether the detector merely triggers a full re-run or whether it supplies a warm-start or a restricted search space; without this detail the claim that the extension is “sound and intuitive” cannot be evaluated.
[§5] §5 (Experiments): no baseline that runs vanilla Auto-Sklearn periodically (or with a fixed retraining schedule) is reported. Consequently it is impossible to isolate the contribution of the drift detector versus the simple fact of periodic re-optimization.
[Table 2] Table 2 / Figure 3: the reported accuracy curves lack error bars or statistical significance tests across the multiple runs or folds; given that Auto-Sklearn itself is stochastic, the visual improvement cannot be assessed for robustness.

minor comments (2)

[Abstract] Abstract: “lead” should be “led”; “thesesystems” is missing a space.
Notation: the manuscript uses both “concept drift” and “distribution shift” without clarifying whether they are treated as synonyms or distinct notions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Method): the precise interface between the drift detector and Auto-Sklearn’s meta-learning / ensemble stages is not specified. It is unclear whether the detector merely triggers a full re-run or whether it supplies a warm-start or a restricted search space; without this detail the claim that the extension is “sound and intuitive” cannot be evaluated.

Authors: We agree the interface description in §4 is insufficiently precise. The drift detector triggers a complete re-optimization of the Auto-Sklearn pipeline (including meta-learning and ensemble construction) upon detection; no warm-start or search-space restriction is currently passed from the detector. We will expand §4 with an explicit diagram and pseudocode clarifying this trigger-only interface and the exact extensions made to Auto-Sklearn’s lifelong-learning loop. revision: yes
Referee: [§5] §5 (Experiments): no baseline that runs vanilla Auto-Sklearn periodically (or with a fixed retraining schedule) is reported. Consequently it is impossible to isolate the contribution of the drift detector versus the simple fact of periodic re-optimization.

Authors: The referee correctly identifies a missing control. We will add a new baseline that periodically invokes vanilla Auto-Sklearn at fixed intervals chosen to match the average detection frequency observed with the drift detectors. Results for this baseline will be reported alongside the proposed method in the revised §5 and Table 2. revision: yes
Referee: [Table 2] Table 2 / Figure 3: the reported accuracy curves lack error bars or statistical significance tests across the multiple runs or folds; given that Auto-Sklearn itself is stochastic, the visual improvement cannot be assessed for robustness.

Authors: We accept this criticism. Auto-Sklearn’s internal stochasticity (random seeds, model selection) makes single-run curves insufficient. In the revision we will repeat all experiments with at least five independent seeds, add error bars (standard deviation) to Figure 3 and the accuracy curves in Table 2, and report paired statistical significance tests (e.g., Wilcoxon) between methods. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper describes an empirical extension of Auto-Sklearn combined with concept drift detection for slowly changing data distributions in a lifelong learning setting. It reports results on AutoML competition benchmarks without presenting equations, derivations, fitted parameters used as predictions, or load-bearing self-citations. The core claims rest on experimental outcomes rather than any self-referential definitions or reductions of results to inputs by construction. No instances of the enumerated circularity patterns are present.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Based solely on the abstract, the approach rests on the domain assumption of slow drift and the availability of suitable benchmark datasets; no free parameters or invented entities are mentioned.

axioms (1)

domain assumption Data distribution changes relatively slowly over time
Stated in the abstract as the scenario the method targets.

pith-pipeline@v0.9.0 · 5735 in / 1122 out tokens · 39530 ms · 2026-05-24T16:35:19.389166+00:00 · methodology

Towards AutoML in the presence of Drift: first results

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)