Efficient Novelty-Driven Neural Architecture Search

Huiqi Li; Miao Zhang; Shirui Pan; Steven Su; Taoping Liu

arxiv: 1907.09109 · v1 · pith:QV5UHNV7new · submitted 2019-07-22 · 💻 cs.LG · cs.NE· stat.ML

Efficient Novelty-Driven Neural Architecture Search

Miao Zhang , Huiqi Li , Shirui Pan , Taoping Liu , Steven Su This is my paper

Pith reviewed 2026-05-24 17:59 UTC · model grok-4.3

classification 💻 cs.LG cs.NEstat.ML

keywords neural architecture searchone-shot NASnovelty searchweight sharingCIFAR-10Penn Treebanksupernet training

0 comments

The pith

Sampling architectures by novelty trains a one-shot supernet whose inherited weights better predict final test accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard one-shot neural architecture search struggles because validation accuracy on inherited weights from the supernet does not reliably indicate how well an architecture will perform after independent training. This paper replaces reward-based controllers with a novelty search procedure that selects diverse architectures for each training step of a single-path supernet. The resulting shared weights yield a stronger correlation with true test accuracy. Using this method the authors locate an architecture that reaches 2.51 percent test error on CIFAR-10 after 7.5 hours on a single GPU and competitive perplexities on PTB language modeling.

Core claim

The paper establishes that sampling architectures for supernet training via novelty search, rather than via a learned controller or validation accuracy, yields a weight-sharing model in which the accuracy of a candidate with inherited weights more reliably forecasts its accuracy after full retraining. A single-path supernet is employed so that only a subset of weights is updated per step. Extensive experiments confirm that this sampling strategy discovers high-performing cells that achieve state-of-the-art results on CIFAR-10 and PTB while requiring only modest search compute.

What carries the argument

Novelty search procedure that ranks candidate architectures by how different they are from previously sampled ones, used to decide which sub-networks update the single-path supernet at each step.

If this is right

The discovered architectures transfer successfully to ImageNet and WikiText-2.
Search completes in 7.5 hours on one GPU while matching or exceeding prior one-shot methods.
Memory footprint stays low because only one path is active during supernet training.
The absence of a separate controller simplifies the overall algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If novelty sampling improves correlation, then other diversity-promoting strategies such as evolutionary algorithms might produce similar gains.
Single-path supernets combined with novelty could scale to larger search spaces where full supernets become intractable.
The approach leaves open whether the same novelty criterion would help in multi-path or progressive shrinking variants of one-shot NAS.

Load-bearing premise

Novelty-based sampling of architectures during supernet training will produce inherited weights that correlate more strongly with final test accuracy than accuracy-based or controller-based sampling.

What would settle it

Running the novelty search procedure on the same search space and then measuring the Spearman rank correlation between inherited validation accuracy and independently trained test accuracy; if the correlation stays as low as in prior one-shot methods, the central premise fails.

read the original abstract

One-Shot Neural architecture search (NAS) attracts broad attention recently due to its capacity to reduce the computational hours through weight sharing. However, extensive experiments on several recent works show that there is no positive correlation between the validation accuracy with inherited weights from the supernet and the test accuracy after re-training for One-Shot NAS. Different from devising a controller to find the best performing architecture with inherited weights, this paper focuses on how to sample architectures to train the supernet to make it more predictive. A single-path supernet is adopted, where only a small part of weights are optimized in each step, to reduce the memory demand greatly. Furthermore, we abandon devising complicated reward based architecture sampling controller, and sample architectures to train supernet based on novelty search. An efficient novelty search method for NAS is devised in this paper, and extensive experiments demonstrate the effectiveness and efficiency of our novelty search based architecture sampling method. The best architecture obtained by our algorithm with the same search space achieves the state-of-the-art test error rate of 2.51\% on CIFAR-10 with only 7.5 hours search time in a single GPU, and a validation perplexity of 60.02 and a test perplexity of 57.36 on PTB. We also transfer these search cell structures to larger datasets ImageNet and WikiText-2, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper replaces controllers with novelty sampling for supernet training but reports no correlation measurements or ablations to support the key claim.

read the letter

The paper's main move is to drop the usual reward-based controller and instead sample architectures for supernet training using a novelty search procedure. They stick with a single-path supernet to keep memory low and report 2.51% CIFAR-10 error after 7.5 GPU hours plus reasonable PTB perplexities. The final numbers are competitive and the search cost is modest, which matters for anyone trying to run NAS in practice. The single-path choice is a sensible engineering step that reduces the usual memory overhead. The soft spot is the missing link between the novelty sampling and the stated goal. The abstract notes that prior one-shot methods lack correlation between supernet accuracy and final test accuracy, then positions novelty sampling as the fix, yet nothing in the reported experiments measures that correlation, compares novelty sampling against random or accuracy-driven baselines on the same supernet, or even defines the novelty metric in detail. Without those controls the good final numbers could come from the search space, the single-path design, or post-hoc choices rather than the sampling method itself. The work engages honestly with a known limitation in the one-shot NAS literature. It is worth sending to referees so they can request the ablations and correlation data; a reader working on practical NAS would get some value from seeing the method even if the central mechanism remains unproven.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a one-shot NAS method that trains a single-path supernet by sampling architectures according to a novelty search criterion instead of a learned controller or validation accuracy. The central claim is that novelty-driven sampling produces supernet weights whose inherited accuracies correlate more strongly with final test accuracy after independent retraining, enabling efficient discovery of high-performing architectures. The authors report a 2.51% CIFAR-10 test error (7.5 GPU-hours) and PTB validation/test perplexities of 60.02/57.36, with transfer results on ImageNet and WikiText-2.

Significance. If the novelty-sampling hypothesis is substantiated, the work would offer a controller-free, memory-efficient alternative to existing one-shot NAS pipelines while matching or exceeding reported performance on standard benchmarks. The single-path supernet design is a practical contribution to reducing memory footprint.

major comments (3)

[Abstract, §4] Abstract and §4 (Experiments): the central claim that novelty sampling improves correlation between inherited supernet accuracy and post-retraining test accuracy is unsupported; no correlation coefficients, scatter plots, or ablation tables compare novelty sampling against random or accuracy-driven sampling on the identical single-path supernet.
[§3] §3 (Method): the novelty metric itself is never defined mathematically; no equation, distance function, or algorithm box specifies how novelty scores are computed or how they determine sampling probabilities.
[§4] §4 (Experiments): the reported SOTA numbers (2.51% CIFAR-10, PTB perplexities) are not accompanied by any statistical test or controlled comparison isolating the novelty component from the single-path supernet architecture or post-hoc selection procedure.

minor comments (1)

[§3] Notation for the single-path supernet and weight-sharing scheme is introduced without a clear diagram or pseudocode, making the memory-reduction claim harder to verify.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract, §4] Abstract and §4 (Experiments): the central claim that novelty sampling improves correlation between inherited supernet accuracy and post-retraining test accuracy is unsupported; no correlation coefficients, scatter plots, or ablation tables compare novelty sampling against random or accuracy-driven sampling on the identical single-path supernet.

Authors: We agree that direct quantitative support for the correlation claim is missing. The manuscript reports final architecture performance after novelty-driven supernet training but does not include correlation coefficients, scatter plots, or ablations against random or accuracy-driven sampling on the same single-path supernet. We will add these analyses in the revised version. revision: yes
Referee: [§3] §3 (Method): the novelty metric itself is never defined mathematically; no equation, distance function, or algorithm box specifies how novelty scores are computed or how they determine sampling probabilities.

Authors: Section 3 introduces the novelty search criterion for sampling, but we acknowledge that a precise mathematical definition is not provided. We will add the formal equation for the novelty score (including the distance function) and an algorithm box describing how novelty determines sampling probabilities in the revised manuscript. revision: yes
Referee: [§4] §4 (Experiments): the reported SOTA numbers (2.51% CIFAR-10, PTB perplexities) are not accompanied by any statistical test or controlled comparison isolating the novelty component from the single-path supernet architecture or post-hoc selection procedure.

Authors: The experiments section presents benchmark results and comparisons to prior NAS methods. However, we agree that statistical tests and controlled ablations isolating the novelty sampling component from the supernet design and selection procedure are not included. We will incorporate these in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on external public benchmarks

full rationale

The paper evaluates its novelty-driven sampling method via final test error rates on CIFAR-10 and perplexities on PTB after independent retraining of discovered architectures. These are independent external benchmarks rather than quantities defined inside the paper's equations or fitted parameters. No self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citation chains appear in the abstract or described derivation. The single-path supernet and novelty sampling procedure are presented as design choices whose effectiveness is measured against public datasets, making the central result self-contained against external validation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that novelty-based sampling improves supernet predictiveness; no explicit free parameters or invented entities are stated in the abstract.

axioms (1)

domain assumption Weight sharing within a single-path supernet allows partial weight updates to train many architectures efficiently.
Invoked when the paper adopts the single-path supernet to reduce memory demand.

pith-pipeline@v0.9.0 · 5780 in / 1213 out tokens · 27266 ms · 2026-05-24T17:59:29.545883+00:00 · methodology

Efficient Novelty-Driven Neural Architecture Search

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)