Neural Architecture Search with Reinforcement Learning

Barret Zoph; Quoc V. Le

arxiv: 1611.01578 · v2 · pith:BHOWV3D6new · submitted 2016-11-05 · 💻 cs.LG · cs.AI· cs.NE

Neural Architecture Search with Reinforcement Learning

Barret Zoph , Quoc V. Le This is my paper

classification 💻 cs.LG cs.AIcs.NE

keywords modelcellneuralstate-of-the-artachievesarchitecturelearningnetworks

0 comments

read the original abstract

Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network to generate the model descriptions of neural networks and train this RNN with reinforcement learning to maximize the expected accuracy of the generated architectures on a validation set. On the CIFAR-10 dataset, our method, starting from scratch, can design a novel network architecture that rivals the best human-invented architecture in terms of test set accuracy. Our CIFAR-10 model achieves a test error rate of 3.65, which is 0.09 percent better and 1.05x faster than the previous state-of-the-art model that used a similar architectural scheme. On the Penn Treebank dataset, our model can compose a novel recurrent cell that outperforms the widely-used LSTM cell, and other state-of-the-art baselines. Our cell achieves a test set perplexity of 62.4 on the Penn Treebank, which is 3.6 perplexity better than the previous state-of-the-art model. The cell can also be transferred to the character language modeling task on PTB and achieves a state-of-the-art perplexity of 1.214.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 45 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 conditional novelty 8.0

AutoTTS discovers width-depth test-time scaling controllers through agentic search in a pre-collected trajectory environment, yielding better accuracy-cost tradeoffs than hand-designed baselines on math reasoning task...
AGAN: Towards Automated Design of Generative Adversarial Networks
cs.LG 2019-06 unverdicted novelty 8.0

AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.
1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?
cs.LG 2026-05 unverdicted novelty 7.0

Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling
cs.CL 2026-05 unverdicted novelty 7.0

AutoTTS discovers superior test-time scaling strategies for LLMs via cheap controller synthesis in a pre-collected trajectory environment, outperforming manual baselines on math benchmarks with low discovery cost.
AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery
cs.CL 2026-04 unverdicted novelty 7.0

AutoSOTA uses eight specialized agents to replicate and optimize models from recent AI papers, producing 105 new SOTA results in about five hours per paper on average.
Neural Architecture Search of Time-to-First-Spike-Coded Spiking Neural Networks for Efficient Eye-based Emotion Recognition
cs.NE 2025-12 unverdicted novelty 7.0

TNAS-ER uses an ANN-assisted evolutionary search to optimize TTFS SNN architectures, achieving high emotion recognition performance with improved energy efficiency on neuromorphic hardware.
Soft Head Selection for Injecting ICL-Derived Task Embeddings
cs.CL 2025-07 conditional novelty 7.0

SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B pa...
COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations
cs.CV 2025-04 unverdicted novelty 7.0

COCO-Inpaint supplies a large-scale dataset and evaluation protocol focused on inpainting-based image forgeries to benchmark existing detection methods.
Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
cs.LG 2023-10 unverdicted novelty 7.0

Experimental comparison of 15 HPO and NAS algorithms for automated feature preprocessing on 45 tabular datasets finds evolution-based methods and random search as top performers.
Learning to learn with quantum neural networks via classical neural networks
quant-ph 2019-07 unverdicted novelty 7.0

Classical RNNs trained on small instances provide parameter initializations for QAOA and VQE that reduce total optimization iterations and generalize across problem sizes.
Blending-target Domain Adaptation by Adversarial Meta-Adaptation Networks
cs.LG 2019-07 unverdicted novelty 7.0

AMEAN applies adversarial meta-learning to discover implicit meta-sub-target clusters in blended target data, reducing intra-target category misalignment and outperforming standard DA methods on three BTDA benchmarks.
Neural Network Architecture Search with Differentiable Cartesian Genetic Programming for Regression
cs.NE 2019-07 unverdicted novelty 7.0

dCGPANN encodes neural nets so evolutionary operators can rewire, prune, adapt activations and add skips while gradient descent tunes parameters, yielding smaller networks with lower regression error in fixed time.
NetTailor: Tuning the Architecture, Not Just the Weights
cs.CV 2019-06 unverdicted novelty 7.0

NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for s...
Surrogate Neural Architecture Codesign Package (SNAC-Pack)
cs.LG 2026-05 unverdicted novelty 6.0

SNAC-Pack automates hardware-aware neural architecture codesign for FPGAs via surrogate-based multi-objective search, QAT/pruning compression, and hls4ml synthesis, yielding compact models with reduced resources on je...
RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 5...
RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO replaces handcrafted spatial priors with a reinforcement learning policy for target localization in visual tracking and reports 57.5% AUC on LaSOText without template updates.
OMEGA: Optimizing Machine Learning by Evaluating Generated Algorithms
cs.AI 2026-04 unverdicted novelty 6.0

OMEGA framework generates novel ML classifiers via meta-prompts and executable code that outperform scikit-learn baselines on 20 benchmark datasets.
TRON: Trainable, architecture-reconfigurable random optical neural networks
physics.optics 2026-04 unverdicted novelty 6.0

TRON demonstrates a trainable and reconfigurable optical neural network that combines multi-scattering media with DMD-based matrix multiplication and performs in-situ optimization plus neural architecture search on th...
DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training
cs.LG 2026-01 unverdicted novelty 6.0

DeepFedNAS delivers up to 1.21% higher accuracy and 61x faster architecture search for federated learning on heterogeneous IoT by replacing random supernet sampling with Pareto-optimal elite architectures and using a ...
LLaVA-Video: Video Instruction Tuning With Synthetic Data
cs.CV 2024-10 unverdicted novelty 6.0

LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.
AutoPV: Automatically Design Your Photovoltaic Power Forecasting Model
cs.LG 2024-08 unverdicted novelty 6.0

AutoPV applies neural architecture search with a custom search space drawn from time series forecasting and photovoltaic models to automatically produce architectures that outperform predefined state-of-the-art models...
Learnable Parameter Similarity
cs.LG 2019-07 unverdicted novelty 6.0

LPS uses a second-order neural network to learn an end-to-end metric for second-order parameter similarity and introduces the ModelSet500 benchmark with 500 trained models.
Video Action Recognition Via Neural Architecture Searching
cs.CV 2019-07 unverdicted novelty 6.0

Uses differentiable NAS with temporal segments and pseudo-3D operators to discover a video action recognition network that outperforms hand-designed models on UCF101 with ~1% of the parameters when trained from scratch.
Sibyl-AutoResearch: Autonomous Research Needs Self-Evolving Trial-and-Error Harnesses, Not Paper Generators
cs.MA 2026-05 unverdicted novelty 5.0

Sibyl-AutoResearch introduces self-evolving trial-and-error harnesses with auditable conversion units that link trial signals to updated research behaviors and harness repairs in autonomous systems.
Heterogeneous Connectivity in Sparse Networks: Fan-in Profiles, Gradient Hierarchy, and Topological Equilibria
cs.LG 2026-04 unverdicted novelty 5.0

Arbitrary heterogeneous fan-in profiles in sparse networks match uniform random accuracy at high sparsity, but initializing RigL dynamic sparse training with equilibrium-matched lognormal profiles improves performance...
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference
cs.AR 2026-04 unverdicted novelty 5.0

An RL agent using Soft Actor-Critic with Mixture-of-Experts jointly optimizes ASIC architecture, memory hierarchy, and partitioning for AI inference, achieving 29809 tokens/s for Llama 3.1 at 3nm and under 13mW for Sm...
Efficient Accelerated Graph Edit Distance Computation on GPU
cs.DC 2026-03 unverdicted novelty 5.0

FAST-GED delivers orders-of-magnitude speedups over NetworkX for graph edit distance on GPUs while often reaching optimal solutions and outperforming approximate methods.
Optimized Architectures for Kolmogorov-Arnold Networks
cs.LG 2025-12 unverdicted novelty 5.0

Overprovisioned KANs with sparsification, deep supervision, and depth selection under differentiable MDL yield smaller models with competitive accuracy on benchmarks.
CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
cs.AI 2025-09 unverdicted novelty 5.0

CoLLM-NAS introduces a collaborative two-LLM framework with Navigator, Generator, and Coordinator modules to perform knowledge-guided neural architecture search, reporting state-of-the-art results on ImageNet and NAS-...
Bridging the phenotype-target gap for molecular generation via multi-objective reinforcement learning
cs.LG 2025-09 unverdicted novelty 5.0

SmilesGEN uses dual VAEs to jointly model drug structures and transcriptional responses, generating molecules with higher validity, novelty, and similarity to known ligands than prior methods.
Implantable Adaptive Cells: A Novel Enhancement for Pre-Trained U-Nets in Medical Image Segmentation
cs.CV 2024-05 unverdicted novelty 5.0

Introduces Implantable Adaptive Cells inserted into pre-trained U-Nets via Partially-Connected DARTS to achieve approximately 5 percentage point gains in segmentation accuracy on four medical MRI/CT datasets.
EPNAS: Efficient Progressive Neural Architecture Search
cs.LG 2019-07 unverdicted novelty 5.0

EPNAS uses a progressive search policy with REINFORCE performance prediction to search neural architectures in parallel, supporting multiple resource constraints and outperforming ENAS and PNAS on CIFAR-10 and ImageNe...
ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network
cs.LG 2019-06 unverdicted novelty 5.0

ARMIN introduces auto-addressing via hidden states and a novel RNN cell to produce a lighter recurrent memory network with lower overhead than existing MANNs or vanilla LSTMs.
Learning to Cope with Adversarial Attacks
cs.LG 2019-06 unverdicted novelty 5.0

MLAH agent in deep RL demonstrates hierarchical coping mechanisms and improved reward maintenance under spaced adversarial attacks, at the expense of stability.
Hyp-RL : Hyperparameter Optimization by Reinforcement Learning
cs.LG 2019-06 unverdicted novelty 5.0

Reinforcement learning selects hyperparameters sequentially by learning from actual future validation loss reductions and outperforms SMBO methods on 50 datasets.
BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH
cs.LG 2026-04 unverdicted novelty 4.0

BayMOTH unifies meta-Bayesian optimization with a usefulness-based fallback to lookahead, demonstrating competitive results on function optimization tasks even under low task relatedness.
Exploring Vision Neural Network Pruning via Screening Methodology
cs.LG 2025-02 unverdicted novelty 4.0

A unified F-statistic screening and weighted evaluation method prunes both unstructured and structured parameters in FNNs and CNNs, claiming order-of-magnitude size reduction with competitive accuracy on vision datasets.
Training speedups via batching for geometric learning: an analysis of static and dynamic algorithms
cs.LG 2025-02 unverdicted novelty 4.0

Experiments on QM9 and AFLOW datasets show that static and dynamic batching for GNNs can yield up to 2.7x training speedups depending on data, model, batch size, hardware, and training steps, with occasional differenc...
Self-Adaptive 2D-3D Ensemble of Fully Convolutional Networks for Medical Image Segmentation
eess.IV 2019-07 unverdicted novelty 4.0

Self-adaptive 2D-3D FCN ensemble optimized by multiobjective evolution for prostate segmentation on PROMISE12 achieves top-10 ranking with smaller size than prior auto-designed models.
MLFriend: Interactive Prediction Task Recommendation for Event-Driven Time-Series Data
cs.LG 2019-06 unverdicted novelty 4.0

MLFriend enumerates prediction tasks for event-driven time-series data and interactively recommends useful ones, with evaluation on three datasets yielding 2885 tasks of which 722 were deemed useful by experts.
Split and Aggregation Learning for Foundation Models Over Mobile Embodied AI Network (MEAN): A Comprehensive Survey
cs.IT 2026-05 unverdicted novelty 3.0

The paper surveys split and aggregation learning for foundation models in 6G networks to improve efficiency, resource use, and data privacy in distributed AI.
Genetic Deep Learning for Lung Cancer Screening
cs.CV 2019-07 unverdicted novelty 3.0

Genetic algorithm designs a CNN for lung cancer detection in CXRs achieving 97.15% accuracy, outperforming Inception-V3 and ResNet-152 with 4x and 14x fewer parameters.
A Unified Deep Framework for Joint 3D Pose Estimation and Action Recognition from a Single RGB Camera
cs.CV 2019-07 unverdicted novelty 3.0

A multitask framework lifts 2D keypoints to 3D poses via a two-stream network then applies ENAS to model spatio-temporal pose evolution for action recognition on Human3.6M, MSR Action3D and SBU datasets.
Genetic Network Architecture Search
cs.NE 2019-07 unverdicted novelty 3.0

Genetic algorithm searches convolution cell architectures with weight sharing via SGD, reporting 96% accuracy on CIFAR10 and 80.1% on CIFAR100.
Spiking Neural Network Architecture Search: A Survey
cs.NE 2025-10 unverdicted novelty 2.0

A survey of Spiking Neural Network architecture search techniques viewed through a hardware/software co-design lens.