pith. sign in

arxiv: 1806.09055 · v2 · pith:7NAHN7TInew · submitted 2018-06-24 · 💻 cs.LG · cs.CL· cs.CV· stat.ML

DARTS: Differentiable Architecture Search

classification 💻 cs.LG cs.CLcs.CVstat.ML
keywords architecturesearcharchitecturesdifferentiableefficientnon-differentiableaddressesalgorithm
0
0 comments X
read the original abstract

This paper addresses the scalability challenge of architecture search by formulating the task in a differentiable manner. Unlike conventional approaches of applying evolution or reinforcement learning over a discrete and non-differentiable search space, our method is based on the continuous relaxation of the architecture representation, allowing efficient search of the architecture using gradient descent. Extensive experiments on CIFAR-10, ImageNet, Penn Treebank and WikiText-2 show that our algorithm excels in discovering high-performance convolutional architectures for image classification and recurrent architectures for language modeling, while being orders of magnitude faster than state-of-the-art non-differentiable techniques. Our implementation has been made publicly available to facilitate further research on efficient architecture search algorithms.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 33 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. AGAN: Towards Automated Design of Generative Adversarial Networks

    cs.LG 2019-06 unverdicted novelty 8.0

    AGAN is the first neural architecture search method for GANs that discovers architectures outperforming state-of-the-art on CIFAR-10 unsupervised image generation and competitive on supervised tasks.

  2. 1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?

    cs.LG 2026-05 unverdicted novelty 7.0

    Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.

  3. Lattice fermion formulation via Physics-Informed Neural Networks: Ginsparg-Wilson relation and Overlap fermions

    hep-lat 2026-05 unverdicted novelty 7.0

    Physics-Informed Neural Networks construct lattice Dirac operators satisfying the Ginsparg-Wilson relation, reproducing overlap fermions to high accuracy and discovering a Fujikawa-type generalized relation via algebr...

  4. Lattice fermion formulation via Physics-Informed Neural Networks: Ginsparg-Wilson relation and Overlap fermions

    hep-lat 2026-05 unverdicted novelty 7.0

    Physics-informed neural networks construct overlap fermions by optimizing to the Ginsparg-Wilson relation and autonomously discover both the standard and generalized Fujikawa-type versions of the relation.

  5. AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery

    cs.CL 2026-04 unverdicted novelty 7.0

    AutoSOTA uses eight specialized agents to replicate and optimize models from recent AI papers, producing 105 new SOTA results in about five hours per paper on average.

  6. Soft Head Selection for Injecting ICL-Derived Task Embeddings

    cs.CL 2025-07 conditional novelty 7.0

    SITE applies soft gradient-based head selection to inject ICL-derived task embeddings, outperforming prior embedding adaptation and few-shot ICL across generation, reasoning, and NLU tasks on 12 LLMs from 4B to 70B pa...

  7. Switchable Normalization for Learning-to-Normalize Deep Representation

    cs.CV 2019-07 unverdicted novelty 7.0

    Switchable Normalization learns per-layer weights to combine channel, layer, and minibatch normalizers, claiming robustness to batch size and better results than fixed normalizers on ImageNet, COCO, CityScapes, ADE20K...

  8. NetTailor: Tuning the Architecture, Not Just the Weights

    cs.CV 2019-06 unverdicted novelty 7.0

    NetTailor adapts CNN architecture for new tasks by assembling pre-trained universal blocks with task-specific layers, trained via activation mimicry and complexity penalties to match accuracy while reducing size for s...

  9. PACE: Two-Timescale Self-Evolution for Small Language Model Agents

    cs.LG 2026-05 unverdicted novelty 6.0

    PACE coordinates low-risk prompt evolution with validated higher-risk control-logic updates to improve frozen SLM agents on benchmarks without model retraining.

  10. AutoMCU: Feasibility-First MCU Neural Network Customization via LLM-based Multi-Agent Systems

    cs.LG 2026-05 unverdicted novelty 6.0

    AutoMCU uses feasibility-first LLM multi-agent coordination to automate MCU-constrained neural network design, delivering competitive accuracy on CIFAR-10/100 in 1-2 hours versus hundreds of GPU hours for prior HW-NAS...

  11. PEML: Parameter-efficient Multi-Task Learning with Optimized Continuous Prompts

    cs.CL 2026-05 unverdicted novelty 6.0

    PEML co-optimizes continuous prompts and low-rank adaptations to deliver up to 6.67% average accuracy gains over existing multi-task PEFT methods on GLUE, SuperGLUE, and other benchmarks.

  12. CHAL: Council of Hierarchical Agentic Language

    cs.AI 2026-05 unverdicted novelty 6.0

    CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.

  13. Hybrid-LoRA: Bridging Full Fine-Tuning and Low-Rank Adaptation for Post-Training

    cs.LG 2026-05 unverdicted novelty 6.0

    Hybrid-LoRA selectively full fine-tunes modules with high sensitivity to low-rank adaptation using a novel score and applies LoRA elsewhere, matching full fine-tuning at 10% budget and outperforming PEFT baselines by ...

  14. SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

    cs.LG 2026-05 unverdicted novelty 6.0

    SURGE introduces a dual-path gradient compensator and adaptive scaler to improve surrogate gradient estimation in binarized neural network training.

  15. SURGE: Surrogate Gradient Adaptation in Binary Neural Networks

    cs.LG 2026-05 unverdicted novelty 6.0

    SURGE proposes a dual-path gradient compensator and adaptive scaler to learn better surrogate gradients for binary neural network training, outperforming prior methods on classification, detection, and language tasks.

  16. Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

    cs.AR 2026-04 unverdicted novelty 6.0

    FLAME models layer-wise overlapping parallelism and asynchronous CPU-GPU pipeline bubbles to estimate inference latency across frequencies with sparse profiling and low error for DNNs and SLMs.

  17. Stochastic Regret Guarantees for Online Zeroth- and First-Order Bilevel Optimization

    cs.LG 2025-11 unverdicted novelty 6.0

    Introduces a novel search direction enabling sublinear stochastic bilevel regret guarantees for first- and zeroth-order online bilevel optimization algorithms without relying on window smoothing.

  18. Quantum Circuit Design using a Progressive Widening Enhanced Monte Carlo Tree Search

    quant-ph 2025-02 unverdicted novelty 6.0

    Progressive widening MCTS with sampling action space automates quantum circuit design, cutting evaluations 10-100x and CNOT gates up to 3x versus prior MCTS on chemistry and linear-equation tasks.

  19. LLaVA-Video: Video Instruction Tuning With Synthetic Data

    cs.CV 2024-10 unverdicted novelty 6.0

    LLaVA-Video-178K is a new synthetic video instruction dataset that, when combined with existing data to train LLaVA-Video, produces strong results on video understanding benchmarks.

  20. Learnable Parameter Similarity

    cs.LG 2019-07 unverdicted novelty 6.0

    LPS uses a second-order neural network to learn an end-to-end metric for second-order parameter similarity and introduces the ModelSet500 benchmark with 500 trained models.

  21. Video Action Recognition Via Neural Architecture Searching

    cs.CV 2019-07 unverdicted novelty 6.0

    Uses differentiable NAS with temporal segments and pseudo-3D operators to discover a video action recognition network that outperforms hand-designed models on UCF101 with ~1% of the parameters when trained from scratch.

  22. End-to-end Automated Deep Neural Network Optimization for PPG-based Blood Pressure Estimation on Wearables

    cs.LG 2026-04 unverdicted novelty 5.0

    An end-to-end hardware-aware optimization pipeline produces DNNs for PPG-based blood pressure estimation with up to 7.99% lower error and 83x fewer parameters that fit on ultra-low-power SoCs like GAP8.

  23. TusoAI: Agentic Optimization for Scientific Methods

    cs.AI 2025-09 unverdicted novelty 5.0

    TusoAI is an LLM-based agent that builds and iteratively optimizes domain-specific computational methods for scientific data analysis, outperforming expert baselines on RNA-seq denoising and earth monitoring while rep...

  24. On Constraint Qualifications for MPECs with Applications to Bilevel Hyperparameter Optimization for Machine Learning

    math.OC 2025-08 unverdicted novelty 5.0

    Clarifies relationships among MPEC constraint qualifications and fully characterizes MPEC-LICQ for the MPEC from bilevel hyperparameter optimization in L1-loss SVM classification.

  25. ORFS-agent: Tool-Using Agents for Chip Design Optimization

    cs.AI 2025-06 unverdicted novelty 5.0

    ORFS-agent uses LLM agents to tune parameters in chip design flows, improving geometric-mean wirelength, clock period, and co-optimization objectives by up to 2.7% over OR-AutoTuner with 40% fewer iterations on ASAP7 ...

  26. EPNAS: Efficient Progressive Neural Architecture Search

    cs.LG 2019-07 unverdicted novelty 5.0

    EPNAS uses a progressive search policy with REINFORCE performance prediction to search neural architectures in parallel, supporting multiple resource constraints and outperforming ENAS and PNAS on CIFAR-10 and ImageNe...

  27. Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

    cs.CV 2026-04 unverdicted novelty 4.0

    Deployment-aligned low-precision NAS recovers about two-thirds of the accuracy drop from post-training quantization, achieving 0.826 mIoU on-device for a 95k-parameter model on Intel Movidius Myriad X without added co...

  28. Exploring Vision Neural Network Pruning via Screening Methodology

    cs.LG 2025-02 unverdicted novelty 4.0

    A unified F-statistic screening and weighted evaluation method prunes both unstructured and structured parameters in FNNs and CNNs, claiming order-of-magnitude size reduction with competitive accuracy on vision datasets.

  29. Adaptive Reorganization of Neural Pathways for Continual Learning with Spiking Neural Networks

    cs.NE 2023-09 unverdicted novelty 4.0

    SOR-SNN employs Self-Organizing Regulation networks to reorganize a single SNN into sparse pathways, achieving better performance, energy efficiency, memory use, backward transfer, and self-repair on continual learnin...

  30. Self-Adaptive 2D-3D Ensemble of Fully Convolutional Networks for Medical Image Segmentation

    eess.IV 2019-07 unverdicted novelty 4.0

    Self-adaptive 2D-3D FCN ensemble optimized by multiobjective evolution for prostate segmentation on PROMISE12 achieves top-10 ranking with smaller size than prior auto-designed models.

  31. Genetic Deep Learning for Lung Cancer Screening

    cs.CV 2019-07 unverdicted novelty 3.0

    Genetic algorithm designs a CNN for lung cancer detection in CXRs achieving 97.15% accuracy, outperforming Inception-V3 and ResNet-152 with 4x and 14x fewer parameters.

  32. Genetic Network Architecture Search

    cs.NE 2019-07 unverdicted novelty 3.0

    Genetic algorithm searches convolution cell architectures with weight sharing via SGD, reporting 96% accuracy on CIFAR10 and 80.1% on CIFAR100.

  33. Spiking Neural Network Architecture Search: A Survey

    cs.NE 2025-10 unverdicted novelty 2.0

    A survey of Spiking Neural Network architecture search techniques viewed through a hardware/software co-design lens.