pith. machine review for the scientific record. sign in

arxiv: 1711.09846 · v2 · submitted 2017-11-27 · 💻 cs.LG · cs.NE

Recognition: unknown

Population Based Training of Neural Networks

Authors on Pith no claims yet
classification 💻 cs.LG cs.NE
keywords traininghyperparameterhyperparameterslearningmaximisenetworksperformancepopulation
0
0 comments X
read the original abstract

Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

    cs.RO 2026-04 unverdicted novelty 7.0

    3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.

  2. Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

    cs.CV 2026-05 unverdicted novelty 6.0

    RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.

  3. Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

    cs.CL 2026-04 unverdicted novelty 6.0

    Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.

  4. Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

    cs.RO 2025-11 unverdicted novelty 6.0

    Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.

  5. Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

    cs.RO 2026-04 unverdicted novelty 5.0

    TFM-S3 uses a tabular foundation model to predict returns and guide intermittent global exploration within an SVD-derived policy subspace, yielding faster early convergence and better final performance than TD3 and po...

  6. Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

    cs.LG 2026-04 unverdicted novelty 5.0

    HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule wi...

  7. Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

    cs.LG 2026-04 unverdicted novelty 5.0

    LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.

  8. EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation

    cs.AI 2026-04 unverdicted novelty 4.0

    EvoNash-MARL achieves 19.6% annualized returns on equity allocation from 2014-2024 versus 11.7% for SPY, with evidence of robustness under constraints but no strong statistical superiority per WRC and SPA-lite tests.