hub

Population Based Training of Neural Networks

GitHub repository, accessed · 2017 · cs.LG · arXiv 1711.09846

22 Pith papers cite this work. Polarity classification is still indexing.

22 Pith papers citing it

open full Pith review browse 22 citing papers arXiv PDF

abstract

Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally sub-optimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wall-clock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

cs.CL · 2023-09-28 · unverdicted · novelty 8.0

Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.

What Do Evolutionary Coding Agents Evolve?

cs.NE · 2026-05-19 · unverdicted · novelty 7.0

Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.

Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

cs.LG · 2023-10-04 · unverdicted · novelty 7.0

Experimental comparison of 15 HPO and NAS algorithms for automated feature preprocessing on 45 tabular datasets finds evolution-based methods and random search as top performers.

3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

PopuLoRA shows that co-evolving populations of LoRA adapters through cross-evaluated self-play can outperform compute-matched single-agent baselines on multiple code and math reasoning benchmarks.

Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.

Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization

cs.CE · 2026-03-26 · unverdicted · novelty 6.0

An LLM acting as real-time controller for SIMP topology optimization parameters outperforms fixed schedules and heuristics, delivering 5.7-18.1% lower compliance on 2D and 3D benchmarks.

Differentiable Evolutionary Reinforcement Learning

cs.AI · 2025-12-15 · unverdicted · novelty 6.0

DERL is a differentiable bi-level method that evolves optimal reward structures for RL policies by composing atomic primitives and using meta-gradients from validation performance.

Proximal Policy Distillation

cs.LG · 2024-07-21 · conditional · novelty 6.0

PPD integrates PPO into policy distillation so the student collects and uses its own rewards, yielding better sample efficiency and robustness than standard student-distill or teacher-distill on ATARI, Mujoco, and Procgen tasks.

Attentive Multi-Task Deep Reinforcement Learning

cs.LG · 2019-07-05 · unverdicted · novelty 6.0

Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.

Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

cs.CL · 2026-04-03 · unverdicted · novelty 6.0

Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.

Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning

cs.RO · 2025-11-06 · unverdicted · novelty 6.0

Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

cs.AI · 2026-05-15 · unverdicted · novelty 5.0

FORGE is a staged population protocol that evolves prompt-injected memory (Rules, Examples, or Mixed) for ReAct agents via reflection and broadcast, yielding 1.7-7.7× gains over zero-shot and 29-72% over Reflexion on CybORG CAGE-2.

Growing Action Spaces

cs.LG · 2019-06-28 · unverdicted · novelty 5.0

A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.

Can Tabular Foundation Models Guide Exploration in Robot Policy Learning?

cs.RO · 2026-04-30 · unverdicted · novelty 5.0

TFM-S3 uses a tabular foundation model to predict returns and guide intermittent global exploration within an SVD-derived policy subspace, yielding faster early convergence and better final performance than TD3 and population-based methods under fixed rollout budgets.

Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models

cs.LG · 2026-04-27 · unverdicted · novelty 5.0

HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.

Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

cs.LG · 2026-04-13 · unverdicted · novelty 5.0

LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.

Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition

cs.CL · 2019-07-10 · unverdicted · novelty 4.0

ESGD with anchors guarantees no degradation from the anchor model and reports improved loss and ASR performance on BN50 and SWB300 datasets.

EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation

cs.AI · 2026-04-13 · unverdicted · novelty 4.0

EvoNash-MARL achieves 19.6% annualized returns on equity allocation from 2014-2024 versus 11.7% for SPY, with evidence of robustness under constraints but no strong statistical superiority per WRC and SPA-lite tests.

Automated Machine Learning in Practice: State of the Art and Recent Results

cs.LG · 2019-07-19 · unverdicted · novelty 3.0

Survey of AutoML methods with benchmarks on their performance for business applications.

On the notion of number in humans and machines

cs.CV · 2019-06-27 · unverdicted · novelty 2.0

Experiments indicate deep learning models achieve higher accuracy on numerosity tasks for counts below human subitizing capacity.

citing papers explorer

Showing 22 of 22 citing papers.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution cs.CL · 2023-09-28 · unverdicted · none · ref 159 · internal anchor
Promptbreeder evolves both task prompts and the mutation prompts that improve them using LLMs, outperforming Chain-of-Thought and Plan-and-Solve on arithmetic and commonsense reasoning benchmarks.
What Do Evolutionary Coding Agents Evolve? cs.NE · 2026-05-19 · unverdicted · none · ref 51 · internal anchor
Evolutionary coding agents achieve most benchmark gains through a small subset of edit types and by cycling previously deleted code lines rather than developing new algorithmic structures.
Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data cs.LG · 2023-10-04 · unverdicted · none · ref 39 · internal anchor
Experimental comparison of 15 HPO and NAS algorithms for automated feature preprocessing on 45 tabular datasets finds evolution-based methods and random search as top performers.
3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS cs.RO · 2026-04-13 · unverdicted · none · ref 8
3D-ALP achieves 0.65 success on memory-dependent 5-step robotic reach tasks versus near-zero for reactive baselines by anchoring MCTS planning to a persistent 3D camera-to-world frame.
PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play cs.AI · 2026-05-16 · unverdicted · none · ref 29 · internal anchor
PopuLoRA shows that co-evolving populations of LoRA adapters through cross-evaluated self-play can outperform compute-matched single-agent baselines on multiple code and math reasoning benchmarks.
Beyond Partner Diversity: An Influence-Based Team Steering Framework for Zero-Shot Human-Machine Teaming cs.AI · 2026-05-14 · unverdicted · none · ref 13 · internal anchor
IBTS framework uses influence shaping to improve zero-shot human-machine teaming beyond partner diversity alone, with gains shown in Overcooked-AI simulations and a 30-subject human study.
Large Language Models as Optimization Controllers: Adaptive Continuation for SIMP Topology Optimization cs.CE · 2026-03-26 · unverdicted · none · ref 37 · internal anchor
An LLM acting as real-time controller for SIMP topology optimization parameters outperforms fixed schedules and heuristics, delivering 5.7-18.1% lower compliance on 2D and 3D benchmarks.
Differentiable Evolutionary Reinforcement Learning cs.AI · 2025-12-15 · unverdicted · none · ref 11 · internal anchor
DERL is a differentiable bi-level method that evolves optimal reward structures for RL policies by composing atomic primitives and using meta-gradients from validation performance.
Proximal Policy Distillation cs.LG · 2024-07-21 · conditional · none · ref 8 · internal anchor
PPD integrates PPO into policy distillation so the student collects and uses its own rewards, yielding better sample efficiency and robustness than standard student-distill or teacher-distill on ATARI, Mujoco, and Procgen tasks.
Attentive Multi-Task Deep Reinforcement Learning cs.LG · 2019-07-05 · unverdicted · none · ref 13 · internal anchor
Attention mechanism dynamically groups task knowledge at state granularity in multi-task DRL to enable positive transfer and avoid negative transfer, matching or exceeding prior methods with fewer parameters.
Reward-Guided Semantic Evolution for Test-time Adaptive Object Detection cs.CV · 2026-05-06 · unverdicted · none · ref 25
RGSE adapts text embeddings at test time via evolutionary search, using cosine similarity rewards from high-confidence visual proposals to improve open-vocabulary object detection under distribution shifts.
Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution cs.CL · 2026-04-03 · unverdicted · none · ref 16
Vocabulary dropout prevents diversity collapse in LLM co-evolution by masking proposer logits, yielding average +4.4 point solver gains on mathematical reasoning benchmarks at 8B scale.
Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning cs.RO · 2025-11-06 · unverdicted · none · ref 35
Isaac Lab is a unified GPU-native platform combining high-fidelity physics, photorealistic rendering, multi-frequency sensors, domain randomization, and learning pipelines for scalable multi-modal robot policy training.
FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast cs.AI · 2026-05-15 · unverdicted · none · ref 7 · internal anchor
FORGE is a staged population protocol that evolves prompt-injected memory (Rules, Examples, or Mixed) for ReAct agents via reflection and broadcast, yielding 1.7-7.7× gains over zero-shot and 29-72% over Reflexion on CybORG CAGE-2.
Growing Action Spaces cs.LG · 2019-06-28 · unverdicted · none · ref 4 · internal anchor
A curriculum of growing action spaces combined with simultaneous off-policy value estimation accelerates learning in large multi-agent action spaces.
Can Tabular Foundation Models Guide Exploration in Robot Policy Learning? cs.RO · 2026-04-30 · unverdicted · none · ref 14
TFM-S3 uses a tabular foundation model to predict returns and guide intermittent global exploration within an SVD-derived policy subspace, yielding faster early convergence and better final performance than TD3 and population-based methods under fixed rollout budgets.
Scalable Hyperparameter-Divergent Ensemble Training with Automatic Learning Rate Exploration for Large Models cs.LG · 2026-04-27 · unverdicted · none · ref 8
HDET lets data-parallel replicas explore a spread of learning rates independently before averaging parameters, with an auto-LR controller driven by inter-replica loss differences to produce a self-adapting schedule without extra sweeps.
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates cs.LG · 2026-04-13 · unverdicted · none · ref 7
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition cs.CL · 2019-07-10 · unverdicted · none · ref 9 · internal anchor
ESGD with anchors guarantees no degradation from the anchor model and reports improved loss and ASR performance on BN50 and SWB300 datasets.
EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation cs.AI · 2026-04-13 · unverdicted · none · ref 16
EvoNash-MARL achieves 19.6% annualized returns on equity allocation from 2014-2024 versus 11.7% for SPY, with evidence of robustness under constraints but no strong statistical superiority per WRC and SPA-lite tests.
Automated Machine Learning in Practice: State of the Art and Recent Results cs.LG · 2019-07-19 · unverdicted · none · ref 38 · internal anchor
Survey of AutoML methods with benchmarks on their performance for business applications.
On the notion of number in humans and machines cs.CV · 2019-06-27 · unverdicted · none · ref 31 · internal anchor
Experiments indicate deep learning models achieve higher accuracy on numerosity tasks for counts below human subitizing capacity.

Population Based Training of Neural Networks

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer