pith. sign in

arxiv: 2211.09760 · v1 · pith:IX2NZSVRnew · submitted 2022-11-17 · 💻 cs.LG · math.OC· stat.ML

VeLO: Training Versatile Learned Optimizers by Scaling Up

classification 💻 cs.LG math.OCstat.ML
keywords optimizerdeeplearningoptimizershand-designedlearnedmodelsscaling
0
0 comments X
read the original abstract

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    cs.AI 2024-08 unverdicted novelty 8.0

    The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.

  2. FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows

    cs.CV 2026-06 unverdicted novelty 7.0

    FlowBender introduces closed-loop training that lets conditional flow models learn correction policies from their own task-specific alignment errors, outperforming supervised and guidance baselines on fidelity and pla...

  3. Learn2Splat: Extending the Horizon of Learned 3DGS Optimization

    cs.CV 2026-05 unverdicted novelty 7.0

    A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...

  4. Quasi-Equivariant Metanetworks

    cs.LG 2026-04 unverdicted novelty 7.0

    Quasi-equivariant metanetworks relax strict equivariance to preserve functional identity in weight-space learning while improving expressivity for feedforward, convolutional, and transformer networks.

  5. Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space

    cs.LG 2026-07 unverdicted novelty 5.0

    DNG-Encoder represents NN weights as dynamic graphs to preserve sequential inference and powers INR2JLS, which raises INR classification accuracy by ~10% on CIFAR-100-INR.

  6. AI Training Manager: Bounded Closed-Loop Control of Adaptive Training Recipes

    cs.AI 2026-06 unverdicted novelty 4.0

    An LLM-based bounded controller adapts ML training parameters from structured telemetry to correct overfitting and exploration issues, shown on TinyStories and robotic RL tasks.