A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
Bilevel programming for hyperparameter optimization and meta-learning
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
AGILS is an alternating gradient algorithm for bilevel optimization that uses Moreau envelope reformulation to handle inexact lower-level solves, with convergence to stationary points proven under stated assumptions.
citing papers explorer
-
When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining
A bilevel method learns composite pretraining loss weights online via gradient alignment with a downstream objective, matching tuned baselines at roughly 30% extra cost over one training run.
-
Alternating Gradient-Type Algorithm for Bilevel Optimization with Inexact Lower-Level Solutions via Moreau Envelope-based Reformulation
AGILS is an alternating gradient algorithm for bilevel optimization that uses Moreau envelope reformulation to handle inexact lower-level solves, with convergence to stationary points proven under stated assumptions.