New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
Optimizing ml training with metagradient descent.arXiv preprint arXiv:2503.13751,
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5verdicts
UNVERDICTED 5roles
method 1polarities
use method 1representative citing papers
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
A sketching method based on higher-order derivatives enables efficient data deletion predictions for deep learning models under a stability assumption with near-linear overhead in error and failure parameters.
citing papers explorer
-
On the Accuracy of Newton Step and Influence Function Data Attributions
New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
-
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
-
Efficient Estimation of Kernel Surrogate Models for Task Attribution
Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.
-
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
-
How to sketch a learning algorithm
A sketching method based on higher-order derivatives enables efficient data deletion predictions for deep learning models under a stability assumption with near-linear overhead in error and failure parameters.