Optimizing ml training with metagradient descent.arXiv preprint arXiv:2503.13751,

[EIC+25] Logan Engstrom, Andrew Ilyas, Benjamin Chen, Axel Feldmann, William Moses, Aleksander Madry · 2025 · arXiv 2503.13751

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

On the Accuracy of Newton Step and Influence Function Data Attributions

cs.LG · 2025-12-14 · unverdicted · novelty 7.0

New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.

NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training

cs.LG · 2026-05-02 · unverdicted · novelty 6.0

NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.

Efficient Estimation of Kernel Surrogate Models for Task Attribution

cs.LG · 2026-02-03 · unverdicted · novelty 6.0

Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.

Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates

cs.LG · 2026-04-13 · unverdicted · novelty 5.0

LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.

How to sketch a learning algorithm

cs.LG · 2026-04-08 · unverdicted · novelty 5.0

A sketching method based on higher-order derivatives enables efficient data deletion predictions for deep learning models under a stability assumption with near-linear overhead in error and failure parameters.

citing papers explorer

Showing 5 of 5 citing papers.

On the Accuracy of Newton Step and Influence Function Data Attributions cs.LG · 2025-12-14 · unverdicted · none · ref 7
New analysis without global strong convexity yields tight scaling laws: NS error ~Θ(kd/n²) and NS-IF difference ~Θ((k+d)√(kd)/n²) for well-behaved logistic regressions.
NoiseRater: Meta-Learned Noise Valuation for Diffusion Model Training cs.LG · 2026-05-02 · unverdicted · none · ref 9
NoiseRater meta-learns instance-level importance scores for noise in diffusion training via bilevel optimization, then uses a two-stage pipeline to improve efficiency and generation quality on FFHQ and ImageNet.
Efficient Estimation of Kernel Surrogate Models for Task Attribution cs.LG · 2026-02-03 · unverdicted · none · ref 2
Kernel surrogate models with first-order gradient approximation achieve 25% higher correlation to leave-one-out ground truth for task attribution and 40% better downstream data selection than linear surrogates.
Generalization Guarantees on Data-Driven Tuning of Gradient Descent with Langevin Updates cs.LG · 2026-04-13 · unverdicted · none · ref 5
LGD reaches Bayes optimality at optimal hyperparameters and admits an O(dh) pseudo-dimension bound for meta-learning hyperparameters on convex regression tasks.
How to sketch a learning algorithm cs.LG · 2026-04-08 · unverdicted · none · ref 5
A sketching method based on higher-order derivatives enables efficient data deletion predictions for deep learning models under a stability assumption with near-linear overhead in error and failure parameters.

Optimizing ml training with metagradient descent.arXiv preprint arXiv:2503.13751,

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer