Estimating Example Difficulty Using Variance of Gradients

Chirag Agarwal; Daniel D'souza; Sara Hooker

arxiv: 2008.11600 · v4 · pith:PTZVU7WCnew · submitted 2020-08-26 · 💻 cs.CV · cs.LG

Estimating Example Difficulty Using Variance of Gradients

Chirag Agarwal , Daniel D'souza , Sara Hooker This is my paper

classification 💻 cs.CV cs.LG

keywords examplesmodelchallengingdatadifficultyefficientfurthergradients

0 comments

read the original abstract

In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying atypical examples ensures the safe deployment of models, isolates samples that require further human inspection and provides interpretability into model behavior. In this work, we propose Variance of Gradients (VoG) as a valuable and efficient metric to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. We show that data points with high VoG scores are far more difficult for the model to learn and over-index on corrupted or memorized examples. Further, restricting the evaluation to the test set instances with the lowest VoG improves the model's generalization performance. Finally, we show that VoG is a valuable and efficient ranking for out-of-distribution detection.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
cs.LG 2026-04 unverdicted novelty 6.0

COMPASS uses semantic clustering on multilingual embeddings to select auxiliary data for PEFT adapters, outperforming linguistic-similarity baselines on multilingual benchmarks while supporting continual adaptation.
LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training
cs.LG 2025-09 unverdicted novelty 5.0

LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.