hub

An empirical study of example forgetting during deep neural network learning.arXiv preprint arXiv:1812.05159

Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J Gordon · 2018 · arXiv 1812.05159

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.

Eliciting Latent Predictions from Transformers with the Tuned Lens

cs.LG · 2023-03-14 · accept · novelty 7.0

Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.

Multimodal Distribution Matching for Vision-Language Dataset Distillation

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.

LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.

Let the Target Select for Itself: Data Selection via Target-Aligned Paths

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

Exploiting data symmetries boosts k-NN to select near-optimal low-noise subsets from noisy datasets, approaching Bayes-optimal performance in high dimensions, with learned representations aiding partial symmetry knowledge.

Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

cs.LG · 2026-04-13 · unverdicted · novelty 6.0

Fine-tuning causes architecture-specific forgetting of individual samples that is stochastic across seeds, with ViTs showing more predictable decay than CNNs and class-level patterns that are semantically consistent.

Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

cs.LG · 2026-03-08 · unverdicted · novelty 6.0

Data Agent learns a co-evolving sample selection policy end-to-end that accelerates training by over 50% on ImageNet-1k and MMLU with no performance loss.

Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality

cond-mat.mtrl-sci · 2025-07-11 · unverdicted · novelty 6.0

Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.

EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training

cs.CV · 2024-11-25 · unverdicted · novelty 6.0

EPS uses DCT features to cluster patches by spatial-temporal complexity and adaptively samples from the highest cluster, cutting training patches by 75-91.69% and speeding sampling up to 82.1x versus EMT while claiming preserved quality.

Demystifying CLIP Data

cs.CV · 2023-09-28 · accept · novelty 6.0

MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.

Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning

cs.LG · 2026-04-14 · unverdicted · novelty 5.0

BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.

LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training

cs.LG · 2025-09-25 · unverdicted · novelty 5.0

LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.

citing papers explorer

Showing 14 of 14 citing papers.

The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently cs.LG · 2026-05-11 · unverdicted · none · ref 70
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
Eliciting Latent Predictions from Transformers with the Tuned Lens cs.LG · 2023-03-14 · accept · none · ref 86
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
Multimodal Distribution Matching for Vision-Language Dataset Distillation cs.CV · 2026-05-22 · unverdicted · none · ref 63
MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.
LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection cs.LG · 2026-05-11 · unverdicted · none · ref 46 · 2 links
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
Let the Target Select for Itself: Data Selection via Target-Aligned Paths cs.LG · 2026-05-10 · unverdicted · none · ref 42
Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.
Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise cs.LG · 2026-05-03 · unverdicted · none · ref 13
Exploiting data symmetries boosts k-NN to select near-optimal low-noise subsets from noisy datasets, approaching Bayes-optimal performance in high dimensions, with learned representations aiding partial symmetry knowledge.
Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers cs.LG · 2026-04-13 · unverdicted · none · ref 1
Fine-tuning causes architecture-specific forgetting of individual samples that is stochastic across seeds, with ViTs showing more predictable decay than CNNs and class-level patterns that are semantically consistent.
Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment cs.CV · 2026-04-08 · unverdicted · none · ref 52
AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.
Data Agent: Learning to Select Data via End-to-End Dynamic Optimization cs.LG · 2026-03-08 · unverdicted · none · ref 13
Data Agent learns a co-evolving sample selection policy end-to-end that accelerates training by over 50% on ImageNet-1k and MMLU with no performance loss.
Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality cond-mat.mtrl-sci · 2025-07-11 · unverdicted · none · ref 69
Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.
EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training cs.CV · 2024-11-25 · unverdicted · none · ref 28
EPS uses DCT features to cluster patches by spatial-temporal complexity and adaptively samples from the highest cluster, cutting training patches by 75-91.69% and speeding sampling up to 82.1x versus EMT while claiming preserved quality.
Demystifying CLIP Data cs.CV · 2023-09-28 · accept · none · ref 24
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning cs.LG · 2026-04-14 · unverdicted · none · ref 13
BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.
LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training cs.LG · 2025-09-25 · unverdicted · none · ref 25
LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.

An empirical study of example forgetting during deep neural network learning.arXiv preprint arXiv:1812.05159

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer