Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
hub
An empirical study of example forgetting during deep neural network learning.arXiv preprint arXiv:1812.05159
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.
Exploiting data symmetries boosts k-NN to select near-optimal low-noise subsets from noisy datasets, approaching Bayes-optimal performance in high dimensions, with learned representations aiding partial symmetry knowledge.
Fine-tuning causes architecture-specific forgetting of individual samples that is stochastic across seeds, with ViTs showing more predictable decay than CNNs and class-level patterns that are semantically consistent.
AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.
Data Agent learns a co-evolving sample selection policy end-to-end that accelerates training by over 50% on ImageNet-1k and MMLU with no performance loss.
Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.
EPS uses DCT features to cluster patches by spatial-temporal complexity and adaptively samples from the highest cluster, cutting training patches by 75-91.69% and speeding sampling up to 82.1x versus EMT while claiming preserved quality.
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.
LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.
citing papers explorer
-
The Benefits of Temporal Correlations: SGD Learns k-Juntas from Random Walks Efficiently
Temporal correlations from lazy random walks enable efficient SGD learning of k-juntas via temporal-difference loss on ReLU networks, achieving linear sample complexity in d.
-
Eliciting Latent Predictions from Transformers with the Tuned Lens
Training per-layer affine probes on frozen transformers yields more reliable latent predictions than the logit lens and enables detection of malicious inputs from prediction trajectories.
-
Multimodal Distribution Matching for Vision-Language Dataset Distillation
MDM distills vision-language datasets via joint embedding clustering, weight-space model interpolation, and geometry-aware distribution matching on the unit hypersphere.
-
LiBaGS: Lightweight Boundary Gap Synthesis for Targeted Synthetic Data Selection
LiBaGS scores and selects synthetic data near decision boundaries using proximity, uncertainty, density, and validity, with boundary-gap allocation and marginal stopping to improve training accuracy.
-
Let the Target Select for Itself: Data Selection via Target-Aligned Paths
Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.
-
Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise
Exploiting data symmetries boosts k-NN to select near-optimal low-noise subsets from noisy datasets, approaching Bayes-optimal performance in high dimensions, with learned representations aiding partial symmetry knowledge.
-
Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers
Fine-tuning causes architecture-specific forgetting of individual samples that is stochastic across seeds, with ViTs showing more predictable decay than CNNs and class-level patterns that are semantically consistent.
-
Beyond Loss Values: Robust Dynamic Pruning via Loss Trajectory Alignment
AlignPrune uses a Dynamic Alignment Score from loss trajectories to identify noisy samples more accurately than per-sample loss, improving pruning accuracy by up to 6.3% on noisy benchmarks.
-
Data Agent: Learning to Select Data via End-to-End Dynamic Optimization
Data Agent learns a co-evolving sample selection policy end-to-end that accelerates training by over 50% on ImageNet-1k and MMLU with no performance loss.
-
Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality
Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.
-
EPS: Efficient Patch Sampling for Video Overfitting in Deep Super-Resolution Model Training
EPS uses DCT features to cluster patches by spatial-temporal complexity and adaptively samples from the highest cluster, cutting training patches by 75-91.69% and speeding sampling up to 82.1x versus EMT while claiming preserved quality.
-
Demystifying CLIP Data
MetaCLIP curates balanced 400M-pair subsets from CommonCrawl that outperform CLIP data, reaching 70.8% zero-shot ImageNet accuracy on ViT-B versus CLIP's 68.3%.
-
Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning
BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.
-
LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training
LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.