Introduces the first active learning framework for unaligned multimodal data that selects alignments using uncertainty and diversity to cut annotation costs by up to 40% on benchmarks while preserving accuracy.
Dataset pruning: Reducing training data by examining generalization influence.arXiv preprint arXiv:2205.09329
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
REGLU guides LoRA-based unlearning via representation subspaces and orthogonal regularization to outperform prior methods on forget-retain trade-off in LLM benchmarks.
Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.
SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
OrderDP is a plug-and-play data pruning method that selects a random subset then top-q samples to guarantee unbiased surrogate-loss training with convergence analysis and over 40% training cost reduction on CIFAR and ImageNet.
SLAP is a new batch-aware pruning framework that uses distribution-aware stratified sampling and Hessian-approximated gradients to select data, claiming 20-40% less data while matching or exceeding full-dataset performance on LLM instruction tuning tasks.
Adaptive Data Dropout uses performance feedback to dynamically modulate training data exposure, reducing effective steps while matching static dropout accuracy on image benchmarks.
citing papers explorer
-
Towards Multimodal Active Learning: Efficient Learning with Limited Paired Data
Introduces the first active learning framework for unaligned multimodal data that selects alignments using uncertainty and diversity to cut annotation costs by up to 40% on benchmarks while preserving accuracy.
-
Surprisingly High Redundancy in Electronic Structure Data Across Materials Explained by Low Intrinsic Dimensionality
Electronic structure datasets across materials show high redundancy from low intrinsic dimensionality, allowing pruning to 1/100th size with preserved chemical accuracy.