Coresets for data-efficient training of machine learning models

Baharan Mirzasoleiman, Jeff Bilmes, Jure Leskovec · 1906 · arXiv 1906.01827

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming

cs.CL · 2026-05-04 · unverdicted · novelty 7.0

ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.

Let the Target Select for Itself: Data Selection via Target-Aligned Paths

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.

GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization

cs.DB · 2026-04-09 · unverdicted · novelty 6.0

GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.

DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks

cs.LG · 2025-02-01 · unverdicted · novelty 6.0

DUET is a global-to-local method that optimizes LLM training data mixtures via Bayesian optimization guided by influence-based selection and feedback from unseen evaluation tasks, with a regret bound showing convergence to the optimal mixture.

LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models

cs.CL · 2024-07-17 · unverdicted · novelty 6.0

LMMS-EVAL delivers a standardized multimodal evaluation framework with lite and live variants that target the trade-offs among coverage, cost, and zero contamination.

citing papers explorer

Showing 5 of 5 citing papers.

ContextualJailbreak: Evolutionary Red-Teaming via Simulated Conversational Priming cs.CL · 2026-05-04 · unverdicted · none · ref 17
ContextualJailbreak uses evolutionary search over simulated primed dialogues with novel mutations to reach 90-100% attack success on open LLMs and transfers to some closed frontier models at 15-90% rates.
Let the Target Select for Itself: Data Selection via Target-Aligned Paths cs.LG · 2026-05-10 · unverdicted · none · ref 30
Target-aligned data selection via normalized endpoint loss drop on a validation-induced reference path achieves competitive performance with reduced computational overhead.
GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization cs.DB · 2026-04-09 · unverdicted · none · ref 59
GRACE dynamically constructs and updates coresets for LLM training using representation diversity, gradient-based importance, and k-NN graph propagation to improve efficiency and performance.
DUET: Optimizing Training Data Mixtures via Feedback from Unseen Evaluation Tasks cs.LG · 2025-02-01 · unverdicted · none · ref 17
DUET is a global-to-local method that optimizes LLM training data mixtures via Bayesian optimization guided by influence-based selection and feedback from unseen evaluation tasks, with a regret bound showing convergence to the optimal mixture.
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models cs.CL · 2024-07-17 · unverdicted · none · ref 6
LMMS-EVAL delivers a standardized multimodal evaluation framework with lite and live variants that target the trade-offs among coverage, cost, and zero contamination.

Coresets for data-efficient training of machine learning models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer