Proposes RAP, a retrieval-based approximate prior method, to predict performance of symbolic programs and LLM prompts on new tasks using a Bernoulli model and corpus-derived performance distributions.
Rethinking llm evaluation: Can we evaluate llms with 200x less data?arXiv preprint arXiv:2510.10457
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A joint task-model adaptation method learns optimal weights for data selection indicators via ICL proxies on small validation sets, matching or exceeding full-dataset fine-tuning performance with only 30% of samples on GSM8K.
citing papers explorer
-
Predicting Performance of Symbolic and Prompt Programs with Examples
Proposes RAP, a retrieval-based approximate prior method, to predict performance of symbolic programs and LLM prompts on new tasks using a Bernoulli model and corpus-derived performance distributions.
-
Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies
A joint task-model adaptation method learns optimal weights for data selection indicators via ICL proxies on small validation sets, matching or exceeding full-dataset fine-tuning performance with only 30% of samples on GSM8K.