Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
International Conference on Learning Representations , year=
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.
citing papers explorer
-
Forecasting Downstream Performance of LLMs With Proxy Metrics
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
-
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
SelfCheckGPT detects hallucinations by checking consistency across multiple sampled responses from black-box LLMs on WikiBio biography generation tasks.
-
From Flat Facts to Sharp Hallucinations: Detecting Stubborn Errors via Gradient Sensitivity
EPGS detects high-confidence factual errors in LLMs by using embedding perturbations to measure gradient sensitivity as a proxy for sharp versus flat minima.