A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

Florian Buettner; Sebastian G. Gruber

arxiv: 2310.05833 · v2 · pith:2BIVJY44new · submitted 2023-10-09 · 💻 cs.LG · stat.ML

A Bias-Variance-Covariance Decomposition of Kernel Scores for Generative Models

Sebastian G. Gruber , Florian Buettner This is my paper

classification 💻 cs.LG stat.ML

keywords uncertaintymodelsdecompositionestimationframeworkkernellanguagebias-variance-covariance

0 comments

read the original abstract

Generative models, like large language models, are becoming increasingly relevant in our daily lives, yet a theoretical framework to assess their generalization behavior and uncertainty does not exist. Particularly, the problem of uncertainty estimation is commonly solved in an ad-hoc and task-dependent manner. For example, natural language approaches cannot be transferred to image generation. In this paper, we introduce the first bias-variance-covariance decomposition for kernel scores. This decomposition represents a theoretical framework from which we derive a kernel-based variance and entropy for uncertainty estimation. We propose unbiased and consistent estimators for each quantity which only require generated samples but not the underlying model itself. Based on the wide applicability of kernels, we demonstrate our framework via generalization and uncertainty experiments for image, audio, and language generation. Specifically, kernel entropy for uncertainty estimation is more predictive of performance on CoQA and TriviaQA question answering datasets than existing baselines and can also be applied to closed-source models.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Disentangling Ambiguity from Instability in Large Language Models: A Clinical Text-to-SQL Case Study
cs.CL 2026-02 unverdicted novelty 6.0

CLUES decomposes semantic uncertainty into separate ambiguity and instability scores for clinical Text-to-SQL, with instability via Schur complement, outperforming Kernel Language Entropy on failure prediction while e...