pith. sign in

arxiv: 2504.03623 · v1 · pith:NPUBBEBOnew · submitted 2025-04-04 · 💻 cs.CV

Quantifying the uncertainty of model-based synthetic image quality metrics

classification 💻 cs.CV
keywords modelembeddingsfaeduncertaintyembeddingfeaturetrustworthinessautoencoder
0
0 comments X
read the original abstract

The quality of synthetically generated images (e.g. those produced by diffusion models) are often evaluated using information about image contents encoded by pretrained auxiliary models. For example, the Fr\'{e}chet Inception Distance (FID) uses embeddings from an InceptionV3 model pretrained to classify ImageNet. The effectiveness of this feature embedding model has considerable impact on the trustworthiness of the calculated metric (affecting its suitability in several domains, including medical imaging). Here, uncertainty quantification (UQ) is used to provide a heuristic measure of the trustworthiness of the feature embedding model and an FID-like metric called the Fr\'{e}chet Autoencoder Distance (FAED). We apply Monte Carlo dropout to a feature embedding model (convolutional autoencoder) to model the uncertainty in its embeddings. The distribution of embeddings for each input are then used to compute a distribution of FAED values. We express uncertainty as the predictive variance of the embeddings as well as the standard deviation of the computed FAED values. We find that their magnitude correlates with the extent to which the inputs are out-of-distribution to the model's training data, providing some validation of its ability to assess the trustworthiness of the FAED.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

    cs.CV 2026-06 unverdicted novelty 6.0

    FID variance from training seeds is 3.2 times larger than from sampling seeds on hundreds of SiT models, with 1-2% coefficient of variation that barely shrinks with more compute, leading to a multi-seed evaluation protocol.