A Note on the Inception Score

Shane Barratt , Rishi Sharma

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords modelsgenerativecomparingevaluationinceptionmetricscorewhen

read the original abstract

Deep generative models are powerful tools that have produced impressive results in recent years. These advances have been for the most part empirically driven, making it essential that we use high quality evaluation metrics. In this paper, we provide new insights into the Inception Score, a recently proposed and widely used evaluation metric for generative models, and demonstrate that it fails to provide useful guidance when comparing models. We discuss both suboptimalities of the metric itself and issues with its application. Finally, we call for researchers to be more systematic and careful when evaluating and comparing generative models, as the advancement of the field depends upon it.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space
cs.CV 2026-04 unverdicted novelty 7.0

OccDirector uses a VLM-guided Spatio-Temporal MMDiT model with history anchoring to generate physically plausible 4D occupancy from language scripts, supported by the new OccInteract-85k dataset.
Diffusion Models Beat GANs on Image Synthesis
cs.LG 2021-05 accept novelty 7.0

Diffusion models with architecture improvements and classifier guidance achieve superior FID scores to GANs on unconditional and conditional ImageNet image synthesis.
Large Scale GAN Training for High Fidelity Natural Image Synthesis
cs.LG 2018-09 accept novelty 7.0

BigGANs achieve state-of-the-art class-conditional synthesis on ImageNet 128x128 with Inception Score 166.5 and FID 7.4 by scaling GANs and applying orthogonal regularization plus truncation.
TC-AE: Unlocking Token Capacity for Deep Compression Autoencoders
cs.CV 2026-04 unverdicted novelty 6.0

TC-AE improves reconstruction and generative performance in deep compression by decomposing token-to-latent compression into two stages and using joint self-supervised training.
Movie Gen: A Cast of Media Foundation Models
cs.CV 2024-10 unverdicted novelty 5.0

A 30B-parameter transformer and related models generate high-quality videos and audio, claiming state-of-the-art results on text-to-video, video editing, personalization, and audio generation tasks.