MIND: Monge Inception Distance for Generative Models Evaluation
Pith reviewed 2026-05-11 00:46 UTC · model grok-4.3
The pith
MIND replaces the Fréchet Inception Distance with a sliced Wasserstein metric that needs far fewer samples to evaluate generative models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MIND computes the sliced Wasserstein distance between the distributions of Inception-v3 features extracted from real and generated data. By averaging the exact one-dimensional Wasserstein distances obtained through sorting, the metric avoids all covariance estimation and achieves better sample efficiency and computational speed than FID while remaining correlated with it.
What carries the argument
The sliced Wasserstein distance applied to Inception-v3 features, which reduces the comparison of two high-dimensional distributions to repeated one-dimensional optimal transport problems solved by sorting.
If this is right
- MIND with 5k samples achieves the evaluation performance of FID with 50k samples.
- The metric computes two orders of magnitude faster than FID.
- It resists moment-matching adversarial attacks that can fool FID.
- Even 1k or 2k samples yield informative scores for quick model iteration.
- MIND maintains high correlation with FID on standard benchmarks.
Where Pith is reading between the lines
- If MIND becomes standard, generative model development cycles could shorten because fewer samples are needed for each evaluation round.
- The robustness property suggests MIND may better reflect true distributional differences rather than superficial statistics.
- Future work could test whether combining MIND with other feature extractors yields further gains in domains beyond images.
Load-bearing premise
That the sliced Wasserstein distance on Inception-v3 features accurately reflects meaningful differences between real and generated distributions without needing the Gaussian assumption used by FID.
What would settle it
Run MIND and FID on the same set of models and datasets; if MIND with 5k samples fails to rank models in the same order as FID with 50k samples on a broad benchmark, or if human preference studies disagree with MIND rankings, the replacement claim would not hold.
Figures
read the original abstract
We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fr\'echet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distributions by averaging one-dimensional optimal transport distances, efficiently computed via sorting. This approach circumvents the estimation of high-dimensional means and covariance matrices, which underlie FID's poor sample complexity and vulnerability to adversarial attacks. We empirically demonstrate three primary advantages: (i) it is more sample-efficient by one order of magnitude, (ii) it is faster to compute by two orders of magnitude, (iii) it is more robust to adversarial attacks such as moment-matching. We show that MIND with 5k samples can replace the evaluation performance of FID with 50k samples, providing high correlation with this standard benchmark and superior discriminative performance. We further demonstrate that even smaller sample sizes (e.g., 1k or 2k) remain highly informative for rapid model iteration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes MIND, a metric for generative model evaluation that computes the sliced Wasserstein distance on Inception-v3 features instead of the Gaussian moment-matching used by FID. It claims three advantages: one-order-of-magnitude better sample efficiency, two-order-of-magnitude faster computation, and greater robustness to adversarial attacks such as moment matching. The central empirical result is that MIND evaluated on 5k samples achieves correlation and discriminative performance comparable to FID on 50k samples, with even smaller sizes (1k–2k) remaining informative.
Significance. If the reported efficiency and robustness gains hold across broader model classes and datasets, MIND could meaningfully reduce the computational cost of generative-model evaluation and enable faster iteration, especially in regimes where collecting 50k high-quality samples is expensive. The absence of fitted parameters and the direct use of optimal transport on projections are positive features that avoid some of the known instabilities of covariance estimation.
major comments (3)
- [§4.3] §4.3 and Figure 4: the claim that MIND at 5k samples 'replaces' FID at 50k samples is supported only by correlation coefficients on standard benchmarks; no statistical test (e.g., bootstrap confidence intervals or paired significance test) is reported for the difference in discriminative power, leaving open whether the observed parity is robust or dataset-specific.
- [§3.1] §3.1, Eq. (3): while the sliced Wasserstein distance is correctly noted to be a lower bound on the true Wasserstein distance, the manuscript provides no analysis or ablation showing that the directions missed by random projections do not systematically affect perceptual ranking of generative models; this is load-bearing for the claim that MIND faithfully substitutes for FID.
- [§4.4] §4.4: the robustness experiments against moment-matching attacks report single-run results without specifying the number of random projections, the attack magnitude, or variance across seeds; this weakens the generality of the 'more robust' claim relative to the sample-efficiency result.
minor comments (2)
- [§3.2] The definition of the number of random projections L is introduced in §3.2 but its default value and sensitivity are only shown in an appendix; moving a brief sensitivity plot to the main text would improve readability.
- [Table 1] Table 1 caption states 'correlation with FID' but does not specify whether Pearson or Spearman correlation is used; this should be stated explicitly.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and commit to revisions that strengthen the statistical support, empirical validation, and experimental details of the manuscript.
read point-by-point responses
-
Referee: [§4.3] §4.3 and Figure 4: the claim that MIND at 5k samples 'replaces' FID at 50k samples is supported only by correlation coefficients on standard benchmarks; no statistical test (e.g., bootstrap confidence intervals or paired significance test) is reported for the difference in discriminative power, leaving open whether the observed parity is robust or dataset-specific.
Authors: We agree that statistical tests are needed to substantiate the replacement claim. In the revised version we will report bootstrap confidence intervals (1000 resamples) on the Pearson and Spearman correlations between MIND@5k and FID@50k across all evaluated datasets. We will also add paired Wilcoxon signed-rank tests on the per-model discriminative scores (e.g., ranking accuracy on held-out model pairs) computed over 10 independent random seeds, thereby quantifying whether the observed parity is statistically significant and consistent across datasets. revision: yes
-
Referee: [§3.1] §3.1, Eq. (3): while the sliced Wasserstein distance is correctly noted to be a lower bound on the true Wasserstein distance, the manuscript provides no analysis or ablation showing that the directions missed by random projections do not systematically affect perceptual ranking of generative models; this is load-bearing for the claim that MIND faithfully substitutes for FID.
Authors: We acknowledge the absence of a dedicated ablation on projection coverage. While a complete theoretical characterization of missed directions remains an open question in the sliced optimal transport literature, we will add an empirical ablation that varies the number of random projections from 10 to 500 and measures the stability of model rankings on the same Inception features. We will also report the Kendall-tau distance between rankings obtained with increasing projection counts and show that rankings converge rapidly and remain consistent with FID rankings. This provides concrete evidence that, in practice on standard generative-model benchmarks, the directions captured by a modest number of projections suffice to preserve perceptual ordering. revision: partial
-
Referee: [§4.4] §4.4: the robustness experiments against moment-matching attacks report single-run results without specifying the number of random projections, the attack magnitude, or variance across seeds; this weakens the generality of the 'more robust' claim relative to the sample-efficiency result.
Authors: We agree that the robustness section requires additional specification and statistical reporting. In the revision we will explicitly state that all attack experiments use 100 random projections, detail the attack magnitudes (additive perturbations to the first two moments at levels 0.1, 0.5, and 1.0 times the feature standard deviation), and report mean and standard deviation of MIND and FID scores over five independent random seeds. These changes will make the robustness advantage directly comparable in rigor to the sample-efficiency results. revision: yes
Circularity Check
MIND definition is direct and non-circular; empirical claims are external to the construction
full rationale
The paper defines MIND explicitly as the average of 1D optimal transport distances (via sorting) on random projections of Inception-v3 features. This is a straightforward application of the known sliced Wasserstein distance with no fitted parameters, no self-referential predictions, and no reduction of the central quantity to prior author results by construction. The reported sample-efficiency, speed, and robustness advantages are presented as empirical observations on standard benchmarks rather than algebraic identities. No load-bearing self-citations, uniqueness theorems, or ansatzes imported from the authors' earlier work appear in the derivation chain.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Sliced Wasserstein distance provides a useful approximation to the true Wasserstein distance between high-dimensional distributions
- domain assumption Inception-v3 features are a suitable embedding space for comparing image distributions
Forward citations
Cited by 1 Pith paper
-
Improved Baselines with Representation Autoencoders
RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.
Reference graph
Works this paper leans on
- [1]
-
[2]
International conference on machine learning , pages=
Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
- [3]
-
[4]
Unidimensional and evolution methods for optimal transportation , author=. 2013 , school=
work page 2013
-
[5]
Yang, Jiawei and Geng, Zhengyang and Ju, Xuan and Tian, Yonglong and Wang, Yue , journal=. Representation
-
[6]
Karras, Tero and Aila, Timo and Laine, Samuli and Lehtinen, Jaakko , journal=. Progressive growing of
-
[7]
A practical guide to sample-based statistical distances for evaluating generative models in science , author=. arXiv:2403.12636 , year=
-
[8]
Sutherland and Michael Arbel and Arthur Gretton , booktitle=
Mikołaj Bińkowski and Dougal J. Sutherland and Michael Arbel and Arthur Gretton , booktitle=. Demystifying
-
[9]
International conference on machine learning , pages=
Improved denoising diffusion probabilistic models , author=. International conference on machine learning , pages=. 2021 , organization=
work page 2021
-
[10]
Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas , booktitle=. 2015 , organization=
work page 2015
- [11]
- [12]
-
[13]
Transactions on Machine Learning Research , issn=
Maxime Oquab and Timoth. Transactions on Machine Learning Research , issn=
-
[14]
Jayasumana, Sadeep and Ramalingam, Srikumar and Veit, Andreas and Glasner, Daniel and Chakrabarti, Ayan and Kumar, Sanjiv , booktitle=. Rethinking
- [15]
-
[16]
Bradbury, James and Frostig, Roy and Hawkins, Peter and Johnson, Matthew James and Leary, Chris and Maclaurin, Dougal and Necula, George and Paszke, Adam and VanderPlas, Jake and Wanderman-Milne, Skye and others , year=
-
[17]
Stein, George and Cresswell, Jesse and Hosseinzadeh, Rasa and Sui, Yi and Ross, Brendan and Villecroze, Valentin and Liu, Zhaoyan and Caterini, Anthony L and Taylor, Eric and Loaiza-Ganem, Gabriel , booktitle =. Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models , volume =
-
[18]
Journal of Machine Learning Research , volume=
A kernel two-sample test , author=. Journal of Machine Learning Research , volume=. 2012 , publisher=
work page 2012
-
[19]
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=
Rethinking the Inception Architecture for Computer Vision , author=. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages=. 2016 , organization=
work page 2016
- [20]
-
[21]
Computational optimal transport: With applications to data science , author=. Foundations and Trends. 2019 , publisher=
work page 2019
-
[22]
International conference on scale space and variational methods in computer vision , pages=
Wasserstein barycenter and its application to texture mixing , author=. International conference on scale space and variational methods in computer vision , pages=. 2011 , organization=
work page 2011
-
[23]
2021 , MONTH = Nov, KEYWORDS =
Nadjahi, Kimia , URL =. 2021 , MONTH = Nov, KEYWORDS =
work page 2021
-
[24]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
On a property of the lognormal distribution , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1963 , publisher=
work page 1963
-
[25]
Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Kai Li and Li Fei-Fei , booktitle =. 2009 , volume =
work page 2009
-
[26]
Deep Learning Face Attributes in the Wild , year=
Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou , booktitle=. Deep Learning Face Attributes in the Wild , year=
-
[27]
Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
-
[28]
Electronic Journal of Statistics , volume=
Minimax confidence intervals for the sliced Wasserstein distance , author=. Electronic Journal of Statistics , volume=. 2022 , publisher=
work page 2022
-
[29]
Statistical and Topological Properties of Sliced Probability Divergences , volume =
Nadjahi, Kimia and Durmus, Alain and Chizat, L\'. Statistical and Topological Properties of Sliced Probability Divergences , volume =. Advances in Neural Information Processing Systems , editor =
-
[30]
Heusel, Martin and Ramsauer, Hubert and Unterthiner, Thomas and Nessler, Bernhard and Hochreiter, Sepp , booktitle =
-
[31]
Learning Generative Models with
Genevay, Aude and Peyre, Gabriel and Cuturi, Marco , booktitle =. Learning Generative Models with. 2018 , editor =
work page 2018
-
[32]
Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =
Cuturi, Marco , booktitle =. Sinkhorn Distances: Lightspeed Computation of Optimal Transport , volume =
-
[33]
Denoising Diffusion Probabilistic Models , volume =
Ho, Jonathan and Jain, Ajay and Abbeel, Pieter , booktitle =. Denoising Diffusion Probabilistic Models , volume =
-
[34]
Improved Techniques for Training
Salimans, Tim and Goodfellow, Ian and Zaremba, Wojciech and Cheung, Vicki and Radford, Alec and Chen, Xi , booktitle =. Improved Techniques for Training
-
[35]
Sajjadi, Mehdi S. M. and Bachem, Olivier and Lucic, Mario and Bousquet, Olivier and Gelly, Sylvain , booktitle =. Assessing Generative Models via Precision and Recall , volume =
-
[36]
Foundations and Trends in Machine Learning , volume =
Peyré, Gabriel and Cuturi, Marco , title =. Foundations and Trends in Machine Learning , volume =. 2019 , month =
work page 2019
-
[37]
Monge, Gaspard , journal =. M. 1781 , pages =
-
[38]
Doklady Akademii Nauk SSSR , volume =
On the translocation of masses , author =. Doklady Akademii Nauk SSSR , volume =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.