Recognition: 1 theorem link
· Lean TheoremDemystifying MMD GANs
Pith reviewed 2026-05-15 01:03 UTC · model grok-4.3
The pith
Gradient estimators for MMD GANs and Wasserstein GANs are unbiased, but finite-sample discriminators bias the generator updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an 2
What carries the argument
The MMD critic whose gradient estimators are shown to be unbiased when the kernel is fixed, together with the finite-sample bias that appears once the discriminator is learned from data.
If this is right
- MMD GANs can use smaller critic networks than Wasserstein GANs while achieving matching performance.
- Training strategies developed for Wasserstein GANs transfer directly to MMD GANs because both rely on integral probability metrics.
- The Kernel Inception Distance can serve as a dynamic learning-rate scheduler during GAN training.
- The kernel corresponding to the energy distance is explicitly characterized, allowing direct comparison between Cramer GAN and MMD GAN critics.
Where Pith is reading between the lines
- The sample-induced bias identified here may be one concrete mechanism behind the well-known instability of many GAN training runs.
- Similar unbiasedness proofs could be attempted for other integral probability metric critics, potentially unifying design rules across a wider family of GAN variants.
- Adaptive use of the Kernel Inception Distance might improve convergence monitoring in non-image generative tasks where FID-style metrics are unavailable.
Load-bearing premise
The theoretical unbiasedness of the critic gradients assumes the kernel is fixed and positive definite, and that any remaining finite-sample bias does not dominate other optimization difficulties.
What would settle it
Train an MMD GAN critic on an effectively infinite data set and verify whether the observed generator gradients exactly match the closed-form unbiased estimator derived in the paper.
read the original abstract
We investigate the training and performance of generative adversarial networks using the Maximum Mean Discrepancy (MMD) as critic, termed MMD GANs. As our main theoretical contribution, we clarify the situation with bias in GAN loss functions raised by recent work: we show that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. We also discuss the issue of kernel choice for the MMD critic, and characterize the kernel corresponding to the energy distance used for the Cramer GAN critic. Being an integral probability metric, the MMD benefits from training strategies recently developed for Wasserstein GANs. In experiments, the MMD GAN is able to employ a smaller critic network than the Wasserstein GAN, resulting in a simpler and faster-training algorithm with matching performance. We also propose an improved measure of GAN convergence, the Kernel Inception Distance, and show how to use it to dynamically adapt learning rates during GAN training.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates MMD GANs and provides a theoretical clarification that gradient estimators for both MMD GANs and Wasserstein GANs are unbiased when the critic is fixed, while learning a discriminator from samples induces bias in the generator gradients. It discusses kernel choice for the MMD critic, characterizes the kernel for the energy distance used in Cramer GANs, and proposes the Kernel Inception Distance (KID) as an improved convergence measure that can be used to adapt learning rates dynamically. Experiments show that MMD GANs achieve matching performance to WGANs using smaller critic networks, resulting in simpler and faster training.
Significance. If the central distinction between population-level unbiasedness (via U-statistics for fixed positive-definite kernels) and finite-sample bias holds, the work offers a useful clarification of gradient issues in integral probability metric GANs, extending prior WGAN results with an independent derivation. The empirical finding that smaller critics suffice and the introduction of KID for practical training provide concrete value for the field.
major comments (2)
- [Abstract and theoretical analysis] Abstract and theoretical section: the claim that gradient estimators are unbiased for fixed-critic MMD relies on interchanging gradient and expectation under a fixed positive-definite kernel. When the critic is a neural network, the effective kernel depends on critic parameters; the paper should explicitly state whether critic parameters are held fixed during the generator gradient computation and provide the precise conditions under which the interchange remains valid.
- [Experiments] Experiments section: the claim of matching performance with smaller networks is central to the practical contribution, yet no variance across random seeds, multiple runs, or statistical significance tests are reported. This makes it difficult to assess whether the observed equivalence is robust or could be due to training variability.
minor comments (2)
- [Abstract] The abstract introduces KID without a one-sentence definition; adding a brief parenthetical description would improve readability.
- [Kernel discussion] In the kernel characterization for the energy distance, ensure the final kernel expression is numbered as an equation and the derivation steps are clearly separated from surrounding text.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive feedback on our manuscript. We address each major comment below and will incorporate clarifications and additional reporting in the revised version.
read point-by-point responses
-
Referee: [Abstract and theoretical analysis] Abstract and theoretical section: the claim that gradient estimators are unbiased for fixed-critic MMD relies on interchanging gradient and expectation under a fixed positive-definite kernel. When the critic is a neural network, the effective kernel depends on critic parameters; the paper should explicitly state whether critic parameters are held fixed during the generator gradient computation and provide the precise conditions under which the interchange remains valid.
Authors: We thank the referee for this observation. In the standard alternating optimization used for MMD GANs (and WGANs), the critic parameters are held fixed during the generator update step; only the generator parameters are optimized while the kernel induced by the current critic remains constant. Under this fixed-kernel regime the interchange of gradient and expectation is justified by the dominated convergence theorem for the bounded continuous functions arising from a positive-definite kernel. We will add an explicit paragraph in the theoretical section stating these conditions and confirming that the critic is frozen during generator gradient computation. revision: yes
-
Referee: [Experiments] Experiments section: the claim of matching performance with smaller networks is central to the practical contribution, yet no variance across random seeds, multiple runs, or statistical significance tests are reported. This makes it difficult to assess whether the observed equivalence is robust or could be due to training variability.
Authors: We agree that the absence of variance estimates and statistical tests weakens the empirical claim. Although the reported runs were performed with multiple random seeds and produced qualitatively consistent results, we did not include standard deviations or significance tests in the original manuscript. In the revision we will add error bars computed over at least five independent seeds for the key FID/KID curves and include a brief discussion of statistical significance for the observed performance parity between the smaller MMD critic and the larger WGAN critic. revision: yes
Circularity Check
No significant circularity: independent derivation of bias properties
full rationale
The paper's central claims rest on standard properties of U-statistics for the MMD estimator and the ability to interchange gradient and expectation when the kernel is fixed and positive definite. The distinction between population-level unbiasedness of the gradient estimator and finite-sample bias induced by learning the critic is derived directly from these properties without reducing to fitted parameters, self-definitions, or load-bearing self-citations. Training strategies are borrowed from WGAN literature (non-overlapping authors) but the MMD-specific bias analysis is presented as an independent contribution. No step in the provided derivation chain collapses by construction to its inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- kernel bandwidth or choice
axioms (1)
- domain assumption MMD is an integral probability metric benefiting from WGAN training strategies
invented entities (1)
-
Kernel Inception Distance
no independent evidence
Forward citations
Cited by 22 Pith papers
-
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI
HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.
-
DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport
DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generativ...
-
STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
-
Active Sampling for Ultra-Low-Bit-Rate Video Compression via Conditional Controlled Diffusion
ActDiff-VC achieves up to 64.6% bitrate reduction at matched NIQE and improves perceptual metrics like KID and FID by using content-adaptive keyframe selection and budget-aware sparse trajectory selection to condition...
-
Faithful Extreme Image Rescaling with Learnable Reversible Transformation and Semantic Priors
FaithEIR combines learnable reversible latent transformations, an adaptive high-frequency detail prior, and semantic conditioning to outperform prior methods in fidelity and perceptual quality for extreme image rescaling.
-
OccDirector: Language-Guided Behavior and Interaction Generation in 4D Occupancy Space
OccDirector uses a VLM-guided Spatio-Temporal MMDiT model with history anchoring to generate physically plausible 4D occupancy from language scripts, supported by the new OccInteract-85k dataset.
-
FIT: A Large-Scale Dataset for Fit-Aware Virtual Try-On
FIT is a large-scale dataset of 1.13M try-on triplets with exact size data plus a synthetic generation pipeline that enables training of virtual try-on models capable of depicting realistic garment fit including ill-f...
-
Dress-ED: Instruction-Guided Editing for Virtual Try-On and Try-Off
Dress-ED is the first large-scale benchmark unifying virtual try-on, try-off, and text-guided garment editing with 146k verified samples plus a multimodal diffusion baseline.
-
Diffusion Posterior Sampling for General Noisy Inverse Problems
Diffusion models solve noisy (non)linear inverse problems via approximated posterior sampling that blends diffusion steps with manifold gradients without strict consistency projection.
-
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
SDEdit performs guided image synthesis and editing by adding noise to inputs and refining them via denoising with a diffusion model's SDE prior, outperforming GAN methods in human studies without task-specific training.
-
TOPOS: High-Fidelity and Efficient Industry-Grade 3D Head Generation
TOPOS creates high-fidelity 3D heads with fixed industry topology from single images via a specialized VAE with Perceiver Resampler and a rectified flow transformer.
-
CRAFT: Clinical Reward-Aligned Finetuning for Medical Image Synthesis
CRAFT adapts diffusion models to medical images via clinical reward alignment from LLMs and VLMs, improving alignment scores and cutting low-quality generations by 20.4% on average across modalities.
-
Score-Based Generative Modeling through Anisotropic Stochastic Partial Differential Equations
Anisotropic SPDEs preserve geometric data structure over longer timescales in score-based generative modeling, yielding better image quality than standard SDE baselines and flow matching in unconditional and condition...
-
Stylistic Attribute Control in Latent Diffusion Models
A technique for parametric stylistic control in latent diffusion models learns disentangled directions from synthetic datasets and applies them via guidance composition while preserving semantics.
-
InpaintSLat: Inpainting Structured 3D Latents via Initial Noise Optimization
Optimizing initial noise via backpropagation approximation and spectral parameterization in structured 3D latent diffusion yields higher contextual consistency and prompt alignment in training-free inpainting.
-
FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
FashionStylist is an expert-annotated benchmark dataset that unifies outfit-to-item grounding, completion, and evaluation tasks for multimodal large language models in fashion.
-
One-to-More: High-Fidelity Training-Free Anomaly Generation with Attention Control
O2MAG generates high-fidelity text-guided anomalies from a single image without training by manipulating self-attention in diffusion models with anomaly masks and dual enhancements.
-
CaloArt: Large-Patch x-Prediction Diffusion Transformers for High-Granularity Calorimeter Shower Generation
CaloArt achieves top FPD, high-level, and classifier metrics on CaloChallenge datasets 2 and 3 while keeping single-GPU generation at 9-11 ms per shower by combining large-patch tokenization, x-prediction, and conditi...
-
SAMIC: A Lightweight Semantic-Aware Mamba for Efficient Perceptual Image Compression
SAMIC introduces semantic-aware Mamba blocks and SVD-based redundancy reduction to achieve efficient perceptual image compression with improved rate-distortion-perception tradeoffs.
-
Learning to Emulate Chaos: Adversarial Optimal Transport Regularization
Adversarial optimal transport objectives train neural emulators with improved long-term statistical fidelity on chaotic systems.
-
LoRaQ: Optimized Low Rank Approximation for 4-bit Quantization
LoRaQ enables fully sub-16-bit quantized diffusion models by optimizing low-rank error compensation in a data-free way, outperforming prior methods at equal memory cost on Pixart-Σ and SANA while supporting mixed low-...
-
Protecting and Preserving Protest Dynamics for Responsible Analysis
A responsible computing framework substitutes real protest imagery with labeled synthetic reproductions from conditional image synthesis to enable privacy-aware analysis of collective action patterns.
Reference graph
Works this paper leans on
-
[1]
Towards Principled Methods for Training Generative Adversarial Networks
M. Arjovsky and L. Bottou. Towards principled methods for training generative adversarial networks. In ICLR, 2017. arXiv:1701.04862
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[2]
M. Arjovsky, S. Chintala, and L. Bottou. W asserstein generative adversarial networks. In ICML, 2017. arXiv:1701.07875
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[3]
Do GANs actually learn the distribution? An empirical study
S. Arora and Y. Zhang. Do GAN s actually learn the distribution? A n empirical study, 2017. arXiv:1706.08224
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[4]
Generalization and Equilibrium in Generative Adversarial Nets (GANs)
S. Arora, R. Ge, Y. Liang, T. Ma, and Y. Zhang. Generalization and equilibrium in generative adversarial nets ( GAN s). In ICML, 2017. arXiv:1703.00573
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[5]
M. G. Bellemare, I. Danihelka, W. Dabney, S. Mohamed, B. Lakshminarayanan, S. Hoyer, and R. Munos. The C ramer distance as a solution to biased W asserstein gradients, 2017. arXiv:1705.10743
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[6]
Better Mixing via Deep Representations
Y. Bengio, G. Mesnil, Y. Dauphin, and S. Rifai. Better mixing via deep representations. In ICML, 2013. arXiv:1207.4404
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[7]
BEGAN: Boundary Equilibrium Generative Adversarial Networks
D. Berthelot, T. Schumm, and L. Metz. BEGAN : Boundary equilibrium generative adversarial networks, 2017. arXiv:1703.10717
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[8]
P. J. Bickel and E. L. Lehmann. Unbiased estimation in convex families. The Annals of Mathematical Statistics, 40 0 (5): 0 1523--1535, 1969
work page 1969
-
[9]
D. Bouchacourt, P. K. Mudigonda, and S. Nowozin. DISCO nets: DIS similarity CO efficients networks. In NIPS, pp.\ 352--360. 2016
work page 2016
-
[10]
A Test of Relative Similarity For Model Selection in Generative Models
W. Bounliphone, E. Belilovsky, M. B. Blaschko, I. Antonoglou, and A. Gretton. A test of relative similarity for model selection in generative models. In ICLR, 2016. arXiv:1511.04581
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[11]
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)
D.-A. Clevert, T. Unterthiner, and S. Hochreiter. Fast and accurate deep network learning by exponential linear units ( ELU s). In ICLR, 2016. arXiv:1511.07289
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[12]
Comparison of Maximum Likelihood and GAN-based training of Real NVPs
I. Danihelka, B. Lakshminarayanan, B. Uria, D. Wierstra, and P. Dayan. Comparison of maximum likelihood and GAN -based training of R eal NVP s, 2017. arXiv:1705.05263
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[13]
G. K. Dziugaite, D. M. Roy, and Z. Ghahramani. Training generative neural networks via maximum mean discrepancy optimization. In UAI, 2015. arXiv:1505.03906
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[14]
Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step
W. Fedus, M. Rosca, B. Lakshminarayanan, A. M. Dai, S. Mohamed, and I. Goodfellow. Many paths to equilibrium: GAN s do not need to decrease a divergence at every step. In ICLR, 2018. arXiv:1710.08446
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[15]
T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. JASA, 102 0 (477): 0 359--378, 2007
work page 2007
-
[16]
Generative Adversarial Networks
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014. arXiv:1406.2661
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[17]
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Sch \" o lkopf, and A. J. Smola. A kernel two-sample test. JMLR, 13, 2012
work page 2012
-
[18]
Improved Training of Wasserstein GANs
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved training of W asserstein GAN s. In NIPS, 2017. arXiv:1704.00028
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[19]
GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer, and S. Hochreiter. GAN s trained by a two time-scale update rule converge to a N ash equilibrium. In NIPS, 2017. arXiv:1706.08500
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
R. Huang, S. Zhang, T. Li, and R. He. Beyond face rotation: Global and local perception GAN for photorealistic and identity preserving frontal view synthesis. In ICCV, 2017 a . arXiv:1704.04086
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[21]
Stacked Generative Adversarial Networks
X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie. Stacked generative adversarial networks. In CVPR, 2017 b . arXiv:1612.04357
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[22]
Y. Jin, K. Zhang, M. Li, Y. Tian, H. Zhu, and Z. Fang. Towards the automatic anime characters creation with generative adversarial networks, 2017. arXiv:1708.05509
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
Adam: A Method for Stochastic Optimization
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015. arXiv:1412.6980
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[24]
A. Klenke. Probability Theory: A Comprehensive Course. World Publishing Corporation, 2008
work page 2008
-
[25]
A. Krizhevsky. Learning multiple layers of features from tiny images, 2009
work page 2009
- [26]
-
[27]
C. Li, D. Alvarez-Melis, K. Xu, S. Jegelka, and S. Sra. Distributional adversarial networks, 2017 a . arXiv:1706.09549
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[28]
MMD GAN: Towards Deeper Understanding of Moment Matching Network
C.-L. Li, W.-C. Chang, Y. Cheng, Y. Yang, and B. P \' o czos. MMD GAN : Towards deeper understanding of moment matching network. In NIPS, 2017 b . arXiv:1705.08584
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[29]
Y. Li, K. Swersky, and R. Zemel. Generative moment matching networks. In ICML, 2015. arXiv:1502.02761
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[30]
L. Liu. On the two-sample statistic approach to generative adversarial networks. Master's thesis, University of Princeton Senior Thesis, April 2017. URL http://arks.princeton.edu/ark:/88435/dsp0179408079v
work page 2017
-
[31]
S. Liu, O. Bousquet, and K. Chaudhuri. Approximation and convergence properties of generative adversarial learning. In NIPS, 2017. arXiv:1705.08991
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[32]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In ICCV, 2015
work page 2015
-
[33]
Revisiting Classifier Two-Sample Tests
D. Lopez-Paz and M. Oquab. Revisiting classifier two-sample tests. In ICLR, 2017. arXiv:1610.06545
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[34]
R. Lyons. Distance covariance in metric spaces. The Annals of Probability, 41 0 (5): 0 3051--3696, 2013
work page 2013
-
[35]
The Zero Set of a Real Analytic Function
B. Mityagin. The zero set of a real analytic function, 2015. arXiv:1512.07276
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[36]
Y. Mroueh and T. Sercu. F isher GAN . In NIPS, 2017. arXiv:1705.09675
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[37]
McGan: Mean and Covariance Feature Matching GAN
Y. Mroueh, T. Sercu, and V. Goel. McGan : Mean and covariance feature matching GAN . In ICML, 2017. arXiv:1702.08398
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[38]
A. M \"u ller. Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29 0 (2): 0 429--443, 1997
work page 1997
-
[39]
f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization
S. Nowozin, B. Cseke, and R. Tomioka. f- GAN : Training generative neural samplers using variational divergence minimization. In NIPS, 2016. arXiv:1606.00709
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[40]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in P ython. JMLR, 12: 0 2825--2830, 2011
work page 2011
- [41]
-
[42]
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. In ICLR, 2016. arXiv:1511.06434
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[43]
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006
work page 2006
- [44]
-
[45]
Improved Techniques for Training GANs
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training GAN s. In NIPS, 2016. arXiv:1606.03498
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[46]
Equivalence of distance-based and RKHS-based statistics in hypothesis testing
D. Sejdinovic, B. K. Sriperumbudur, A. Gretton, and K. Fukumizu. Equivalence of distance-based and RKHS -based statistics in hypothesis testing. The Annals of Stastistics, 41 0 (5): 0 2263--2291, 2013. arXiv:1207.6076
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[47]
B. K. Sriperumbudur, K. Fukumizu, A. Gretton, G. R. G. Lanckriet, and B. Sch \" o lkopf. Kernel choice and classifiability for RKHS embeddings of probability distributions. In NIPS, 2009 a
work page 2009
-
[48]
B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Sch \" o lkopf, and G. R. G. Lanckriet. On integral probability metrics, phi-divergences and binary classification, 2009 b . arXiv:0901.2698
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[49]
B. K. Sriperumbudur, A. Gretton, K. Fukumizu, G. R. G. Lanckriet, and B. Sch \"o lkopf. Hilbert space embeddings and metrics on probability measures. JMLR, 11: 0 1517--1561, 2010. arXiv:0907.5309
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[50]
B. K. Sriperumbudur, K. Fukumizu, and G. R. G. Lanckriet. Universality, characteristic kernels and RKHS embedding of measures. JMLR, 12: 0 2389--2410, 2011. arXiv:1003.0887
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[51]
B. K. Sriperumbudur, K. Fukumizu, A. Gretton, B. Sch \" o lkopf, and G. R. G. Lanckriet. On the empirical estimation of integral probability metrics. Electronic Journal of Statistics, 6: 0 1550--1599, 2012
work page 2012
-
[52]
I. Steinwart and A. Christmann. Support Vector Machines. Information Science and Statistics. Springer, 2008
work page 2008
-
[53]
D. J. Sutherland. What are the mean and variance of a 0-censored multivariate normal? Cross Validated answer, 2018. URL https://stats.stackexchange.com/q/326347
work page 2018
- [54]
-
[55]
Intriguing properties of neural networks
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus. Intriguing properties of neural networks. In ICLR, 2014. arXiv:1312.6199
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[56]
Rethinking the Inception Architecture for Computer Vision
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the I nception architecture for computer vision. In CVPR, 2016. arXiv:1512.00567
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[57]
G. Sz\' e kely and M. Rizzo. Testing for equal distributions in high dimension. InterStat, 5, 2004
work page 2004
-
[58]
A note on the evaluation of generative models
L. Theis, A. van den Oord, and M. Bethge. A note on the evaluation of generative models. In ICLR, 2016. arXiv:1511.01844
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[59]
F. Yu, Y. Zhang, S. Song, A. Seff, and J. Xiao. LSUN : Construction of a large-scale image dataset using deep learning with humans in the loop, 2015. arXiv:1506.03365
work page internal anchor Pith review Pith/arXiv arXiv 2015
- [60]
-
[61]
B-tests: Low Variance Kernel Two-Sample Tests
W. Zaremba, A. Gretton, and M. B. Blaschko. B-tests: Low variance kernel two-sample tests. In NIPS, 2013. arXiv:1307.1954
work page internal anchor Pith review Pith/arXiv arXiv 2013
- [62]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.