Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
hub
Revisiting Classifier Two-Sample Tests
13 Pith papers cite this work. Polarity classification is still indexing.
abstract
The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the $n$ examples in $S_P$ with a positive label, and by pairing the $m$ examples in $S_Q$ with a negative label. If the null hypothesis "$P = Q$" is true, then the classification accuracy of a binary classifier on a held-out subset of this dataset should remain near chance-level. As we will show, such Classifier Two-Sample Tests (C2ST) learn a suitable representation of the data on the fly, return test statistics in interpretable units, have a simple null distribution, and their predictive uncertainty allow to interpret where $P$ and $Q$ differ. The goal of this paper is to establish the properties, performance, and uses of C2ST. First, we analyze their main theoretical properties. Second, we compare their performance against a variety of state-of-the-art alternatives. Third, we propose their use to evaluate the sample quality of generative models with intractable likelihoods, such as Generative Adversarial Networks (GANs). Fourth, we showcase the novel application of GANs together with C2ST for causal discovery.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Introduces the first amortized neural posterior estimator conditioned on both data and temperature β for generalized Bayesian inference, matching MCMC performance on standard SBI benchmarks.
A spectrum-aware decision boundary algorithm enables effective hard-label black-box adversarial attacks on 3D point cloud models by fusing spectral information across classes and performing curvature-aware iterative optimization.
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
Hoeffding's relative entropy threshold test between empirical distributions is asymptotically optimal for both one- and two-sample hypothesis testing, with a strong converse for the two-sample case.
Decision trees partition covariate space to detect positivity violations in causal inference, augmented by random forests to quantify violation robustness within each subspace.
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
Pre-trained TabPFN acts as an effective training-free summary network for neural posterior estimation, matching or outperforming standard methods while preserving useful marginal and location information in the posterior.
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
SynVA toolkit generates realistic vascular meshes and anatomically plausible aneurysms, releasing 50,000 labeled samples for medical vision tasks.
Neural posterior estimation trained on simulated radar data enables probabilistic inference of terrain parameters from real Mars radar sounder profiles while conditioning on reference surface assumptions.
Simulation-based inference uses neural networks trained on simulations to enable parameter inference in cosmology and astrophysics where traditional likelihood calculations are intractable.
citing papers explorer
-
Proton Structure from Neural Simulation-Based Inference at the LHC
Neural simulation-based inference on unbinned top-quark pair data at 13 TeV yields improved gluon PDF precision over traditional binned analyses while incorporating experimental and theoretical uncertainties.
-
Amortized Simulation-Based Inference in Generalized Bayes via Neural Posterior Estimation
Introduces the first amortized neural posterior estimator conditioned on both data and temperature β for generalized Bayesian inference, matching MCMC performance on standard SBI benchmarks.
-
Hard-Label Black-Box Attacks on 3D Point Clouds
A spectrum-aware decision boundary algorithm enables effective hard-label black-box adversarial attacks on 3D point cloud models by fusing spectral information across classes and performing curvature-aware iterative optimization.
-
Generative Modeling with Orbit-Space Particle Flow Matching
OGPP is a particle flow-matching method using orbit-space canonicalization and geometric paths that achieves lower error and fewer steps than prior approaches on 3D benchmarks.
-
Asymptotically Optimal Tests for One- and Two-Sample Problems
Hoeffding's relative entropy threshold test between empirical distributions is asymptotically optimal for both one- and two-sample hypothesis testing, with a strong converse for the two-sample case.
-
A discriminative approach for finding and characterizing positivity violations using decision trees
Decision trees partition covariate space to detect positivity violations in causal inference, augmented by random forests to quantify violation robustness within each subspace.
-
Demystifying MMD GANs
MMD GANs have unbiased critic gradients but biased generator gradients from sample-based learning, and the Kernel Inception Distance provides a practical new measure for GAN convergence and dynamic learning rate adaptation.
-
Pre-trained Tabular Foundation Models as Versatile Summary Networks for Neural Posterior Estimation
Pre-trained TabPFN acts as an effective training-free summary network for neural posterior estimation, matching or outperforming standard methods while preserving useful marginal and location information in the posterior.
-
Learning to Test: Physics-Informed Representation for Dynamical Instability Detection
A physics-informed neural representation is learned from safe data to support distributional hypothesis testing for dynamical instability in stochastic DAE systems without repeated simulations.
-
SynVA: A Modular Toolkit for Vessel Generation and Aneurysm Editing
SynVA toolkit generates realistic vascular meshes and anatomically plausible aneurysms, releasing 50,000 labeled samples for medical vision tasks.
-
Neural Posterior Estimation of Terrain Parameters from Radar Sounder Data
Neural posterior estimation trained on simulated radar data enables probabilistic inference of terrain parameters from real Mars radar sounder profiles while conditioning on reference surface assumptions.
-
Machine Learning Techniques for Astrophysics and Cosmology: Simulation-Based Inference
Simulation-based inference uses neural networks trained on simulations to enable parameter inference in cosmology and astrophysics where traditional likelihood calculations are intractable.
- Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors