Recognition: no theorem link
Many Wrongs Make a Right: Leveraging Biased Simulations Towards Unbiased Parameter Inference
Pith reviewed 2026-05-13 21:06 UTC · model grok-4.3
The pith
Biased simulations can be combined in a mixture model to produce unbiased estimates of signal fractions with calibrated uncertainties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We introduce a Template-Adapted Mixture Model that treats each biased simulation as a template and learns a data-driven combination to recover the true signal and background densities inside the signal region. By exploiting the diversity of the biases across many simulations, the model estimates the signal fraction without requiring perfect knowledge of how each simulation deviates from reality. When applied to a Gaussian toy problem and to a di-Higgs measurement, the resulting fraction estimates show substantially smaller bias and uncertainties that match the observed coverage.
What carries the argument
The Template-Adapted Mixture Model, which reweights or selects among multiple biased simulation templates to form data-driven estimates of the signal-region distributions for signal and background.
If this is right
- Signal-fraction estimates become less sensitive to the detailed mismodeling present in any one simulation.
- Uncertainties remain calibrated even when the individual simulations are systematically biased.
- The same framework can be applied to other inference tasks that reduce to estimating population fractions in mixed samples.
- Performance improves when the set of biased simulations spans a wider range of possible mismodeling patterns.
Where Pith is reading between the lines
- The method could be extended to full parameter fits rather than single-fraction estimation by treating each parameter bin as its own mixture problem.
- If the bias diversity requirement is met, future experiments might deliberately generate families of intentionally biased simulations instead of pursuing a single high-fidelity one.
- The approach shares structure with ensemble debiasing techniques in machine learning and could be tested on non-HEP tasks such as medical imaging or astrophysical source separation.
Load-bearing premise
The collection of biased simulations is diverse enough that their combination can cancel the domain-shift bias without introducing new uncontrolled errors.
What would settle it
Run the method on a dataset whose true signal fraction is known independently; if the reported interval fails to cover the true value at the claimed rate, the calibration claim is false.
Figures
read the original abstract
In particle physics, as in many areas of science, parameter inference relies on simulations to bridge the gap between theory and experiment. Recent developments in simulation-based inference have boosted the sensitivity of analyses; however, biases induced by simulation-data mismodeling can be difficult to control within standard inference pipelines. In this work, we propose a Template-Adapted Mixture Model to confront this problem in the context of signal fraction estimation: inferring the population proportion of signal in a mixed sample of signal and background, both of which follow arbitrarily complex distributions. We harness many biased simulations to perform data-driven estimates of each process distribution in the signal region, substantially reducing the bias on the signal fraction due to the domain shift between simulation and reality. We explore different methodological choices, including model selection, feature representation, and statistical method, and apply them to a Gaussian toy example and to a semi-realistic di-Higgs measurement. We find that the presented methods successfully leverage the biased simulations to provide estimates with well-calibrated uncertainties.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a Template-Adapted Mixture Model that combines multiple biased simulations to obtain data-driven estimates of signal and background densities in the signal region, thereby reducing bias in the inferred signal fraction for parameter inference. The approach is tested on a Gaussian toy example and a semi-realistic di-Higgs measurement, with exploration of model selection, feature choices, and statistical methods, ultimately claiming well-calibrated uncertainties on the resulting estimates.
Significance. If the central claim holds beyond the specific bias structures tested, the method could offer a practical route to mitigating simulation-data domain shifts in signal-fraction estimation without requiring explicit bias parameterization, which is a common challenge in high-energy physics analyses. The use of an ensemble of biased simulations to span the mismatch manifold is a constructive idea, and the dual validation on toy and physics-inspired examples provides a reasonable starting point for assessing practicality and uncertainty calibration.
major comments (3)
- [Abstract] The central claim that the mixture yields unbiased signal-fraction posteriors with calibrated uncertainties rests on the assumption that the chosen templates fully span the bias manifold; however, the di-Higgs example only injects specific bias structures, leaving open whether residual bias orthogonal to the templates would be absorbed into the reported uncertainties or propagate undetected (see Abstract and the description of the Template-Adapted Mixture Model).
- [Results (toy and di-Higgs sections)] Quantitative metrics for bias reduction and uncertainty calibration (e.g., coverage probabilities, bias magnitude before/after adaptation, or posterior width comparisons) are not reported in sufficient detail to verify the success statement; the abstract asserts 'well-calibrated uncertainties' but the toy and di-Higgs results lack explicit tables or figures showing these diagnostics.
- [Methodological choices] The identifiability of mixture weights from data alone, and the criteria used for model selection and feature representation, require explicit validation against degeneracy or overfitting; without these, it is unclear whether the data-driven adaptation step itself introduces uncontrolled bias when the true process distributions deviate from the spanned template space.
minor comments (2)
- [Method] Notation for the adapted templates and mixture weights should be defined more clearly in the equations to avoid ambiguity when describing the adaptation step.
- [Figures] Figure captions for the toy Gaussian and di-Higgs results could include explicit statements of the injected bias parameters and the recovered signal fraction to improve readability.
Simulated Author's Rebuttal
We thank the referee for their insightful and constructive comments on our manuscript. We address each major comment point by point below, indicating the revisions we will incorporate to strengthen the presentation and address the concerns raised.
read point-by-point responses
-
Referee: [Abstract] The central claim that the mixture yields unbiased signal-fraction posteriors with calibrated uncertainties rests on the assumption that the chosen templates fully span the bias manifold; however, the di-Higgs example only injects specific bias structures, leaving open whether residual bias orthogonal to the templates would be absorbed into the reported uncertainties or propagate undetected (see Abstract and the description of the Template-Adapted Mixture Model).
Authors: We agree that the performance of the Template-Adapted Mixture Model depends on the templates spanning the relevant directions of bias. In the revised manuscript we will explicitly articulate this assumption in the abstract and in the methods description of the model. We will also add a new discussion subsection on limitations when the true bias lies partially outside the spanned space, together with a supplementary numerical test that introduces an orthogonal bias component to illustrate the resulting behavior of the posterior and uncertainties. revision: yes
-
Referee: [Results (toy and di-Higgs sections)] Quantitative metrics for bias reduction and uncertainty calibration (e.g., coverage probabilities, bias magnitude before/after adaptation, or posterior width comparisons) are not reported in sufficient detail to verify the success statement; the abstract asserts 'well-calibrated uncertainties' but the toy and di-Higgs results lack explicit tables or figures showing these diagnostics.
Authors: We accept that more quantitative diagnostics are needed to support the claims. In the revised version we will insert tables in both the toy-model and di-Higgs results sections that report (i) bias in the signal-fraction estimate before and after adaptation, (ii) empirical coverage probabilities of the reported uncertainty intervals, and (iii) comparisons of posterior widths. New figures will be added to visualize these metrics across the range of simulation biases examined. revision: yes
-
Referee: [Methodological choices] The identifiability of mixture weights from data alone, and the criteria used for model selection and feature representation, require explicit validation against degeneracy or overfitting; without these, it is unclear whether the data-driven adaptation step itself introduces uncontrolled bias when the true process distributions deviate from the spanned template space.
Authors: We will clarify that the mixture weights are identifiable when the template distributions are linearly independent in the chosen feature space; a short statement and reference to standard mixture-model theory will be added. For model selection and feature representation we used cross-validation on the data likelihood; we will expand the methods section with an explicit validation subsection that includes checks for degeneracy and overfitting on controlled simulations. We will also discuss the possibility of uncontrolled bias when the true distributions lie outside the template span and note that the Bayesian uncertainty quantification offers partial robustness, while acknowledging that this remains an area for further study. revision: partial
Circularity Check
No circularity: data-driven mixture adaptation remains independent of target parameter
full rationale
The Template-Adapted Mixture Model uses external biased simulations and data to estimate process densities in the signal region before inferring the signal fraction. No equation reduces the target fraction to a fitted input by construction, no self-citation supplies a uniqueness theorem, and no ansatz is smuggled via prior work. The derivation chain is self-contained against the provided simulations and data, consistent with the reader's assessment of score 2.0.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Biased simulations provide useful but imperfect approximations to the true signal and background distributions that can be adapted via mixture modeling using real data.
Reference graph
Works this paper leans on
-
[1]
Add a fixed bias (−0.1for the signal,+0.1for the background) to the second component of the mean
-
[2]
Addfoursampled, normallydistributedoffsetswith mean0and standard deviation0.1: two for the two components of the mean, one for both of the diagonal elements of the covariance, and one for the off-diagonal elements of the covariance
-
[3]
Save the resulting distribution as one of the MSDs if each component of the mean and the diagonals of its covariance matrix are at least0.1away from the nominal values, the off-diagonal elements of the covariance matrix are at least0.05away from their nominal values, and the resulting covariance ma- trix is a valid positive definite covariance matrix. Oth...
-
[4]
The Davies Problem The Davies problem, first discussed in Refs. [25, 26], arises in composite hypothesis tests when, in a region of parameter space, the dependence on the other parame- ters vanishes. In this subsection, we provide a pedagog- ical introduction to this problem in the context of the signal parameters of our model. Consider our modelp(x)for t...
-
[5]
Normalization and Degeneracy As discussed in Sec. IIIB, the model p(x) =κ s(x) + (1−κ)b(x)(B3) is invariant under the transformation: s→As, b→ A−Aκ A−κ b, κ→ κ A ,(B4) whereAis the arbitrary rescaling parametrizing the transformation. This means that, if the normalizations ofsandbare allowed to float, then any optimization ob- jective which only constrain...
-
[6]
A. L. Read, Linear interpolation of histograms, Nucl. In- strum. Meth. A425, 357 (1999)
work page 1999
-
[7]
K. Cranmer, G. Lewis, L. Moneta, A. Shibata, and W. Verkerke (ROOT), HistFactory: A tool for creat- ing statistical models for use with RooFit and RooStats, (2012)
work page 2012
- [8]
-
[9]
G. E. P. Box, Science and statistics, Journal of the Amer- ican Statistical Association71, 791 (1976)
work page 1976
-
[10]
C. M. Bishop,Pattern Recognition and Machine Learn- ing(Springer-VerlagBerlin, Heidelberg, 2006)
work page 2006
-
[11]
G. Hinton, Products of experts, in1999 Ninth Interna- tional Conference on Artificial Neural Networks ICANN
-
[12]
(Conf. Publ. No. 470), Vol. 1 (1999) pp. 1–6 vol.1
work page 1999
-
[13]
G. E. Hinton, Training products of experts by min- imizing contrastive divergence, Neural Computation 14, 1771 (2002), https://direct.mit.edu/neco/article- pdf/14/8/1771/815447/089976602760128018.pdf. 27 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 TD signal distribution for Gaussian Case Study 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2....
work page 2002
-
[14]
K. Cranmer, J. Pavez, and G. Louppe, Approximating Likelihood Ratios with Calibrated Discriminative Classi- fiers, (2015), arXiv:1506.02169 [stat.AP]
-
[15]
A. Ghosh, Measuring quantum interference in the off- shell Higgs to four leptons process with Machine Learn- ing, inJournées de Rencontre des Jeunes Chercheurs 2019 (JRJC 2019)(2020) pp. 171–176
work page 2019
-
[16]
R. Gomez Ambrosio, J. ter Hoeve, M. Madigan, J. Rojo, and V. Sanz, Unbinned multivariate observables for global SMEFT analyses from machine learning, JHEP 03, 033, arXiv:2211.02058 [hep-ph]. 110 115 120 125 130 135 140 mbb, 1 110 115 120 125 130 135 140mbb, 2 TD signal distribution for di-Higgs Case Study 110 115 120 125 130 135 140 mbb, 1 110 115 120 125...
-
[17]
H. Bahl and S. Brass, ConstrainingCP-violation in the Higgs-top-quark interaction using machine-learning- based inference, JHEP03, 017, arXiv:2110.10177 [hep- ph]
- [18]
-
[19]
Schöfbeck, Refinable modeling for unbinned SMEFT analyses, Mach
R. Schöfbeck, Refinable modeling for unbinned SMEFT analyses, Mach. Learn. Sci. Tech.6, 015007 (2025), arXiv:2406.19076 [hep-ph]. 28
- [20]
-
[21]
R. Mastandrea, B. Nachman, and T. Plehn, Constraining the Higgs potential with neural simulation-based infer- ence for di-Higgs production, Phys. Rev. D110, 056004 (2024), arXiv:2405.15847 [hep-ph]
- [22]
-
[23]
G. Aadet al.(ATLAS), An implementation of neu- ral simulation-based inference for parameter estima- tion in ATLAS, Rept. Prog. Phys.88, 067801 (2025), arXiv:2412.01600 [physics.data-an]
- [24]
-
[25]
S. Benevedes and J. Thaler, Frequentist uncertainties on neural density ratios with wifi ensembles, Phys. Rev. D 112, 056024 (2025), arXiv:2506.00113 [hep-ph]
- [26]
-
[27]
Bring the noise: exact inference from noisy simulations in collider physics
C. Chang, B. Farmer, A. Fowlie, and A. Kvellestad, Bring the noise: exact inference from noisy simulations in col- lider physics, (2025), arXiv:2502.08157 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [28]
-
[29]
B. Nachman and J. Thaler, Learning from many col- lider events at once, Phys. Rev. D103, 116013 (2021), arXiv:2101.07263 [physics.data-an]
-
[30]
M. Gutmann and A. Hyvärinen, Noise-contrastive esti- mation: A new estimation principle for unnormalized statistical models, inProceedings of the Thirteenth Inter- national Conference on Artificial Intelligence and Statis- tics, Proceedings of Machine Learning Research, Vol. 9, edited by Y. W. Teh and M. Titterington (PMLR, Chia Laguna Resort, Sardinia, I...
work page 2010
-
[31]
R. B. DAVIES, Hypothesis testing when a nuisance parameter is present only un- der the alternative, Biometrika64, 247 (1977), https://academic.oup.com/biomet/article- pdf/64/2/247/1089841/64-2-247.pdf
work page 1977
-
[32]
R. B. Davies, Hypothesis testing when a nuisance param- eterispresentonlyunderthealternatives,Biometrika74, 33 (1987)
work page 1987
-
[33]
P. J. Huber, Robust estimation of a location parameter, inBreakthroughs in Statistics: Methodology and Distri- bution, edited by S. Kotz and N. L. Johnson (Springer New York, New York, NY, 1992) pp. 492–518
work page 1992
-
[34]
Y. Pawitan,In All Likelihood: Statistical Modelling and Inference Using Likelihood, In All Likelihood: Statistical Modelling and Inference Using Likelihood (OUP Oxford, 2013)
work page 2013
-
[35]
D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res.3, 993–1022 (2003)
work page 2003
-
[36]
M. D. Hoffman, D. M. Blei, C. Wang, and J. W. Paisley, Stochastic variational inference, J. Mach. Learn. Res.14, 1303 (2013)
work page 2013
- [37]
- [38]
-
[39]
B. M. Dillon, D. A. Faroughy, J. F. Kamenik, and M. Szewc, Learning Latent Jet Structure, Symmetry13, 1167 (2021)
work page 2021
- [40]
-
[41]
A. Srivastava and C. Sutton, Autoencoding variational inference for topic models (2017), arXiv:1703.01488 [stat.ML]
-
[42]
G. J. Feldman and R. D. Cousins, A Unified approach to the classical statistical analysis of small signals, Phys. Rev. D57, 3873 (1998), arXiv:physics/9711021
work page internal anchor Pith review Pith/arXiv arXiv 1998
-
[43]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Rai- son, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, inAdvances in Neural Information Processing Sy...
work page 2019
-
[44]
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization (2017), arXiv:1412.6980 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[45]
M. M. Deza and E. Deza, Encyclopedia of distances, in Encyclopedia of Distances(Springer Berlin Heidelberg, Berlin, Heidelberg, 2009) pp. 1–583
work page 2009
-
[46]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R.Weiss, V.Dubourg, J.Vanderplas, A.Passos, D.Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Ma- chine Learning Research12, 2825 (2011)
work page 2011
-
[47]
M.Betancourt,Aconceptualintroductiontohamiltonian monte carlo (2018), arXiv:1701.02434 [stat.ME]
work page Pith review arXiv 2018
-
[48]
Stan Development Team, Stan modeling language users guide and reference manual,https://mc-stan.org/ docs/
-
[49]
J. Alwall, M. Herquet, F. Maltoni, O. Mattelaer, and T. Stelzer, MadGraph 5 : Going Beyond, JHEP06, 128, arXiv:1106.0522 [hep-ph]
-
[50]
J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro, The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations, JHEP07, 079, arXiv:1405.0301 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv
- [51]
-
[52]
T. Sjostrand, S. Mrenna, and P. Z. Skands, PYTHIA 6.4 Physics and Manual, JHEP05, 026, arXiv:hep- ph/0603175
-
[53]
A Brief Introduction to PYTHIA 8.1
T. Sjostrand, S. Mrenna, and P. Z. Skands, A Brief Intro- duction to PYTHIA 8.1, Comput. Phys. Commun.178, 852 (2008), arXiv:0710.3820 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2008
-
[54]
T.Sjöstrand, S.Ask, J.R.Christiansen, R.Corke, N.De- 29 sai, P. Ilten, S. Mrenna, S. Prestel, C. O. Rasmussen, and P. Z. Skands, An introduction to PYTHIA 8.2, Comput. Phys. Commun.191, 159 (2015), arXiv:1410.3012 [hep- ph]
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[55]
DELPHES 3, A modular framework for fast simulation of a generic collider experiment
J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi (DELPHES 3), DELPHES 3, A modular framework for fast simu- lation of a generic collider experiment, JHEP02, 057, arXiv:1307.6346 [hep-ex]
work page internal anchor Pith review Pith/arXiv arXiv
-
[56]
The anti-k_t jet clustering algorithm
M. Cacciari, G. P. Salam, and G. Soyez, The anti-kt jet clustering algorithm, JHEP04, 063, arXiv:0802.1189 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv
-
[57]
G. Aadet al.(ATLAS), Search for nonresonant pair pro- duction of Higgs bosons in the bb¯bb¯final state in pp collisions at s=13 TeV with the ATLAS detector, Phys. Rev. D108, 052003 (2023), arXiv:2301.03212 [hep-ex]
- [58]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.