Complexity-Aware Theory Testing from Bell Witnesses
Pith reviewed 2026-05-10 17:58 UTC · model grok-4.3
The pith
Bell witnesses bound the KL divergence to local models in bits per trial, allowing direct comparison with complexity penalties.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A witness obtained from a coarse-graining of full Bell trials yields, through data processing, a lower bound on the Kullback-Leibler (KL) distance to a competitor class in terms of the induced witness distribution. For binary Bell-game witnesses this reduces to a Bernoulli bound, and in the CHSH scenario the local image collapses to a single threshold, giving the closed-form expression D_KL(Bern(ω) || Bern(3/4)) under uniform inputs, with a corresponding extension to known nonuniform designs. A finite-sample Hoeffding argument gives a lower confidence bound under independent trials. Because the bound is measured in bits per trial, it can be compared directly with an MDL/BIC-type complexity 0
What carries the argument
The coarse-graining map from full Bell trial outcomes to a witness distribution, which via the data processing inequality supplies a lower bound on the KL divergence to the local competitor class.
Load-bearing premise
Coarse-graining the detailed Bell trial outcomes into a witness statistic still permits the data processing inequality to lower-bound the KL divergence to the local models.
What would settle it
A Bell dataset in which the maximum-likelihood KL divergence from the observed frequencies to the nearest local model is strictly smaller than the numerical value given by the witness formula would contradict the lower-bound claim.
Figures
read the original abstract
Bell statistical-strength analyses and complexity-based model selection are usually treated separately. Here we relate them by showing that a witness obtained from a coarse-graining of full Bell trials yields, through data processing, a lower bound on the Kullback-Leibler (KL) distance to a competitor class in terms of the induced witness distribution. For binary Bell-game witnesses this reduces to a Bernoulli bound, and in the CHSH scenario the local image collapses to a single threshold, giving the closed-form expression D_KL(Bern(omega) || Bern(3/4)) under uniform inputs, with a corresponding extension to known nonuniform designs. A finite-sample Hoeffding argument gives a lower confidence bound under independent trials. We also include a non-CHSH example based on the three-party Mermin-GHZ game. Because the bound is measured in bits per trial, it can be compared directly with an MDL/BIC-type complexity penalty and thereby yields a conservative crossover criterion for when a more expressive competitor becomes worthwhile. For the reproducible four-photon data of Wang et al., the witness certifies a positive information gap against locality, while a full-table comparison across local, no-signaling, saturated, and two compact nonlocal families favors low-dimensional nonlocal descriptions once complexity is charged. A four-parameter unbiased-correlator control shows that the data support compact nonlocality over locality, while only weakly distinguishing the specific cosine structure of the two-parameter model; an AIC comparison instead favors broader nonlocal controls. We also report witness-based benchmarks from additional published CHSH experiments and discuss the interpretational scope of BIC for constrained or non-regular model classes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that coarse-graining full Bell trial outcomes into a binary witness (e.g., CHSH game win indicator) permits application of the data processing inequality, yielding a lower bound on the KL divergence from the observed witness distribution to the local model class. For uniform-input CHSH this collapses to the closed-form D_KL(Bern(ω) || Bern(3/4)); extensions are given for nonuniform inputs and the Mermin-GHZ game. A Hoeffding lower-confidence bound is supplied for finite samples. The resulting per-trial information gap is compared directly to BIC/MDL complexity penalties to obtain a conservative crossover criterion for preferring more expressive nonlocal models. The method is applied to the reproducible four-photon data of Wang et al. (positive gap against locality) and other published CHSH experiments; a four-parameter unbiased-correlator control and AIC comparisons are used to assess compact versus broad nonlocal families.
Significance. If the derivation is sound, the work supplies a principled, information-theoretic link between Bell-witness statistical strength and complexity-aware model selection, allowing direct numerical comparison of information gain against penalty terms. This is potentially useful for interpreting experimental Bell violations in terms of model preference rather than binary rejection of locality. Credit is due for the closed-form reductions, the finite-sample bound, the explicit Mermin-GHZ example, and the reproducible benchmarks on published data sets.
major comments (2)
- [Derivation of the CHSH bound] The central reduction of the local image to Bern(3/4) under uniform inputs (abstract and derivation section) relies on the push-forward measure induced by the coarse-graining map; the manuscript should explicitly verify that this map sends the entire local set precisely onto the Bernoulli family with p ≤ 3/4, as any measure-theoretic gap would invalidate the infimal KL expression.
- [Application to Wang et al. data] In the four-photon data analysis, the four-parameter unbiased-correlator control is fitted to the same data later used for AIC model comparison; this data-dependent choice risks circularity in the reported preference for compact nonlocality, and a hold-out or cross-validated procedure is needed to establish robustness of the conclusion.
minor comments (2)
- The abstract states an extension to nonuniform designs but the main text should supply the explicit KL expression or derivation for at least one nonuniform input distribution to support reproducibility.
- Table or figure captions for the benchmark comparisons should include the exact number of trials and the numerical value of the Hoeffding lower bound used for each experiment.
Simulated Author's Rebuttal
We thank the referee for the careful reading, positive assessment of the work's potential utility, and constructive suggestions. We address each major comment below and indicate the revisions planned for the next version of the manuscript.
read point-by-point responses
-
Referee: [Derivation of the CHSH bound] The central reduction of the local image to Bern(3/4) under uniform inputs (abstract and derivation section) relies on the push-forward measure induced by the coarse-graining map; the manuscript should explicitly verify that this map sends the entire local set precisely onto the Bernoulli family with p ≤ 3/4, as any measure-theoretic gap would invalidate the infimal KL expression.
Authors: We agree that an explicit verification of the image under the coarse-graining map will improve the rigor of the derivation. In the revised manuscript we will insert a short paragraph immediately following the definition of the witness map. This paragraph will show that (i) every local hidden-variable model induces a Bernoulli distribution on the CHSH win indicator with success probability at most 3/4 (by the standard CHSH inequality), and (ii) for every p ≤ 3/4 there exists a deterministic local strategy achieving exactly that probability. Consequently the push-forward of the local set is precisely the family of Bernoulli distributions with parameter ≤ 3/4, so the infimal KL distance is attained at Bern(3/4) and the closed-form expression is justified. revision: yes
-
Referee: [Application to Wang et al. data] In the four-photon data analysis, the four-parameter unbiased-correlator control is fitted to the same data later used for AIC model comparison; this data-dependent choice risks circularity in the reported preference for compact nonlocality, and a hold-out or cross-validated procedure is needed to establish robustness of the conclusion.
Authors: We acknowledge that fitting the four-parameter control on the full data set and subsequently performing AIC comparisons on the same data introduces a legitimate robustness concern. The model class itself is fixed in advance (unbiased correlators with four free parameters), but parameter estimation is data-dependent. To address this, the revised manuscript will add a hold-out analysis: the Wang et al. data will be randomly partitioned into a training subset (70 %) and a held-out test subset (30 %); the four-parameter control will be fitted only on the training subset, after which both the witness-based information gap and the AIC scores will be recomputed on the test subset. The resulting numbers will be reported alongside the original full-data figures, allowing readers to assess whether the preference for compact nonlocality persists under this stricter protocol. revision: yes
Circularity Check
No significant circularity; derivation follows from DPI on standard local bound
full rationale
The core claim applies the data processing inequality to the coarse-graining map sending full Bell trials to the binary witness outcome. Under uniform inputs the push-forward of the local set is exactly Bern(p) with p ≤ 3/4, so the infimal KL is D_KL(Bern(ω) || Bern(3/4)) for observed ω > 3/4. This is a direct mathematical consequence of the standard CHSH bound and the DPI; it does not reduce to a fitted parameter or self-citation. The finite-sample Hoeffding bound is likewise standard. Data-model comparisons (AIC, four-parameter controls) are post-hoc applications to published experiments and do not enter the derivation of the witness-based lower bound itself.
Axiom & Free-Parameter Ledger
free parameters (1)
- omega
axioms (2)
- standard math Data processing inequality for Kullback-Leibler divergence under coarse-graining maps
- domain assumption Independence of trials for Hoeffding concentration inequality
Reference graph
Works this paper leans on
-
[1]
Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)
J. Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)
work page 1978
- [2]
-
[3]
P. D. Grünwald, The Minimum Description Length Prin- ciple (MIT Press, Cambridge, MA, 2007)
work page 2007
-
[4]
Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)
G. Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)
work page 1978
-
[5]
M. Drton and M. Plummer, A bayesian information cri- terion for singular models, Journal of the Royal Statisti- cal Society: Series B (Statistical Methodology) 79, 323 (2017)
work page 2017
-
[6]
W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Sta- tistical Association 58, 13 (1963)
work page 1963
-
[7]
S. Watanabe, Algebraic Geometry and Statistical Learn- ing Theory (Cambridge University Press, Cambridge, 2009)
work page 2009
-
[8]
K. Wang, Z. Hou, K. Qian, L. Chen, M. Krenn, M. As- pelmeyer, A. Zeilinger, S. Zhu, and X.-S. Ma, Violation of bell inequality with unentangled photons, Science Ad- vances 11, eadr1794 (2025)
work page 2025
-
[9]
W. van Dam, R. D. Gill, and P. D. Grünwald, The sta- tistical strength of nonlocality proofs, IEEE Transactions on Information Theory 51, 2812 (2005)
work page 2005
- [10]
- [11]
- [12]
-
[13]
N. D. Mermin, Extreme quantum entanglement in a su- perposition of macroscopically distinct states, Physical Review Letters 65, 1838 (1990)
work page 1990
-
[14]
B. Hensen, N. Kalb, M. S. Blok, A. E. Dréau, A. Reis- erer, R. F. L. Vermeulen, R. N. Schouten, M. Markham, D. J. Twitchen, K. Goodenough, D. Elkouss, S. Wehner, T. H. Taminiau, and R. Hanson, Loophole-free bell test using electron spins in diamond: Second experiment and additional analysis, Scientific Reports 6, 30289 (2016)
work page 2016
- [15]
-
[16]
K. D. Jöns, L. Schweickert, M. A. M. Versteegh, D. Dalacu, P. J. Poole, A. Gulinatti, A. Giudice, V. Zwiller, and M. E. Reimer, Bright nanoscale source of deterministic entangled photon pairs violating bell’s inequality, Scientific Reports 7, 1700 (2017)
work page 2017
-
[17]
H. Akaike, A new look at the statistical model identifica- tion, IEEE Transactions on Automatic Control 19, 716 (1974)
work page 1974
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.