Complexity-Aware Theory Testing from Bell Witnesses

Jianshuo Gao

arxiv: 2604.08918 · v1 · submitted 2026-04-10 · 🪐 quant-ph

Complexity-Aware Theory Testing from Bell Witnesses

Jianshuo Gao This is my paper

Pith reviewed 2026-05-10 17:58 UTC · model grok-4.3

classification 🪐 quant-ph

keywords Bell inequality witnessesKullback-Leibler divergencedata processing inequalityCHSH scenarioBIC complexity penaltymodel selectionquantum nonlocalityfinite-sample bounds

0 comments

The pith

Bell witnesses bound the KL divergence to local models in bits per trial, allowing direct comparison with complexity penalties.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper links Bell statistical analyses to complexity-based model selection by showing that coarse-graining the full set of measurement outcomes from a Bell experiment into a simple witness statistic produces a lower bound on the Kullback-Leibler distance from the data to the nearest local model. This bound follows from the data processing inequality applied to the induced witness distribution. In the CHSH case with uniform inputs the bound reduces to the explicit Bernoulli form D_KL(Bern(ω) || Bern(3/4)). Because the quantity is expressed in bits per trial, it can be balanced directly against an MDL or BIC penalty to mark the crossover point where a more expressive nonlocal model becomes statistically preferable. The approach is demonstrated on published four-photon data and yields witness-based benchmarks for other CHSH experiments.

Core claim

A witness obtained from a coarse-graining of full Bell trials yields, through data processing, a lower bound on the Kullback-Leibler (KL) distance to a competitor class in terms of the induced witness distribution. For binary Bell-game witnesses this reduces to a Bernoulli bound, and in the CHSH scenario the local image collapses to a single threshold, giving the closed-form expression D_KL(Bern(ω) || Bern(3/4)) under uniform inputs, with a corresponding extension to known nonuniform designs. A finite-sample Hoeffding argument gives a lower confidence bound under independent trials. Because the bound is measured in bits per trial, it can be compared directly with an MDL/BIC-type complexity 0

What carries the argument

The coarse-graining map from full Bell trial outcomes to a witness distribution, which via the data processing inequality supplies a lower bound on the KL divergence to the local competitor class.

Load-bearing premise

Coarse-graining the detailed Bell trial outcomes into a witness statistic still permits the data processing inequality to lower-bound the KL divergence to the local models.

What would settle it

A Bell dataset in which the maximum-likelihood KL divergence from the observed frequencies to the nearest local model is strictly smaller than the numerical value given by the witness formula would contradict the lower-bound claim.

Figures

Figures reproduced from arXiv: 2604.08918 by Jianshuo Gao.

**Figure 2.** Figure 2: visualizes this degradation. Measured in bits per trial, the CHSH witness certificate is δCHSH,r(ω) := 1 log 2DKL(Bern(ω) ∥ Bern(ωloc(r))). (21) Combining this with Eq. (8) yields a conservative crossover rule [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: shows the result. All points lie on or above the diagonal, as they must if the witness is a lower bound. More interesting is how the gap behaves with violation size. Near the local boundary the bound is markedly conservative: the median full-table gain is about 6.2 times the median witness lower bound in the weak-violation regime. Around the Wang-like regime the median ratio drops to about 1.4, and for st… view at source ↗

**Figure 4.** Figure 4: shows the resulting phase diagram. The color indicates the frequency with which the saturated model beats locality by BIC over repeated synthetic tables at a given (n, S). The dashed line is the witness-predicted crossover from the lower bound alone. As expected, the witness line is conservative: the full-table transition occurs somewhat earlier, because the full table exploits structure beyond the single… view at source ↗

read the original abstract

Bell statistical-strength analyses and complexity-based model selection are usually treated separately. Here we relate them by showing that a witness obtained from a coarse-graining of full Bell trials yields, through data processing, a lower bound on the Kullback-Leibler (KL) distance to a competitor class in terms of the induced witness distribution. For binary Bell-game witnesses this reduces to a Bernoulli bound, and in the CHSH scenario the local image collapses to a single threshold, giving the closed-form expression D_KL(Bern(omega) || Bern(3/4)) under uniform inputs, with a corresponding extension to known nonuniform designs. A finite-sample Hoeffding argument gives a lower confidence bound under independent trials. We also include a non-CHSH example based on the three-party Mermin-GHZ game. Because the bound is measured in bits per trial, it can be compared directly with an MDL/BIC-type complexity penalty and thereby yields a conservative crossover criterion for when a more expressive competitor becomes worthwhile. For the reproducible four-photon data of Wang et al., the witness certifies a positive information gap against locality, while a full-table comparison across local, no-signaling, saturated, and two compact nonlocal families favors low-dimensional nonlocal descriptions once complexity is charged. A four-parameter unbiased-correlator control shows that the data support compact nonlocality over locality, while only weakly distinguishing the specific cosine structure of the two-parameter model; an AIC comparison instead favors broader nonlocal controls. We also report witness-based benchmarks from additional published CHSH experiments and discuss the interpretational scope of BIC for constrained or non-regular model classes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Gao reduces Bell witnesses to KL bounds via data processing, giving a clean closed-form crossover for CHSH that can be compared directly to BIC penalties.

read the letter

The main point is that this paper turns a coarse-grained Bell witness into a lower bound on KL divergence to the local set, which then serves as a complexity-aware test. For CHSH with uniform inputs the local image is just Bernoulli with p at most 3/4, so the bound simplifies to D_KL(Bern(ω) || Bern(3/4)) and can be set against an MDL-style penalty in bits per trial. The same logic extends to nonuniform inputs and the Mermin-GHZ game, with a standard Hoeffding finite-sample version added on top.

Referee Report

2 major / 2 minor

Summary. The paper claims that coarse-graining full Bell trial outcomes into a binary witness (e.g., CHSH game win indicator) permits application of the data processing inequality, yielding a lower bound on the KL divergence from the observed witness distribution to the local model class. For uniform-input CHSH this collapses to the closed-form D_KL(Bern(ω) || Bern(3/4)); extensions are given for nonuniform inputs and the Mermin-GHZ game. A Hoeffding lower-confidence bound is supplied for finite samples. The resulting per-trial information gap is compared directly to BIC/MDL complexity penalties to obtain a conservative crossover criterion for preferring more expressive nonlocal models. The method is applied to the reproducible four-photon data of Wang et al. (positive gap against locality) and other published CHSH experiments; a four-parameter unbiased-correlator control and AIC comparisons are used to assess compact versus broad nonlocal families.

Significance. If the derivation is sound, the work supplies a principled, information-theoretic link between Bell-witness statistical strength and complexity-aware model selection, allowing direct numerical comparison of information gain against penalty terms. This is potentially useful for interpreting experimental Bell violations in terms of model preference rather than binary rejection of locality. Credit is due for the closed-form reductions, the finite-sample bound, the explicit Mermin-GHZ example, and the reproducible benchmarks on published data sets.

major comments (2)

[Derivation of the CHSH bound] The central reduction of the local image to Bern(3/4) under uniform inputs (abstract and derivation section) relies on the push-forward measure induced by the coarse-graining map; the manuscript should explicitly verify that this map sends the entire local set precisely onto the Bernoulli family with p ≤ 3/4, as any measure-theoretic gap would invalidate the infimal KL expression.
[Application to Wang et al. data] In the four-photon data analysis, the four-parameter unbiased-correlator control is fitted to the same data later used for AIC model comparison; this data-dependent choice risks circularity in the reported preference for compact nonlocality, and a hold-out or cross-validated procedure is needed to establish robustness of the conclusion.

minor comments (2)

The abstract states an extension to nonuniform designs but the main text should supply the explicit KL expression or derivation for at least one nonuniform input distribution to support reproducibility.
Table or figure captions for the benchmark comparisons should include the exact number of trials and the numerical value of the Hoeffding lower bound used for each experiment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of the work's potential utility, and constructive suggestions. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Derivation of the CHSH bound] The central reduction of the local image to Bern(3/4) under uniform inputs (abstract and derivation section) relies on the push-forward measure induced by the coarse-graining map; the manuscript should explicitly verify that this map sends the entire local set precisely onto the Bernoulli family with p ≤ 3/4, as any measure-theoretic gap would invalidate the infimal KL expression.

Authors: We agree that an explicit verification of the image under the coarse-graining map will improve the rigor of the derivation. In the revised manuscript we will insert a short paragraph immediately following the definition of the witness map. This paragraph will show that (i) every local hidden-variable model induces a Bernoulli distribution on the CHSH win indicator with success probability at most 3/4 (by the standard CHSH inequality), and (ii) for every p ≤ 3/4 there exists a deterministic local strategy achieving exactly that probability. Consequently the push-forward of the local set is precisely the family of Bernoulli distributions with parameter ≤ 3/4, so the infimal KL distance is attained at Bern(3/4) and the closed-form expression is justified. revision: yes
Referee: [Application to Wang et al. data] In the four-photon data analysis, the four-parameter unbiased-correlator control is fitted to the same data later used for AIC model comparison; this data-dependent choice risks circularity in the reported preference for compact nonlocality, and a hold-out or cross-validated procedure is needed to establish robustness of the conclusion.

Authors: We acknowledge that fitting the four-parameter control on the full data set and subsequently performing AIC comparisons on the same data introduces a legitimate robustness concern. The model class itself is fixed in advance (unbiased correlators with four free parameters), but parameter estimation is data-dependent. To address this, the revised manuscript will add a hold-out analysis: the Wang et al. data will be randomly partitioned into a training subset (70 %) and a held-out test subset (30 %); the four-parameter control will be fitted only on the training subset, after which both the witness-based information gap and the AIC scores will be recomputed on the test subset. The resulting numbers will be reported alongside the original full-data figures, allowing readers to assess whether the preference for compact nonlocality persists under this stricter protocol. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation follows from DPI on standard local bound

full rationale

The core claim applies the data processing inequality to the coarse-graining map sending full Bell trials to the binary witness outcome. Under uniform inputs the push-forward of the local set is exactly Bern(p) with p ≤ 3/4, so the infimal KL is D_KL(Bern(ω) || Bern(3/4)) for observed ω > 3/4. This is a direct mathematical consequence of the standard CHSH bound and the DPI; it does not reduce to a fitted parameter or self-citation. The finite-sample Hoeffding bound is likewise standard. Data-model comparisons (AIC, four-parameter controls) are post-hoc applications to published experiments and do not enter the derivation of the witness-based lower bound itself.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard inequalities from information theory and probability applied to the Bell setting. No new free parameters beyond data-dependent omega or invented entities are introduced.

free parameters (1)

omega
Observed Bernoulli parameter extracted from the witness distribution on experimental data.

axioms (2)

standard math Data processing inequality for Kullback-Leibler divergence under coarse-graining maps
Invoked to obtain the lower bound from the full trial distribution to the witness distribution.
domain assumption Independence of trials for Hoeffding concentration inequality
Required to convert the KL bound into a finite-sample lower confidence bound.

pith-pipeline@v0.9.0 · 5583 in / 1491 out tokens · 61516 ms · 2026-05-10T17:58:43.327757+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)

J. Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)

work page 1978
[2]

Barron, J

A. Barron, J. Rissanen, and B. Yu, The minimum de- scription length principle in coding and modeling, IEEE Transactions on Information Theory 44, 2743 (1998)

work page 1998
[3]

P. D. Grünwald, The Minimum Description Length Prin- ciple (MIT Press, Cambridge, MA, 2007)

work page 2007
[4]

Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)

G. Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)

work page 1978
[5]

Drton and M

M. Drton and M. Plummer, A bayesian information cri- terion for singular models, Journal of the Royal Statisti- cal Society: Series B (Statistical Methodology) 79, 323 (2017)

work page 2017
[6]

Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Sta- tistical Association 58, 13 (1963)

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Sta- tistical Association 58, 13 (1963)

work page 1963
[7]

Watanabe, Algebraic Geometry and Statistical Learn- ing Theory (Cambridge University Press, Cambridge, 2009)

S. Watanabe, Algebraic Geometry and Statistical Learn- ing Theory (Cambridge University Press, Cambridge, 2009)

work page 2009
[8]

K. Wang, Z. Hou, K. Qian, L. Chen, M. Krenn, M. As- pelmeyer, A. Zeilinger, S. Zhu, and X.-S. Ma, Violation of bell inequality with unentangled photons, Science Ad- vances 11, eadr1794 (2025)

work page 2025
[9]

van Dam, R

W. van Dam, R. D. Gill, and P. D. Grünwald, The sta- tistical strength of nonlocality proofs, IEEE Transactions on Information Theory 51, 2812 (2005)

work page 2005
[10]

Zhang, E

Y. Zhang, E. Knill, and S. Glancy, Statistical strength of experiments to reject local realism with photon pairs and ineﬀicient detectors, Physical Review A 81, 032117 (2010)

work page 2010
[11]

Zhang, S

Y. Zhang, S. Glancy, and E. Knill, Eﬀicient quantification of experimental evidence against local realism, Physical Review A 88, 052119 (2013)

work page 2013
[12]

Li and P

M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications , 3rd ed. (Springer, New York, 2008)

work page 2008
[13]

N. D. Mermin, Extreme quantum entanglement in a su- perposition of macroscopically distinct states, Physical Review Letters 65, 1838 (1990)

work page 1990
[14]

Hensen, N

B. Hensen, N. Kalb, M. S. Blok, A. E. Dréau, A. Reis- erer, R. F. L. Vermeulen, R. N. Schouten, M. Markham, D. J. Twitchen, K. Goodenough, D. Elkouss, S. Wehner, T. H. Taminiau, and R. Hanson, Loophole-free bell test using electron spins in diamond: Second experiment and additional analysis, Scientific Reports 6, 30289 (2016)

work page 2016
[15]

Storz, J

S. Storz, J. Schär, A. Kulikov, et al., Loophole-free bell in- equality violation with superconducting circuits, Nature 617, 265 (2023)

work page 2023
[16]

K. D. Jöns, L. Schweickert, M. A. M. Versteegh, D. Dalacu, P. J. Poole, A. Gulinatti, A. Giudice, V. Zwiller, and M. E. Reimer, Bright nanoscale source of deterministic entangled photon pairs violating bell’s inequality, Scientific Reports 7, 1700 (2017)

work page 2017
[17]

Akaike, A new look at the statistical model identifica- tion, IEEE Transactions on Automatic Control 19, 716 (1974)

H. Akaike, A new look at the statistical model identifica- tion, IEEE Transactions on Automatic Control 19, 716 (1974)

work page 1974

[1] [1]

Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)

J. Rissanen, Modeling by shortest data description, Au- tomatica 14, 465 (1978)

work page 1978

[2] [2]

Barron, J

A. Barron, J. Rissanen, and B. Yu, The minimum de- scription length principle in coding and modeling, IEEE Transactions on Information Theory 44, 2743 (1998)

work page 1998

[3] [3]

P. D. Grünwald, The Minimum Description Length Prin- ciple (MIT Press, Cambridge, MA, 2007)

work page 2007

[4] [4]

Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)

G. Schwarz, Estimating the dimension of a model, The Annals of Statistics 6, 461 (1978)

work page 1978

[5] [5]

Drton and M

M. Drton and M. Plummer, A bayesian information cri- terion for singular models, Journal of the Royal Statisti- cal Society: Series B (Statistical Methodology) 79, 323 (2017)

work page 2017

[6] [6]

Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Sta- tistical Association 58, 13 (1963)

W. Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Sta- tistical Association 58, 13 (1963)

work page 1963

[7] [7]

Watanabe, Algebraic Geometry and Statistical Learn- ing Theory (Cambridge University Press, Cambridge, 2009)

S. Watanabe, Algebraic Geometry and Statistical Learn- ing Theory (Cambridge University Press, Cambridge, 2009)

work page 2009

[8] [8]

K. Wang, Z. Hou, K. Qian, L. Chen, M. Krenn, M. As- pelmeyer, A. Zeilinger, S. Zhu, and X.-S. Ma, Violation of bell inequality with unentangled photons, Science Ad- vances 11, eadr1794 (2025)

work page 2025

[9] [9]

van Dam, R

W. van Dam, R. D. Gill, and P. D. Grünwald, The sta- tistical strength of nonlocality proofs, IEEE Transactions on Information Theory 51, 2812 (2005)

work page 2005

[10] [10]

Zhang, E

Y. Zhang, E. Knill, and S. Glancy, Statistical strength of experiments to reject local realism with photon pairs and ineﬀicient detectors, Physical Review A 81, 032117 (2010)

work page 2010

[11] [11]

Zhang, S

Y. Zhang, S. Glancy, and E. Knill, Eﬀicient quantification of experimental evidence against local realism, Physical Review A 88, 052119 (2013)

work page 2013

[12] [12]

Li and P

M. Li and P. Vitányi, An Introduction to Kolmogorov Complexity and Its Applications , 3rd ed. (Springer, New York, 2008)

work page 2008

[13] [13]

N. D. Mermin, Extreme quantum entanglement in a su- perposition of macroscopically distinct states, Physical Review Letters 65, 1838 (1990)

work page 1990

[14] [14]

Hensen, N

B. Hensen, N. Kalb, M. S. Blok, A. E. Dréau, A. Reis- erer, R. F. L. Vermeulen, R. N. Schouten, M. Markham, D. J. Twitchen, K. Goodenough, D. Elkouss, S. Wehner, T. H. Taminiau, and R. Hanson, Loophole-free bell test using electron spins in diamond: Second experiment and additional analysis, Scientific Reports 6, 30289 (2016)

work page 2016

[15] [15]

Storz, J

S. Storz, J. Schär, A. Kulikov, et al., Loophole-free bell in- equality violation with superconducting circuits, Nature 617, 265 (2023)

work page 2023

[16] [16]

K. D. Jöns, L. Schweickert, M. A. M. Versteegh, D. Dalacu, P. J. Poole, A. Gulinatti, A. Giudice, V. Zwiller, and M. E. Reimer, Bright nanoscale source of deterministic entangled photon pairs violating bell’s inequality, Scientific Reports 7, 1700 (2017)

work page 2017

[17] [17]

Akaike, A new look at the statistical model identifica- tion, IEEE Transactions on Automatic Control 19, 716 (1974)

H. Akaike, A new look at the statistical model identifica- tion, IEEE Transactions on Automatic Control 19, 716 (1974)

work page 1974