pith. machine review for the scientific record. sign in

arxiv: 2605.09316 · v1 · submitted 2026-05-10 · 🪐 quant-ph · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Neural Information Causality

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:59 UTC · model grok-4.3

classification 🪐 quant-ph cs.AI
keywords neural information causalityquery-separated architecturesrandom-access communicationinformation causalityrepresentation learningCHSH correlationsTsirelson bound
0
0 comments X

The pith

Query-separated neural architectures induce random-access communication tasks bounded by information causality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Query-separated computation requires encoding data before the query arrives, so the intermediate representation must function as a communication message rather than a simple feature map. The paper embeds information causality into this regime to produce Neural-IC, which separates an embedding inequality relating random-access performance to interface mutual information from any additional physical capacity bound on that interface. A reader would care because the resulting framework supplies an operational diagnostic for query leakage, precision loss, and episode memory without defining capacity after the fact. It also supplies an exact one-bit classical benchmark and shows that the relevant quantum gain is fair query-conditioned access rather than exceeding the total information limit. Controlled simulations confirm that apparent violations arise only when query separation is broken or capacity is undercounted.

Core claim

Every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B). Any independently certified physical capacity bound on the interface then implies the stricter bound I_N-RAC ≤ C_H. This separation treats the representation as a message whose performance is limited by communication constraints rather than by post-hoc capacity definitions. For CHSH-type correlation layers the same embedding produces nested Neural-RAC protocols whose biases multiply across depth, and stability of a one-bit bottleneck at arbitrary depth selects the Tsirelson threshold. The paper also gives the exact one-bit classical RAC benchmark and an

What carries the argument

The embedding inequality I_N-RAC ≤ I(⃗a:H,B) that maps any query-separated neural computation onto a random-access communication experiment.

Load-bearing premise

Query-separated computation in neural architectures directly induces a random-access communication experiment to which information causality applies without further assumptions on the encoding or decoding maps.

What would settle it

A simulation or experiment in which a strictly query-separated network achieves measured I_N-RAC strictly larger than the simultaneously measured I(⃗a:H,B) while the interface mutual information is accurately estimated.

Figures

Figures reproduced from arXiv: 2605.09316 by Jeongho Bang, Marcin Paw{\l}owski.

Figure 1
Figure 1. Figure 1: FIG. 1 [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Classical one-bit benchmark against nested correlation protocols. The optimal classical majority code has a nonzero large- [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Closed-form information score [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Bias scan at fixed depth [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Finite-depth critical bias [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Capacity accounting checks for finite-precision and noisy interfaces. The diagonal is the ideal accounting line [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. Finite-depth phase boundaries for several effective capacities [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Controlled leakage and capacity-accounting simulations for [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: FIG. 12. Quantum correlation-layer sweep at [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: as φ increases from 0 to π/4, the effective isotropic bias rises smoothly from the classical edge 1/2 to the Tsirelson value 1/ √ 2. Feeding this single scalar into the nested isotropic analysis of Appendix A immediately yields the predicted Neural-RAC behavior at depth n, namely, P = 1 + Eiso(φ) n 2 , (D26) and IN-RAC(n, Eiso(φ)) = 2n  1 − h  1 + Eiso(φ) n 2  . (D27) [PITH_FULL_IMAGE:figures/full_fi… view at source ↗
Figure 13
Figure 13. Figure 13: FIG. 13. Effective isotropic bias [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: FIG. 14. Predicted Neural-RAC information score [PITH_FULL_IMAGE:figures/full_fig_p030_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: FIG. 15. Regularized optimization of the quantum correlation-layer angle for the toy utility in Eq. (D28). With no penalty the optimum is the [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗
read the original abstract

Query-separated computation forces a representation to play an operational role: data are encoded before a query is known, and a later decoder can answer only through the intermediate interface. In this regime the representation functions as a message rather than merely as a feature map. We formalize this observation by embedding information causality (IC) into representation learning, obtaining a framework called neural information causality (Neural-IC). The revised formulation separates two logically distinct statements. First, every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality $I_{\mathrm{N\text{-}RAC}}\le I(\vec a:H,B)$. Second, any independently certified physical capacity bound on the interface, such as a hard $m$-bit alphabet, a finite-precision register, or a power-constrained noisy channel, implies $I_{\mathrm{N\text{-}RAC}}\le C_H$. This separation avoids treating capacity as a post hoc definition and makes Neural-IC an operational diagnostic for query leakage, precision leakage, and episode-specific memory. We also provide an exact one-bit classical RAC benchmark, showing explicitly that the relevant quantum enhancement is not total information beyond the bottleneck, but fair query-conditioned access. For CHSH-type correlation layers, nested Neural-RAC protocols multiply correlation biases across depth; requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold. We extend the analysis to asymmetric seed biases, to multi-capacity finite-depth phase diagrams, and to correlated data via a conditional information score. Controlled simulations, including straight-through binary bottlenecks and deliberately leaky ablations, verify that apparent violations are accounted for by broken query separation or undercounted capacity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Neural Information Causality (Neural-IC), a framework that embeds information causality into query-separated neural architectures. It separates two claims: (1) every query-separated architecture induces a random-access communication experiment obeying the embedding inequality I_N-RAC ≤ I(⃗a:H,B), and (2) any independently certified physical capacity bound on the interface implies I_N-RAC ≤ C_H. The paper supplies an exact one-bit classical RAC benchmark, shows that nested CHSH-type correlation layers with one-bit bottleneck stability select the Tsirelson threshold, extends the analysis to asymmetric seed biases, multi-capacity finite-depth phase diagrams, and correlated data via a conditional information score, and reports controlled simulations (including straight-through binary bottlenecks and leaky ablations) that attribute apparent violations to broken query separation or undercounted capacity.

Significance. If the embedding is shown to be rigorous, the work supplies an operational diagnostic for query leakage, precision leakage, and episode-specific memory in neural representations by importing information-causality constraints. The explicit separation of the embedding inequality from capacity bounds avoids post-hoc definitions, the one-bit RAC benchmark clarifies that quantum enhancement concerns fair query-conditioned access rather than total information, and the stability argument for Tsirelson selection plus the ablation simulations provide concrete, falsifiable tests. These elements could usefully constrain capacity analyses in deep learning and quantum-inspired models.

major comments (3)
  1. [Abstract and formalization of embedding inequality] Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.
  2. [CHSH-type correlation layers and stability analysis] Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.
  3. [Controlled simulations and ablations] Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.
minor comments (2)
  1. [Abstract] Notation: the vector ⃗a and the subscript N-RAC should be defined at first use with an explicit reference to the corresponding RAC parties and message alphabet.
  2. [One-bit classical RAC benchmark] The one-bit classical RAC benchmark is presented as 'exact'; a short appendix deriving the classical bound from first principles (rather than citing it) would improve self-containedness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications and indicating where revisions will strengthen the presentation.

read point-by-point responses
  1. Referee: Abstract and the formalization section: the claim that 'every query-separated architecture induces a random-access communication experiment and obeys the embedding inequality I_N-RAC ≤ I(⃗a:H,B)' requires an explicit construction mapping the neural encoder output H to a non-adaptive RAC message and the decoder to the RAC receiver, with a proof that mutual-information terms are identical. Joint optimization of encoder and decoder (via batch-norm statistics, attention, or gradient flow) can create effective query-dependent pathways that violate the strict separation presupposed by the classical RAC definition; without this construction the inequality remains an analogy rather than a derived embedding.

    Authors: The formalization section defines query-separated architectures as those in which the encoder has no access to the query vector and the decoder receives only the interface representation H together with the query. This definition directly supplies the required mapping: H is the non-adaptive message and the decoder is the RAC receiver. The equality of the mutual-information terms follows immediately from the operational definition of the induced experiment. Nevertheless, to eliminate any ambiguity regarding possible query-dependent pathways introduced by joint optimization, we will insert an explicit theorem together with its proof in the revised manuscript. The proof will show that the architectural constraints (encoder independence from query and decoder access limited to H) preclude the leakage mechanisms the referee identifies. revision: partial

  2. Referee: Section on CHSH-type correlation layers and nested Neural-RAC protocols: the argument that 'requiring stability of a one-bit bottleneck for arbitrary depth selects the Tsirelson threshold' must be shown to derive the bound independently rather than presupposing the known quantum value. If the stability criterion is calibrated against the Tsirelson bound itself, the selection becomes circular and does not constitute an independent derivation from the Neural-IC axioms.

    Authors: The stability criterion is obtained by applying the embedding inequality recursively to the nested CHSH-type layers while enforcing that the one-bit bottleneck capacity remains finite at every depth. The maximum sustainable correlation bias is thereby fixed by the information-causality constraint alone; the Tsirelson value emerges as the unique number satisfying this recurrence. No external quantum bound is inserted. We will expand the relevant section with the full inductive derivation from the Neural-IC axioms to make the independence explicit and to forestall any appearance of circularity. revision: yes

  3. Referee: Simulation section: the controlled experiments with straight-through binary bottlenecks and leaky ablations are load-bearing for the claim that 'apparent violations are accounted for by broken query separation or undercounted capacity.' The manuscript must specify data-exclusion rules, error-analysis procedures, and the precise definition of 'query separation' used to label runs as valid or invalid; without these the verification cannot be reproduced or falsified.

    Authors: We agree that full reproducibility requires these details. In the revised manuscript we will add a dedicated reproducibility subsection that states: (i) the operational definition of query separation (encoder output statistically independent of query, verified by zero mutual information under gradient flow), (ii) the data-exclusion rule (a run is discarded if any detected leakage exceeds a pre-specified threshold of 0.01 bits), and (iii) the error-analysis procedure (bootstrap resampling with 10 000 iterations to obtain 95 % confidence intervals on all reported information scores). These additions will render the simulation claims directly falsifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation of Neural-IC

full rationale

The paper's central derivation embeds the established information causality principle into query-separated neural architectures by observing that such architectures correspond to random-access communication experiments. This embedding inequality I_N-RAC ≤ I(⃗a:H,B) is presented as a direct consequence of the query separation, which is an operational definition rather than a derived result. The further implication I_N-RAC ≤ C_H follows from applying the independent IC bound to the interface capacity. The discussion of Tsirelson threshold arises from requiring stability under arbitrary depth in one-bit bottlenecks, supported by an explicit classical RAC benchmark and simulations. Since the core claims rely on the mapping to a known physical principle and new operational interpretations without self-referential reductions or fitted predictions masquerading as derivations, the chain is self-contained and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the applicability of the information causality principle to induced random-access experiments and the existence of independently certifiable physical capacity bounds such as m-bit alphabets or power-constrained channels.

axioms (1)
  • domain assumption Information causality holds for random-access communication experiments induced by any query-separated architecture.
    Invoked to obtain the embedding inequality I_N-RAC ≤ I(⃗a:H,B) from the representation acting as a message.
invented entities (1)
  • Neural-IC framework no independent evidence
    purpose: To embed information causality into representation learning and separate embedding from capacity bounds for operational diagnostics.
    Newly introduced construct in the paper; no independent evidence provided beyond the framework itself.

pith-pipeline@v0.9.0 · 5590 in / 1341 out tokens · 53853 ms · 2026-05-12T04:59:16.362081+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · 2 internal anchors

  1. [1]

    Lemma 2 then gives Eq

    Therefore (−1)stEst = 1/ √ 2for every input pair. Lemma 2 then gives Eq. (84). Theorem 11 is the tightness complement to Theorem 8. Neural-IC rules outE >1/ √ 2under arbitrary-depth nesting, while quantum mechanics attainsE= 1/ √

  2. [2]

    quantum is more expressive

    The boundary is therefore not merely a formal upper bound; it is a physically realized frontier. C. Classical communication semantics, dense coding, and Holevo limits IC is normally phrased for classical communication be- cause the capacity unit is then unambiguous. If the transmit- ted bottleneck is quantum and entanglement-assisted, dense coding can cha...

  3. [3]

    Finally, the controlled Neural-RAC ablations in Fig

    The large-napproximation Ecrit(n;C H)≃ 1√ 2 (2CH ln 2)1/(2n) (94) explains the slow convergence of finite-depth scans near criti- cality. Finally, the controlled Neural-RAC ablations in Fig. 11 test the diagnostic interpretation directly atN= 8. The strict models use a straight-through binary bottleneck trained end- to-end with the encoder blind to the qu...

  4. [4]

    Industrial Technology Infrastructure Program

    The finite-depth boundary approaches the quantum threshold from above, clarifying why slightly supercritical correlations may require sufficiently large nesting depth before violating the one-bit Neural-IC bound. FIG. 9. Capacity accounting checks for finite-precision and noisy interfaces. The diagonal is the ideal accounting lineIN-RAC =C H. Lossless har...

  5. [5]

    winning” pairs each occur with probability(1 +E)/4and the two “losing

    Isotropic CHSH cells as biased XOR constraints We recall the isotropic CHSH cell (Definition 9): it is a no-signaling box with inputs(s, t)∈ {0,1} 2 and outputs(A, B)∈ {0,1} 2 such that the local outcomes are uniformly random and Pr[A⊕B=st|s, t] = 1 +E 2 ,∀s, t,(A2) whereE∈[0,1]is the bias (correlation strength). The key to all subsequent calculations is ...

  6. [6]

    The pictorial “pyramid” of Fig

    Tree notation for the nested protocol We now define the nested(N, m) = (2 n,1)protocol in a form convenient for proofs. The pictorial “pyramid” of Fig. 3 in Ref. [11] is precisely a full binary tree of depthnwhose internal nodes correspond to CHSH cells. Indexing.Let{0,1} ≤n−1 denote the set of all binary strings (“words”) of length at mostn−1. Each wordw...

  7. [7]

    Decoding (Bob).Bob receivesxand the queryb=b 1b2 · · ·b n ∈ {0,1} n

    Encoding and decoding rules Encoding (Alice).For each internal nodew∈ {0,1} ≤n−1, once the children’s messagesx w0 andx w1 are available, Alice sets sw :=x w0 ⊕x w1, x w :=x w0 ⊕A w.(A10) At the end of this upward recursion, Alice transmits the single-bit bottleneck message x:=x ϵ,(A11) whereϵdenotes the empty word (the root). Decoding (Bob).Bob receivesx...

  8. [8]

    error bits

    One-step correctness identity and error propagation We begin with a single-node identity that expresses exactly what Bob recovers fromwwhen he applies the update Eq. (A13). Lemma 7(One-step child recovery up to the local CHSH error).Fix an internal nodew. Lete w be the CHSH error bit of the cell atw, ew := (Aw ⊕B w)⊕s wtw.(A15) Then, the encoding/decoding...

  9. [9]

    bottom-to-top

    Even-parity probability and the bias multiplicationE7→E n We now compute the probability that the parity in Eq. (A22) is zero. For completeness we prove the required identity in its most general form. Lemma 8(Parity of i.i.d. biased bits).Lete 1, . . . , en ∈ {0,1}be i.i.d. such that Pr[ei = 0] = 1 +E 2 ,Pr[e i = 1] = 1−E 2 ,(A29) whereE∈[−1,1]. LetE ⊕ :=...

  10. [10]

    Becausea K andβare binary, mutual information can be written in closed form in terms of the (conditional) confusion matrix

    Mutual information of a binary channel: exact formulas In Neural-RAC the relevant object is the conditional channelaK →βgiven the event(b=K). Becausea K andβare binary, mutual information can be written in closed form in terms of the (conditional) confusion matrix. This subsection records these formulas explicitly. B.1.1. General binary channel under an u...

  11. [11]

    random access

    From data toI N-RAC: estimators and bias We now address the practical question: given empirical samples from a Neural-RAC device (classical neural model, QNN, or a simulator), how does one estimateI N-RAC? B.2.1. Two experimental designs There are two natural sampling designs. Design A (per-query batching).For eachK∈ {0, . . . , N−1}, runT K trials with t...

  12. [12]

    Here we describe a simple and robust method

    Confidence intervals and error propagation A practical report ofI N-RAC should include an uncertainty statement. Here we describe a simple and robust method. B.3.1. Binomial confidence intervals for bP Under the symmetry design Eq. (B16), the number of successesS:= PT t=1 1{β(t) =a (t) b(t) }is Binomial(T, P). Hence, classical binomial confidence interval...

  13. [13]

    correlation layer

    Critical-regime sample complexity: why Tsirelson is numerically delicate A subtle point arises near the Tsirelson threshold. Even when the total information score isO(1), the per-query advantage can be exponentially small in the depthn. This section quantifies the phenomenon. Lemma 9(Small-bias expansion of1−h).Letδ∈[−1,1]and setp= (1 +δ)/2. Then, 1−h(p) ...

  14. [14]

    From±1correlators to CHSH winning probabilities The CHSH game is most transparently described in the bit language: the winning predicate is A⊕B=st.(D1) Quantum measurement outcomes, however, are naturally±1random variables. The conversion is elementary but worth stat- ing carefully because it is the algebraic hinge that connects quantum correlators to the...

  15. [15]

    correlation capacity

    CHSH twirling: reduction to an isotropic one-parameter family The nesting analysis in Appendix A assumes an isotropic CHSH cell: Pr[A⊕B=st|s, t] = 1 +E 2 ∀(s, t),(D9) for some biasE∈[0,1]. A natural question is: what if the raw correlations are not isotropic? Ref. [11] emphasizes that one can apply a purely local randomization (no communication) that pres...

  16. [16]

    This is exactly the form one would like in a QNN module: a small set of trainable parameters controlling correlation strength

    A one-parameter quantum family: tuningE iso up to Tsirelson We now exhibit a simple quantum construction where a single angle parameter controls the effective isotropic biasEiso. This is exactly the form one would like in a QNN module: a small set of trainable parameters controlling correlation strength. We use the Bell state|Φ +⟩= (|00⟩+|11⟩)/ √

  17. [17]

    Letˆσx,ˆσz be Pauli matrices. Alice’s CHSH settings are fixed as ˆA0 := ˆσz, ˆA1 := ˆσx.(D20) Bob’s settings form a one-parameter family in thex-zplane: ˆB0(φ) := cosφˆσz + sinφˆσx, ˆB1(φ) := cosφˆσz −sinφˆσx,(D21) whereφ∈[0, π/4]. Theorem 16(Closed-form CHSH correlator and effective isotropic bias for the family Eq. (D21)).LetE st(φ) :=⟨Φ +| ˆAs ⊗ ˆBt(φ)...

  18. [18]

    quantum knob

    Feeding this single scalar into the nested isotropic analysis of Appendix A immediately yields the predicted Neural-RAC behavior at depthn, namely, P= 1 +E iso(φ)n 2 ,(D26) and IN-RAC(n, Eiso(φ)) = 2n 1−h 1 +E iso(φ)n 2 .(D27) Fig. 14 visualizes this dependence for representative anglesφ= 0,π/8, andπ/4against the Neural-IC limitm= 1(dashed), showing that ...

  19. [19]

    Neural Turing Machines

    A. Graves, G. Wayne, and I. Danihelka, Neural turing machines (2014), arXiv:1410.5401 [cs.NE]. 32

  20. [20]

    Graves, G

    A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou,et al., Hybrid computing using a neural network with dynamic external memory, Nature538, 471 (2016)

  21. [21]

    Lewis, E

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K”uttler, M. Lewis, W.-t. Yih, T. Rockt”aschel, S. Riedel, and D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp tasks, inAdvances in Neural Information Processing Systems (2020)

  22. [22]

    Vaswani, N

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems(2017)

  23. [23]

    J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed experiment to test local hidden-variable theories, Physical Review Letters23, 880 (1969)

  24. [24]

    Brunner, D

    N. Brunner, D. Cavalcanti, S. Pironio, V . Scarani, and S. Wehner, Bell nonlocality, Reviews of Modern Physics86, 419 (2014)

  25. [25]

    B. S. Cirel’son, Quantum generalizations of bell’s inequality, Letters in Mathematical Physics4, 93 (1980)

  26. [26]

    Navascu ´es, S

    M. Navascu ´es, S. Pironio, and A. Ac´ın, Bounding the set of quantum correlations, Physical Review Letters98, 010401 (2007)

  27. [27]

    Tavakoli, A

    A. Tavakoli, A. Pozas-Kerstjens, P. Brown, and M. Ara´ujo, Semidefinite programming relaxations for quantum correlations, Reviews of Modern Physics96, 045006 (2024)

  28. [28]

    Fritz, A

    T. Fritz, A. B. Sainz, R. Augusiak, J. B. Brask, R. Chaves, A. Leverrier, and A. Ac ´ın, Local orthogonality as a multipartite principle for quantum correlations, Nature Communications4, 2263 (2013)

  29. [29]

    Pawłowski, T

    M. Pawłowski, T. Paterek, D. Kaszlikowski, V . Scarani, A. Winter, and M. ˙Zukowski, Information causality as a physical principle, Nature461, 1101 (2009)

  30. [30]

    Popescu and D

    S. Popescu and D. Rohrlich, Quantum nonlocality as an axiom, Foundations of Physics24, 379 (1994)

  31. [31]

    van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159

    W. van Dam, Implausible consequences of superstrong nonlocality (2005), arXiv:quant-ph/0501159

  32. [32]

    Brassard, H

    G. Brassard, H. Buhrman, N. Linden, A. A. M ´ethot, A. Tapp, and F. Unger, Limit on nonlocality in any world in which communication complexity is not trivial, Physical Review Letters96, 250401 (2006)

  33. [33]

    P. Jain, M. Gachechiladze, and N. Miklin, Information causality as a tool for bounding the set of quantum correlations, Physical Review Letters133, 160201 (2024)

  34. [34]

    The information bottleneck method

    N. Tishby, F. C. Pereira, and W. Bialek, The information bottleneck method (2000), arXiv:physics/0004057

  35. [35]

    A. A. Alemi, I. Fischer, J. V . Dillon, and K. Murphy, Deep variational information bottleneck, inInternational Conference on Learning Representations(2017)

  36. [36]

    Bengio, A

    Y . Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Transactions on Pattern Analysis and Machine Intelligence35, 1798 (2013)

  37. [37]

    A. M. Saxe, Y . Bansal, J. Dapello, M. S. Advani, A. Kolchinsky, B. D. Tracey, and D. D. Cox, On the information bottleneck theory of deep learning, Journal of Statistical Mechanics: Theory and Experiment2019, 124020 (2019)

  38. [38]

    Kawaguchi, Z

    K. Kawaguchi, Z. Deng, X. Ji, and J. Huang, How does information bottleneck help deep learning?, inProceedings of the 40th Interna- tional Conference on Machine Learning(2023)

  39. [39]

    Biamonte, P

    J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature549, 195 (2017)

  40. [40]

    Cerezo, A

    M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and P. J. Coles, Variational quantum algorithms, Nature Reviews Physics3, 625 (2021)

  41. [41]

    J. R. McClean, S. Boixo, V . N. Smelyanskiy, R. Babbush, and H. Neven, Barren plateaus in quantum neural network training landscapes, Nature Communications9, 4812 (2018)

  42. [42]

    Schuld and N

    M. Schuld and N. Killoran, Quantum machine learning in feature hilbert spaces, Physical Review Letters122, 040504 (2019)

  43. [43]

    E. H. Lieb and M. B. Ruskai, Proof of the strong subadditivity of quantum-mechanical entropy, Journal of Mathematical Physics14, 1938 (1973)

  44. [44]

    C. H. Bennett and S. J. Wiesner, Communication via one- and two-particle operators on einstein-podolsky-rosen states, Physical Review Letters69, 2881 (1992)

  45. [45]

    A. S. Holevo, Bounds for the quantity of information transmitted by a quantum communication channel, Problems of Information Trans- mission9, 177 (1973)

  46. [46]

    Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)

    L. Paninski, Estimation of entropy and mutual information, Neural Computation15, 1191 (2003)

  47. [47]

    L. D. Brown, T. T. Cai, and A. DasGupta, Interval estimation for a binomial proportion, Statistical Science16, 101 (2001)

  48. [48]

    Boucheron, G

    S. Boucheron, G. Lugosi, and P. Massart,Concentration Inequalities: A Nonasymptotic Theory of Independence(Oxford University Press, 2013)

  49. [49]

    Bertoni, J

    G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, Sponge functions, inECRYPT Hash Workshop(2007)

  50. [50]

    Bertoni, J

    G. Bertoni, J. Daemen, M. Peeters, and G. Van Assche, On the indifferentiability of the sponge construction, inAdvances in Cryptology – EUROCRYPT 2008, Lecture Notes in Computer Science, V ol. 4965 (Springer, 2008) pp. 181–197

  51. [51]

    Bertoni, J

    G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, and R. Van Keer, Keccak implementation overview (2012), version 3.2

  52. [52]

    Wetzels and W

    J. Wetzels and W. Bokslag, Sponges and engines: An introduction to keccak and keyak (2015), arXiv:1510.02856 [cs.CR]