Path Integral Solution for Dissipative Generative Dynamics

Xidi Wang

arxiv: 2601.00860 · v2 · submitted 2025-12-30 · 💻 cs.LG · cs.AI· physics.app-ph· quant-ph

Path Integral Solution for Dissipative Generative Dynamics

Xidi Wang This is my paper

Pith reviewed 2026-05-16 19:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AIphysics.app-phquant-ph

keywords path integraldissipative dynamicsKoopman operatorlanguage generationquantum dynamicsgenerative modelsinformation dissipationspectral analysis

0 comments

The pith

Language generation requires dissipative quantum dynamics with non-local context aggregation, while conservation laws cause fundamental failure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that purely mechanical systems can produce coherent text only when modeled as dissipative quantum dynamics that admit exact path integral solutions and allow controlled information loss. A reader would care because this reframes intelligence in machines as depending on irreversible processes and non-local aggregation rather than reversible computation alone. Spectral decomposition via Koopman operators isolates decay, growth, and neutral modes that together enable directed information flow. Hamiltonian constraints remove the dissipative modes and degrade performance even when model capacity stays fixed. The result positions language generation as an instance of dissipative quantum field theory.

Core claim

Dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text generation, while conservation laws cause fundamental failure. Koopman operators with closed-form path integral propagators show that irreversible computation requires both controlled information dissipation and causal context aggregation. Spectral analysis yields an emergent eigenvalue structure that separates decay modes for forgetting, growth modes for amplification, and neutral modes for preservation; these are the essential ingredients for directed information flow. Hamiltonian constraints eliminate the dissipative modes and degrade performance despite unchanged model capacity,,

What carries the argument

Koopman operators with closed-form path integral propagators on dissipative quantum dynamics, which perform spectral decomposition into decay, growth, and neutral eigenvalue modes to enable directed information flow.

If this is right

Irreversible computation in language models requires both controlled information dissipation and causal non-local context aggregation.
Spectral eigenvalue structure naturally separates forgetting, amplification, and preservation functions needed for directed information flow.
Imposing Hamiltonian conservation eliminates dissipative modes and degrades generative performance even when capacity is unchanged.
Language generation is established as a dissipative quantum field theory rather than a conservative mechanical system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Explicit dissipation mechanisms could be added to existing neural architectures to improve long-range coherence without increasing parameter count.
The same path-integral and Koopman framework might apply to other sequential generative tasks such as music or molecular design.
Training dynamics in conventional neural networks may implicitly simulate the required dissipative effects through optimization.

Load-bearing premise

The generative process in language models can be exactly represented by dissipative quantum dynamics that admit closed-form path integral solutions and Koopman operator spectral decomposition.

What would settle it

A conservative Hamiltonian language model that achieves coherent text generation on standard benchmarks without any dissipation terms would falsify the necessity of dissipative modes.

Figures

Figures reproduced from arXiv: 2601.00860 by Xidi Wang.

**Figure 2.** Figure 2: FIG. 2. Koopman eigenvalue spectrum (32 layers). Main: [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

read the original abstract

Can purely mechanical systems generate intelligent language? We prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text generation, while conservation laws cause fundamental failure. Employing Koopman operators with closed-form path integral propagators, we show irreversible computation fundamentally requires both controlled information dissipation and causal context aggregation. Spectral analysis reveals emergent eigenvalue structure, separating into decay modes (forgetting), growth modes (amplification), and neutral modes (preservation) -- the essential ingredients for directed information flow. Hamiltonian constraints force the elimination of these dissipative modes and degrading performance despite unchanged model capacity. This establishes language generation as dissipative quantum field theory, proving mechanical systems acquire intelligence through the combination of dissipation and non-locality, not through conservation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript claims to prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text generation via closed-form path integral solutions using Koopman operators, while conservation laws cause fundamental failure. Spectral analysis is asserted to reveal emergent eigenvalue modes separating into decay (forgetting), growth (amplification), and neutral (preservation) modes as essential for directed information flow, thereby establishing language generation as dissipative quantum field theory.

Significance. If the claimed exact equivalence between standard autoregressive language model generation and dissipative quantum dynamics with closed-form propagators were rigorously established, the work would provide a novel theoretical bridge between quantum dissipative systems and generative AI, potentially explaining the necessity of irreversibility for coherent computation. It would also supply a spectral framework for understanding information flow in models. However, the absence of derivations renders the significance speculative at present.

major comments (3)

Abstract: The central claim of a 'proof' that autoregressive token prediction exactly equals dissipative quantum dynamics admitting closed-form path integral solutions is unsupported, as no derivation of the Hamiltonian, Lindblad operators, or explicit propagator from any concrete architecture (e.g., transformer attention or softmax) is supplied.
Abstract: The separation into decay, growth, and neutral modes is presented as emerging from spectral analysis of the dissipative dynamics, yet no explicit eigenvalue calculation, Koopman operator spectrum, or error-controlled decomposition is provided to show these modes are forced rather than introduced by ansatz.
Abstract: The assertion that Hamiltonian constraints force elimination of dissipative modes and degrade performance lacks any side-by-side quantitative comparison of conserved versus dissipative propagators on the same token distribution or any performance metric quantifying the claimed fundamental failure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the points where the manuscript's claims require stronger supporting derivations. We agree that the abstract and main text would be substantially strengthened by explicit constructions of the Hamiltonian, Lindblad operators, and propagator from concrete model components, together with the requested spectral calculations and quantitative comparisons. The revised manuscript will incorporate these elements in the main text and a new appendix.

read point-by-point responses

Referee: Abstract: The central claim of a 'proof' that autoregressive token prediction exactly equals dissipative quantum dynamics admitting closed-form path integral solutions is unsupported, as no derivation of the Hamiltonian, Lindblad operators, or explicit propagator from any concrete architecture (e.g., transformer attention or softmax) is supplied.

Authors: We accept that the current manuscript asserts the equivalence without supplying the intermediate steps. In the revision we will derive the effective non-Hermitian Hamiltonian and Lindblad operators directly from the transformer attention matrix and the softmax normalization, then obtain the closed-form path-integral propagator via the Koopman operator. The derivation will start from the standard autoregressive loss and show how dissipation and non-local aggregation arise naturally. revision: yes
Referee: Abstract: The separation into decay, growth, and neutral modes is presented as emerging from spectral analysis of the dissipative dynamics, yet no explicit eigenvalue calculation, Koopman operator spectrum, or error-controlled decomposition is provided to show these modes are forced rather than introduced by ansatz.

Authors: The referee is correct that the manuscript does not yet display the explicit spectrum. The revised version will include the full eigenvalue problem for the Koopman operator of the dissipative generator, together with a controlled truncation error bound that demonstrates the three classes of modes (decay, growth, neutral) are required by the non-Hermitian structure and are not chosen by hand. revision: yes
Referee: Abstract: The assertion that Hamiltonian constraints force the elimination of dissipative modes and degrade performance lacks any side-by-side quantitative comparison of conserved versus dissipative propagators on the same token distribution or any performance metric quantifying the claimed fundamental failure.

Authors: We agree that a direct empirical comparison is necessary. The revision will add a controlled experiment that evolves the same initial token distribution under both a purely Hamiltonian (unitary) propagator and the dissipative propagator, reporting perplexity, next-token accuracy, and a coherence metric on a fixed validation set. This will quantify the performance gap attributable to the absence of dissipative modes. revision: yes

Circularity Check

2 steps flagged

Central claim that language generation equals dissipative quantum dynamics reduces to unshown equivalence assumption by construction

specific steps

self definitional [Abstract]
"This establishes language generation as dissipative quantum field theory, proving mechanical systems acquire intelligence through the combination of dissipation and non-locality, not through conservation."

The conclusion that language generation IS dissipative QFT is reached by assuming the generative process can be exactly represented by dissipative quantum dynamics with closed-form path integral solutions; the 'proof' therefore restates the modeling premise rather than deriving the equivalence from architecture or data.
self definitional [Abstract]
"Spectral analysis reveals emergent eigenvalue structure, separating into decay modes (forgetting), growth modes (amplification), and neutral modes (preservation) -- the essential ingredients for directed information flow."

The three mode classes are defined directly by the dissipative dynamics assumptions and then labeled 'emergent' from spectral analysis; no explicit eigenvalue calculation or comparison against a non-dissipative baseline is provided, so the separation is imposed by the model choice rather than independently obtained.

full rationale

The paper's load-bearing step is the assertion that autoregressive generation is exactly dissipative quantum dynamics admitting closed-form path integrals and Koopman spectral decomposition. This equivalence is introduced as the modeling premise rather than derived from any transformer equations, attention propagator, or token distribution. The subsequent 'proof' that dissipation produces coherent generation and conservation laws cause failure, plus the separation into decay/growth/neutral modes, therefore follows tautologically from the initial representational choice. No independent derivation of Hamiltonian/Lindblad operators or explicit eigenvalue computation is exhibited, rendering the emergent structure and the final identification of language generation as dissipative QFT circular with the input ansatz.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on mapping language generation to dissipative quantum systems via Koopman operators and path integrals; this introduces domain assumptions about exact representability without shown validation or independent evidence.

axioms (1)

domain assumption Language generation dynamics admit an exact representation as dissipative quantum systems with closed-form path integral propagators
Invoked to enable the spectral analysis separating decay, growth, and neutral modes.

invented entities (1)

decay modes, growth modes, and neutral modes no independent evidence
purpose: To separate forgetting, amplification, and preservation for directed information flow in text generation
Introduced as emergent from the eigenvalue structure of the dissipative system.

pith-pipeline@v0.9.0 · 5414 in / 1348 out tokens · 48670 ms · 2026-05-16T19:09:13.631103+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

[1]

self-funded

ensures the system remains bounded without external energy input. C. Energy Budget: Self-Funded Dynamics From eigenvalue analysis: X j γ(−) j =−1400.35 (total decay) (6) X j γ(+) j = +935.73 (total growth) (7) X j γj =−464.62 (net dissipation) (8) Since|Decay|>|Growth|, the magnitude of total growth is bounded by total decay, ensuring net dissipa- tion wi...

work page
[2]

Setup Letψ∈C d denote the hidden state. Define: Prior.Before observation,ψhas Gaussian distribution with meanµ 0 ∈C d and precision matrix Λ −1 0 ∈C d×d (inverse covariance): p(ψ)∝exp −1 2(ψ−µ 0)†Λ−1 0 (ψ−µ 0) .(13) Likelihood.Observing targetv∈C d through measure- ment matrixW∈C d×d with noise levelσ 2 >0: p(v|ψ)∝exp − 1 2σ2 ∥W ψ−v∥ 2 .(14) 4

work page
[3]

Completing the Square to Obtain Posterior By Bayes’ theorem, the posterior satisfies logp(ψ|v) = logp(v|ψ) + logp(ψ) + const. Expanding the quadratic forms and collecting terms in ψ: logp(ψ|v) =− 1 2 ψ†Λ−1 1 ψ+η †ψ+ψ †η+ const,(15) where we define: Λ−1 1 := Λ−1 0 +σ −2W †W,(16) η:= Λ −1 0 µ0 +σ −2W †v.(17) The key algebraic identity (completing the square...

work page
[4]

(18) into Eq

Gaussian Posterior Substituting Eq. (18) into Eq. (15): p(ψ|v) =N(ψ;µ 1,Λ 1),(19) with: Precision: Λ −1 1 = Λ−1 0 +σ −2W †W,(20) Mean:µ 1 = Λ1 Λ−1 0 µ0 +σ −2W †v .(21) The precision formula (20) shows that precisions add: observation increases certainty. This update is equivalent to the Kalman filter [12] in information form

work page
[5]

Measurement Action The negative log-likelihood (14) gives the measurement action: Smeas =−logp(v|ψ) = 1 2σ2 ∥W ψ−v∥ 2 + const.(22) For direct observation (W=I) with targetv=ψ target, this yields Eq. (12). B. Derivation II: Quantum Measurement Theory

work page
[6]

The pointer starts in Gaussian state ϕ(q) = 1 (2πσ2)1/4 exp − q2 4σ2 ,(23) whereqis the pointer position andσcharacterizes the pointer width

Gaussian Pointer Model Consider measuring observable ˆAusing an auxiliary pointer system [7, 9]. The pointer starts in Gaussian state ϕ(q) = 1 (2πσ2)1/4 exp − q2 4σ2 ,(23) whereqis the pointer position andσcharacterizes the pointer width. The measurement interaction ˆU= exp −i ˆA⊗ˆpptr entangles system and pointer. For system eigenstate|a⟩ of ˆA, the poin...

work page
[7]

surprise

Weak Measurement Limit A measurement is weak whenσ≫∆A, where ∆A=q ⟨ ˆA2⟩ − ⟨ ˆA⟩2 is the observable spread. Expanding Eq. (24) to leading order inσ −2 and normalizing [7]: |ψ⟩ → |ψ⟩+ q− ⟨ ˆA⟩ 2σ2 ( ˆA− ⟨ ˆA⟩)|ψ⟩+O(σ −4).(25) The state shifts toward the measured value by an amount proportional to the “surprise” (q− ⟨ ˆA⟩) and inversely proportional toσ 2

work page
[8]

Continuous Measurement For continuous monitoring, divide timeTintoNin- tervals of durationδt=T /N, withσ 2 =σ 2 0/δtto en- sure finite information rate asδt→0. The measurement record{q k}has distribution: P(q k|ψk)∝exp −(qk − ⟨ ˆA⟩k)2 2σ2 ! .(26) Taking the product over all intervals and the limitN→ ∞: P[{q(t)}|ψ 0]∝exp − Z T 0 (q(t)− ⟨ ˆA⟩t)2 2σ2 0 dt ! .(27)

work page
[9]

Path Integral Weight Eq. (27) establishes that the path integral weight from continuous weak measurement is: exp(−Smeas), S meas = Z T 0 ∥q(t)− ⟨ ˆA⟩t∥2 2σ2 0 dt.(28) Identifyingq(t)→ψ target (observation) and⟨ ˆA⟩ →ψ (state) recovers Eq. (12). 5 C. Derivation III: Maximum Entropy Principle

work page
[10]

Translation-Invariant Kernels A similarity measureK(ψ, v) comparing hidden state ψto targetvis translation-invariant ifK(ψ, v) =k(ψ−v) for some functionk. Bochner’s theorem [13] states that a continuous translation-invariant kernel is positive semi-definite if and only if it is the Fourier transform of a non-negative measure: k(δ) = Z Rd p(ω)eiω·δ dω,(29)...

work page
[11]

least in- formative

Maximum Entropy Characterization Among all translation-invariant kernels with fixed sec- ond momentE p[∥ω∥2] =c, which kernel has maximum entropy in frequency space? MaximizingH[p] =− R plogp dωsubject to R p dω= 1 and R ∥ω∥2p dω=cvia calculus of variations yields: p(ω)∝exp −λ∥ω∥2 ,(30) which is Gaussian. The corresponding kernel in position space: K(ψ, v...

work page
[12]

pointer states

Kernel as Likelihood Interpreting the kernel as likelihood: P(observev|stateψ)∝K(ψ, v) = exp − ∥ψ−v∥ 2 2σ2 . (32) The action as negative log-likelihood: Smeas =−logP(v|ψ) = ∥ψ−v∥ 2 2σ2 + const,(33) which is Eq. (12). D. Convergence of the Three Derivations The three frameworks yield identical results but answer different questions: •Maximum entropy: Why s...

work page
[13]

B. O. Koopman, Hamiltonian systems and transforma- tion in Hilbert space, Proc. Natl. Acad. Sci. U.S.A.17, 315 (1931)

work page 1931
[14]

Mezi´ c, Spectral properties of dynamical systems, model reduction and decompositions, Nonlinear Dyn.41, 309 (2005)

I. Mezi´ c, Spectral properties of dynamical systems, model reduction and decompositions, Nonlinear Dyn.41, 309 (2005)

work page 2005
[15]

Greydanus, M

S. Greydanus, M. Dzamba, and J. Yosinski, Hamilto- nian neural networks, Adv. Neural Inf. Process. Syst.32 (2019)

work page 2019
[16]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, Adv. Neural Inf. Process. Syst.31(2018)

work page 2018
[17]

Lindblad, On the generators of quantum dynamical semigroups, Commun

G. Lindblad, On the generators of quantum dynamical semigroups, Commun. Math. Phys.48, 119 (1976)

work page 1976
[18]

H. J. Carmichael,An Open Systems Approach to Quan- tum Optics(Springer, Berlin, 1993)

work page 1993
[19]

H. M. Wiseman and G. J. Milburn,Quantum Measure- ment and Control(Cambridge University Press, 2009)

work page 2009
[20]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, Transformers are RNNs: Fast autoregressive transform- ers with linear attention, Proc. Int. Conf. Mach. Learn., pp. 5156–5165 (2020)

work page 2020
[21]

Di´ osi, Continuous quantum measurement and Itˆ o for- malism, Phys

L. Di´ osi, Continuous quantum measurement and Itˆ o for- malism, Phys. Lett. A129, 419 (1988)

work page 1988
[22]

V. P. Belavkin, Nondemolition measurements, nonlinear filtering and dynamic programming of quantum stochas- tic processes, Lect. Notes Control Inf. Sci.121, 245 (1988)

work page 1988
[23]

Gisin and I

N. Gisin and I. C. Percival, The quantum-state diffusion model applied to open systems, J. Phys. A: Math. Gen. 25, 5677 (1992)

work page 1992
[24]

R. E. Kalman, A new approach to linear filtering and prediction problems, J. Basic Eng.82, 35 (1960)

work page 1960
[25]

Bochner,Lectures on Fourier Integrals(Princeton Uni- versity Press, 1959)

S. Bochner,Lectures on Fourier Integrals(Princeton Uni- versity Press, 1959)

work page 1959
[26]

W. H. Zurek, Decoherence, einselection, and the quan- tum origins of the classical, Rev. Mod. Phys.75, 715 (2003)

work page 2003
[27]

El-Ganainy, K

R. El-Ganainy, K. G. Makris, M. Khajavikhan, Z. H. Musslimani, S. Rotter, and D. N. Christodoulides, 12 Non-Hermitian physics and PT symmetry, Nat. Phys.14, 11 (2018)

work page 2018
[28]

Weedbrook, S

C. Weedbrook, S. Pirandola, R. Garc´ ıa-Patr´ on, N. J. Cerf, T. C. Ralph, J. H. Shapiro, and S. Lloyd, Gaussian quantum information, Rev. Mod. Phys.84, 621 (2012)

work page 2012
[29]

Lee-Thorp, J

J. Lee-Thorp, J. Ainslie, I. Eckstein, and S. Onta˜ n´ on, FNet: Mixing tokens with Fourier transforms, Proc. NAACL-HLT, pp. 4296–4313 (2022)

work page 2022
[30]

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

R. Eldan and Y. Li, TinyStories: How small can lan- guage models be and still speak coherent English?, arXiv:2305.07759 (2023)

work page internal anchor Pith review arXiv 2023

[1] [1]

self-funded

ensures the system remains bounded without external energy input. C. Energy Budget: Self-Funded Dynamics From eigenvalue analysis: X j γ(−) j =−1400.35 (total decay) (6) X j γ(+) j = +935.73 (total growth) (7) X j γj =−464.62 (net dissipation) (8) Since|Decay|>|Growth|, the magnitude of total growth is bounded by total decay, ensuring net dissipa- tion wi...

work page

[2] [2]

Setup Letψ∈C d denote the hidden state. Define: Prior.Before observation,ψhas Gaussian distribution with meanµ 0 ∈C d and precision matrix Λ −1 0 ∈C d×d (inverse covariance): p(ψ)∝exp −1 2(ψ−µ 0)†Λ−1 0 (ψ−µ 0) .(13) Likelihood.Observing targetv∈C d through measure- ment matrixW∈C d×d with noise levelσ 2 >0: p(v|ψ)∝exp − 1 2σ2 ∥W ψ−v∥ 2 .(14) 4

work page

[3] [3]

Completing the Square to Obtain Posterior By Bayes’ theorem, the posterior satisfies logp(ψ|v) = logp(v|ψ) + logp(ψ) + const. Expanding the quadratic forms and collecting terms in ψ: logp(ψ|v) =− 1 2 ψ†Λ−1 1 ψ+η †ψ+ψ †η+ const,(15) where we define: Λ−1 1 := Λ−1 0 +σ −2W †W,(16) η:= Λ −1 0 µ0 +σ −2W †v.(17) The key algebraic identity (completing the square...

work page

[4] [4]

(18) into Eq

Gaussian Posterior Substituting Eq. (18) into Eq. (15): p(ψ|v) =N(ψ;µ 1,Λ 1),(19) with: Precision: Λ −1 1 = Λ−1 0 +σ −2W †W,(20) Mean:µ 1 = Λ1 Λ−1 0 µ0 +σ −2W †v .(21) The precision formula (20) shows that precisions add: observation increases certainty. This update is equivalent to the Kalman filter [12] in information form

work page

[5] [5]

Measurement Action The negative log-likelihood (14) gives the measurement action: Smeas =−logp(v|ψ) = 1 2σ2 ∥W ψ−v∥ 2 + const.(22) For direct observation (W=I) with targetv=ψ target, this yields Eq. (12). B. Derivation II: Quantum Measurement Theory

work page

[6] [6]

The pointer starts in Gaussian state ϕ(q) = 1 (2πσ2)1/4 exp − q2 4σ2 ,(23) whereqis the pointer position andσcharacterizes the pointer width

Gaussian Pointer Model Consider measuring observable ˆAusing an auxiliary pointer system [7, 9]. The pointer starts in Gaussian state ϕ(q) = 1 (2πσ2)1/4 exp − q2 4σ2 ,(23) whereqis the pointer position andσcharacterizes the pointer width. The measurement interaction ˆU= exp −i ˆA⊗ˆpptr entangles system and pointer. For system eigenstate|a⟩ of ˆA, the poin...

work page

[7] [7]

surprise

Weak Measurement Limit A measurement is weak whenσ≫∆A, where ∆A=q ⟨ ˆA2⟩ − ⟨ ˆA⟩2 is the observable spread. Expanding Eq. (24) to leading order inσ −2 and normalizing [7]: |ψ⟩ → |ψ⟩+ q− ⟨ ˆA⟩ 2σ2 ( ˆA− ⟨ ˆA⟩)|ψ⟩+O(σ −4).(25) The state shifts toward the measured value by an amount proportional to the “surprise” (q− ⟨ ˆA⟩) and inversely proportional toσ 2

work page

[8] [8]

Continuous Measurement For continuous monitoring, divide timeTintoNin- tervals of durationδt=T /N, withσ 2 =σ 2 0/δtto en- sure finite information rate asδt→0. The measurement record{q k}has distribution: P(q k|ψk)∝exp −(qk − ⟨ ˆA⟩k)2 2σ2 ! .(26) Taking the product over all intervals and the limitN→ ∞: P[{q(t)}|ψ 0]∝exp − Z T 0 (q(t)− ⟨ ˆA⟩t)2 2σ2 0 dt ! .(27)

work page

[9] [9]

Path Integral Weight Eq. (27) establishes that the path integral weight from continuous weak measurement is: exp(−Smeas), S meas = Z T 0 ∥q(t)− ⟨ ˆA⟩t∥2 2σ2 0 dt.(28) Identifyingq(t)→ψ target (observation) and⟨ ˆA⟩ →ψ (state) recovers Eq. (12). 5 C. Derivation III: Maximum Entropy Principle

work page

[10] [10]

Translation-Invariant Kernels A similarity measureK(ψ, v) comparing hidden state ψto targetvis translation-invariant ifK(ψ, v) =k(ψ−v) for some functionk. Bochner’s theorem [13] states that a continuous translation-invariant kernel is positive semi-definite if and only if it is the Fourier transform of a non-negative measure: k(δ) = Z Rd p(ω)eiω·δ dω,(29)...

work page

[11] [11]

least in- formative

Maximum Entropy Characterization Among all translation-invariant kernels with fixed sec- ond momentE p[∥ω∥2] =c, which kernel has maximum entropy in frequency space? MaximizingH[p] =− R plogp dωsubject to R p dω= 1 and R ∥ω∥2p dω=cvia calculus of variations yields: p(ω)∝exp −λ∥ω∥2 ,(30) which is Gaussian. The corresponding kernel in position space: K(ψ, v...

work page

[12] [12]

pointer states

Kernel as Likelihood Interpreting the kernel as likelihood: P(observev|stateψ)∝K(ψ, v) = exp − ∥ψ−v∥ 2 2σ2 . (32) The action as negative log-likelihood: Smeas =−logP(v|ψ) = ∥ψ−v∥ 2 2σ2 + const,(33) which is Eq. (12). D. Convergence of the Three Derivations The three frameworks yield identical results but answer different questions: •Maximum entropy: Why s...

work page

[13] [13]

B. O. Koopman, Hamiltonian systems and transforma- tion in Hilbert space, Proc. Natl. Acad. Sci. U.S.A.17, 315 (1931)

work page 1931

[14] [14]

Mezi´ c, Spectral properties of dynamical systems, model reduction and decompositions, Nonlinear Dyn.41, 309 (2005)

I. Mezi´ c, Spectral properties of dynamical systems, model reduction and decompositions, Nonlinear Dyn.41, 309 (2005)

work page 2005

[15] [15]

Greydanus, M

S. Greydanus, M. Dzamba, and J. Yosinski, Hamilto- nian neural networks, Adv. Neural Inf. Process. Syst.32 (2019)

work page 2019

[16] [16]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, Neural ordinary differential equations, Adv. Neural Inf. Process. Syst.31(2018)

work page 2018

[17] [17]

Lindblad, On the generators of quantum dynamical semigroups, Commun

G. Lindblad, On the generators of quantum dynamical semigroups, Commun. Math. Phys.48, 119 (1976)

work page 1976

[18] [18]

H. J. Carmichael,An Open Systems Approach to Quan- tum Optics(Springer, Berlin, 1993)

work page 1993

[19] [19]

H. M. Wiseman and G. J. Milburn,Quantum Measure- ment and Control(Cambridge University Press, 2009)

work page 2009

[20] [20]

Katharopoulos, A

A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, Transformers are RNNs: Fast autoregressive transform- ers with linear attention, Proc. Int. Conf. Mach. Learn., pp. 5156–5165 (2020)

work page 2020

[21] [21]

Di´ osi, Continuous quantum measurement and Itˆ o for- malism, Phys

L. Di´ osi, Continuous quantum measurement and Itˆ o for- malism, Phys. Lett. A129, 419 (1988)

work page 1988

[22] [22]

V. P. Belavkin, Nondemolition measurements, nonlinear filtering and dynamic programming of quantum stochas- tic processes, Lect. Notes Control Inf. Sci.121, 245 (1988)

work page 1988

[23] [23]

Gisin and I

N. Gisin and I. C. Percival, The quantum-state diffusion model applied to open systems, J. Phys. A: Math. Gen. 25, 5677 (1992)

work page 1992

[24] [24]

R. E. Kalman, A new approach to linear filtering and prediction problems, J. Basic Eng.82, 35 (1960)

work page 1960

[25] [25]

Bochner,Lectures on Fourier Integrals(Princeton Uni- versity Press, 1959)

S. Bochner,Lectures on Fourier Integrals(Princeton Uni- versity Press, 1959)

work page 1959

[26] [26]

W. H. Zurek, Decoherence, einselection, and the quan- tum origins of the classical, Rev. Mod. Phys.75, 715 (2003)

work page 2003

[27] [27]

El-Ganainy, K

R. El-Ganainy, K. G. Makris, M. Khajavikhan, Z. H. Musslimani, S. Rotter, and D. N. Christodoulides, 12 Non-Hermitian physics and PT symmetry, Nat. Phys.14, 11 (2018)

work page 2018

[28] [28]

Weedbrook, S

C. Weedbrook, S. Pirandola, R. Garc´ ıa-Patr´ on, N. J. Cerf, T. C. Ralph, J. H. Shapiro, and S. Lloyd, Gaussian quantum information, Rev. Mod. Phys.84, 621 (2012)

work page 2012

[29] [29]

Lee-Thorp, J

J. Lee-Thorp, J. Ainslie, I. Eckstein, and S. Onta˜ n´ on, FNet: Mixing tokens with Fourier transforms, Proc. NAACL-HLT, pp. 4296–4313 (2022)

work page 2022

[30] [30]

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

R. Eldan and Y. Li, TinyStories: How small can lan- guage models be and still speak coherent English?, arXiv:2305.07759 (2023)

work page internal anchor Pith review arXiv 2023