arxiv: 2604.05030 · v2 · submitted 2026-04-06 · 💻 cs.CL · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Phase-Associative Memory: Sequence Modeling in Complex Hilbert Space

Gowrav Vishwakarma , Christopher J. Agostino

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:36 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.LG

keywords phase-associative memorycomplex-valued sequence modelshilbert spacescaling lawslanguage modelingwikitext-103power-law scaling

0 comments

The pith

PAM, a complex-valued sequence model, shows steeper scaling than real-valued models, narrowing their performance gap as parameter count increases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes that language meaning is indeterminate until interpreted in context, which aligns more with quantum logic than classical compositionality, motivating a Hilbert-space approach to modeling. It presents Phase-Associative Memory as a complex-valued sequence model whose state accumulates outer products of complex embeddings and retrieves them through the real part of the conjugate inner product. On WikiText-103, across a sweep from 5 million to 100 million parameters, PAM records higher absolute loss and perplexity than a matched real-valued model yet improves faster, with scaling exponents of -0.15 versus -0.12 for loss and -0.65 versus -0.49 for perplexity. The gap between the two narrows steadily with size, pointing to the possibility that complex models could reach strong performance levels with substantially fewer parameters than today's real-valued frontier systems.

Core claim

Phase-Associative Memory (PAM) is a complex-valued sequence model whose state S_t in complex d by d space accumulates outer products of complex token embeddings, retrieved using the real part of the conjugate inner product normalized by the square root of d. When compared to a structurally matched real-valued ablation on WikiText-103 across parameter scales from 5M to 100M, PAM demonstrates superior scaling behavior with power-law exponents of -0.15 versus -0.12 for loss and -0.65 versus -0.49 for perplexity, causing the performance gap to narrow monotonically with increasing model size.

What carries the argument

The complex-valued state matrix in PAM that stores sequence information through accumulated outer products of embeddings, accessed via the real part of the conjugate inner product.

If this is right

PAM's performance gap to the real model decreases steadily as the number of parameters grows.
The architecture may reach the loss levels typical of current large real-valued models using roughly ten times fewer parameters.
Effective language models based on this approach could run on consumer hardware rather than requiring massive data center resources.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Confirming the scaling trend at scales exceeding 100 million parameters would indicate whether the complex mechanism provides a sustained advantage.
The results suggest that incorporating phase information from complex numbers might enable more efficient encoding of linguistic context than real-valued methods alone.
Applying similar Hilbert-space formalisms to other sequence modeling tasks could test the generality of the observed scaling benefit.

Load-bearing premise

The faster improvement with scale in PAM results from its complex Hilbert space operations rather than from any incidental differences in how the two models are implemented or optimized.

What would settle it

Training both PAM and the real-valued ablation at a scale of several hundred million parameters and checking whether the perplexity advantage for PAM continues to widen or begins to reverse.

Figures

Figures reproduced from arXiv: 2604.05030 by Christopher J. Agostino, Gowrav Vishwakarma.

**Figure 3.** Figure 3: FIG. 3. Phase coherence vs. phase difference [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Experiments probing natural language processing by both humans and LLMs suggest that the meaning of a semantic expression is indeterminate prior to the act of interpretation rather than being specifiable simply as the sum of its parts (i.e. compositionality). This observer-dependent act dynamically actualizes meaning under genuine contextuality more consistent with quantum logical mechanisms than with classical Boolean approaches that assume separability, motivating an approach to language modeling that utilizes a Hilbert space formalism. In this work, we introduce Phase-Associative Memory (PAM) -- a complex-valued sequence model whose state S_t \in \mathbb{C}^{d \times d} accumulates outer products of complex token embeddings retrieved through the conjugate inner product $\mathrm{Re}\langle K \mid Q\rangle / \sqrt{d}$ -- and evaluate it against a structurally matched real-valued ablation. Both architectures train stably across a 5M--100M parameter sweep on WikiText-103 under identical conditions; PAM sits at higher absolute loss at every measured scale but improves more rapidly with parameter count, with power-law exponents of $-0.15$ vs.\ $-0.12$ in loss and $-0.65$ vs.\ $-0.49$ in perplexity that narrow the gap between the two architectures monotonically. Further investigation of complex-valued sequence modeling at larger scales could reveal that the loss plateau characteristic of real-valued state-of-the-art language models (e.g. transformers) is reachable with PAM-style architectures with an order of magnitude fewer parameters than the current frontier ($\sim$1T), implying that similar capabilities are achievable at sizes runnable on consumer-grade hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Phase-Associative Memory (PAM), a complex-valued sequence model whose state S_t in C^{d x d} is updated via outer products of complex token embeddings using the conjugate inner product Re<K|Q>/sqrt(d). Motivated by claims of contextuality in semantic interpretation, PAM is evaluated against a structurally matched real-valued ablation on WikiText-103 across a 5M–100M parameter sweep under identical training conditions. The central empirical result is that PAM exhibits higher absolute loss at every scale but steeper power-law scaling (loss exponents -0.15 vs. -0.12; perplexity -0.65 vs. -0.49), with the performance gap narrowing monotonically; the authors suggest this implies PAM-style models could reach current loss plateaus with far fewer parameters.

Significance. If the reported scaling advantage is robustly attributable to the complex Hilbert-space mechanism rather than capacity or optimization mismatches, the work would provide concrete evidence that complex-valued associative memory can alter scaling behavior in sequence models. The direct ablation, stable training across scales, and explicit power-law fits on held-out data are positive features that allow falsifiable comparison; however, the absolute performance remains worse than the real baseline at all measured points, so the result is primarily of interest for its implications on long-term efficiency rather than immediate superiority.

major comments (2)

[Abstract] Abstract: The claim that PAM and the real ablation are 'structurally matched' and trained 'under identical conditions' across the 5M–100M sweep is load-bearing for attributing the exponent difference (-0.15 vs. -0.12 in loss) to the phase-associative outer-product update. Complex parameters are conventionally counted as two real scalars; without explicit confirmation that the real model's width was doubled or that effective degrees of freedom (including conjugate-inner-product overhead and real-part projection) were equalized, the steeper PAM scaling could arise from an under-capacity real baseline rather than the Hilbert-space formalism.
[Abstract] Abstract: The power-law exponents are presented without reported fitting details, number of points, R² values, or uncertainty estimates. Because the headline result is the difference in these exponents, the absence of this information prevents assessment of whether the gap (-0.03 in loss, -0.16 in perplexity) is statistically distinguishable from noise or from small variations in the 5M–100M regime.

minor comments (1)

[Abstract] The abstract refers to 'genuine contextuality' and 'quantum logical mechanisms' without a precise definition or citation to the specific quantum-logic axioms being invoked; a brief clarification of which non-classical feature (e.g., non-commutativity of the outer-product update) is being tested would strengthen the motivation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: The claim that PAM and the real ablation are 'structurally matched' and trained 'under identical conditions' across the 5M–100M sweep is load-bearing for attributing the exponent difference (-0.15 vs. -0.12 in loss) to the phase-associative outer-product update. Complex parameters are conventionally counted as two real scalars; without explicit confirmation that the real model's width was doubled or that effective degrees of freedom (including conjugate-inner-product overhead and real-part projection) were equalized, the steeper PAM scaling could arise from an under-capacity real baseline rather than the Hilbert-space formalism.

Authors: We appreciate the referee raising this critical issue of capacity matching. In designing the real-valued ablation, we adjusted the model dimensions such that the total number of real-valued parameters is equivalent between the two architectures. Specifically, since each complex parameter contributes two real degrees of freedom, the real-valued model's hidden size and embedding dimensions were scaled up accordingly to match the effective capacity. The conjugate inner product and real-part projection operations do not introduce additional trainable parameters. We will include a detailed description of the architectural hyperparameters and parameter counting procedure in the revised Methods section, along with a table comparing the configurations, to make this matching explicit and allow for independent verification. revision: yes
Referee: The power-law exponents are presented without reported fitting details, number of points, R² values, or uncertainty estimates. Because the headline result is the difference in these exponents, the absence of this information prevents assessment of whether the gap (-0.03 in loss, -0.16 in perplexity) is statistically distinguishable from noise or from small variations in the 5M–100M regime.

Authors: We agree that the lack of fitting details limits the interpretability of the scaling results. The exponents were obtained by fitting a power-law model of the form L(N) = a * N^b to the loss (and similarly for perplexity) using ordinary least squares regression on logarithmically transformed data, based on the five model sizes in the 5M to 100M range. In the revised manuscript, we will report the R² values for the fits (which exceed 0.98 for both models), the number of points used, and uncertainty estimates obtained via bootstrap resampling of the data points. These additions will enable a quantitative assessment of the significance of the observed differences in the scaling exponents. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are direct empirical measurements

full rationale

The paper defines PAM via standard complex outer-product accumulation and conjugate inner-product retrieval, then reports empirical training curves and fitted power-law exponents on held-out WikiText-103 data for both PAM and a structurally matched real ablation. No derivation chain, uniqueness theorem, or ansatz is invoked that reduces the reported scaling exponents (-0.15 vs -0.12 loss, -0.65 vs -0.49 perplexity) to quantities defined by the authors' own fitted parameters or prior self-citations. The central claim is an observed difference in measured scaling behavior under identical training conditions; this is falsifiable against external benchmarks and does not collapse by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The paper's central claim rests on an empirical scaling observation rather than on new mathematical derivations; the complex-valued design is motivated by an interpretive assumption about language rather than derived from first principles.

free parameters (1)

state dimension d
The size of the complex matrix S_t is a model hyperparameter that determines capacity and is swept with parameter count.

axioms (1)

domain assumption Semantic meaning in language is indeterminate prior to interpretation and exhibits genuine contextuality better captured by quantum logic than classical Boolean compositionality.
Invoked in the opening paragraph to motivate the Hilbert-space approach.

invented entities (1)

Phase-Associative Memory state S_t no independent evidence
purpose: A d-by-d complex matrix that accumulates outer products of token embeddings to represent sequence history.
Newly defined architecture component without independent existence outside the proposed model.

pith-pipeline@v0.9.0 · 5591 in / 1470 out tokens · 146250 ms · 2026-05-10T19:36:15.757191+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PAM—a complex-valued sequence model whose state S_t ∈ ℂ^{d×d} accumulates outer products of complex token embeddings retrieved through the conjugate inner product Re⟨K|Q⟩/√d
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

operational quantum logic established that any system whose observables are contextual requires a non-Boolean algebraic structure naturally housed in a complex Hilbert space with the conjugate inner product
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean J_uniquely_calibrated_via_higher_derivative unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PAM and SAM both show monotonic perplexity decrease with parameter count... PAM's slope is steeper: −0.15 vs −0.12 in loss

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

117 extracted references · 30 canonical work pages · 15 internal anchors

[1]

PAM trains stably across the 5M–100M sweep on WikiText-103 (Figure 2) and reaches validation perplexity competitive with a structurally matched real-valued ablation under identical training, with- out optimization specialized to the complex arith- metic
[2]

The effective rank, measured by the entropy of the singular-value spectrum, sat- urates at∼10 out ofd= 64 within the first 10–15 tokens and remains bounded thereafter

The matrix stateS t accumulates associations well below itsd 2 capacity. The effective rank, measured by the entropy of the singular-value spectrum, sat- urates at∼10 out ofd= 64 within the first 10–15 tokens and remains bounded thereafter. The gated decay determines occupancy
[3]

The validation per- plexity gap narrows monotonically from 2.18×at 5M to 1.36×at 100M

PAM and SAM both show monotonic perplexity decrease with parameter count (Figure 2), but PAM’s slope is steeper:−0.15 vs−0.12 in loss and −0.65 vs−0.49 in perplexity. The validation per- plexity gap narrows monotonically from 2.18×at 5M to 1.36×at 100M
[4]

A natively Hilbert-space ar- chitecture can in principle reach below this floor, with the gap set by the structure ofρ t|c

Within the quantum semantic framework, we inter- pret the empirical∼1.69-nat irreducible-loss floor characterized for real-valued transformer fits as the diagonal projection of the complex-valued von Neu- mann entropy ofρ t|c. A natively Hilbert-space ar- chitecture can in principle reach below this floor, with the gap set by the structure ofρ t|c. Anchor...
[5]

Shapin,The Scientific Revolution(University of Chicago Press, 1996)

S. Shapin,The Scientific Revolution(University of Chicago Press, 1996)

1996
[6]

Dear,Revolutionizing the Sciences: European Knowl- edge and Its Ambitions, 1500–1700(Princeton Univer- sity Press, 2001)

P. Dear,Revolutionizing the Sciences: European Knowl- edge and Its Ambitions, 1500–1700(Princeton Univer- sity Press, 2001)

2001
[7]

Aquinas,Summa Theologica(Rome, 1274) en- glish translation by Fathers of the English Dominican Province, Benziger Bros., 1947

T. Aquinas,Summa Theologica(Rome, 1274) en- glish translation by Fathers of the English Dominican Province, Benziger Bros., 1947

1947
[8]

A. C. Crombie,Augustine to Galileo: The History of Science A.D. 400–1650(Harvard University Press, 1959)

1959
[9]

Galilei,Dialogue Concerning the Two Chief World Systems(Florence, 1632) english translation by S

G. Galilei,Dialogue Concerning the Two Chief World Systems(Florence, 1632) english translation by S. Drake, University of California Press, 1953

1953
[10]

Bacon,Novum Organum(London, 1620) reprinted in:The Works of Francis Bacon, ed

F. Bacon,Novum Organum(London, 1620) reprinted in:The Works of Francis Bacon, ed. J. Spedding, R.L. Ellis, and D.D. Heath, London, 1857–1874
[11]

Descartes,Meditations on First Philosophy(Paris,

R. Descartes,Meditations on First Philosophy(Paris,
[12]

Cottingham, Cambridge University Press, 1986

english translation by J. Cottingham, Cambridge University Press, 1986

1986
[13]

Newton,Philosophiæ Naturalis Principia Mathemat- ica(London, 1687) english translation by I.B

I. Newton,Philosophiæ Naturalis Principia Mathemat- ica(London, 1687) english translation by I.B. Cohen and A. Whitman, University of California Press, 1999

1999
[14]

Einstein, B

A. Einstein, B. Podolsky, and N. Rosen, Can quantum- mechanical description of physical reality be considered complete?, Physical Review47, 777 (1935)

1935
[15]

Howard, Einstein on locality and separability, Studies in History and Philosophy of Science Part A16, 171 (1985)

D. Howard, Einstein on locality and separability, Studies in History and Philosophy of Science Part A16, 171 (1985)

1985
[16]

D. Howard, Holism, separability, and the metaphysi- cal implications of the Bell experiments, inPhilosoph- ical Consequences of Quantum Theory: Reflections on Bell’s Theorem, edited by J. T. Cushing and E. Mc- Mullin (University of Notre Dame Press, 1989) pp. 224– 253

1989
[17]

J. S. Bell, On the Einstein Podolsky Rosen paradox, Physics Physique Fizika1, 195 (1964)

1964
[18]

J. S. Bell, On the problem of hidden variables in quan- tum mechanics, Reviews of Modern Physics38, 447 (1966)

1966
[19]

J. F. Clauser, M. A. Horne, A. Shimony, and R. A. Holt, Proposed experiment to test local hidden-variable theories, Physical Review Letters23, 880 (1969)

1969
[20]

Aspect, J

A. Aspect, J. Dalibard, and G. Roger, Experimental test of Bell’s inequalities using time-varying analyzers, Physical Review Letters49, 1804 (1982)

1982
[21]

Bohr, Can quantum-mechanical description of phys- ical reality be considered complete?, Physical Review 48, 696 (1935)

N. Bohr, Can quantum-mechanical description of phys- ical reality be considered complete?, Physical Review 48, 696 (1935)

1935
[22]

Kochen and E

S. Kochen and E. P. Specker, The problem of hidden variables in quantum mechanics, Journal of Mathemat- ics and Mechanics17, 59 (1967)

1967
[23]

Hensen, H

B. Hensen, H. Bernien, A. E. Dr´ eau, A. Reiserer, N. Kalb, M. S. Blok, J. Ruitenberg, R. F. L. Vermeulen, R. N. Schouten, C. Abell´ an, W. Amaya, V. Pruneri, M. W. Mitchell, M. Markham, D. J. Twitchen, D. Elk- ouss, S. Wehner, T. H. Taminiau, and R. Hanson, Loophole-free Bell inequality violation using electron spins separated by 1.3 kilometres, Nature52...

2015
[24]

Giustina, M

M. Giustina, M. A. M. Versteegh, S. Wengerowsky, J. Handsteiner, A. Hochrainer, K. Phelan, F. Steinlech- ner, J. Kofler, J.- ˚A. Larsson, C. Abell´ an, W. Amaya, V. Pruneri, M. W. Mitchell, J. Beyer, T. Gerrits, A. E. Lita, L. K. Shalm, S. W. Nam, T. Scheidl, R. Ursin, B. Wittmann, and A. Zeilinger, Significant-loophole-free test of Bell’s theorem with en...

2015
[25]

L. K. Shalm, E. Meyer-Scott, B. G. Christensen, P. Bierhorst, M. A. Wayne, M. J. Stevens, T. Gerrits, S. Glancy, D. R. Hamel, M. S. Allman, K. J. Coak- ley, S. D. Dyer, C. Hodge, A. E. Lita, V. B. Verma, C. Lambrocco, E. Tortorici, A. L. Migdall, Y. Zhang, D. R. Kumor, W. H. Farr, F. Marsili, M. D. Shaw, J. A. Stern, C. Abell´ an, W. Amaya, V. Pruneri, T....

2015
[26]

Rauch, J

D. Rauch, J. Handsteiner, A. Hochrainer, J. Gallicchio, A. S. Friedman, C. Leung, B. Liu, L. Bulla, S. Ecker, F. Steinlechner, R. Ursin, B. Hu, D. Leon, C. Benn, A. Ghedina, M. Cecconi, A. H. Guth, D. I. Kaiser, T. Scheidl, and A. Zeilinger, Cosmic Bell test using ran- dom measurement settings from high-redshift quasars, Physical Review Letters121, 080403 (2018)

2018
[27]

Deutsch, Quantum theory, the Church–Turing prin- ciple and the universal quantum computer, Proceedings of the Royal Society of London A400, 97 (1985)

D. Deutsch, Quantum theory, the Church–Turing prin- ciple and the universal quantum computer, Proceedings of the Royal Society of London A400, 97 (1985)

1985
[28]

P. W. Shor, Polynomial-time algorithms for prime fac- torization and discrete logarithms on a quantum com- puter, SIAM Journal on Computing26, 1484 (1997)

1997
[29]

M. A. Nielsen and I. L. Chuang,Quantum Computa- tion and Quantum Information(Cambridge University Press, 2000)

2000
[30]

B. S. Tsirelson, Quantum generalizations of Bell’s in- equality, Letters in Mathematical Physics4, 93 (1980)

1980
[31]

Frege, ¨Uber Sinn und Bedeutung, Zeitschrift f¨ ur Philosophie und philosophische Kritik100, 25 (1892)

G. Frege, ¨Uber Sinn und Bedeutung, Zeitschrift f¨ ur Philosophie und philosophische Kritik100, 25 (1892)
[32]

Montague, Universal grammar, Theoria36, 373 (1970)

R. Montague, Universal grammar, Theoria36, 373 (1970)

1970
[33]

Wittgenstein,Philosophical Investigations(Black- well, 1953) translated by G.E.M

L. Wittgenstein,Philosophical Investigations(Black- well, 1953) translated by G.E.M. Anscombe

1953
[34]

W. V. O. Quine,Word and Object(MIT Press, 1960)

1960
[35]

Gadamer,Truth and Method(Continuum, 1960) translated by J

H.-G. Gadamer,Truth and Method(Continuum, 1960) translated by J. Weinsheimer and D.G. Marshall, 2nd revised edition, 2004

1960
[36]

Z. S. Harris, Distributional structure, WORD10, 146 (1954)

1954
[37]

Efficient Estimation of Word Representations in Vector Space

T. Mikolov, K. Chen, G. Corrado, and J. Dean, Effi- cient estimation of word representations in vector space, arXiv preprint arXiv:1301.3781 (2013)

work page internal anchor Pith review arXiv 2013
[38]

Bengio, A

Y. Bengio, A. Courville, and P. Vincent, Representation learning: A review and new perspectives, IEEE Trans- actions on Pattern Analysis and Machine Intelligence 35, 1798 (2013)

2013
[39]

Hochreiter and J

S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation9, 1735 (1997)

1997
[40]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, Proceedings of NAACL- HLT , 4171 (2019). 11

2019
[41]

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka- plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. Mc- Candlish, A. Radford, I. Sutskever, and ...

2020
[42]

Vaswani, N

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, Attention is all you need, inAdvances in Neural Infor- mation Processing Systems(2017)

2017
[43]

Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, K. Cho, and Y. Bengio, Neural machine translation by jointly learning to align and translate, arXiv preprint arXiv:1409.0473 (2015), published as conference paper at ICLR 2015

work page internal anchor Pith review arXiv 2015
[44]

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu, A sur- vey on hallucination in large language models: Princi- ples, taxonomy, challenges, and open questions, arXiv preprint arXiv:2311.05232 (2023)

work page internal anchor Pith review arXiv 2023
[45]

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, and P. Fung, Survey of halluci- nation in natural language generation, ACM Computing Surveys55, 1 (2023)

2023
[46]

Ignore Previous Prompt: Attack Techniques For Language Models

F. Perez and I. Ribeiro, Ignore previous prompt: At- tack techniques for language models, arXiv preprint arXiv:2211.09527 (2022)

work page internal anchor Pith review arXiv 2022
[47]

Greshake, S

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz, Not what you’ve signed up for: Compromising real-world LLM-integrated applications with indirect prompt injection, inProceedings of the 16th ACM Workshop on Artificial Intelligence and Se- curity(2023)

2023
[48]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, Scaling laws for neural language models, arXiv preprint arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001
[49]

Training Compute-Optimal Large Language Models

J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. d. L. Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Milli- can, G. van den Driessche, B. Damoc, A. Guy, S. Osin- dero, K. Simonyan, J. W. Rae, O. Vinyals, and L. Sifre, Training compute-optimal large language models, arXiv preprint arXiv:2203.15556 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[50]

Muennighoffet al., From scaling law to sub-scaling law: Understanding the diminishing returns of larger models, arXiv preprint (2025), iCLR 2025 submission

N. Muennighoffet al., From scaling law to sub-scaling law: Understanding the diminishing returns of larger models, arXiv preprint (2025), iCLR 2025 submission

2025
[51]

Villaloboset al., The AI scaling wall of diminishing returns, arXiv preprint arXiv:2512.20264 (2025)

P. Villaloboset al., The AI scaling wall of diminishing returns, arXiv preprint arXiv:2512.20264 (2025)

work page arXiv 2025
[52]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

C. Snell, J. Lee, K. Xu, and A. Kumar, Scaling LLM test-time compute optimally can be more ef- fective than scaling model parameters, arXiv preprint arXiv:2408.03314 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

C. J. Agostinoet al., A quantum semantic frame- work for natural language processing, arXiv preprint arXiv:2506.10077 (2025)

work page arXiv 2025
[54]

Vervaeke, T

J. Vervaeke, T. P. Lillicrap, and B. A. Richards, Rele- vance realization and the emerging framework in cogni- tive science, Journal of Logic and Computation22, 79 (2012)

2012
[55]

Jaeger, A

J. Jaeger, A. Riedl, A. Djedovic, J. Vervaeke, and D. Walsh, Naturalizing relevance realization: Why agency and cognition are fundamentally not compu- tational, Phenomenology and the Cognitive Sciences (2023)

2023
[56]

Scaling and evaluating sparse autoencoders

L. Gaoet al., Scaling and evaluating sparse autoen- coders, arXiv preprint arXiv:2406.04093 (2024)

work page internal anchor Pith review arXiv 2024
[57]

From isolation to entanglement: When do interpretability methods identify and disen- tangle known concepts?arXiv preprint arXiv:2512.15134,

J. Muelleret al., From isolation to entanglement: When do interpretability methods identify and disentan- gle known concepts?, arXiv preprint arXiv:2512.15134 (2024)

work page arXiv 2024
[58]

Open Problems in Mechanistic Interpretability

L. Sharkey, D. Braun, B. Millidge,et al., Open prob- lems in mechanistic interpretability, arXiv preprint arXiv:2501.16496 (2025)

work page internal anchor Pith review arXiv 2025
[59]

Adler, N

J. Adler and Y. Shavit, On the complexity of neural computation in superposition, arXiv preprint arXiv:2409.15318 (2024)

work page arXiv 2024
[60]

Cui, Zhang, Wang, and Wang, On the limits of sparse autoencoders: A theoretical framework and reweighted remedy, arXiv preprint arXiv:2506.15963 (2025)

o. Cui, Zhang, Wang, and Wang, On the limits of sparse autoencoders: A theoretical framework and reweighted remedy, arXiv preprint arXiv:2506.15963 (2025)

work page arXiv 2025
[61]

Marek, B

S. Marek, B. Tervo-Clemmens, F. J. Calabro,et al., Re- producible brain-wide association studies require thou- sands of individuals, Nature603, 654 (2022)

2022
[62]

Botvinik-Nezer, F

R. Botvinik-Nezer, F. Holzmeister, C. F. Camerer, A. Dreber, J. Huber, M. Johannesson, M. Kirchler, R. Iwanir, J. A. Mumford, R. A. Adcock,et al., Vari- ability in the analysis of a single neuroimaging dataset by many teams, Nature582, 84 (2020)

2020
[63]

R. A. Poldrack, Can cognitive processes be inferred from neuroimaging data?, Trends in Cognitive Sciences10, 59 (2006)

2006
[64]

K. S. Button, J. P. A. Ioannidis, C. Mokrysz, B. A. Nosek, J. Flint, E. S. J. Robinson, and M. R. Munaf` o, Power failure: why small sample size undermines the reliability of neuroscience, Nature Reviews Neuroscience 14, 365 (2013)

2013
[65]

J. R. Busemeyer and P. D. Bruza,Quantum Models of Cognition and Decision(Cambridge University Press, 2012)

2012
[66]

E. M. Pothos and J. R. Busemeyer, Can quantum prob- ability provide a new direction for cognitive modeling?, Behavioral and Brain Sciences36, 255 (2013)

2013
[67]

Aerts, Quantum structure in cognition, Journal of Mathematical Psychology53, 314 (2009)

D. Aerts, Quantum structure in cognition, Journal of Mathematical Psychology53, 314 (2009)

2009
[68]

Z. Wang, T. Solloway, R. M. Shiffrin, and J. R. Buse- meyer, Context effects produced by question orders re- veal quantum nature of human judgments, Proceedings of the National Academy of Sciences111, 9431 (2014)

2014
[69]

P. D. Bruza, L. Fell, P. Hoyte, S. Dehdashti, A. Obeid, A. Gibson, and C. Moreira, Contextuality and context- sensitivity in probabilistic models of cognition, Cogni- tive Psychology140, 101529 (2023)

2023
[70]

E. M. Pothos and J. R. Busemeyer, Quantum cognition, Annual Review of Psychology73, 749 (2022)

2022
[71]

C. J. Agostino, Q. Le Thien, N. D’Souza, and L. van der Elst, The production of meaning in the processing of natural language, Proceedings of HAXD (2026)

2026
[72]

K. I. Lo, M. Sadrzadeh, and S. Mansfield, Quantum-like contextuality in large language models, Proceedings of the Royal Society A (2024), arXiv:2412.16806

work page arXiv 2024
[73]

Abramsky and A

S. Abramsky and A. Brandenburger, The sheaf- theoretic structure of non-locality and contextuality, New Journal of Physics13, 113036 (2011)

2011
[74]

Williams, Oldenburg, Dhar, Hatherley, Fierro, Rajcic, Schiller, Stamatiou, and Søgaard, Mechanis- tic interpretability needs philosophy, arXiv preprint 12 arXiv:2506.18852 (2025)

o. Williams, Oldenburg, Dhar, Hatherley, Fierro, Rajcic, Schiller, Stamatiou, and Søgaard, Mechanis- tic interpretability needs philosophy, arXiv preprint 12 arXiv:2506.18852 (2025)

work page arXiv 2025
[75]

Chen and o

Z. Chen and o. Wang, Artificial entanglement in the fine-tuning of large language models, arXiv preprint arXiv:2601.06788 (2026)

work page arXiv 2026
[76]

Gabor, Theory of communication, Journal of the In- stitution of Electrical Engineers — Part III: Radio and Communication Engineering93, 429 (1946)

D. Gabor, Theory of communication, Journal of the In- stitution of Electrical Engineers — Part III: Radio and Communication Engineering93, 429 (1946)

1946
[77]

A. V. Oppenheim and J. S. Lim, The importance of phase in signals, Proceedings of the IEEE69, 529 (1981)

1981
[78]

Pancharatnam, Generalized theory of interference, and its applications, Proceedings of the Indian Academy of Sciences — Section A44, 247 (1956)

S. Pancharatnam, Generalized theory of interference, and its applications, Proceedings of the Indian Academy of Sciences — Section A44, 247 (1956)

1956
[79]

M. V. Berry, Quantal phase factors accompanying adia- batic changes, Proceedings of the Royal Society of Lon- don. Series A392, 45 (1984)

1984
[80]

Hirose,Complex-Valued Neural Networks(Springer, 2012)

A. Hirose,Complex-Valued Neural Networks(Springer, 2012)

2012

Showing first 80 references.