Prime Fourier Embeddings: A Principled Basis for Modular Arithmetic

Donghun Lee; Hyunsang Hwang; Suhyun Bae

arxiv: 2606.23044 · v1 · pith:P7M5RY6Bnew · submitted 2026-06-22 · 💻 cs.LG · cs.AI

Prime Fourier Embeddings: A Principled Basis for Modular Arithmetic

Hyunsang Hwang , Suhyun Bae , Donghun Lee This is my paper

Pith reviewed 2026-06-26 09:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords prime fourier embeddingsmodular arithmeticequivariant mapsschur's lemmachinese remainder theoremharmonic analysisgroup representationsneural embeddings

0 comments

The pith

Prime Fourier Embeddings encode integers so that modular arithmetic reduces to selecting independent prime channels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Prime Fourier Embeddings that represent integers as prime-indexed pairs of cosines and sines drawn from the harmonic analysis of the rationals. It establishes that any linear map respecting the product-group symmetries of these embeddings must be block-diagonal, with one block per prime, because Schur's lemma applied to the character decomposition forces this form. For square-free composite moduli the Chinese Remainder Theorem therefore directly identifies which prime blocks carry the computation. Empirical checks confirm the prediction through ablation studies that reveal specialization ratios above 500x between relevant and irrelevant channels together with perfect in-distribution accuracy on all tested square-free moduli.

Core claim

Prime Fourier Embeddings derived from the harmonic analysis of the rationals induce a representation of the multiplicative group such that any linear map equivariant under the product group action must be block-diagonal with one independent block per prime, a direct consequence of Schur's lemma on the resulting character decomposition; the Chinese Remainder Theorem then predicts the task-relevant blocks for square-free composite moduli.

What carries the argument

The prime-indexed Fourier components that realize a product-group representation, to which Schur's lemma applies and forces block-diagonal equivariant linear maps.

If this is right

Modular arithmetic on square-free moduli factors into independent computations, one per prime factor.
The Chinese Remainder Theorem supplies the exact list of active prime channels before any training occurs.
Ablation studies isolate each prime block, confirming specialization ratios exceeding 500x.
In-distribution test accuracy reaches 100 percent on all square-free composite moduli examined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same symmetry-matching strategy could be applied to other arithmetic operations whose groups admit analogous decompositions.
Pre-structured embeddings aligned with task symmetries may reduce the data needed for models to discover algebraic rules.
Extension to moduli containing square factors would test whether separate handling of prime-power components is required.

Load-bearing premise

The construction of Prime Fourier Embeddings from the harmonic analysis of the rationals produces a representation whose symmetry group is precisely the product group over primes, so that Schur's lemma applies directly and the Chinese Remainder Theorem isolates the relevant channels.

What would settle it

A concrete linear map that remains equivariant under the product group action on Prime Fourier Embeddings yet mixes information across distinct prime blocks, or an ablation experiment on a square-free composite modulus in which task-irrelevant channels fail to show strong specialization.

Figures

Figures reproduced from arXiv: 2606.23044 by Donghun Lee, Hyunsang Hwang, Suhyun Bae.

**Figure 1.** Figure 1: PFE encoding for (9 + 17) mod 23 = 3. The active prime p = 23 encodes the wrap-around addition where the purple region in particular, marks the overlap between the red and blue arcs — geometrically, the portion of the circle claimed by both a and b when their sum exceeds the modulus(= p). Its angular size is (a + b − p)/p, equal to (a + b) mod p normalized by p, which is the label. Inactive primes (p = 29,… view at source ↗

**Figure 2.** Figure 2: PFE encoding for (13 + 15) mod 21 = 7. Since 21 = 3 × 7, the prime channels p = 3 and p = 7 are load-bearing, carrying residues (13 + 15) mod 3 = 1 and (13 + 15) mod 7 = 0 respectively. The intermediate bars show CRT reconstruction: the unique c ∈ Z/21Z satisfying c ≡ 1 (mod 3) and c ≡ 0 (mod 7) is c = 7, recovered as 3 + 3 + 1 (mod 21) = 7 + 0 (mod 21) = 7. structure shown in [PITH_FULL_IMAGE:figures/ful… view at source ↗

**Figure 3.** Figure 3: Nested block-diagonal structure of an equivariant linear map W. Each R(θp,d) ∈ GL(R 2 ) is a 2 × 2 rotation with θp,d = 2π/pd+1 . 4.1. Experiment 1: Prime Specialization on Single-Prime Tasks Setup. We train on (a + b) mod p for each prime p in a fixed subset P of P0, varying |P| ∈ {4, 6, 8, 10, 12, 14, 16} and input range r ∈ {100, 500, 1000, 2000, 4000}. For each configuration we train a separate model p… view at source ↗

**Figure 4.** Figure 4: Experiment 1 (part I). Top: mean diagonal drop (ablate own prime). Bottom: mean off-diagonal drop (ablate other prime). Rows index |P|; columns index input range r. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Experiemnt 2. mean factor drop 4.2. Experiment 2: CRT Decomposition on Composite Moduli Setup. We train on (a + b) mod N for squarefree composite moduli N, covering two-factor composites N ∈ {15, 21, 33, 35, 55, 77} and three-factor composites N ∈ {105, 165, 231, 385}, each formed as a product of primes from {3, 5, 7, 11}, embedded within the full prime basis P = {3, 5, 7, 11, 13, 17, 19, 23}. We sweep ov… view at source ↗

**Figure 5.** Figure 5: Experiment 1 (part II). Top: specialization ratio capped at 500×. Bottom: convergence rate (test acc > 0.85). Rows index |P|; columns index input range r. 100 500 1000 2000 4000 Input range N=15 (3×5) N=21 (3×7) N=33 (3×11) N=35 (5×7) N=55 (5×11) N=77 (7×11) N=105 (3×5×7) N=165 (3×5×11) N=231 (3×7×11) N=385 (5×7×11) 0.700 0.732 0.732 0.734 0.731 0.629 0.761 0.762 0.764 0.762 0.572 0.785 0.787 0.787 0.786 0… view at source ↗

**Figure 8.** Figure 8: Experiemnt 2. factor/nonfactor ratio ablating an irrelevant channel has negligible effect. This is a falsifiable prediction of the representation theory confirmed across a systematic sweep of moduli, prime counts, and input ranges. Selection, not discovery. PFE transforms the grokking bottleneck from a representational discovery problem into a selection problem. By providing a Fourier basis for Z/NZ direct… view at source ↗

**Figure 9.** Figure 9: Experiemnt 2. test accuracy leaving their separation as an implicit task for the network. The adelic character factorization (Appendix A.6, Theorem A.26) establishes that the Fourier basis on AQ factorizes into independent local components indexed by primes. PFE implements these components directly, making the prime basis a principled rather than arbitrary choice for modular arithmetic tasks whose structu… view at source ↗

**Figure 10.** Figure 10: Per-prime ablation profiles, Experiment 1, r = 100. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗

**Figure 11.** Figure 11: Per-prime ablation profiles, Experiment 1, r = 500. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗

**Figure 12.** Figure 12: Per-prime ablation profiles, Experiment 1, r = 1000. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗

**Figure 13.** Figure 13: Per-prime ablation profiles, Experiment 1, r = 2000. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗

**Figure 14.** Figure 14: Per-prime ablation profiles, Experiment 1, r = 4000. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗

**Figure 15.** Figure 15: Mean diagonal and off-diagonal drop as a function of input range for each value of |P| [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗

**Figure 16.** Figure 16: Specialization ratio as a function of mean r/ptask, consistent with the equivariance interpretation. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗

**Figure 17.** Figure 17: CRT ablation profiles, all composites, r = 100 [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗

**Figure 18.** Figure 18: CRT ablation profiles, all composites, r = 500. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗

**Figure 19.** Figure 19: CRT ablation profiles, all composites, r = 1000 [PITH_FULL_IMAGE:figures/full_fig_p021_19.png] view at source ↗

**Figure 20.** Figure 20: CRT ablation profiles, all composites, r = 2000. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗

**Figure 21.** Figure 21: CRT ablation profiles, all composites, r = 4000. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_21.png] view at source ↗

read the original abstract

Numbers have algebraic structure that standard neural embeddings often fail to expose. We introduce Prime Fourier Embeddings (PFE), which encode integers as prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q, providing a pre-structured representation in which modular arithmetic reduces to selecting the relevant prime channel rather than discovering algebraic structure from scratch. We prove that any linear map equivariant with respect to the product group action on PFE must be block-diagonal with one independent block per prime -- a consequence of Schur's lemma applied to the resulting character decomposition. For square-free composite moduli, the Chinese Remainder Theorem predicts which prime channels are task-relevant. Both predictions are confirmed empirically: ablation studies show specialization ratios exceeding 500x between task-relevant and task-irrelevant channels, with perfect in-distribution test accuracy across all square-free composite moduli tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PFE introduces a prime-indexed Fourier embedding with a Schur's lemma argument for block-diagonal equivariant maps, but the key claim that the construction yields an exact product-group representation needs direct verification from the full derivation.

read the letter

The new piece is the Prime Fourier Embeddings construction: integers get encoded as prime-indexed (cos, sin) pairs taken from the harmonic analysis of Q. This is meant to make modular arithmetic a matter of selecting the relevant prime channel rather than learning the structure. They then apply Schur's lemma to the character decomposition under the product group action to conclude that any equivariant linear map must be block-diagonal with one block per prime. For square-free composite moduli the Chinese Remainder Theorem is used to predict which blocks are task-relevant.

The empirical results are the strongest part on current evidence. The reported ablation studies show specialization ratios above 500x between relevant and irrelevant channels together with perfect in-distribution accuracy across the tested moduli. That level of clean separation is a concrete signal worth checking.

The soft spot is the representation-theoretic premise. The block-diagonal claim and the automatic CRT channel identification both require that the PFE construction induces a representation whose symmetry group is precisely the product over primes, with distinct irreps and no cross terms. If the harmonic analysis produces shared irreps or a different group action, Schur's lemma does not force the stated structure. The abstract states the proof but does not supply the derivation, so it is not possible to confirm the group action step. The experiments match the predicted outcome but do not test the premise itself.

This is for people working on equivariant architectures or on injecting algebraic structure into networks for arithmetic or crypto tasks. The construction is distinct from prior published work referenced in the abstract and the results are sharp enough to merit a serious referee who can examine the full proof and experimental protocol.

Referee Report

2 major / 2 minor

Summary. The paper introduces Prime Fourier Embeddings (PFE), which encode integers as prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q. It claims to prove, via Schur's lemma applied to the character decomposition under the product group action, that any equivariant linear map on PFE must be block-diagonal with one independent block per prime. For square-free composite moduli, the Chinese Remainder Theorem is said to identify the task-relevant prime channels. Both the block-diagonal structure and the channel predictions are reported as confirmed by ablation studies showing specialization ratios exceeding 500x and perfect in-distribution test accuracy.

Significance. If the representation-theoretic claims hold, the work supplies a pre-structured embedding that reduces modular arithmetic to channel selection rather than structure discovery, with potential implications for equivariant architectures and algebraic reasoning in neural networks. The reported empirical specialization provides a concrete, falsifiable signature of the predicted block structure.

major comments (2)

[theoretical derivation of equivariant maps] The central proof (theoretical section following the PFE definition): the application of Schur's lemma to conclude block-diagonality per prime requires that the PFE construction induces a representation whose symmetry group is precisely the product group over primes with no shared irreps or cross-prime characters; the manuscript states this follows from the harmonic analysis of Q but does not exhibit the explicit character decomposition or verify absence of entanglement, which is load-bearing for the claim that CRT channel selection follows automatically.
[ablation studies] Empirical validation (ablation studies section): the reported specialization ratios >500x and perfect accuracy are presented as confirmation of the CRT-derived prediction, yet the text provides neither the precise protocol for labeling task-relevant vs. irrelevant channels, data exclusion rules, nor error bars on the ratios; without these, the experiments cannot be assessed as a direct test of the group-representation premise rather than a post-hoc fit.

minor comments (2)

[PFE definition] Notation for the product group action and the precise definition of the PFE map (prime-indexed pairs) should be stated with an explicit formula early in the manuscript to allow readers to check the representation property directly.
[abstract and experiments] The abstract claims 'perfect in-distribution test accuracy across all square-free composite moduli tested' but does not list the specific moduli or model architectures used; adding this table or list would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: The central proof (theoretical section following the PFE definition): the application of Schur's lemma to conclude block-diagonality per prime requires that the PFE construction induces a representation whose symmetry group is precisely the product group over primes with no shared irreps or cross-prime characters; the manuscript states this follows from the harmonic analysis of Q but does not exhibit the explicit character decomposition or verify absence of entanglement, which is load-bearing for the claim that CRT channel selection follows automatically.

Authors: We agree that an explicit character decomposition would make the argument fully self-contained. In the revised manuscript we will add a dedicated subsection deriving the character table of the PFE representation under the product-group action. Using the orthogonality relations from the harmonic analysis on Q, we will show that all irreps remain prime-specific with no cross-prime mixing, thereby confirming that Schur's lemma applies block-wise exactly as claimed. revision: yes
Referee: Empirical validation (ablation studies section): the reported specialization ratios >500x and perfect accuracy are presented as confirmation of the CRT-derived prediction, yet the text provides neither the precise protocol for labeling task-relevant vs. irrelevant channels, data exclusion rules, nor error bars on the ratios; without these, the experiments cannot be assessed as a direct test of the group-representation premise rather than a post-hoc fit.

Authors: We accept that the experimental protocol requires additional detail for reproducibility and for a direct test of the theory. The revised ablation section will specify: (i) the exact rule for labeling task-relevant channels via the prime factors given by the CRT for each square-free modulus; (ii) the data-partitioning criteria that exclude non-square-free cases and enforce strict train-test separation; (iii) the computation of specialization ratios together with standard-error bars obtained from five independent random seeds. These additions will allow readers to verify that the reported specialization is a direct consequence of the predicted block structure. revision: yes

Circularity Check

1 steps flagged

PFE defined via prime-indexed pairs makes product-group decomposition and Schur block-diagonality hold by construction

specific steps

self definitional [Abstract (proof claim)]
"We prove that any linear map equivariant with respect to the product group action on PFE must be block-diagonal with one independent block per prime -- a consequence of Schur's lemma applied to the resulting character decomposition."

PFE is introduced as 'prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q'. Because the embedding is indexed and structured per prime from the outset, the symmetry group is the product group and the irreps are distinct per prime by the definition of the coordinates. Schur's lemma then forces block-diagonality tautologically from that definition rather than as a derived property of the harmonic analysis.

full rationale

The central theoretical claim applies Schur's lemma to conclude that equivariant maps on PFE must be block-diagonal per prime. However, PFE is explicitly constructed as prime-indexed (cos, sin) pairs, so the representation is defined to factor as a product over primes with no cross terms. The character decomposition into distinct per-prime irreps therefore follows directly from the embedding definition rather than from independent harmonic analysis of Q. The CRT channel prediction is standard and non-circular, and the empirical specialization ratios are post-training observations rather than fitted inputs renamed as predictions. This produces moderate circularity confined to the load-bearing representation premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the harmonic analysis of Q yielding a representation whose symmetry is the product group over primes, plus standard application of Schur's lemma; no free parameters are described in the abstract.

axioms (1)

standard math Schur's lemma applies to the character decomposition of the PFE representation under the product group action
Invoked to conclude that equivariant linear maps must be block-diagonal per prime.

invented entities (1)

Prime Fourier Embeddings (PFE) no independent evidence
purpose: Pre-structured integer representation that reduces modular arithmetic to prime-channel selection
Newly introduced construction derived from harmonic analysis of Q.

pith-pipeline@v0.9.1-grok · 5672 in / 1402 out tokens · 45792 ms · 2026-06-26T09:19:35.648589+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages

[1]

Tianyi Zhou and Deqing Fu and Mahdi Soltanolkotabi and Robin Jia and Vatsal Sharan , booktitle=. Fo. 2026 , url=

2026
[2]

and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =

McLeish, Sean and Bansal, Arpit and Stein, Alex and Jain, Neel and Kirchenbauer, John and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =. Transformers Can Do Arithmetic with the Right Embeddings , url =. doi:10.52202/079017-3430 , editor =

work page doi:10.52202/079017-3430
[3]

NeurIPS 2023 AI for Science Workshop , year=

xVal: A Continuous Number Encoding for Large Language Models , author=. NeurIPS 2023 AI for Science Workshop , year=

2023
[4]

Language Models Encode the Value of Numbers Linearly

Zhu, Fangwei and Dai, Damai and Sui, Zhifang. Language Models Encode the Value of Numbers Linearly. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025
[5]

2024 , issue_date =

Su, Jianlin and Ahmed, Murtadha and Lu, Yu and Pan, Shengfeng and Bo, Wen and Liu, Yunfeng , title =. 2024 , issue_date =. doi:10.1016/j.neucom.2023.127063 , journal =

work page doi:10.1016/j.neucom.2023.127063 2024
[6]

Advances in Neural Information Processing Systems , editor=

On Embeddings for Numerical Features in Tabular Deep Learning , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022
[7]

Clifford Neural Layers for

Johannes Brandstetter and Rianne van den Berg and Max Welling and Jayesh K Gupta , booktitle=. Clifford Neural Layers for. 2023 , url=

2023
[8]

2022 , eprint=

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , author=. 2022 , eprint=

2022
[9]

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =

Zhiqing Sun and Zhi. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =. 2019 , url =

2019
[10]

Poincar\'

Nickel, Maximillian and Kiela, Douwe , booktitle =. Poincar\'
[11]

1977 , series =

Serre, Jean-Pierre , title =. 1977 , series =

1977
[12]

1991 , series =

Fulton, William and Harris, Joe , title =. 1991 , series =

1991
[13]

1990 , edition =

Ireland, Kenneth and Rosen, Michael , title =. 1990 , edition =

1990
[14]

1999 , series =

Terras, Audrey , title =. 1999 , series =

1999
[15]

1997 , edition =

Gouv\^. 1997 , edition =

1997
[16]

Algebraic Number Theory , publisher =

Neukirch, J\". Algebraic Number Theory , publisher =. 1999 , series =

1999
[17]

, title =

Ramakrishnan, Dinakar and Valenza, Robert J. , title =. 1999 , series =

1999
[18]

, title =

Folland, Gerald B. , title =. 1995 , series =

1995
[19]

2002 , edition =

Lang, Serge , title =. 2002 , edition =

2002
[20]

, title =

Hungerford, Thomas W. , title =. 1974 , series =

1974
[21]

, title =

Munkres, James R. , title =. 2000 , edition =

2000
[22]

2026 , eprint=

There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=

2026

[1] [1]

Tianyi Zhou and Deqing Fu and Mahdi Soltanolkotabi and Robin Jia and Vatsal Sharan , booktitle=. Fo. 2026 , url=

2026

[2] [2]

and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =

McLeish, Sean and Bansal, Arpit and Stein, Alex and Jain, Neel and Kirchenbauer, John and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =. Transformers Can Do Arithmetic with the Right Embeddings , url =. doi:10.52202/079017-3430 , editor =

work page doi:10.52202/079017-3430

[3] [3]

NeurIPS 2023 AI for Science Workshop , year=

xVal: A Continuous Number Encoding for Large Language Models , author=. NeurIPS 2023 AI for Science Workshop , year=

2023

[4] [4]

Language Models Encode the Value of Numbers Linearly

Zhu, Fangwei and Dai, Damai and Sui, Zhifang. Language Models Encode the Value of Numbers Linearly. Proceedings of the 31st International Conference on Computational Linguistics. 2025

2025

[5] [5]

2024 , issue_date =

Su, Jianlin and Ahmed, Murtadha and Lu, Yu and Pan, Shengfeng and Bo, Wen and Liu, Yunfeng , title =. 2024 , issue_date =. doi:10.1016/j.neucom.2023.127063 , journal =

work page doi:10.1016/j.neucom.2023.127063 2024

[6] [6]

Advances in Neural Information Processing Systems , editor=

On Embeddings for Numerical Features in Tabular Deep Learning , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

2022

[7] [7]

Clifford Neural Layers for

Johannes Brandstetter and Rianne van den Berg and Max Welling and Jayesh K Gupta , booktitle=. Clifford Neural Layers for. 2023 , url=

2023

[8] [8]

2022 , eprint=

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , author=. 2022 , eprint=

2022

[9] [9]

RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =

Zhiqing Sun and Zhi. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =. 2019 , url =

2019

[10] [10]

Poincar\'

Nickel, Maximillian and Kiela, Douwe , booktitle =. Poincar\'

[11] [11]

1977 , series =

Serre, Jean-Pierre , title =. 1977 , series =

1977

[12] [12]

1991 , series =

Fulton, William and Harris, Joe , title =. 1991 , series =

1991

[13] [13]

1990 , edition =

Ireland, Kenneth and Rosen, Michael , title =. 1990 , edition =

1990

[14] [14]

1999 , series =

Terras, Audrey , title =. 1999 , series =

1999

[15] [15]

1997 , edition =

Gouv\^. 1997 , edition =

1997

[16] [16]

Algebraic Number Theory , publisher =

Neukirch, J\". Algebraic Number Theory , publisher =. 1999 , series =

1999

[17] [17]

, title =

Ramakrishnan, Dinakar and Valenza, Robert J. , title =. 1999 , series =

1999

[18] [18]

, title =

Folland, Gerald B. , title =. 1995 , series =

1995

[19] [19]

2002 , edition =

Lang, Serge , title =. 2002 , edition =

2002

[20] [20]

, title =

Hungerford, Thomas W. , title =. 1974 , series =

1974

[21] [21]

, title =

Munkres, James R. , title =. 2000 , edition =

2000

[22] [22]

2026 , eprint=

There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=

2026