pith. sign in

arxiv: 2606.23044 · v1 · pith:P7M5RY6Bnew · submitted 2026-06-22 · 💻 cs.LG · cs.AI

Prime Fourier Embeddings: A Principled Basis for Modular Arithmetic

Pith reviewed 2026-06-26 09:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords prime fourier embeddingsmodular arithmeticequivariant mapsschur's lemmachinese remainder theoremharmonic analysisgroup representationsneural embeddings
0
0 comments X

The pith

Prime Fourier Embeddings encode integers so that modular arithmetic reduces to selecting independent prime channels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Prime Fourier Embeddings that represent integers as prime-indexed pairs of cosines and sines drawn from the harmonic analysis of the rationals. It establishes that any linear map respecting the product-group symmetries of these embeddings must be block-diagonal, with one block per prime, because Schur's lemma applied to the character decomposition forces this form. For square-free composite moduli the Chinese Remainder Theorem therefore directly identifies which prime blocks carry the computation. Empirical checks confirm the prediction through ablation studies that reveal specialization ratios above 500x between relevant and irrelevant channels together with perfect in-distribution accuracy on all tested square-free moduli.

Core claim

Prime Fourier Embeddings derived from the harmonic analysis of the rationals induce a representation of the multiplicative group such that any linear map equivariant under the product group action must be block-diagonal with one independent block per prime, a direct consequence of Schur's lemma on the resulting character decomposition; the Chinese Remainder Theorem then predicts the task-relevant blocks for square-free composite moduli.

What carries the argument

The prime-indexed Fourier components that realize a product-group representation, to which Schur's lemma applies and forces block-diagonal equivariant linear maps.

If this is right

  • Modular arithmetic on square-free moduli factors into independent computations, one per prime factor.
  • The Chinese Remainder Theorem supplies the exact list of active prime channels before any training occurs.
  • Ablation studies isolate each prime block, confirming specialization ratios exceeding 500x.
  • In-distribution test accuracy reaches 100 percent on all square-free composite moduli examined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same symmetry-matching strategy could be applied to other arithmetic operations whose groups admit analogous decompositions.
  • Pre-structured embeddings aligned with task symmetries may reduce the data needed for models to discover algebraic rules.
  • Extension to moduli containing square factors would test whether separate handling of prime-power components is required.

Load-bearing premise

The construction of Prime Fourier Embeddings from the harmonic analysis of the rationals produces a representation whose symmetry group is precisely the product group over primes, so that Schur's lemma applies directly and the Chinese Remainder Theorem isolates the relevant channels.

What would settle it

A concrete linear map that remains equivariant under the product group action on Prime Fourier Embeddings yet mixes information across distinct prime blocks, or an ablation experiment on a square-free composite modulus in which task-irrelevant channels fail to show strong specialization.

Figures

Figures reproduced from arXiv: 2606.23044 by Donghun Lee, Hyunsang Hwang, Suhyun Bae.

Figure 1
Figure 1. Figure 1: PFE encoding for (9 + 17) mod 23 = 3. The active prime p = 23 encodes the wrap-around addition where the purple region in particular, marks the overlap between the red and blue arcs — geometrically, the portion of the circle claimed by both a and b when their sum exceeds the modulus(= p). Its angular size is (a + b − p)/p, equal to (a + b) mod p normalized by p, which is the label. Inactive primes (p = 29,… view at source ↗
Figure 2
Figure 2. Figure 2: PFE encoding for (13 + 15) mod 21 = 7. Since 21 = 3 × 7, the prime channels p = 3 and p = 7 are load-bearing, carrying residues (13 + 15) mod 3 = 1 and (13 + 15) mod 7 = 0 respectively. The intermediate bars show CRT reconstruction: the unique c ∈ Z/21Z satisfying c ≡ 1 (mod 3) and c ≡ 0 (mod 7) is c = 7, recovered as 3 + 3 + 1 (mod 21) = 7 + 0 (mod 21) = 7. structure shown in [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 3
Figure 3. Figure 3: Nested block-diagonal structure of an equivariant linear map W. Each R(θp,d) ∈ GL(R 2 ) is a 2 × 2 rotation with θp,d = 2π/pd+1 . 4.1. Experiment 1: Prime Specialization on Single-Prime Tasks Setup. We train on (a + b) mod p for each prime p in a fixed subset P of P0, varying |P| ∈ {4, 6, 8, 10, 12, 14, 16} and input range r ∈ {100, 500, 1000, 2000, 4000}. For each configuration we train a separate model p… view at source ↗
Figure 4
Figure 4. Figure 4: Experiment 1 (part I). Top: mean diagonal drop (ablate own prime). Bottom: mean off-diagonal drop (ablate other prime). Rows index |P|; columns index input range r. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Experiemnt 2. mean factor drop 4.2. Experiment 2: CRT Decomposition on Composite Moduli Setup. We train on (a + b) mod N for squarefree com￾posite moduli N, covering two-factor composites N ∈ {15, 21, 33, 35, 55, 77} and three-factor composites N ∈ {105, 165, 231, 385}, each formed as a product of primes from {3, 5, 7, 11}, embedded within the full prime basis P = {3, 5, 7, 11, 13, 17, 19, 23}. We sweep ov… view at source ↗
Figure 5
Figure 5. Figure 5: Experiment 1 (part II). Top: specialization ratio capped at 500×. Bottom: convergence rate (test acc > 0.85). Rows index |P|; columns index input range r. 100 500 1000 2000 4000 Input range N=15 (3×5) N=21 (3×7) N=33 (3×11) N=35 (5×7) N=55 (5×11) N=77 (7×11) N=105 (3×5×7) N=165 (3×5×11) N=231 (3×7×11) N=385 (5×7×11) 0.700 0.732 0.732 0.734 0.731 0.629 0.761 0.762 0.764 0.762 0.572 0.785 0.787 0.787 0.786 0… view at source ↗
Figure 8
Figure 8. Figure 8: Experiemnt 2. factor/nonfactor ratio ablating an irrelevant channel has negligible effect. This is a falsifiable prediction of the representation theory confirmed across a systematic sweep of moduli, prime counts, and input ranges. Selection, not discovery. PFE transforms the grokking bottleneck from a representational discovery problem into a selection problem. By providing a Fourier basis for Z/NZ direct… view at source ↗
Figure 9
Figure 9. Figure 9: Experiemnt 2. test accuracy leaving their separation as an implicit task for the network. The adelic character factorization (Appendix A.6, Theo￾rem A.26) establishes that the Fourier basis on AQ factorizes into independent local components indexed by primes. PFE implements these components directly, making the prime basis a principled rather than arbitrary choice for modular arithmetic tasks whose structu… view at source ↗
Figure 10
Figure 10. Figure 10: Per-prime ablation profiles, Experiment 1, r = 100. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Per-prime ablation profiles, Experiment 1, r = 500. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Per-prime ablation profiles, Experiment 1, r = 1000. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Per-prime ablation profiles, Experiment 1, r = 2000. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Per-prime ablation profiles, Experiment 1, r = 4000. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Mean diagonal and off-diagonal drop as a function of input range for each value of |P| [PITH_FULL_IMAGE:figures/full_fig_p019_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Specialization ratio as a function of mean r/ptask, consistent with the equivariance interpretation. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: CRT ablation profiles, all composites, r = 100 [PITH_FULL_IMAGE:figures/full_fig_p020_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: CRT ablation profiles, all composites, r = 500. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: CRT ablation profiles, all composites, r = 1000 [PITH_FULL_IMAGE:figures/full_fig_p021_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: CRT ablation profiles, all composites, r = 2000. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: CRT ablation profiles, all composites, r = 4000. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_21.png] view at source ↗
read the original abstract

Numbers have algebraic structure that standard neural embeddings often fail to expose. We introduce Prime Fourier Embeddings (PFE), which encode integers as prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q, providing a pre-structured representation in which modular arithmetic reduces to selecting the relevant prime channel rather than discovering algebraic structure from scratch. We prove that any linear map equivariant with respect to the product group action on PFE must be block-diagonal with one independent block per prime -- a consequence of Schur's lemma applied to the resulting character decomposition. For square-free composite moduli, the Chinese Remainder Theorem predicts which prime channels are task-relevant. Both predictions are confirmed empirically: ablation studies show specialization ratios exceeding 500x between task-relevant and task-irrelevant channels, with perfect in-distribution test accuracy across all square-free composite moduli tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Prime Fourier Embeddings (PFE), which encode integers as prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q. It claims to prove, via Schur's lemma applied to the character decomposition under the product group action, that any equivariant linear map on PFE must be block-diagonal with one independent block per prime. For square-free composite moduli, the Chinese Remainder Theorem is said to identify the task-relevant prime channels. Both the block-diagonal structure and the channel predictions are reported as confirmed by ablation studies showing specialization ratios exceeding 500x and perfect in-distribution test accuracy.

Significance. If the representation-theoretic claims hold, the work supplies a pre-structured embedding that reduces modular arithmetic to channel selection rather than structure discovery, with potential implications for equivariant architectures and algebraic reasoning in neural networks. The reported empirical specialization provides a concrete, falsifiable signature of the predicted block structure.

major comments (2)
  1. [theoretical derivation of equivariant maps] The central proof (theoretical section following the PFE definition): the application of Schur's lemma to conclude block-diagonality per prime requires that the PFE construction induces a representation whose symmetry group is precisely the product group over primes with no shared irreps or cross-prime characters; the manuscript states this follows from the harmonic analysis of Q but does not exhibit the explicit character decomposition or verify absence of entanglement, which is load-bearing for the claim that CRT channel selection follows automatically.
  2. [ablation studies] Empirical validation (ablation studies section): the reported specialization ratios >500x and perfect accuracy are presented as confirmation of the CRT-derived prediction, yet the text provides neither the precise protocol for labeling task-relevant vs. irrelevant channels, data exclusion rules, nor error bars on the ratios; without these, the experiments cannot be assessed as a direct test of the group-representation premise rather than a post-hoc fit.
minor comments (2)
  1. [PFE definition] Notation for the product group action and the precise definition of the PFE map (prime-indexed pairs) should be stated with an explicit formula early in the manuscript to allow readers to check the representation property directly.
  2. [abstract and experiments] The abstract claims 'perfect in-distribution test accuracy across all square-free composite moduli tested' but does not list the specific moduli or model architectures used; adding this table or list would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive comments. We address each major point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: The central proof (theoretical section following the PFE definition): the application of Schur's lemma to conclude block-diagonality per prime requires that the PFE construction induces a representation whose symmetry group is precisely the product group over primes with no shared irreps or cross-prime characters; the manuscript states this follows from the harmonic analysis of Q but does not exhibit the explicit character decomposition or verify absence of entanglement, which is load-bearing for the claim that CRT channel selection follows automatically.

    Authors: We agree that an explicit character decomposition would make the argument fully self-contained. In the revised manuscript we will add a dedicated subsection deriving the character table of the PFE representation under the product-group action. Using the orthogonality relations from the harmonic analysis on Q, we will show that all irreps remain prime-specific with no cross-prime mixing, thereby confirming that Schur's lemma applies block-wise exactly as claimed. revision: yes

  2. Referee: Empirical validation (ablation studies section): the reported specialization ratios >500x and perfect accuracy are presented as confirmation of the CRT-derived prediction, yet the text provides neither the precise protocol for labeling task-relevant vs. irrelevant channels, data exclusion rules, nor error bars on the ratios; without these, the experiments cannot be assessed as a direct test of the group-representation premise rather than a post-hoc fit.

    Authors: We accept that the experimental protocol requires additional detail for reproducibility and for a direct test of the theory. The revised ablation section will specify: (i) the exact rule for labeling task-relevant channels via the prime factors given by the CRT for each square-free modulus; (ii) the data-partitioning criteria that exclude non-square-free cases and enforce strict train-test separation; (iii) the computation of specialization ratios together with standard-error bars obtained from five independent random seeds. These additions will allow readers to verify that the reported specialization is a direct consequence of the predicted block structure. revision: yes

Circularity Check

1 steps flagged

PFE defined via prime-indexed pairs makes product-group decomposition and Schur block-diagonality hold by construction

specific steps
  1. self definitional [Abstract (proof claim)]
    "We prove that any linear map equivariant with respect to the product group action on PFE must be block-diagonal with one independent block per prime -- a consequence of Schur's lemma applied to the resulting character decomposition."

    PFE is introduced as 'prime-indexed (cos, sin) pairs derived from the harmonic analysis of Q'. Because the embedding is indexed and structured per prime from the outset, the symmetry group is the product group and the irreps are distinct per prime by the definition of the coordinates. Schur's lemma then forces block-diagonality tautologically from that definition rather than as a derived property of the harmonic analysis.

full rationale

The central theoretical claim applies Schur's lemma to conclude that equivariant maps on PFE must be block-diagonal per prime. However, PFE is explicitly constructed as prime-indexed (cos, sin) pairs, so the representation is defined to factor as a product over primes with no cross terms. The character decomposition into distinct per-prime irreps therefore follows directly from the embedding definition rather than from independent harmonic analysis of Q. The CRT channel prediction is standard and non-circular, and the empirical specialization ratios are post-training observations rather than fitted inputs renamed as predictions. This produces moderate circularity confined to the load-bearing representation premise.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the harmonic analysis of Q yielding a representation whose symmetry is the product group over primes, plus standard application of Schur's lemma; no free parameters are described in the abstract.

axioms (1)
  • standard math Schur's lemma applies to the character decomposition of the PFE representation under the product group action
    Invoked to conclude that equivariant linear maps must be block-diagonal per prime.
invented entities (1)
  • Prime Fourier Embeddings (PFE) no independent evidence
    purpose: Pre-structured integer representation that reduces modular arithmetic to prime-channel selection
    Newly introduced construction derived from harmonic analysis of Q.

pith-pipeline@v0.9.1-grok · 5672 in / 1402 out tokens · 45792 ms · 2026-06-26T09:19:35.648589+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 2 canonical work pages

  1. [1]

    Tianyi Zhou and Deqing Fu and Mahdi Soltanolkotabi and Robin Jia and Vatsal Sharan , booktitle=. Fo. 2026 , url=

  2. [2]

    and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =

    McLeish, Sean and Bansal, Arpit and Stein, Alex and Jain, Neel and Kirchenbauer, John and Bartoldson, Brian R. and Kailkhura, Bhavya and Bhatele, Abhinav and Geiping, Jonas and Schwarzschild, Avi and Goldstein, Tom , booktitle =. Transformers Can Do Arithmetic with the Right Embeddings , url =. doi:10.52202/079017-3430 , editor =

  3. [3]

    NeurIPS 2023 AI for Science Workshop , year=

    xVal: A Continuous Number Encoding for Large Language Models , author=. NeurIPS 2023 AI for Science Workshop , year=

  4. [4]

    Language Models Encode the Value of Numbers Linearly

    Zhu, Fangwei and Dai, Damai and Sui, Zhifang. Language Models Encode the Value of Numbers Linearly. Proceedings of the 31st International Conference on Computational Linguistics. 2025

  5. [5]

    2024 , issue_date =

    Su, Jianlin and Ahmed, Murtadha and Lu, Yu and Pan, Shengfeng and Bo, Wen and Liu, Yunfeng , title =. 2024 , issue_date =. doi:10.1016/j.neucom.2023.127063 , journal =

  6. [6]

    Advances in Neural Information Processing Systems , editor=

    On Embeddings for Numerical Features in Tabular Deep Learning , author=. Advances in Neural Information Processing Systems , editor=. 2022 , url=

  7. [7]

    Clifford Neural Layers for

    Johannes Brandstetter and Rianne van den Berg and Max Welling and Jayesh K Gupta , booktitle=. Clifford Neural Layers for. 2023 , url=

  8. [8]

    2022 , eprint=

    Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets , author=. 2022 , eprint=

  9. [9]

    RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =

    Zhiqing Sun and Zhi. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space , booktitle =. 2019 , url =

  10. [10]

    Poincar\'

    Nickel, Maximillian and Kiela, Douwe , booktitle =. Poincar\'

  11. [11]

    1977 , series =

    Serre, Jean-Pierre , title =. 1977 , series =

  12. [12]

    1991 , series =

    Fulton, William and Harris, Joe , title =. 1991 , series =

  13. [13]

    1990 , edition =

    Ireland, Kenneth and Rosen, Michael , title =. 1990 , edition =

  14. [14]

    1999 , series =

    Terras, Audrey , title =. 1999 , series =

  15. [15]

    1997 , edition =

    Gouv\^. 1997 , edition =

  16. [16]

    Algebraic Number Theory , publisher =

    Neukirch, J\". Algebraic Number Theory , publisher =. 1999 , series =

  17. [17]

    , title =

    Ramakrishnan, Dinakar and Valenza, Robert J. , title =. 1999 , series =

  18. [18]

    , title =

    Folland, Gerald B. , title =. 1995 , series =

  19. [19]

    2002 , edition =

    Lang, Serge , title =. 2002 , edition =

  20. [20]

    , title =

    Hungerford, Thomas W. , title =. 1974 , series =

  21. [21]

    , title =

    Munkres, James R. , title =. 2000 , edition =

  22. [22]

    2026 , eprint=

    There Will Be a Scientific Theory of Deep Learning , author=. 2026 , eprint=