pith. machine review for the scientific record. sign in

arxiv: 2603.29496 · v2 · submitted 2026-03-31 · 💻 cs.AI · cs.LG

Recognition: 2 theorem links

· Lean Theorem

Metriplector: From Field Theory to Neural Architecture

Authors on Pith no claims yet

Pith reviewed 2026-05-13 23:53 UTC · model grok-4.3

classification 💻 cs.AI cs.LG
keywords metriplectic dynamicsneural architecturefield theorystress-energy tensorimage recognitionrobotic controllanguage modelingPoisson equation
0
0 comments X

The pith

Metriplector configures inputs as physical fields and lets metriplectic evolution perform the neural computation, with readout from the stress-energy tensor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Metriplector as a neural primitive in which task inputs define fields, sources, and operators whose coupled evolution constitutes the computation itself. The metriplectic structure supports a spectrum of modes: its dissipative branch alone reduces to a screened Poisson equation solved exactly by conjugate gradient, while the full dynamics including the antisymmetric Poisson bracket generate field evolution for image recognition, language modeling, and control. Across five evaluated domains the architecture achieves reported accuracies such as 81 percent on CIFAR-100 and 88 percent success on robotic reaching, all with parameter counts under a few million and using the stress-energy tensor as the natural output. A sympathetic reader would care because the approach replaces hand-crafted layers with a single physics-derived dynamical system that can be dialed from simple dissipation to richer bracket-driven behavior.

Core claim

The metriplectic formulation admits a natural spectrum of instantiations as neural architectures: the dissipative branch yields a screened Poisson equation solved exactly via conjugate gradient, while activating the full structure including the antisymmetric Poisson bracket supplies field dynamics that perform image recognition, language modeling, robotic control, Sudoku solving, and maze pathfinding, with the stress-energy tensor providing the readout.

What carries the argument

Coupled metriplectic dynamics of multiple fields driven by sources and operators, with the stress-energy tensor derived from Noether's theorem serving as the readout mechanism.

If this is right

  • The dissipative branch alone produces exact solutions to screened Poisson equations via conjugate gradient.
  • The full metriplectic structure supplies field dynamics capable of image recognition, language modeling, and robotic control.
  • Task-specific architectures built from the same primitive achieve 81.03 percent on CIFAR-100, 88 percent CEM success on Reacher, 97.2 percent exact Sudoku solve rate, 1.182 bits per byte on language modeling, and perfect F1 on maze pathfinding.
  • The same primitive supports generalization from 15 by 15 training grids to unseen 39 by 39 grids in pathfinding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The spectrum suggests neural computation can be viewed as a tunable physical evolution rather than a stack of unrelated layers.
  • Applying the same field setup to new domains such as physics simulation or scientific computing could test whether the physics grounding transfers without redesign.
  • Parameter counts under one million for control tasks raise the question of whether metriplectic scaling laws differ from those of standard attention-based models.

Load-bearing premise

That arbitrary task inputs can be configured as fields, sources, and operators such that the resulting metriplectic evolution produces useful computation whose stress-energy readout matches task labels without task-specific fitting that undermines the physics interpretation.

What would settle it

An experiment showing that the stress-energy tensor readout requires extensive task-specific parameter adjustments to match labels, or that performance collapses when field configurations are forced to obey strict physical consistency constraints.

Figures

Figures reproduced from arXiv: 2603.29496 by Dan Oprisa, Peter Toth.

Figure 1
Figure 1. Figure 1: Metriplector field interaction. Top: K fields ψk evolve via metriplectic dynamics over the spatial grid (gradient arrows show ∇ψk); the outer product ∇ψa ⊗ ∇ψb yields three stress￾energy components. Bottom: per-field gradient energy Ek = |∇ψk| 2 , cross-field correlation Eab = ∇ψa · ∇ψb, and vorticity Vab = ∇ψa × ∇ψb, summed and projected via Conv1×1 into h. Shown for K=3; the full model uses K=32. work la… view at source ↗
Figure 2
Figure 2. Figure 2: The metriplectic spectrum. All five domains instantiate the same GENERIC equation. Maze and Sudoku use only the dissipative branch (M), solved at equilibrium via CG. Language uses causal dissipation via scan. CIFAR-100 and Reacher activate full metriplectic structure via Euler integration; Reacher additionally incorporates canonical symplectic JΩ, port-Hamiltonian action conditioning, and multi-step AR tra… view at source ↗
Figure 3
Figure 3. Figure 3: Metriplector architecture. A single round of the recurrent V-cycle (repeated R times with shared weights). The cell encoder produces per-cell features h from input, previous predictions, po￾sition, and round fraction. Learned symmetric conductances wij define the graph Laplacian. Damp￾ing and source MLPs produce per-cell screening and forcing terms. K independent screened Poisson equations are solved via C… view at source ↗
Figure 4
Figure 4. Figure 4: CIFAR-100 metriplectic layer (×12, non-shared weights). The representation h (D=128) flows along the top via residual connections. Each layer projects h down to K=32 physics fields ψ, evolves them under the full metriplectic equation (diffusion + advection + damping + source), extracts physically meaningful features via the stress-energy tensor (Noether readout), and projects back to D via gated mixing. Th… view at source ↗
Figure 5
Figure 5. Figure 5: Reacher world model architecture. A convolutional encoder maps 64×64 images to an 8×8×128 latent grid. The action modulates PDE operators via port-Hamiltonian conditioning (zero-initialized). Four MetriplectorLayers evolve fresh ψ fields—organized as 8 conjugate (q, p) pairs with canonical symplectic JΩ—and extract T µν features. Multi-step AR training (scheduled warmup) closes the exposure bias gap. The p… view at source ↗
Figure 6
Figure 6. Figure 6: Causal Poisson language model. Tokens are embedded and passed through L=6 non￾shared CausalPoissonLayers. Each layer solves the causal Poisson recurrence via O(N log N) par￾allel associative scan, applies progressive multigrid (token → chunk → section scales with shifted pooling for causal safety), computes cross-field outer products (ψ ⊗ ψ), and integrates all features into the hidden state h via a round … view at source ↗
Figure 7
Figure 7. Figure 7: CIFAR-100: Accuracy vs. parameters. Metriplector variants (blue) achieve 80%+ ac￾curacy with 2.26M parameters—10–15× fewer than conventional architectures (gray) at similar accuracy levels [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: CIFAR-100 ablation impacts. Accuracy improvement from each architectural com￾ponent relative to the PhysicsSelects-V2 K=32 baseline (78.4%). The operator-from-h principle (+14.3%) and the Poisson bracket J (+13.4%) are the most important factors. Fresh ψ per layer (+3.4%) validates the non-recurrent physics design. a sharp increase at L11, reflecting the final layer concentrating discriminative features be… view at source ↗
Figure 9
Figure 9. Figure 9: CIFAR-100 physics diagnostics across 12 layers. (a) Singular value spectrum of the learned Poisson tensor J—values appear in degenerate pairs, the hallmark of skew-symmetric struc￾ture. (b) J effective rank and null-space dimensions; ∥J∥F grows monotonically with depth, indi￾cating stronger cross-field coupling in later layers. (c) ψ magnitude and standard deviation remain stable (0.25–0.5) across L0–L10, … view at source ↗
Figure 10
Figure 10. Figure 10: ψ field specialization across 12 layers (32 fields × 12 layers). (a) Field magnitude: most fields remain moderate through L0–L10, with a sharp spike at L11 where specific fields (e.g., k=3, 20, 21) concentrate discriminative features. (b) Spatial variance: early layers develop spatially uniform fields (low variance); L11 shows dramatic spatial structure as the model localizes class￾specific features befor… view at source ↗
Figure 11
Figure 11. Figure 11: CIFAR-100 forward dynamics budget. (a) Hidden state h: magnitude and per-layer update ∆h both grow monotonically; the step size γ increases from 0.12 to 0.22, self-organizing an expanding dynamics budget. (b) ψ fields: magnitude and standard deviation are stable (0.25–0.5) across L0–L10, with spatial variance spiking at L11. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Emergent symplectic layer specialization. (a) Evolution of Jspec (spectral radius of the full Poisson bracket J = JΩ + Jcross) during training. All layers begin near canonical (Jspec = 1.0) and develop a depth-ordered hierarchy: L2 acquires the richest inter-pair coupling (+56% beyond canonical). The “AR on” marker at epoch 10 shows the onset of multi-step autoregressive training. (b) Final layer configur… view at source ↗
Figure 13
Figure 13. Figure 13: Emergent box discovery in Sudoku. The object layer discovers all 9 Sudoku 3×3 boxes from an 8-connected spatial lattice with no box-level supervision. Assignments are sharp (τ=0.136, entropy ≈0.032) and static across all puzzles. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Learned conductance matrix Wsym (Sudoku). Entries show coupling strength between cell types (digits 1–9 and padding). The model learns sparse, structured couplings: strong links (4↔pad, 2↔4, 5↔7) emerge without supervision. 4.4 Language Modeling Language modeling tests whether the metriplectic formalism can compete with transformers on their home turf: autoregressive next-token prediction on the FineWeb d… view at source ↗
Figure 15
Figure 15. Figure 15: Language modeling BPB progression. GPT baseline (1.224 BPB, dashed) shown for reference; Metriplector reaches 1.182 BPB with 3.6× fewer training tokens. 4.5 Maze Pathfinding [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
read the original abstract

We present Metriplector, a neural architecture primitive in which the input configures an abstract physical system -- fields, sources, and operators -- and the dynamics of that system is the computation. Multiple fields evolve via coupled metriplectic dynamics, and the stress-energy tensor T^{\mu\nu}, derived from Noether's theorem, provides the readout. The metriplectic formulation admits a natural spectrum of instantiations: the dissipative branch alone yields a screened Poisson equation solved exactly via conjugate gradient; activating the full structure -- including the antisymmetric Poisson bracket -- gives field dynamics for image recognition, language modeling, and robotic control. We evaluate Metriplector across five domains, each using a task-specific architecture built from this shared primitive with progressively richer physics: 81.03% on CIFAR-100 with 2.26M parameters; 88% CEM success on Reacher robotic control with under 1M parameters; 97.2% exact Sudoku solve rate with zero structural injection; 1.182 bits/byte on language modeling with 3.6x fewer training tokens than a GPT baseline; and F1=1.0 on maze pathfinding, generalizing from 15x15 training grids to unseen 39x39 grids.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Metriplector, a neural architecture primitive in which task inputs configure fields, sources, and operators whose coupled metriplectic dynamics (Poisson bracket plus dissipation) perform the computation, with readout given by the stress-energy tensor T^{μν} derived from Noether's theorem. The approach is instantiated across a spectrum from pure dissipative dynamics (screened Poisson equation solved by conjugate gradient) to full structure, and evaluated on five domains using task-specific architectures: 81.03% accuracy on CIFAR-100 (2.26M parameters), 88% CEM success on Reacher (<1M parameters), 97.2% exact Sudoku solve rate with zero structural injection, 1.182 bits/byte on language modeling (3.6× fewer tokens than GPT baseline), and F1=1.0 on maze pathfinding with generalization from 15×15 to 39×39 grids.

Significance. If the central claim holds—that arbitrary inputs can be configured as fields/sources/operators such that metriplectic evolution produces task solutions whose stress-energy readout matches labels without task-specific fitting that undermines the physics interpretation—the work could provide a novel unified primitive bridging field theory and neural computation. The reported parameter efficiency, exact Sudoku performance, and out-of-distribution maze generalization would be notable strengths if supported by explicit derivations and ablations.

major comments (2)
  1. [Abstract] Abstract: the claim of 97.2% exact Sudoku solve rate with 'zero structural injection' is load-bearing for the assertion of a general physics primitive. Without an explicit description of how the grid is mapped to initial fields, sources, and operators, it remains unclear whether row/column constraints are pre-encoded in the configuration step rather than emerging from the dynamics alone.
  2. [Abstract] Abstract and Experiments sections: performance numbers (e.g., 81.03% on CIFAR-100, 1.182 bits/byte on LM) are reported without error bars, training details, ablation studies, or derivations showing how the metriplectic equations yield the observed outputs. This prevents verification that the results follow from the stated dynamics rather than from the task-specific configuration choices.
minor comments (2)
  1. The notation for the stress-energy tensor T^{μν} and its derivation via Noether's theorem should include the explicit metric signature and coordinate conventions used in the field equations.
  2. The manuscript would benefit from a dedicated section clarifying the precise mapping procedure from task inputs to fields/sources/operators for each domain, to allow readers to assess generality.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments, which help clarify the presentation of the input-to-field mapping and strengthen the experimental reporting. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim of 97.2% exact Sudoku solve rate with 'zero structural injection' is load-bearing for the assertion of a general physics primitive. Without an explicit description of how the grid is mapped to initial fields, sources, and operators, it remains unclear whether row/column constraints are pre-encoded in the configuration step rather than emerging from the dynamics alone.

    Authors: We agree that an explicit description of the Sudoku input configuration is required to substantiate the 'zero structural injection' claim. In the revised manuscript we will insert a new subsection (Experiments, Sudoku) that provides the precise mapping: the 9x9 grid is encoded as a scalar field φ(x) whose initial value at each cell is set by a delta-source term proportional to the given number (or zero for empty cells); the metriplectic operators are instantiated solely from the abstract Poisson structure and dissipation kernel without any row- or column-specific terms; the Sudoku constraints arise endogenously from the conservation properties enforced by the stress-energy tensor readout. The exact initialization equations and operator definitions will be supplied so that readers can verify the absence of pre-encoded constraints. revision: yes

  2. Referee: [Abstract] Abstract and Experiments sections: performance numbers (e.g., 81.03% on CIFAR-100, 1.182 bits/byte on LM) are reported without error bars, training details, ablation studies, or derivations showing how the metriplectic equations yield the observed outputs. This prevents verification that the results follow from the stated dynamics rather than from the task-specific configuration choices.

    Authors: We concur that the current reporting lacks the statistical and methodological detail needed for independent verification. The revised manuscript will expand the Experiments section with: (i) mean and standard deviation over five independent random seeds for every reported metric; (ii) complete hyperparameter tables and optimization schedules for each of the five tasks; (iii) ablation tables that isolate the contribution of the antisymmetric Poisson bracket versus the dissipative branch alone; and (iv) explicit derivations (for Sudoku and maze) that step through the metriplectic evolution equations and show how the stress-energy tensor components map to the task labels. These additions will demonstrate that the reported performance is a direct consequence of the coupled dynamics. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain.

full rationale

The paper defines Metriplector as a primitive where task inputs configure fields/sources/operators and metriplectic dynamics (Poisson bracket plus dissipation) perform the computation, with readout via the stress-energy tensor obtained from Noether's theorem. The abstract and description present this as a spectrum of instantiations from screened Poisson to full dynamics, evaluated on multiple domains with task-specific architectures built from the shared primitive. No quoted equations or steps reduce the claimed dynamics or readout to a fitted parameter renamed as prediction, a self-citation chain, or an ansatz smuggled via prior work by the same authors. The configuration step is described as generic and physics-motivated rather than shown to embed the solution by construction. The derivation therefore remains self-contained against external physical principles without the specific reductions required for a positive circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that task inputs can be faithfully encoded as fields and operators whose metriplectic evolution yields task solutions; this is a domain assumption imported from physics without independent verification in the abstract.

axioms (2)
  • domain assumption Metriplectic dynamics govern the evolution of the configured fields and operators
    Invoked as the computational mechanism without derivation in the abstract.
  • domain assumption The stress-energy tensor derived from Noether's theorem provides a sufficient readout for downstream tasks
    Stated as the output mechanism without further justification in the abstract.

pith-pipeline@v0.9.0 · 5513 in / 1351 out tokens · 54462 ms · 2026-05-13T23:53:53.989363+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 8 internal anchors

  1. [1]

    Relational inductive biases, deep learning, and graph networks

    Battaglia, P. W., Hamrick, J. B., Bapst, V ., et al. Relational inductive biases, deep learning, and graph networks.arXiv preprint arXiv:1806.01261,

  2. [2]

    Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

    Bronstein, M. M., Bruna, J., Cohen, T., and Veliˇckovi´c, P. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges.arXiv preprint arXiv:2104.13478,

  3. [3]

    CliffordNet: All you need is geometric algebra.arXiv preprint arXiv:2601.06793,

    Ji, Z. CliffordNet: All you need is geometric algebra.arXiv preprint arXiv:2601.06793,

  4. [4]

    The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

    Penedo, G., Kydlíˇcek, H., Lozhkov, A., Mitchell, M., Colin, C., Mou, G., Ponferrada, E. G., Wolf, T., and Thrush, T. The FineWeb datasets: Decanting the web for the finest text data at scale.arXiv preprint arXiv:2406.17557,

  5. [5]

    Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds

    Thomas, N., Smidt, T., Kearnes, S., Yang, L., Li, L., Kober, K., and Riley, P. Tensor field net- works: Rotation- and translation-equivariant neural networks for 3D point clouds.arXiv preprint arXiv:1802.08219,

  6. [6]

    LLaMA: Open and Efficient Foundation Language Models

    Touvron, H., Lavril, T., Izacard, G., et al. LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971,

  7. [7]

    Neural logic machines for Sudoku solving.arXiv preprint arXiv:2108.06455,

    Zhang, J., Li, Z., and Chen, F. Neural logic machines for Sudoku solving.arXiv preprint arXiv:2108.06455,

  8. [8]

    LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels

    Maes, L., Le Lidec, Q., Scieur, D., LeCun, Y ., and Balestriero, R. LeWorldModel: Stable end-to-end joint-embedding predictive architecture from pixels.arXiv preprint arXiv:2603.19312,

  9. [9]

    World Models

    Ha, D. and Schmidhuber, J. World models.arXiv preprint arXiv:1803.10122,

  10. [10]

    Mastering Diverse Domains through World Models

    Hafner, D., Pasukonis, J., Ba, J., and Lillicrap, T. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104,

  11. [11]

    2411.04983 , archiveprefix =

    Zhou, Y ., Zhang, Y ., Zhai, Y ., and LeCun, Y . DINO-WM: World models on pre-trained visual features enable zero-shot planning.arXiv preprint arXiv:2411.04983,

  12. [12]

    Simon, P

    Lee, K., Trask, N., and Stinis, P. Learning the metriplectic dynamics of complex systems.arXiv preprint arXiv:2208.05929,

  13. [13]

    and Greydanus, S

    Sosanya, A. and Greydanus, S. Dissipative Hamiltonian neural networks: Learning dissipative and conservative dynamics separately.arXiv preprint arXiv:2201.10085,

  14. [14]

    Graph neural networks informed locally by thermodynamics.arXiv preprint arXiv:2405.13093,

    Hernández, Q., Badías, A., Chinesta, F., and Cueto, E. Graph neural networks informed locally by thermodynamics.arXiv preprint arXiv:2405.13093,

  15. [15]

    C., Arratia, P

    34 Hernández, Q., Win, M., O’Connor, T. C., Arratia, P. E., and Trask, N. Data-driven particle dy- namics: Structure-preserving coarse-graining for emergent behavior in non-equilibrium systems. arXiv preprint arXiv:2508.12569,

  16. [16]

    and Lindemann, L

    Baheri, A. and Lindemann, L. Metriplectic conditional flow matching for dissipative dynamics. arXiv preprint arXiv:2509.19526,

  17. [17]

    Meta-learning Structure-Preserving Dynamics

    Jing, C., Mudiyanselage, U. B., Cho, W., Jo, M., Gruber, A., and Lee, K. Meta-learning structure- preserving dynamics.arXiv preprint arXiv:2508.11205,

  18. [18]

    The same CG solver is reused for both forward and adjoint solves, requiringO(N)total memory

    A Implicit Differentiation Givenψ ∗ =A −1bwhereA=L W + Λand downstream lossL: ∂L ∂b =A −1 ∂L ∂ψ ∗ =v,(24) ∂L ∂wij =−v i(ψ∗ i −ψ ∗ j )−v j(ψ∗ j −ψ ∗ i ),(25) wherev=A −1(∂L/∂ψ ∗)is the adjoint variable. The same CG solver is reused for both forward and adjoint solves, requiringO(N)total memory. B Dirichlet Energy Derivation Setting∇ ψEDir = 0from Eq. (9): ...