arXiv preprint arXiv:2602.03655 , year =

· 2026 · cs.LG · arXiv 2602.03655

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this question, we introduce the sequential group composition task. In this task, networks receive a sequence of elements from a finite group encoded in a real vector space and must predict their cumulative product. This task can be order-sensitive and cannot be solved by a linear model. Our analysis isolates the roles of the group structure, encoding statistics, and sequence length in shaping learning. We prove that two-layer networks from vanishing initialization learn this task one irreducible representation of the group at a time in an order determined by the Fourier statistics of the encoding. To perfectly learn the task, these networks require a hidden width exponential in the sequence length $k$. In contrast, we construct deeper architectures that exploit associativity to dramatically improve this scaling: recurrent neural networks can compose elements sequentially in $k$ steps, while multilayer networks can compose adjacent pairs in parallel in $\log k$ layers. Overall, the sequential group composition task offers a tractable window into the mechanics of deep learning.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Neural Networks Provably Learn Spectral Representations for Group Composition

cs.LG · 2026-06-02 · unverdicted · novelty 8.0

Two-layer neural networks provably converge almost surely to irreducible representations of finite groups when trained on the group composition task, with the dynamics governed by Riemannian gradient ascent on a representation-theoretic energy functional.

Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

cs.AI · 2026-05-01 · unverdicted · novelty 7.0

Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.

Singular Learning and Occam's Razor in Deep Monomial Networks

cs.LG · 2026-06-26 · unverdicted · novelty 5.0

For large monomial activation degree, critical points in deep fully-connected networks coincide exactly with subnetwork configurations where neurons are inactive or redundant.

There Will Be a Scientific Theory of Deep Learning

stat.ML · 2026-04-23 · unverdicted · novelty 2.0

A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

citing papers explorer

Showing 5 of 5 citing papers after filters.

Neural Networks Provably Learn Spectral Representations for Group Composition cs.LG · 2026-06-02 · unverdicted · none · ref 47 · internal anchor
Two-layer neural networks provably converge almost surely to irreducible representations of finite groups when trained on the group composition task, with the dynamics governed by Riemannian gradient ascent on a representation-theoretic energy functional.
Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior cs.LG · 2026-05-06 · unverdicted · none · ref 1 · internal anchor
Manifold steering along activation geometry induces behavioral trajectories matching the natural manifold of outputs, while linear steering produces off-manifold unnatural behaviors.
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts cs.AI · 2026-05-01 · unverdicted · none · ref 1 · internal anchor
Llama-3.1-8B computes sums for cyclic concepts using base-10 addition via task-agnostic Fourier features with periods 2, 5, and 10 rather than modular arithmetic in the concept period.
Singular Learning and Occam's Razor in Deep Monomial Networks cs.LG · 2026-06-26 · unverdicted · none · ref 10 · internal anchor
For large monomial activation degree, critical points in deep fully-connected networks coincide exactly with subnetwork configurations where neurons are inactive or redundant.
There Will Be a Scientific Theory of Deep Learning stat.ML · 2026-04-23 · unverdicted · none · ref 218 · internal anchor
A mechanics of the learning process is emerging in deep learning theory, characterized by dynamics, coarse statistics, and falsifiable predictions across idealized settings, limits, laws, hyperparameters, and universal behaviors.

arXiv preprint arXiv:2602.03655 , year =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer