Conservation Laws for Modern Neural Architectures

Nam Nguyen; Tan Lai Ngoc; Tan M. Nguyen; Tuan Dam; Viet-Hoang Tran; Vinh Khanh Bui

arxiv: 2606.17816 · v1 · pith:AX36XNWFnew · submitted 2026-06-16 · 💻 cs.LG · cs.AI

Conservation Laws for Modern Neural Architectures

Viet-Hoang Tran , Vinh Khanh Bui , Tan Lai Ngoc , Nam Nguyen , Tuan Dam , Tan M. Nguyen This is my paper

classification 💻 cs.LG cs.AI

keywords lawsarchitecturesconservationgradientmodelsmodernnetworksactivations

0 comments

read the original abstract

Understanding gradient descent dynamics is key to explaining the success of over-parameterized models, where implicit bias manifests through conservation laws in gradient flow. While such laws are well understood for linear and ReLU networks, they remain largely unexplored for modern architectures. This work develops a unified framework to characterize conservation laws for contemporary models, including feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts architectures under diverse gating designs. Our theoretical findings are supported by experiments that validate the predicted invariants.

This paper has not been read by Pith yet.

Conservation Laws for Modern Neural Architectures

discussion (0)