Rational Neural Networks have Expressivity Advantages

Alex Townsend; Maosen Tang

arxiv: 2602.12390 · v2 · pith:DYVQII2Hnew · submitted 2026-02-12 · 💻 cs.LG · cs.AI· cs.NA· math.NA

Rational Neural Networks have Expressivity Advantages

Maosen Tang , Alex Townsend This is my paper

classification 💻 cs.LG cs.AIcs.NAmath.NA

keywords activationsnetworksrationalvarepsilonarchitecturesfixednetworkneural

0 comments

read the original abstract

We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $\Omega(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Rational Sparse Autoencoder
cs.LG 2026-06 unverdicted novelty 7.0

RSAE replaces fixed SAE encoder activations (ReLU, JumpReLU, TopK) with trainable rational functions, initialized from baselines and fine-tuned to improve reconstruction and downstream metrics on language-model residu...