Understanding and improving layer normalization.Advances in neural information processing systems, 32

Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, Junyang Lin · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Cubit: Token Mixer with Kernel Ridge Regression

cs.LG · 2026-05-07 · unverdicted · novelty 5.0 · 2 refs

Cubit replaces Transformer's attention with a closed-form Kernel Ridge Regression token mixer and reports larger gains as training sequence length increases.

Global-local Spatial-temporal Aware Graph Attention Network for Network Traffic Forecasting

cs.IR · 2025-05-11 · unverdicted · novelty 3.0

GLSTaGAT is a spatial-temporal graph attention network using data-driven fusion graphs, global-local blocks, node normalization, and a transformer encoder to outperform baselines on real-world network traffic datasets.

citing papers explorer

Showing 2 of 2 citing papers.

Cubit: Token Mixer with Kernel Ridge Regression cs.LG · 2026-05-07 · unverdicted · none · ref 90 · 2 links
Cubit replaces Transformer's attention with a closed-form Kernel Ridge Regression token mixer and reports larger gains as training sequence length increases.
Global-local Spatial-temporal Aware Graph Attention Network for Network Traffic Forecasting cs.IR · 2025-05-11 · unverdicted · none · ref 41
GLSTaGAT is a spatial-temporal graph attention network using data-driven fusion graphs, global-local blocks, node normalization, and a transformer encoder to outperform baselines on real-world network traffic datasets.

Understanding and improving layer normalization.Advances in neural information processing systems, 32

fields

years

verdicts

representative citing papers

citing papers explorer