A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
Journal of the American statistical association , volume=
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
An efficient algorithm recovers phylogenetic trees from Θ(n) noisy quartets under random classification noise, matching the information-theoretic lower bound and achieving near-optimal quartet distance.
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.
citing papers explorer
-
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
-
Optimal Phylogenetic Reconstruction from Sampled Quartets
An efficient algorithm recovers phylogenetic trees from Θ(n) noisy quartets under random classification noise, matching the information-theoretic lower bound and achieving near-optimal quartet distance.
-
Stories in Space: In-Context Learning Trajectories in Conceptual Belief Space
LLMs perform in-context learning as trajectories through a structured low-dimensional conceptual belief space, with the structure visible in both behavior and internal representations and causally manipulable via interventions.
-
Provable Accuracy Collapse in Embedding-Based Representations under Dimensionality Mismatch
Triplet constraints realizable in D-dimensional Euclidean space cannot be preserved above 50% accuracy by any embedding of dimension at most cD for constant c<1, with UGC-hardness preventing better polynomial-time solutions in any dimension.