A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
Title resolution pending
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 9roles
background 3polarities
background 3representative citing papers
Cumulative state updates in CMRU restore gradient flow through time in quantized bistable RNNs, yielding more stable convergence and competitive or superior performance versus LRUs and minGRUs on long-range sequence tasks.
ResRL decouples shared semantics between positive and negative responses in LLM reinforcement learning via SVD-based projection residuals, outperforming baselines including NSR by up to 9.4% on math reasoning benchmarks.
The work augments pose-conditioned 3D Gaussian avatars with a residual latent evolved by a transformer decoder that decomposes updates into driving, restoring, and dissipative forces to produce history-dependent, temporally coherent full-body animations.
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
mlr3torch introduces an extensible deep learning framework in R that integrates torch models into the mlr3 ecosystem via graph-based architectures for classification, regression, and multimodal tasks.
citing papers explorer
-
Enjoy Your Layer Normalization with the Computational Efficiency of RMSNorm
A framework to identify and convert foldable layer normalizations to RMSNorm for exact equivalence and faster inference in deep neural networks.
-
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
Cumulative state updates in CMRU restore gradient flow through time in quantized bistable RNNs, yielding more stable convergence and competitive or superior performance versus LRUs and minGRUs on long-range sequence tasks.
-
ResRL: Boosting LLM Reasoning via Negative Sample Projection Residual Reinforcement Learning
ResRL decouples shared semantics between positive and negative responses in LLM reinforcement learning via SVD-based projection residuals, outperforming baselines including NSR by up to 9.4% on math reasoning benchmarks.
-
Latent Dynamics for Full Body Avatar Animation
The work augments pose-conditioned 3D Gaussian avatars with a residual latent evolved by a transformer decoder that decomposes updates into driving, restoring, and dissipative forces to produce history-dependent, temporally coherent full-body animations.
-
Behavior-Consistent Deep Reinforcement Learning
QED bounds cross-run KL divergence in Boltzmann policies by setting temperature proportional to Q-disagreement and reduces return variance by two orders of magnitude on 18 continuous-control tasks without performance loss.
-
STELLAR: Scaling 3D Perception Large Models for Autonomous Driving
STELLAR trains up to 500M-parameter multi-modal models on 50M driving scenes and reports empirical scaling trends plus new state-of-the-art results on the Waymo Open Dataset.
-
Double Metric Learning for Building Directed Graphs with Chain Connections for the ATLAS ITk Detector
Double metric learning learns two embeddings per node to build directed graphs with chain connections, yielding better performance than single metric learning for high-pT particles and accurate edge direction prediction in ATLAS ITk simulations.
-
RT-Transformer: The Transformer Block as a Spherical State Estimator
Transformer components arise as the natural solution to precision-weighted directional state estimation on the hypersphere.
-
mlr3torch: A Deep Learning Framework in R based on mlr3 and torch
mlr3torch introduces an extensible deep learning framework in R that integrates torch models into the mlr3 ecosystem via graph-based architectures for classification, regression, and multimodal tasks.