Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
Howneuralnetworksextrapolate: Fromfeedforward to graph neural networks.arXiv:2009.11848, 2020b
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
FiLM conditioning targeted at early message-passing layers lets pretrained GNS models generalize to new material properties using only 12 trajectories, a 5-fold data reduction versus multi-task baselines.
Video generation models generalize perfectly inside the training distribution but fail out-of-distribution and rely on case-based mimicking of nearest training examples instead of abstracting physical laws.
SLIDE is a deep learning estimator that truncates initial effects via complex eigenvalues of linearized equations to predict output sequences of damped multibody systems, reporting speedups up to several million times.
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
citing papers explorer
-
Massive Activations in Large Language Models
Massive activations are constant large values in LLMs that function as indispensable bias terms and concentrate attention probabilities on specific tokens.
-
Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators
FiLM conditioning targeted at early message-passing layers lets pretrained GNS models generalize to new material properties using only 12 trajectories, a 5-fold data reduction versus multi-task baselines.
-
How Far is Video Generation from World Model: A Physical Law Perspective
Video generation models generalize perfectly inside the training distribution but fail out-of-distribution and rely on case-based mimicking of nearest training examples instead of abstracting physical laws.
-
SLIDE: A machine-learning based method for forced dynamic response estimation of multibody systems
SLIDE is a deep learning estimator that truncates initial effects via complex eigenvalues of linearized equations to predict output sequences of damped multibody systems, reporting speedups up to several million times.
-
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.