Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
Deep residual learning for image recognition
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
HELIX uses learnable feature identities and hybrid temporal-feature attention to achieve state-of-the-art time series imputation across multiple datasets and settings.
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.
citing papers explorer
-
Energy-Gated Attention: Spectral Salience as an Inductive Bias for Transformer Attention
Energy-Gated Attention improves language model validation loss by gating attention according to spectral energy of key embeddings discovered by a learned projection, with consistent gains on TinyShakespeare and Penn Treebank using under 0.26% extra parameters.
-
HELIX: Hybrid Encoding with Learnable Identity and Cross-dimensional Synthesis for Time Series Imputation
HELIX uses learnable feature identities and hybrid temporal-feature attention to achieve state-of-the-art time series imputation across multiple datasets and settings.
-
Orthogonal Subspace Decomposition for Generalizable AI-Generated Image Detection
Orthogonal subspace decomposition via SVD on vision foundation model features preserves high-rank pre-trained knowledge by freezing principal components and adapting residuals, reducing overfitting for better generalization in AI-generated image detection.