MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.
Attention is all you need
9 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 9representative citing papers
HyperPersona is a hypergraph framework that jointly models document, sentence, and word levels of text via hyperedges and nodes, then uses a transformer graph encoder to predict Big Five personality traits from text alone.
The Transformer is recovered exactly as the forward Euler step of spherical SVFlow, with multi-head attention and MoE/FFN as approximations to its vector field.
A PPO-trained transformer policy sparsifies dynamic graphs during RRT frontier exploration, cutting size by up to 96% and yielding the most consistent exploration rates across environments.
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
BASIS uses balanced hashing and invariant scalars to sketch activations, cutting memory to O(L*R*N) while matching exact backprop performance on GPT training at R=32.
BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.
Transformer models detect applicant gender in de-gendered academic recommendation letters via implicit linguistic patterns such as associations with words like 'emotional' and 'humanitarian', and removing these cues reduces but does not eliminate prediction accuracy above chance.
Ordinary least squares is a special case of the single-layer linear transformer when attention parameters are set via spectral decomposition of the empirical covariance matrix.
citing papers explorer
-
MapFormer: Self-Supervised Learning of Cognitive Maps with Input-Dependent Positional Embeddings
MapFormers learn cognitive maps via input-dependent Lie-algebra positional encodings and achieve near-perfect OOD generalization on cognitive tasks where standard transformers fail.
-
HyperPersona: A Multi-Level Hypergraph Framework for Text-Based Automatic Personality Prediction
HyperPersona is a hypergraph framework that jointly models document, sentence, and word levels of text via hyperedges and nodes, then uses a transformer graph encoder to predict Big Five personality traits from text alone.
-
Transformer as an Euler Discretization of Score-based Variational Flow
The Transformer is recovered exactly as the forward Euler step of spherical SVFlow, with multi-head attention and MoE/FFN as approximations to its vector field.
-
Learning-Based Sparsification of Dynamic Graphs in Robotic Exploration Algorithms
A PPO-trained transformer policy sparsifies dynamic graphs during RRT frontier exploration, cutting size by up to 96% and yielding the most consistent exploration rates across environments.
-
AE-ViT: Stable Long-Horizon Parametric Partial Differential Equations Modeling
AE-ViT combines a convolutional autoencoder with a latent-space transformer and multi-stage parameter plus coordinate injection to deliver stable long-horizon predictions for parametric PDEs, cutting relative rollout error by roughly five times versus prior DL-ROMs and ViTs on advection-diffusion-re
-
BASIS: Balanced Activation Sketching with Invariant Scalars for "Ghost Backpropagation"
BASIS uses balanced hashing and invariant scalars to sketch activations, cutting memory to O(L*R*N) while matching exact backprop performance on GPT training at R=32.
-
Scalable Spatiotemporal Inference with Biased Scan Attention Transformer Neural Processes
BSA-TNP is a new neural process model with KRBlocks and biased scan attention that claims to match top accuracy while scaling inference to over 1M points in under a minute on a single GPU and supporting translation invariance.
-
Identifying and Mitigating Gender Cues in Academic Recommendation Letters: An Interpretability Case Study
Transformer models detect applicant gender in de-gendered academic recommendation letters via implicit linguistic patterns such as associations with words like 'emotional' and 'humanitarian', and removing these cues reduces but does not eliminate prediction accuracy above chance.
-
Ordinary Least Squares is a Special Case of Transformer
Ordinary least squares is a special case of the single-layer linear transformer when attention parameters are set via spectral decomposition of the empirical covariance matrix.