RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.
CutMix augmentation during training induces spatial locality in early layers of Vision Transformers trained from scratch, as measured by reduced Mean Attention Distance.
citing papers explorer
-
RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts
RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.
-
Elucidating the Design Space of Diffusion-Based Generative Models
Organizing diffusion model design choices yields SOTA FID of 1.79 on CIFAR-10 with only 35 network evaluations per image and similar gains on ImageNet-64.
-
Inducing Spatial Locality in Vision Transformers through the Training Protocol
CutMix augmentation during training induces spatial locality in early layers of Vision Transformers trained from scratch, as measured by reduced Mean Attention Distance.