Simplified self-attention for transformer-based end-to-end speech recognition

· 2005 · arXiv 2005.10463

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts

cs.LG · 2025-10-05 · unverdicted · novelty 7.0

RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

citing papers explorer

Showing 2 of 2 citing papers.

RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts cs.LG · 2025-10-05 · unverdicted · none · ref 19
RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.
Rethinking Attention with Performers cs.LG · 2020-09-30 · unverdicted · none · ref 138
Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

Simplified self-attention for transformer-based end-to-end speech recognition

fields

years

verdicts

representative citing papers

citing papers explorer