pith. sign in

Simplified self-attention for transformer-based end-to-end speech recognition

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2025 1 2020 1

verdicts

UNVERDICTED 2

representative citing papers

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

citing papers explorer

Showing 2 of 2 citing papers.

  • RACE Attention: A Strictly Linear-Time Attention Layer for Training on Outrageously Large Contexts cs.LG · 2025-10-05 · unverdicted · none · ref 19

    RACE Attention is a strictly linear-time attention mechanism that approximates softmax attention outputs using Gaussian projections and soft LSH to enable training on contexts up to 12 million tokens.

  • Rethinking Attention with Performers cs.LG · 2020-09-30 · unverdicted · none · ref 138

    Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.