pith. sign in

Title resolution pending

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 2 cs.CL 1

verdicts

UNVERDICTED 3

representative citing papers

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

citing papers explorer

Showing 3 of 3 citing papers.

  • Rethinking Attention with Performers cs.LG · 2020-09-30 · unverdicted · none · ref 141

    Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

  • Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming cs.CL · 2019-06-24 · unverdicted · none · ref 27

    A fully differentiable parser that stochastically samples projective dependency trees using Gumbel perturbations and dynamic programming to boost downstream task performance without direct supervision.

  • Towards Understanding Self-Pretraining for Sequence Classification cs.LG · 2026-05-20 · unverdicted · none · ref 70

    Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.