Title resolution pending

Nikita Nangia, Samuel Bowman · 2018 · DOI 10.18653/v1/n18-4013

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Rethinking Attention with Performers

cs.LG · 2020-09-30 · unverdicted · novelty 7.0

Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.

Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming

cs.CL · 2019-06-24 · unverdicted · novelty 7.0

A fully differentiable parser that stochastically samples projective dependency trees using Gumbel perturbations and dynamic programming to boost downstream task performance without direct supervision.

Towards Understanding Self-Pretraining for Sequence Classification

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.

citing papers explorer

Showing 3 of 3 citing papers.

Rethinking Attention with Performers cs.LG · 2020-09-30 · unverdicted · none · ref 141
Performers approximate full-rank softmax attention in Transformers via FAVOR+ random features for linear complexity, with theoretical guarantees of unbiased estimation and competitive results on pixel, text, and protein tasks.
Learning Latent Trees with Stochastic Perturbations and Differentiable Dynamic Programming cs.CL · 2019-06-24 · unverdicted · none · ref 27
A fully differentiable parser that stochastically samples projective dependency trees using Gumbel perturbations and dynamic programming to boost downstream task performance without direct supervision.
Towards Understanding Self-Pretraining for Sequence Classification cs.LG · 2026-05-20 · unverdicted · none · ref 70
Self-pretraining improves Transformer sequence classification by enabling learning of proximity-biased attention from positional encodings that label supervision alone cannot easily acquire from random starts.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer