Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova

URL https://proceedings · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters

cs.LG · 2025-02-11 · unverdicted · novelty 7.0

Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.

citing papers explorer

Showing 1 of 1 citing paper.

Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters cs.LG · 2025-02-11 · unverdicted · none · ref 6
Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova

fields

years

verdicts

representative citing papers

citing papers explorer