pith. sign in

Gated attention for large language models: Non-linearity, sparsity, and attention-sink-free

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 2 cs.CV 1

years

2026 3

verdicts

UNVERDICTED 3

representative citing papers

Vision Transformers Need Better Token Interaction

cs.CV · 2026-05-22 · unverdicted · novelty 5.0

Replacing softmax attention with entmax-1.5 in DINOv1 ViT-S/16 improves semantic segmentation mIoU on three benchmarks while keeping ImageNet linear-probing accuracy unchanged.

citing papers explorer

Showing 3 of 3 citing papers.