Sparse sinkhorn attention, 2020

Tay, Y · 2020 · arXiv 2002.11296

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Transformer Neural Processes - Kernel Regression

cs.LG · 2024-11-19 · unverdicted · novelty 7.0

TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.

Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads

cs.LG · 2024-01-19 · conditional · novelty 7.0

Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

citing papers explorer

Showing 2 of 2 citing papers.

Transformer Neural Processes - Kernel Regression cs.LG · 2024-11-19 · unverdicted · none · ref 30
TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads cs.LG · 2024-01-19 · conditional · none · ref 240
Medusa augments LLMs with multiple decoding heads and tree-based attention to predict and verify several tokens in parallel, yielding 2.2-3.6x inference speedup via two fine-tuning regimes.

Sparse sinkhorn attention, 2020

fields

years

verdicts

representative citing papers

citing papers explorer