Simple linear attention language models balance the recall-throughput tradeoff

Simran Arora, Sabri Eyuboglu, Michael Zhang, Aman Timalsina, Silas Alberti, Dylan Zinsley, James Zou, Atri Rudra, Christopher R´ e · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

KernelBench: Can LLMs Write Efficient GPU Kernels?

cs.LG · 2025-02-14 · accept · novelty 7.0

KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.

citing papers explorer

Showing 1 of 1 citing paper.

KernelBench: Can LLMs Write Efficient GPU Kernels? cs.LG · 2025-02-14 · accept · none · ref 2
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.

Simple linear attention language models balance the recall-throughput tradeoff

fields

years

verdicts

representative citing papers

citing papers explorer