pith. sign in

hub

N., Kaiser, ., and Polosukhin, I

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

clear filters

representative citing papers

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Fast Inference from Transformers via Speculative Decoding

cs.LG · 2022-11-30 · accept · novelty 7.0

Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

Stochastic Sparse Attention for Memory-Bound Inference

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

SANTA replaces full value-cache multiply-accumulates with stochastic gather-and-add sampling from the attention distribution to reduce memory bandwidth while preserving an unbiased estimator.

LACE: Lattice Attention for Cross-thread Exploration

cs.AI · 2026-04-16 · unverdicted · novelty 5.0 · 3 refs

LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.