pith. sign in

hub

N., Kaiser, ., and Polosukhin, I

15 Pith papers cite this work. Polarity classification is still indexing.

15 Pith papers citing it

hub tools

clear filters

representative citing papers

Randomness is sometimes necessary for coordination

cs.AI · 2026-05-07 · conditional · novelty 7.0

Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

Fast Inference from Transformers via Speculative Decoding

cs.LG · 2022-11-30 · accept · novelty 7.0

Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

Stochastic Sparse Attention for Memory-Bound Inference

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

SANTA replaces full value-cache multiply-accumulates with stochastic gather-and-add sampling from the attention distribution to reduce memory bandwidth while preserving an unbiased estimator.

LACE: Lattice Attention for Cross-thread Exploration

cs.AI · 2026-04-16 · unverdicted · novelty 5.0 · 3 refs

LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Randomness is sometimes necessary for coordination cs.AI · 2026-05-07 · conditional · none · ref 94

    Structured per-agent randomness via ranked masking in attention allows symmetric agents to break ties and coordinate, achieving perfect success on symmetric tasks where deterministic policies fail and enabling zero-shot transfer across team sizes.

  • Position: The Turing-Completeness of Autoregressive Transformers Relies Heavily on Context Management cs.AI · 2026-05-19 · unverdicted · none · ref 50

    Existing proofs of autoregressive Transformer Turing-completeness apply to scaling families of models rather than fixed systems with context management, so they do not establish Turing-completeness for real-world LLMs.

  • LACE: Lattice Attention for Cross-thread Exploration cs.AI · 2026-04-16 · unverdicted · none · ref 36 · 3 links

    LACE enables concurrent reasoning paths in LLMs to interact via lattice attention and a synthetic training pipeline, raising accuracy more than 7 points over independent parallel search.