Conference on Empirical Methods in Natural Language Processing , year=

Consistent Accelerated Inference via Confident Adaptive Transformers , author=

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Fast Inference from Transformers via Speculative Decoding

cs.LG · 2022-11-30 · accept · novelty 7.0

Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

citing papers explorer

Showing 1 of 1 citing paper.

Fast Inference from Transformers via Speculative Decoding cs.LG · 2022-11-30 · accept · none · ref 13
Speculative decoding accelerates exact sampling from large autoregressive models by 2-3x on T5-XXL by running smaller approximation models in parallel to propose token sequences that the large model then verifies in batches while preserving the original output distribution.

Conference on Empirical Methods in Natural Language Processing , year=

fields

years

verdicts

representative citing papers

citing papers explorer