{"work":{"id":"9d4637dd-1cab-4f10-82d4-8c8d14bb96ed","openalex_id":null,"doi":null,"arxiv_id":"2401.15077","raw_key":null,"title":"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty","authors":null,"authors_text":"Yuhui Li, Fangyun Wei, Chao Zhang, Hongyang Zhang","year":2024,"venue":"cs.LG","abstract":"Autoregressive decoding makes the inference of Large Language Models (LLMs) time-consuming. In this paper, we reconsider speculative sampling and derive two key observations. Firstly, autoregression at the feature (second-to-top-layer) level is more straightforward than at the token level. Secondly, the inherent uncertainty in feature (second-to-top-layer) level autoregression constrains its performance. Based on these insights, we introduce EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency), a simple yet highly efficient speculative sampling framework. By incorporating a token sequence advanced by one time step, EAGLE effectively resolves the uncertainty, enabling precise second-to-top-layer feature prediction with minimal overhead. We conducted comprehensive evaluations of EAGLE, including all models from the Vicuna and LLaMA2-Chat series, the MoE model Mixtral 8x7B Instruct, and tasks in dialogue, code generation, mathematical reasoning, and instruction following. For LLaMA2-Chat 70B, EAGLE achieved a latency speedup ratio of 2.7x-3.5x, doubled throughput, while maintaining the distribution of the generated text.","external_url":"https://arxiv.org/abs/2401.15077","cited_by_count":null,"metadata_source":"pith","metadata_fetched_at":"2026-05-25T05:06:38.507278+00:00","pith_arxiv_id":"2401.15077","created_at":"2026-05-09T05:55:31.588570+00:00","updated_at":"2026-06-05T21:23:00.469572+00:00","title_quality_ok":true,"display_title":"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty","render_title":"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"},"hub":{"state":{"work_id":"9d4637dd-1cab-4f10-82d4-8c8d14bb96ed","tier":"hub","tier_reason":"10+ Pith inbound or 1,000+ external citations","pith_inbound_count":34,"external_cited_by_count":null,"distinct_field_count":9,"first_pith_cited_at":"2024-02-19T17:04:04+00:00","last_pith_cited_at":"2026-05-22T02:31:32+00:00","author_build_status":"not_needed","summary_status":"needed","contexts_status":"needed","graph_status":"needed","ask_index_status":"not_needed","reader_status":"not_needed","recognition_status":"not_needed","updated_at":"2026-06-11T06:47:40.199769+00:00","tier_text":"hub"},"tier":"hub","role_counts":[{"context_role":"background","n":8},{"context_role":"method","n":2}],"polarity_counts":[{"context_polarity":"background","n":8},{"context_polarity":"use_method","n":2}],"runs":{},"summary":{},"graph":{},"authors":[]}}