{"paper":{"title":"SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding","license":"http://creativecommons.org/licenses/by/4.0/","headline":"SPEED-Bench establishes a unified benchmark for speculative decoding that covers diverse semantic domains, throughput across concurrencies, and integration with production engines.","cross_cats":["cs.AI"],"primary_cat":"cs.DC","authors_text":"Benjamin Chislett, Bita Darvish Rouhani, Izzy Putterman, Maor Ashkenazi, Ran Zilberstein, Talor Abramovich, Tiyasa Mitra, Yonatan Geifman","submitted_at":"2026-02-10T16:19:56Z","abstract_excerpt":"Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, and a reliance on high-level implementations that fail to reflect production environments. To address this, we introduce SPEED-Bench, a comprehensive suite designed to standard"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"SPEED-Bench establishes a unified evaluation standard for practical comparisons of SD algorithms by offering diverse semantic domains, throughput splits across concurrencies, and integration with production engines like vLLM and TensorRT-LLM.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the qualitative data split, curated by prioritizing semantic diversity across samples, sufficiently represents real-world workloads and that integration with the named production engines accurately exposes behaviors masked by other benchmarks.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"SPEED-Bench establishes a unified benchmark for speculative decoding that covers diverse semantic domains, throughput across concurrencies, and integration with production engines.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"175995a3c54657284e858c8b6d83def9ca9c5dc6f75644999908b5535f472656"},"source":{"id":"2604.09557","kind":"arxiv","version":2},"verdict":{"id":"8782035f-fd2d-4952-8abc-2d3819e344e0","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T03:05:43.307851Z","strongest_claim":"SPEED-Bench establishes a unified evaluation standard for practical comparisons of SD algorithms by offering diverse semantic domains, throughput splits across concurrencies, and integration with production engines like vLLM and TensorRT-LLM.","one_line_summary":"SPEED-Bench is a new standardized benchmark for speculative decoding that supplies semantically diverse qualitative data and throughput-oriented splits across concurrency levels, integrated with vLLM and TensorRT-LLM.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the qualitative data split, curated by prioritizing semantic diversity across samples, sufficiently represents real-world workloads and that integration with the named production engines accurately exposes behaviors masked by other benchmarks.","pith_extraction_headline":"SPEED-Bench establishes a unified benchmark for speculative decoding that covers diverse semantic domains, throughput across concurrencies, and integration with production engines."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2604.09557/integrity.json","findings":[],"available":true,"detectors_run":[],"snapshot_sha256":"c28c3603d3b5d939e8dc4c7e95fa8dfce3d595e45f758748cecf8e644a296938"},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}