Efficient memory manage- ment for large language model serving with pagedattention,

· 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Improving Speech Recognition of Named Entities in Classroom Speech with LLM Revision and Phonetic-Semantic Context

cs.CL · 2025-06-12 · unverdicted · novelty 6.0

An LLM-based revision method with phonetic-semantic context reduces named entity word error rate by up to 30% relative on a new 45-hour MIT classroom speech dataset.

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

cs.CL · 2026-03-17 · unverdicted · novelty 5.0

WAND adapts AR-TTS models to constant complexity via windowed attention and distillation, cutting KV cache memory by up to 66.2% while preserving quality and achieving length-invariant latency.

citing papers explorer

Showing 2 of 2 citing papers.

Improving Speech Recognition of Named Entities in Classroom Speech with LLM Revision and Phonetic-Semantic Context cs.CL · 2025-06-12 · unverdicted · none · ref 34
An LLM-based revision method with phonetic-semantic context reduces named entity word error rate by up to 30% relative on a new 45-hour MIT classroom speech dataset.
WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models cs.CL · 2026-03-17 · unverdicted · none · ref 28
WAND adapts AR-TTS models to constant complexity via windowed attention and distillation, cutting KV cache memory by up to 66.2% while preserving quality and achieving length-invariant latency.

Efficient memory manage- ment for large language model serving with pagedattention,

fields

years

verdicts

representative citing papers

citing papers explorer