pith. sign in

hub Canonical reference

Splitwise: Efficient generative llm inference using phase splitting

Canonical reference. 83% of citing Pith papers cite this work as background.

12 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 method 1

citation-polarity summary

representative citing papers

Sparse Prefix Caching for Hybrid and Recurrent LLM Serving

cs.LG · 2026-04-17 · unverdicted · novelty 7.0

Sparse prefix caching via dynamic programming for optimal checkpoint placement under overlap distributions improves the Pareto frontier for recurrent and hybrid LLM serving on shared-prefix data.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 12 of 12 citing papers.