Title resolution pending

H _2 O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models , author= · 2023

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

cs.CL · 2024-06-16 · unverdicted · novelty 6.0

Quest speeds up long-context LLM self-attention by up to 2.23x via query-dependent selection of top-K critical KV cache pages, cutting overall latency by 7.03x with negligible accuracy loss.

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

cs.AI · 2026-04-20

citing papers explorer

Showing 2 of 2 citing papers.

Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference cs.CL · 2024-06-16 · unverdicted · none · ref 54
Quest speeds up long-context LLM self-attention by up to 2.23x via query-dependent selection of top-K critical KV cache pages, cutting overall latency by 7.03x with negligible accuracy loss.
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling cs.AI · 2026-04-20 · unreviewed · ref 1

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer