Structured Pruning Learns Compact and Accurate Models , booktitle =

Mengzhou Xia, Zexuan Zhong, Danqi Chen , editor = · 2022 · DOI 10.18653/v1/2022.acl-long.107

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

cs.CL · 2023-10-03 · conditional · novelty 6.0

FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

citing papers explorer

Showing 1 of 1 citing paper.

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs cs.CL · 2023-10-03 · conditional · none · ref 34
FastGen adaptively compresses LLM KV caches via lightweight attention profiling: evicting long-range contexts on local heads, non-special tokens on special-token heads, and retaining full caches on broad-attention heads, yielding substantial memory savings with negligible quality loss.

Structured Pruning Learns Compact and Accurate Models , booktitle =

fields

years

verdicts

representative citing papers

citing papers explorer