SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable performance across 16 datasets.
How long can context length of open-source llms truly promise? In NeurIPS 2023 Workshop on Instruction Tuning and Instruction Following
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
dataset 1
citation-polarity summary
years
2024 2representative citing papers
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
citing papers explorer
-
SnapKV: LLM Knows What You are Looking for Before Generation
SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable performance across 16 datasets.
-
A Survey on the Memory Mechanism of Large Language Model based Agents
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.