H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.
Cognitive Graph for Multi-Hop Reading Comprehension at Scale
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
We propose a new CogQA framework for multi-hop question answering in web-scale documents. Inspired by the dual process theory in cognitive science, the framework gradually builds a \textit{cognitive graph} in an iterative process by coordinating an implicit extraction module (System 1) and an explicit reasoning module (System 2). While giving accurate answers, our framework further provides explainable reasoning paths. Specifically, our implementation based on BERT and graph neural network efficiently handles millions of documents for multi-hop reasoning questions in the HotpotQA fullwiki dataset, achieving a winning joint $F_1$ score of 34.9 on the leaderboard, compared to 23.6 of the best competitor.
fields
cs.LG 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
H2O evicts non-heavy-hitter tokens from the KV cache using a dynamic submodular policy, retaining recent and frequent-co-occurrence tokens to reduce memory while preserving accuracy.