Lazyformer: Self attention with lazy update, 2021 b

Ying, C · 2021 · arXiv 2102.12702

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing

cs.PF · 2026-04-20 · unverdicted · novelty 7.0

HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.

Transformer Neural Processes - Kernel Regression

cs.LG · 2024-11-19 · unverdicted · novelty 7.0

TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

HybridGen: Efficient LLM Generative Inference via CPU-GPU Hybrid Computing cs.PF · 2026-04-20 · unverdicted · none · ref 48
HybridGen achieves 1.41x-3.2x average speedups over six prior KV cache methods for LLM inference by using attention logit parallelism, a feedback-driven scheduler, and semantic-aware KV cache mapping.
Transformer Neural Processes - Kernel Regression cs.LG · 2024-11-19 · unverdicted · none · ref 37
TNP-KR adds a kernel regression transformer block, kernel attention bias, scan attention for translation invariance, and deep kernel attention to achieve lower complexity and state-of-the-art results on meta-regression and related benchmarks.

Lazyformer: Self attention with lazy update, 2021 b

fields

years

verdicts

representative citing papers

citing papers explorer