Title resolution pending

Efficient Streaming Language Models with Attention Sinks , author= · 2024

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.

Optimizer-Induced Mode Connectivity: From AdamW to Muon

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.

Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.

Metaphor Is Not All Attention Needs

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.

Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling

cs.AI · 2026-04-20

citing papers explorer

Showing 5 of 5 citing papers.

Self-Pruned Key-Value Attention: Learning When to Write by Predicting Future Utility cs.LG · 2026-05-13 · unverdicted · none · ref 32
SP-KV trains a utility predictor jointly with the LLM to dynamically prune low-utility KV cache entries, achieving 3-10x memory reduction during generation with negligible performance loss.
Optimizer-Induced Mode Connectivity: From AdamW to Muon cs.AI · 2026-05-11 · unverdicted · none · ref 42
Optimizer choice induces distinct connected regions in the loss landscape of two-layer ReLU networks, with AdamW and Muon sometimes separated by provable barriers.
Key-Value Means: Transformers with Expandable Block-Recurrent Compressed Memory cs.LG · 2026-05-11 · unverdicted · none · ref 16 · 3 links
KVM is a new block-recurrent compressed KV attention that turns transformers into O(N) chunked RNNs or growable sublinear-memory models while remaining implementable with standard operations.
Metaphor Is Not All Attention Needs cs.CL · 2026-05-12 · unverdicted · none · ref 15
Poetic jailbreaks succeed because they induce distinct attention patterns in LLMs that are independent of harmful-content detection, not because models fail to recognize literary formatting.
Stability Implies Redundancy: Delta Attention Selective Halting for Efficient Long-Context Prefilling cs.AI · 2026-04-20 · unreviewed · ref 7

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer