Title resolution pending

URLhttps://arxiv · 2025 · arXiv 2507.02559

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers

cs.LG · 2025-10-27 · unverdicted · novelty 7.0

One of the Q, K or V weights in transformer self-attention is redundant and replaceable by the identity matrix under mild assumptions, reducing parameters by 25 percent with no loss in small-model performance.

Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers

cs.LG · 2026-02-11 · unverdicted · novelty 6.0

TaperNorm gradually removes internal normalization in pre-norm transformers via learned gates that reach zero, revealing final norm as a scale anchor and enabling up to 1.18x faster KV-cached decoding with small loss increases.

Selective Neuron Amplification in Transformer Language Models

cs.LG · 2026-04-08 · unverdicted · novelty 5.0 · 2 refs

Selective Neuron Amplification boosts task-relevant neurons during inference to improve uncertain outputs in language models.

citing papers explorer

Showing 3 of 3 citing papers.

Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Self-Attention Transformers cs.LG · 2025-10-27 · unverdicted · none · ref 3
One of the Q, K or V weights in transformer self-attention is redundant and replaceable by the identity matrix under mild assumptions, reducing parameters by 25 percent with no loss in small-model performance.
Gated Normalization Removal and Scale Anchoring in Pre-Norm Transformers cs.LG · 2026-02-11 · unverdicted · none · ref 2
TaperNorm gradually removes internal normalization in pre-norm transformers via learned gates that reach zero, revealing final norm as a scale anchor and enabling up to 1.18x faster KV-cached decoding with small loss increases.
Selective Neuron Amplification in Transformer Language Models cs.LG · 2026-04-08 · unverdicted · none · ref 1 · 2 links
Selective Neuron Amplification boosts task-relevant neurons during inference to improve uncertain outputs in language models.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer