Delta Attention Residuals attend over per-sublayer deltas instead of cumulative hidden states, producing higher-contrast attention weights and 1.7-8.2% validation perplexity gains over standard and attention residuals across 220M-7.6B models.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.
citing papers explorer
-
Delta Attention Residuals
Delta Attention Residuals attend over per-sublayer deltas instead of cumulative hidden states, producing higher-contrast attention weights and 1.7-8.2% validation perplexity gains over standard and attention residuals across 220M-7.6B models.
-
Refusal in Language Models Is Mediated by a Single Direction
Refusal in language models is mediated by a single direction in residual stream activations that can be erased to disable safety or added to elicit refusal.
-
BoostAPR: Boosting Automated Program Repair via Execution-Grounded Reinforcement Learning with Dual Reward Models
BoostAPR boosts automated program repair by training a sequence-level assessor and line-level credit allocator from execution outcomes, then applying them in PPO to reach 40.7% on SWE-bench Verified.