Selective RoPE adds input-dependent rotations to generalize RoPE, showing implicit positional structure in softmax attention and improving performance on language modeling, copying, state tracking, and retrieval when added to gated transformers.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
Setting β in balanced Adam to achieve a refresh count R_β ≈1000 based on effective learning horizon T_ES improves validation robustness over fixed-β baselines across 11 vision and language experiments.
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.
citing papers explorer
-
Selective Rotary Position Embedding
Selective RoPE adds input-dependent rotations to generalize RoPE, showing implicit positional structure in softmax attention and improving performance on language modeling, copying, state tracking, and retrieval when added to gated transformers.
-
Refresh-Scaling the Memory of Balanced Adam
Setting β in balanced Adam to achieve a refresh count R_β ≈1000 based on effective learning horizon T_ES improves validation robustness over fixed-β baselines across 11 vision and language experiments.
-
Revisiting Adam for Streaming Reinforcement Learning
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.