STAPO: Stabilizing Reinforcement Learning for LLMs by Silencing Rare Spurious Tokens cs.CL · 2026-02-17