Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
ULPT optimizes prompts in ultra-low dimensions with frozen random up-projection to cut training parameters by 98% while matching vanilla prompt tuning performance on NLP tasks.
citing papers explorer
-
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
Post-training on reasoning tasks sparks the emergence of specialized attention heads that enable structured computation, with SFT adding stable heads while GRPO uses dynamic activation and pruning tied to reward signals, and controllable think models relying on compensatory heads instead of specific
-
Ultra-Low-Dimensional Prompt Tuning via Random Projection
ULPT optimizes prompts in ultra-low dimensions with frozen random up-projection to cut training parameters by 98% while matching vanilla prompt tuning performance on NLP tasks.