DASH assigns segment-level credit in reasoning traces using drift toward ground-truth answers, yielding 50.8% accuracy on AIME25 versus 45.4% for GRPO while reducing overthinking behaviors.
Emily Pronin, Daniel Y Lin, and Lee Ross
7 Pith papers cite this work. Polarity classification is still indexing.
years
2026 7representative citing papers
Post-training quantization increases overthinking errors in reasoning models; a logit penalty on curated overthinking markers reduces CoT length 12-23% without accuracy loss.
GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
Reinforced Mode Regulation (RMR) applies low-rank damping to the Transformer value cache to prevent geometric collapse and enable stable autoregressive generation at entropy rates as low as 0.8 nats/step.
Reasoning SFT generalizes cross-domain conditionally on sufficient optimization, high-quality long-CoT data, and strong base models, while degrading safety.
An exploratory red-teaming study documents eleven cases of security, privacy, and governance failures in autonomous language-model agents with tool access and persistent memory.
citing papers explorer
No citing papers match the current filters.