GitHub Actions workflows achieve only 28% overall compliance with best practices, with LLMs enabling an 81% reduction in verification effort via hybrid adjudication but still requiring expert oversight for security judgments.
Emily Pronin, Daniel Y Lin, and Lee Ross
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
Reasoning SFT generalizes cross-domain conditionally on sufficient optimization, high-quality long-CoT data, and strong base models, while degrading safety.
An exploratory red-teaming study documents eleven cases of security, privacy, and governance failures in autonomous language-model agents with tool access and persistent memory.
citing papers explorer
-
Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability
Reasoning SFT generalizes cross-domain conditionally on sufficient optimization, high-quality long-CoT data, and strong base models, while degrading safety.
-
Agents of Chaos
An exploratory red-teaming study documents eleven cases of security, privacy, and governance failures in autonomous language-model agents with tool access and persistent memory.