Introduces rewriting categories to formalize proof equivariance and success invariance, shows LLM provers violate both, and demonstrates test-time aggregation recovers invariance and boosts performance.
Consistency training helps stop sycophancy and jailbreaks
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
citing papers explorer
-
What are the Right Symmetries for Formal Theorem Proving?
Introduces rewriting categories to formalize proof equivariance and success invariance, shows LLM provers violate both, and demonstrates test-time aggregation recovers invariance and boosts performance.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
- Self-Mined Hardness for Safety Fine-Tuning