STRATAGEM uses a Reasoning Transferability Coefficient and Reasoning Evolution Reward in game self-play to promote domain-agnostic reasoning in language models, yielding gains on math, general reasoning, and code benchmarks.
Alternating Turn Structure.In our formulation, players take turns rather than acting simultaneously
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Stratagem: Learning Transferable Reasoning via Trajectory-Modulated Game Self-Play
STRATAGEM uses a Reasoning Transferability Coefficient and Reasoning Evolution Reward in game self-play to promote domain-agnostic reasoning in language models, yielding gains on math, general reasoning, and code benchmarks.