ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
International conference on machine learning , pages=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.
citing papers explorer
-
ZAYA1-8B Technical Report
ZAYA1-8B is a reasoning MoE model with 700M active parameters that matches larger models on math and coding benchmarks and reaches 91.9% on AIME'25 via Markovian RSA test-time compute.
-
MoBA: Mixture of Block Attention for Long-Context LLMs
MoBA routes attention over blocks via MoE-style gating to enable dynamic, bias-light long-context attention that matches full attention performance at lower cost.