A controlled multi-model evaluation on shared data subsets shows that deployment metrics and prompting choices create important tradeoffs and alter model rankings beyond accuracy alone.
Outrageously large neural networks: The sparsely-gated mixture-of- experts layer
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Unified Deployment-Aware Evaluation of Open Reasoning Language Models
A controlled multi-model evaluation on shared data subsets shows that deployment metrics and prompting choices create important tradeoffs and alter model rankings beyond accuracy alone.