MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

· 2024 · arXiv 2406.06565

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

cs.CL · 2026-05-29 · unverdicted · novelty 3.0

Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.

citing papers explorer

Showing 1 of 1 citing paper.

Mellum2 Technical Report cs.CL · 2026-05-29 · unverdicted · none · ref 51
Mellum 2 is a 12B MoE model with 2.5B active parameters, trained on 10.6T tokens with MoE, GQA, SWA, and MTP, then post-trained into Instruct and Thinking variants, claimed competitive with 4B-14B models at 2.5B compute.

MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

fields

years

verdicts

representative citing papers

citing papers explorer