NAACL , year=

Dynabench: Rethinking benchmarking in NLP , author=

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

MathDuels: Evaluating LLMs as Problem Posers and Solvers

cs.CL · 2026-04-23 · unverdicted · novelty 7.0

Self-play between LLMs for problem authoring and solving, scored via Rasch modeling, shows that authoring and solving skills are partially decoupled and that the benchmark difficulty evolves with new models.

citing papers explorer

Showing 1 of 1 citing paper.

MathDuels: Evaluating LLMs as Problem Posers and Solvers cs.CL · 2026-04-23 · unverdicted · none · ref 47
Self-play between LLMs for problem authoring and solving, scored via Rasch modeling, shows that authoring and solving skills are partially decoupled and that the benchmark difficulty evolves with new models.

NAACL , year=

fields

years

verdicts

representative citing papers

citing papers explorer