arXiv preprint arXiv:2508.17580 , year=

UQ: Assessing Language Models on Unsolved Questions , author= · arXiv 2508.17580

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval

cs.AI · 2026-04-20 · accept · novelty 8.0

MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmentation yielding up to 12% gains.

citing papers explorer

Showing 1 of 1 citing paper.

MathNet: a Global Multimodal Benchmark for Mathematical Reasoning and Retrieval cs.AI · 2026-04-20 · accept · none · ref 55
MathNet delivers the largest multilingual Olympiad math dataset and benchmarks where models like Gemini-3.1-Pro reach 78% on solving but embedding models struggle on equivalent problem retrieval, with retrieval augmentation yielding up to 12% gains.

arXiv preprint arXiv:2508.17580 , year=

fields

years

verdicts

representative citing papers

citing papers explorer