pith. sign in

Recoverable Identifier

arXiv:2604.25578 · detector doi_compliance · incontrovertible · 2026-05-19 20:57:08.590593+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.18653/v1/2024.findings-naacl.149.URL) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

URL https://arxiv.org/abs/2506.05176. Y. Zhang, M. Konomi, C. Xypolopoulos, K. Divriotis, K. Skianis, G. Nikolentzos, G. Stamou, G. Shang, and M. Vazirgiannis. Greekmmlu: A native-sourced multitask benchmark for evaluating language models in greek, 2026. URL https://arxiv.org/abs/2602.05150. W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, and N. Duan. AGIEval: A human-centric benchmark for evaluating foundation models. In K. Duh, H. Gomez, and S. Bethard, editors,Findings of the Association for Computational Linguistics: NAACL 2024, pages 2299–2314, Mexico City, Mexico, June 2024. Association for Computational Linguistics. doi: 10.18653/v1/ 2024.findings-naacl.149. URL https://aclanthology.org/2024.findings-naacl.149/. F. Zhou, Z. Wang, N. Ranjan, Z. Cheng, L. Tang, G. He, Z. Liu, and E. P. Xing. Megamath: Pushing the limits of open math corpora.arXiv preprint arXiv:2504.02807, 2025. Preprint. Z. Zhu, C. Xie, X. Lv, and slime Contributors. slime: An llm post-training framework for rl scaling. https://github.com/THUDM/slime, 2025. GitHub repository. Corresponding author: Xin Lv. B. Zoph, I. Bello, S. Kumar, N. Du, Y. Huang, J. Dean, N. Shazeer, and W. Fedus. St-moe: Designing stable and transferable sparse expert models, 2022. URL https://arxiv.org/abs/2202.08906. 36 Marco-MoE : Open Multilingual Mixture-of-Expert Language Models with Efficient Upcycling A. Per-Benchmark Results across Pre-training Phases Benchmark (Metric) Stage-1 Stage-2 Stage-3 Stage-

Evidence payload

{
  "printed_excerpt": "URL https://arxiv.org/abs/2506.05176. Y. Zhang, M. Konomi, C. Xypolopoulos, K. Divriotis, K. Skianis, G. Nikolentzos, G. Stamou, G. Shang, and M. Vazirgiannis. Greekmmlu: A native-sourced multitask benchmark for evaluating language models i",
  "reconstructed_doi": "10.18653/v1/2024.findings-naacl.149.URL",
  "ref_index": 10,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}