A multi-language object-oriented programming benchmark for large language models.arXiv preprint arXiv:2509.26111, 2025

Shuai Wang, Liang Ding, Li Shen, Yong Luo, Han Hu, Lefei Zhang, Fu Lin · 2025 · arXiv 2509.26111

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

MirrorCode: AI can rebuild entire programs from behavior alone

cs.AI · 2026-06-29 · unverdicted · novelty 7.0

MirrorCode benchmark shows current AI models achieving up to 56% success reimplementing 25 diverse full programs from behavior alone, including a 16,000-line bioinformatics toolkit.

citing papers explorer

Showing 1 of 1 citing paper.

MirrorCode: AI can rebuild entire programs from behavior alone cs.AI · 2026-06-29 · unverdicted · none · ref 21
MirrorCode benchmark shows current AI models achieving up to 56% success reimplementing 25 diverse full programs from behavior alone, including a 16,000-line bioinformatics toolkit.

A multi-language object-oriented programming benchmark for large language models.arXiv preprint arXiv:2509.26111, 2025

fields

years

verdicts

representative citing papers

citing papers explorer