← back to paper
arxiv: 2605.03546 · 2 revisions
ProgramBench: Can Language Models Rebuild Programs From Scratch?