arXiv preprint arXiv:2305.14591 , year =

URLhttps://arxiv · arXiv 2305.14591

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models

cs.LG · 2025-10-16 · unverdicted · novelty 5.0

GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.

citing papers explorer

Showing 2 of 2 citing papers.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 110
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.
Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models cs.LG · 2025-10-16 · unverdicted · none · ref 23
GenCluster scales test-time compute via large-scale generation, behavioral clustering, ranking, and round-robin submission to achieve IOI gold medal performance with the open-weight gpt-oss-120b model.

arXiv preprint arXiv:2305.14591 , year =

fields

years

verdicts

representative citing papers

citing papers explorer