Livecodebench pro: How do olympiad medalists judge llms in competitive programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, et al · 2025 · arXiv 2506.11928

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

dataset 1

citation-polarity summary

use dataset 1

representative citing papers

Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language

cs.CL · 2026-05-15 · unverdicted · novelty 7.0

Fine-tuning LLMs on an unseen language teaches syntax but fails to transfer semantic competence, leaving Python with up to a 19% performance advantage and no tested intervention closing the gap.

CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation

cs.SE · 2026-04-14 · accept · novelty 7.0

CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

cs.CL · 2026-03-23 · conditional · novelty 7.0

TESSY creates stylistically consistent synthetic data via teacher-student token interleaving, yielding 11.25% and 6.68% gains on code benchmarks where pure teacher data causes 3.25% and 10.02% drops.

OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation

cs.AI · 2026-05-14 · conditional · novelty 6.0 · 2 refs

OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.

When Independent Sampling Outperforms Agentic Reasoning

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

On Codeforces problems, independent k-shot sampling achieves better accuracy-cost and accuracy-query tradeoffs than agentic reasoning, even with prompt caching.

Seed1.8 Model Card: Towards Generalized Real-World Agency

cs.AI · 2026-03-21 · unverdicted · novelty 5.0

Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

citing papers explorer

Showing 6 of 6 citing papers.

Syntax Without Semantics: Teaching Large Language Models to Code in an Unseen Language cs.CL · 2026-05-15 · unverdicted · none · ref 14
Fine-tuning LLMs on an unseen language teaches syntax but fails to transfer semantic competence, leaving Python with up to a 19% performance advantage and no tested intervention closing the gap.
CodeSpecBench: Benchmarking LLMs for Executable Behavioral Specification Generation cs.SE · 2026-04-14 · accept · none · ref 42
CodeSpecBench shows LLMs achieve at most 20.2% pass rate on repository-level executable behavioral specification generation, revealing that strong code generation does not imply deep semantic understanding.
How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data cs.CL · 2026-03-23 · conditional · none · ref 50
TESSY creates stylistically consistent synthetic data via teacher-student token interleaving, yielding 11.25% and 6.68% gains on code benchmarks where pure teacher data causes 3.25% and 10.02% drops.
OpenDeepThink: Parallel Reasoning via Bradley-Terry Aggregation cs.AI · 2026-05-14 · conditional · none · ref 23 · 2 links
OpenDeepThink uses Bradley-Terry aggregation of LLM pairwise judgments to rank and evolve parallel reasoning traces, improving Gemini 3.1 Pro Codeforces Elo by 405 points over eight rounds.
When Independent Sampling Outperforms Agentic Reasoning cs.LG · 2026-05-08 · unverdicted · none · ref 24
On Codeforces problems, independent k-shot sampling achieves better accuracy-cost and accuracy-query tradeoffs than agentic reasoning, even with prompt caching.
Seed1.8 Model Card: Towards Generalized Real-World Agency cs.AI · 2026-03-21 · unverdicted · none · ref 93
Seed1.8 is a new foundation model that adds unified agentic capabilities for search, code execution, and GUI interaction to existing LLM and vision strengths.

Livecodebench pro: How do olympiad medalists judge llms in competitive programming?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer