Hardtests: Synthesizing high-quality test cases for llm coding

Zhongmou He, Yee Man Choi, Kexun Zhang, Jiabao Ji, Junting Zhou, Dejia Xu, Ivan Bercovich, Aidan Zhang, Lei Li · 2025 · arXiv 2505.24098

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale

cs.LG · 2026-05-14 · conditional · novelty 7.0

FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.

Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems

cs.RO · 2026-04-22 · unverdicted · novelty 6.0

Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

citing papers explorer

Showing 3 of 3 citing papers.

FrontierSmith: Synthesizing Open-Ended Coding Problems at Scale cs.LG · 2026-05-14 · conditional · none · ref 10
FrontierSmith automates synthesis of open-ended coding problems from closed-ended seeds and shows measurable gains on two open-ended LLM coding benchmarks.
VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation cs.SE · 2026-05-08 · unverdicted · none · ref 18
VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.
Navigating the Clutter: Waypoint-Based Bi-Level Planning for Multi-Robot Systems cs.RO · 2026-04-22 · unverdicted · none · ref 118
Waypoint-based bi-level planning with curriculum RLVR improves multi-robot task success rates in dense-obstacle benchmarks over motion-agnostic and VLA baselines.

Hardtests: Synthesizing high-quality test cases for llm coding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer