Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
Generating executable oracles to check conformance of client code to requirements of
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2representative citing papers
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.
citing papers explorer
-
Sketch-and-Verify: Structured Inference-Time Scaling via Program Sketching
Sketch-and-Verify improves small-LLM code generation on HumanEval+ by factorizing search into K algorithmic sketches and M fillings each, outperforming flat sampling by up to 32 percentage points at matched budget while remaining cheaper than upgrading model tier.
-
Semantic Voting: Execution-Grounded Consensus for LLM Code Generation
Execution-based selectors for LLM code candidates outperform textual voting by large margins across configurations, with input generation quality mattering more than the specific aggregation rule.