KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
Can language models solve olympiad programming?
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.
An open-source MoE code model matches GPT-4 Turbo on coding and math benchmarks while expanding to 338 languages and 128K context length.
CodeGolf Bench is a dynamic benchmark for LLM concise code generation in 60 languages, showing reasoning models reach 70.97% average human percentile on Python and C++ tasks while non-reasoning models lag.
AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.
citing papers explorer
-
KernelBench: Can LLMs Write Efficient GPU Kernels?
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.