KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
Can language models solve olympiad programming?arXiv preprint arXiv:2404.10952, 2024
5 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.
An open-source MoE code model matches GPT-4 Turbo on coding and math benchmarks while expanding to 338 languages and 128K context length.
AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.
citing papers explorer
-
KernelBench: Can LLMs Write Efficient GPU Kernels?
KernelBench shows that even the best current LLMs generate correct and faster-than-baseline GPU kernels in fewer than 20 percent of realistic ML workloads.
-
Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution
Solvita is an agentic evolution system using Planner, Solver, Oracle, and Hacker agents with trainable graph knowledge networks updated by reinforcement learning on pass/fail and vulnerability signals to achieve SOTA code generation performance.
-
Conversation for Non-verifiable Learning: Self-Evolving LLMs through Meta-Evaluation
CoNL lets LLMs self-improve on non-verifiable tasks by rewarding critiques that produce better solutions in multi-agent conversations, jointly optimizing generation and judging without external feedback.
-
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
An open-source MoE code model matches GPT-4 Turbo on coding and math benchmarks while expanding to 338 languages and 128K context length.
-
AgentCrypt: Advancing Privacy and (Secure) Computation in AI Agent Collaboration
AgentCrypt introduces a deterministic three-tier privacy framework for AI agent collaboration that uses masking and homomorphic encryption to protect data independently of model accuracy.