AIDev is a new open dataset of 456k AI-agent pull requests showing agents submit code faster than humans but with lower acceptance rates and simpler changes.
Feedbackeval: A benchmark for evaluating large language models in feedback-driven code repair tasks
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
An external controller for frozen LLMs raises strict validation success on three RL coding tasks from 0/9 to 8/9 by selecting memory records and skills, running fail-fast checks, and propagating credit via eligibility traces.
citing papers explorer
-
Agentic MIP Research: Accelerated Constraint Handler Generation
LLM agents in a solver-aware harness recover global constraints from MIP formulations, generate executable propagation-only handlers for SCIP, and solve five additional MIPLIB 2017 instances.