SpecPylot generates and validates icontract specifications for Python programs by combining LLM proposals with Crosshair symbolic execution feedback.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 3roles
background 1polarities
background 1representative citing papers
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
citing papers explorer
-
SpecPylot: Python Specification Generation using Large Language Models
SpecPylot generates and validates icontract specifications for Python programs by combining LLM proposals with Crosshair symbolic execution feedback.
-
Does Pass Rate Tell the Whole Story? Evaluating Design Constraint Compliance in LLM-based Issue Resolution
LLM agents resolve fewer than half of issues while satisfying design constraints despite passing tests, as shown by a benchmark of 495 issues and 1787 constraints from six repositories.
-
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.