SMT-LLM builds a constraint graph from PyPI metadata and AST-derived imports, solves it with Z3, and uses LLM imputation only when needed, resolving 83.6% of HG2.9K snippets versus PLLM's 54.8% while cutting median time by 6.3x and LLM calls by 11x.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 3roles
background 1polarities
background 1representative citing papers
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.
citing papers explorer
-
Breaking the Dependency Chaos: A Constraint-Driven Python Dependency Resolution Strategy with Selective LLM Imputation
SMT-LLM builds a constraint graph from PyPI metadata and AST-derived imports, solves it with Z3, and uses LLM imputation only when needed, resolving 83.6% of HG2.9K snippets versus PLLM's 54.8% while cutting median time by 6.3x and LLM calls by 11x.
-
Evaluating LLM Agents on Automated Software Analysis Tasks
A custom LLM agent achieves 94% manually verified success on a new benchmark of 35 software analysis setups, outperforming baselines at 77%, but struggles with stage mixing, error localization, and overestimating its own success.
-
Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
Empirical study of 3977 agent trajectories finds Python execution errors correlate with lower success rates on GitHub issues, flags challenging errors, and reports three confirmed bugs in the SWE-Bench platform.