SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
IEEE Transactions on software Engineering SE-2(4), 308–320 (1976)
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
EnCoDe enables design-time prediction of block-level energy consumption in Python code via static features and ML models trained on a dataset from 18,000 programs, achieving R²=0.75 and 80.6% hotspot classification accuracy.
Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.
Autark is a serverless toolkit that enables rapid prototyping of urban visual analytics systems via domain-aware abstractions and supports more reliable LLM-assisted coding.
Full instantiation of a statistical effort modeling method for game resource localisation attacks on two use cases, confirming feasibility for MATE attack analysis.
A metadata framework modernizes legacy SAS clinical reporting for AI by adding a non-destructive wrapper layer, achieving 92% code reduction on consolidation and high report parity in validations.
citing papers explorer
-
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench reveals that even top language models like Claude 2 resolve only 1.96% of 2,294 real-world GitHub issues, highlighting a gap in practical coding capabilities.
-
EnCoDe: Energy Estimation of Source Code At Design-Time
EnCoDe enables design-time prediction of block-level energy consumption in Python code via static features and ML models trained on a dataset from 18,000 programs, achieving R²=0.75 and 80.6% hotspot classification accuracy.
-
Process-Centric Analysis of Agentic Software Systems
Graphectory turns stochastic agent trajectories into analyzable graphs, showing that stronger models and successful fixes follow coherent localization-validation steps while failures are chaotic, and online detection plus rollback improves resolution rates by 6.9-23.5%.
-
Autark: A Serverless Toolkit for Prototyping Urban Visual Analytics Systems
Autark is a serverless toolkit that enables rapid prototyping of urban visual analytics systems via domain-aware abstractions and supports more reliable LLM-assisted coding.
-
Statistical Effort Modelling of Game Resource Localisation Attacks
Full instantiation of a statistical effort modeling method for game resource localisation attacks on two use cases, confirming feasibility for MATE attack analysis.
-
A Non-Destructive Methodological Framework for Modernizing Legacy Clinical Reporting Systems for AI-Driven Pharmacoinformatics: A SAS Case Study
A metadata framework modernizes legacy SAS clinical reporting for AI by adding a non-destructive wrapper layer, achieving 92% code reduction on consolidation and high report parity in validations.