ConCovUp uses static analysis to ground LLM test generation and backward tracing to produce concurrent test drivers that raise average shared-memory access pair coverage from 36.6% to 68.1% on nine real-world libraries.
Em-assist: Safe automated extractmethod refactoring with llms
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 8roles
background 3representative citing papers
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.
A perturbation method shows memorization advantage in code LLMs varies widely by model and task, remaining low on CVEFixes and Defects4J benchmarks.
CoCoMUT is a reusable pipeline that discovers project structure, constructs call graphs, extracts source, reconciles bytecode to source, and emits versioned JSON datasets of method contexts, demonstrated on 20 Java repositories with 97.8% reconciliation and 99% audit accuracy.
AI models generated nearly 16,000 lines of unit tests in hours and enabled safe large-scale refactoring with up to 78% branch coverage in a case study.
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.
Generative AI suitability in qualitative research depends primarily on the approach (small-q positivist/post-positivist or Big Q non-positivist) along with skills, ethics, and personal preferences.
citing papers explorer
-
ConCovUp: Effective Agent-Based Test Driver Generation for Concurrency Testing
ConCovUp uses static analysis to ground LLM test generation and backward tracing to produce concurrent test drivers that raise average shared-memory access pair coverage from 36.6% to 68.1% on nine real-world libraries.
-
Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software
LLMs propose volatile performance improvements on real-world Java tasks that lag human developers on average, showing algorithmic benchmarks overestimate capabilities.
-
Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics
GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.
-
Learned or Memorized ? Quantifying Memorization Advantage in Code LLMs
A perturbation method shows memorization advantage in code LLMs varies widely by model and task, remaining low on CVEFixes and Defects4J benchmarks.
-
CoCoMUT: A Tool for Code-Context Mining and Automated Dataset Generation
CoCoMUT is a reusable pipeline that discovers project structure, constructs call graphs, extracts source, reconciles bytecode to source, and emits versioned JSON datasets of method contexts, demonstrated on 20 Java repositories with 97.8% reconciliation and 99% audit accuracy.
-
A Blueprint for AI-Driven Software Quality: Integrating LLMs with Established Standards
Survey mapping LLM applications in software quality assurance to established standards including ISO/IEC 12207, ISO 25010, CMMI, and TMM, with case studies, challenges, and future directions.
-
To Vibe Research or Not to Vibe Research? Generative AI in Qualitative Research
Generative AI suitability in qualitative research depends primarily on the approach (small-q positivist/post-positivist or Big Q non-positivist) along with skills, ethics, and personal preferences.