RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.
De-hallucinator: Mitigating llm hallucina- tions in code generation tasks via iterative grounding
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.
citing papers explorer
-
RepairAgent: An Autonomous, LLM-Based Agent for Program Repair
RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.
-
MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
-
Story Point Estimation Using Large Language Models
LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
-
Task Abstention for Large Language Models in Code Generation
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
-
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
-
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
-
MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
-
Retrieval-Augmented Generation for AI-Generated Content: A Survey
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.