De-hallucinator: Mitigating llm hallucina- tions in code generation tasks via iterative grounding

Aryaz Eghbali, Michael Pradel · 2024 · arXiv 2401.01701

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

cs.SE · 2024-03-25 · conditional · novelty 8.0

RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.

MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

cs.SE · 2026-04-11 · conditional · novelty 7.0

MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.

Story Point Estimation Using Large Language Models

cs.SE · 2026-03-06 · unverdicted · novelty 7.0

LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

Task Abstention for Large Language Models in Code Generation

cs.SE · 2026-05-16 · unverdicted · novelty 6.0

A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.

uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs

cs.CR · 2026-05-15 · unverdicted · novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.

What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study

cs.SE · 2026-04-29 · unverdicted · novelty 6.0

Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

cs.SE · 2024-08-28 · unverdicted · novelty 6.0

MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.

Retrieval-Augmented Generation for AI-Generated Content: A Survey

cs.CV · 2024-02-29 · accept · novelty 5.0

A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.

citing papers explorer

Showing 9 of 9 citing papers.

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair cs.SE · 2024-03-25 · conditional · none · ref 35
RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.
MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis cs.SE · 2026-04-11 · conditional · none · ref 19
MR-Coupler leverages functional coupling analysis and LLMs to generate valid metamorphic test cases for over 90% of tasks while detecting 44% of real bugs, outperforming baselines by 64.90% in validity and 36.56% in false-alarm reduction.
Story Point Estimation Using Large Language Models cs.SE · 2026-03-06 · unverdicted · none · ref 25
LLMs predict story points better in zero-shot prompting than supervised deep learning models trained on 80% of project data, with few-shot examples and comparative judgments further improving performance.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 13
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Task Abstention for Large Language Models in Code Generation cs.SE · 2026-05-16 · unverdicted · none · ref 11
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs cs.CR · 2026-05-15 · unverdicted · none · ref 8
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study cs.SE · 2026-04-29 · unverdicted · none · ref 55
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing cs.SE · 2024-08-28 · unverdicted · none · ref 9
MR-Adopt deduces input transformations from hard-coded MR test cases using LLMs, data-flow refinement, and output-relation selection to enable reuse with new source inputs.
Retrieval-Augmented Generation for AI-Generated Content: A Survey cs.CV · 2024-02-29 · accept · none · ref 241
A survey classifying RAG foundations for AIGC, summarizing enhancements, cross-modal applications, benchmarks, limitations, and future directions.

De-hallucinator: Mitigating llm hallucina- tions in code generation tasks via iterative grounding

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer