CodeMirage : Hallucinations in Code Generated by Large Language Models

Agarwal, V · 2024 · arXiv 2408.08333

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation

cs.SE · 2026-06-07 · unverdicted · novelty 7.0

First empirical study shows crate hallucination in Rust LLMs has consistent rates across models insensitive to parameters and tests prompt-based mitigation.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

FASE: Fast Adaptive Semantic Entropy for Code Quality

cs.SE · 2026-06-08 · unverdicted · novelty 6.0

FASE approximates functional correctness via MST on structural and semantic dissimilarity graphs, reporting 25% better Spearman correlation and 19% better ROCAUC than LLM-based semantic entropy at 0.3% runtime cost on HumanEval and BigCodeBench.

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

cs.SE · 2026-05-06 · accept · novelty 6.0

A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.

ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods

cs.CE · 2026-01-08 · unverdicted · novelty 6.0

ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.

MAFIG: Multi-agent Driven Formal Instruction Generation Framework

cs.AI · 2026-04-13 · unverdicted · novelty 5.0

MAFIG uses a Perception Agent and Emergency Decision Agent plus span-focused local distillation to let lightweight models rapidly generate formal instructions that fix local scheduling failures, achieving over 94% success with sub-second latency on port, warehousing, and deck datasets.

Vibe coding for clinicians: democratising bespoke software development for digital health innovation

cs.HC · 2026-04-24 · unverdicted · novelty 3.0

Vibe coding enables clinicians to prototype digital health tools by prompting LLMs in natural language, democratizing bespoke software development.

citing papers explorer

Showing 7 of 7 citing papers.

When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation cs.SE · 2026-06-07 · unverdicted · none · ref 1
First empirical study shows crate hallucination in Rust LLMs has consistent rates across models insensitive to parameters and tests prompt-based mitigation.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 2
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
FASE: Fast Adaptive Semantic Entropy for Code Quality cs.SE · 2026-06-08 · unverdicted · none · ref 1
FASE approximates functional correctness via MST on structural and semantic dissimilarity graphs, reporting 25% better Spearman correlation and 19% better ROCAUC than LLM-based semantic entropy at 0.3% runtime cost on HumanEval and BigCodeBench.
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code cs.SE · 2026-05-06 · accept · none · ref 3
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods cs.CE · 2026-01-08 · unverdicted · none · ref 11
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
MAFIG: Multi-agent Driven Formal Instruction Generation Framework cs.AI · 2026-04-13 · unverdicted · none · ref 36
MAFIG uses a Perception Agent and Emergency Decision Agent plus span-focused local distillation to let lightweight models rapidly generate formal instructions that fix local scheduling failures, achieving over 94% success with sub-second latency on port, warehousing, and deck datasets.
Vibe coding for clinicians: democratising bespoke software development for digital health innovation cs.HC · 2026-04-24 · unverdicted · none · ref 18
Vibe coding enables clinicians to prototype digital health tools by prompting LLMs in natural language, democratizing bespoke software development.

CodeMirage : Hallucinations in Code Generated by Large Language Models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer