First empirical study shows crate hallucination in Rust LLMs has consistent rates across models insensitive to parameters and tests prompt-based mitigation.
CodeMirage : Hallucinations in Code Generated by Large Language Models
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
FASE approximates functional correctness via MST on structural and semantic dissimilarity graphs, reporting 25% better Spearman correlation and 19% better ROCAUC than LLM-based semantic entropy at 0.3% runtime cost on HumanEval and BigCodeBench.
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
MAFIG uses a Perception Agent and Emergency Decision Agent plus span-focused local distillation to let lightweight models rapidly generate formal instructions that fix local scheduling failures, achieving over 94% success with sub-second latency on port, warehousing, and deck datasets.
Vibe coding enables clinicians to prototype digital health tools by prompting LLMs in natural language, democratizing bespoke software development.
citing papers explorer
-
When LLMs Invent Rust Crates: An Empirical Study of Hallucination Patterns and Mitigation
First empirical study shows crate hallucination in Rust LLMs has consistent rates across models insensitive to parameters and tests prompt-based mitigation.
-
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
-
FASE: Fast Adaptive Semantic Entropy for Code Quality
FASE approximates functional correctness via MST on structural and semantic dissimilarity graphs, reporting 25% better Spearman correlation and 19% better ROCAUC than LLM-based semantic entropy at 0.3% runtime cost on HumanEval and BigCodeBench.
-
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
-
ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods
ALL-FEM fine-tunes LLMs on a corpus of verified FEniCS scripts and uses multi-agent workflows to automate finite element code generation, achieving 71.79% success on 39 benchmarks across elasticity, flow, and coupled problems.
-
MAFIG: Multi-agent Driven Formal Instruction Generation Framework
MAFIG uses a Perception Agent and Emergency Decision Agent plus span-focused local distillation to let lightweight models rapidly generate formal instructions that fix local scheduling failures, achieving over 94% success with sub-second latency on port, warehousing, and deck datasets.
-
Vibe coding for clinicians: democratising bespoke software development for digital health innovation
Vibe coding enables clinicians to prototype digital health tools by prompting LLMs in natural language, democratizing bespoke software development.