LibEvoBench benchmark shows LLMs are version-oblivious on evolving APIs, with documentation helping but version specification not.
LLM Hallucinations in Practical Code Generation : Phenomena , Mechanism , and Mitigation
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 5roles
background 1polarities
background 1representative citing papers
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
EyeMulator augments CodeLLM fine-tuning loss with token weights derived from human eye-tracking scan paths, producing large gains on code translation and summarization across StarCoder, Llama-3.2 and DeepSeek-Coder.
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.
citing papers explorer
-
Bridging the Gap on AI-Assisted Scientific Software Development Through Transparency and Traceability
Proposes guidance for responsible AI use in scientific software development under NQA-1 standards, illustrated with TMAP8 V&V cases to ensure accountability and auditability.