hub

Exploring and evaluating hallucinations in llm-powered code generation

Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, Yuchi Ma · 2024 · arXiv 2404.00971

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation

cs.SE · 2026-04-03 · unverdicted · novelty 7.0

Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

Uncertainty Quantification for LLM-based Code Generation

cs.SE · 2026-05-12 · unverdicted · novelty 6.0

RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.

SOMA: Efficient Multi-turn LLM Serving via Small Language Model

cs.CL · 2026-05-11 · unverdicted · novelty 6.0

SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.

Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code

cs.SE · 2026-05-06 · accept · novelty 6.0

A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.

iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation

cs.SE · 2026-04-21 · conditional · novelty 6.0

iCoRe improves Fail-to-Pass rates to 42.0% and 52.8% on two bug reproduction benchmarks by using correlation-aware iterative retrieval instead of standard semantic or BM25 methods.

SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization

cs.AI · 2026-04-19 · unverdicted · novelty 6.0

SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.

ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files

cs.SE · 2026-02-28 · unverdicted · novelty 6.0

ContextCov compiles agent instruction files into static, runtime, and architectural guardrails, raising constraint compliance to 88.3% on SWE-bench Lite tasks versus 67% and 50.3% for prompt and reflection baselines.

FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data

cs.AI · 2025-10-29 · unverdicted · novelty 6.0

FELA deploys specialized LLM agents in an evolutionary framework to generate, validate, and refine explainable features from heterogeneous industrial event logs, improving downstream model performance.

Multi-LLM Orchestration for High-Quality Code Generation: Exploiting Complementary Model Strengths

cs.SE · 2025-10-01 · conditional · novelty 6.0

PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.

A Large Language Model Approach to Generating Bypass Rules for Malware Evasion in Analysis Sandbox

cs.CR · 2026-05-20 · unverdicted · novelty 5.0

ABLE uses LLMs with sanitization and iterative refinement to generate bypass YARA rules from malware traces, achieving 79% success on 334 samples and 47% more family detections.

ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding

cs.DC · 2026-04-26 · unverdicted · novelty 5.0

ClusterFusion++ fuses the entire Transformer block (LayerNorm to residual) via CUDA extensions and achieves 1.34x throughput on Pythia-2.8B with near-identical output fidelity.

Can LLMs be Effective Code Contributors? A Study on Open-source Projects

cs.SE · 2026-04-25 · unverdicted · novelty 5.0

LLMs achieve only 0-60% success when asked to contribute code to sizable open-source projects, often failing basic checks or simply repeating training data.

Context-Guided Decompilation: A Step Towards Re-executability

cs.SE · 2025-11-03 · unverdicted · novelty 5.0

ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.

citing papers explorer

Showing 14 of 14 citing papers.

Evaluating the Environmental Impact of using SLMs and Prompt Engineering for Code Generation cs.SE · 2026-04-03 · unverdicted · none · ref 24
Chain-of-Thought prompting balances high accuracy with low energy use in small language models for code generation, while multi-sampling strategies add high energy costs for small accuracy gains.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 30
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Uncertainty Quantification for LLM-based Code Generation cs.SE · 2026-05-12 · unverdicted · none · ref 56
RisCoSet applies multiple hypothesis testing to construct risk-controlling partial-program prediction sets for LLM code generation, achieving up to 24.5% less code removal than prior methods at equivalent risk levels.
SOMA: Efficient Multi-turn LLM Serving via Small Language Model cs.CL · 2026-05-11 · unverdicted · none · ref 29
SOMA estimates a local response manifold from early turns and adapts a small surrogate model via divergence-maximizing prompts and localized LoRA fine-tuning for efficient multi-turn serving.
Bridging Generation and Training: A Systematic Review of Quality Issues in LLMs for Code cs.SE · 2026-05-06 · accept · none · ref 70
A review of 114 studies creates taxonomies for code and data quality issues, formalizes 18 propagation mechanisms from training data defects to LLM-generated code defects, and synthesizes detection and mitigation techniques.
iCoRe: An Iterative Correlation-Aware Retriever for Bug Reproduction Test Generation cs.SE · 2026-04-21 · conditional · none · ref 20
iCoRe improves Fail-to-Pass rates to 42.0% and 52.8% on two bug reproduction benchmarks by using correlation-aware iterative retrieval instead of standard semantic or BM25 methods.
SOCIA-EVO: Automated Simulator Construction via Dual-Anchored Bi-Level Optimization cs.AI · 2026-04-19 · unverdicted · none · ref 87
SOCIA-EVO generates statistically consistent simulators by separating structural refinement from parameter calibration via bi-level optimization and falsifying strategies through execution feedback in a Bayesian-weighted playbook.
ContextCov: Deriving and Enforcing Executable Constraints from Agent Instruction Files cs.SE · 2026-02-28 · unverdicted · none · ref 46
ContextCov compiles agent instruction files into static, runtime, and architectural guardrails, raising constraint compliance to 88.3% on SWE-bench Lite tasks versus 67% and 50.3% for prompt and reflection baselines.
FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data cs.AI · 2025-10-29 · unverdicted · none · ref 15
FELA deploys specialized LLM agents in an evolutionary framework to generate, validate, and refine explainable features from heterogeneous industrial event logs, improving downstream model performance.
Multi-LLM Orchestration for High-Quality Code Generation: Exploiting Complementary Model Strengths cs.SE · 2025-10-01 · conditional · none · ref 40
PerfOrch is a four-agent multi-LLM system that uses offline profiling to build language-and-category rankings for routing tasks, achieving 97.19% and 95.83% pass@1 on HumanEval-X and EffiBench-X with generalization across benchmarks.
A Large Language Model Approach to Generating Bypass Rules for Malware Evasion in Analysis Sandbox cs.CR · 2026-05-20 · unverdicted · none · ref 60
ABLE uses LLMs with sanitization and iterative refinement to generate bypass YARA rules from malware traces, achieving 79% success on 334 samples and 47% more family detections.
ClusterFusion++: Expanding Cluster-Level Fusion to Full Transformer-Block Decoding cs.DC · 2026-04-26 · unverdicted · none · ref 6
ClusterFusion++ fuses the entire Transformer block (LayerNorm to residual) via CUDA extensions and achieves 1.34x throughput on Pythia-2.8B with near-identical output fidelity.
Can LLMs be Effective Code Contributors? A Study on Open-source Projects cs.SE · 2026-04-25 · unverdicted · none · ref 9
LLMs achieve only 0-60% success when asked to contribute code to sizable open-source projects, often failing basic checks or simply repeating training data.
Context-Guided Decompilation: A Step Towards Re-executability cs.SE · 2025-11-03 · unverdicted · none · ref 40
ICL4Decomp applies in-context learning to guide LLMs in generating re-executable decompiled code from binaries, reporting roughly 40% higher re-executability than prior methods across datasets and optimization levels.

Exploring and evaluating hallucinations in llm-powered code generation

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer