arXiv.org , year =

Gao, Cuiyun, Fan, Guodong, Chong, Chun Yong, Chen, Shizhan, Liu, Chao, Lo, David

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-05-07 · conditional · novelty 7.0

Delulu is a multi-lingual benchmark showing that top code LLMs still hallucinate in FIM tasks, with the strongest model reaching only 84.5% pass@1.

Showing 1 of 1 citing paper.

Delulu: A Verified Multi-Lingual Benchmark for Code Hallucination Detection in Fill-in-the-Middle Tasks cs.LG · 2026-05-07 · conditional · none · ref 19
Delulu is a multi-lingual benchmark showing that top code LLMs still hallucinate in FIM tasks, with the strongest model reaching only 84.5% pass@1.