Grok and Copilot show lower hallucination on references but higher on abstracts while Gemini and ChatGPT control tone better but have more factual errors, quantified by a new Hallucination Index.
‘ChatGPT 4.0 ghosted us while conducting literature search:’ Modeling the chatbot’s generated non -existent references using regression analysis
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing
Grok and Copilot show lower hallucination on references but higher on abstracts while Gemini and ChatGPT control tone better but have more factual errors, quantified by a new Hallucination Index.