Grok and Copilot show lower hallucination on references but higher on abstracts while Gemini and ChatGPT control tone better but have more factual errors, quantified by a new Hallucination Index.
Multimodal fine-tuning of LLMs for robust document visual question answering
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing
Grok and Copilot show lower hallucination on references but higher on abstracts while Gemini and ChatGPT control tone better but have more factual errors, quantified by a new Hallucination Index.